feedeR: Reading RSS and Atom Feeds from R
I’m working on a project in which I need to systematically parse a number of RSS and Atom feeds from within R. I was somewhat surprised to find that no package currently exists on CRAN to handle this task. So this presented the opportunity for a bit of DIY.
You can find the fruits of my morning’s labour here.
Installing and Loading
The package is currently hosted on GitHub.
Reading a RSS Feed
Although Atom is supposed to be a better format from a technical perspective, RSS is relatively ubiquitous. The vast majority of blogs provide an RSS feed. We’ll look at the feed exposed by R-bloggers.
There are three metadata elements pertaining to the feed.
The actual entries on the feed are captured in the
items element. For each entry the
link are captured. There are often more fields available for each entry, but these three are generally present.
Reading an Atom Feed
Atom feeds are definitely in the minority, but this format is still used by a number of popular sites. We’ll look at the feed from The R Journal.
The same three elements of metadata are present.
Atom feeds do not appear to consistently provide the date on which each of the entries was originally published. The
link fields are always present though!
I’m still testing this across a selection of feeds. If you find a feed that breaks the package, please let me known and I’ll debug as necessary.