Your Ad Here

Archive for the ‘syndication’ Category

Tips for Parsing RSS and ATOM Feeds

Sunday, June 22nd, 2008

Syndication has many layers built in it these days, but really when you get down to the formats making your own parsers and writers for your syndication formats is quite easy. Good knowledge of the formats and some basic tricks can make your feeds highly customized and tailored by your own work.

Tips for Generating Good Feeds

RSS and Atom are easy to work with, but like any new format, you may encounter some problems in using them. This section attempts to address the most common issues that arise when generating a feed.

  • Distinct Entries — Make sure that aggregators can tell your entries apart, by using different identifiers in rdf:about (RSS 1.0), guid (RSS 2.0) and id (Atom). This will save a lot of headaches down the road.
  • Meaningful Metadata — Try to make the metadata useful on its own; for example, if you only include a short <title>, people may not know what the link is about. By the same token, if you shove an entire article into <description>, it’ll crowd people’s view of the feed, and they’re less likely to stay interested in what you have to say. Generally, you want to put enough into the feed to help someone decide whether they should follow the link.
  • Encoding HTML — Although it’s tempting, refrain from including HTML markup (like <a href="...">, <b> or <p>) in your RSS feed; because you don’t know how it will be presented, doing so can prevent your feed from being displayed correctly. If you need to include a a tag in the text of the feed (e.g., the title of an entry is “Ode to <title>”), make sure you escape ampersands and angle brackets (so that it would be “Ode to &lt;title&gt;”).
  • XML Entities — Remember that XML doesn’t predefine entities like HTML does; therefore, you won’t have &nbsp; &copy; and other common entities available. You can define them in the XML, or alternatively just use an character encoding that makes what you need available.
  • Character Encoding — Some software generates feeds using Windows character sets, and sometimes mislabels them. The safest thing to do is to encode your feed as UTF-8 and check it by parsing it with an XML parser.
  • Communicating with Viewers — Don’t use entries in your feed to communicate to your users; for example, some feeds have been known to use the <description> to dictate copyright terms. Use the appropriate element or module.
  • Communicating with Machines — Likewise, use the appropriate HTTP status codes if your feed has relocated (usually, 301 Moved Permanently) or is no longer available (410 Gone or 404 Not Found).
  • Making your Feed Cache-Friendly — Successful feeds see a fair amount of traffic because clients poll them often to see if they’ve changed. To support the load, Web Caching can help; see the caching tutorial.
  • Validate — use the Feed Validator to catch any problems in your feed; it works with RSS and Atom. Also, don’t just run it once; make sure you regularly check your feed, so that you can catch transient errors.

Feed Tools

This is an incomplete list of tools for creating feeds and checking them to make sure that you’ve done so correctly. Note that there are many more libraries that help parsing feeds; these haven’t been included here because this tutorial focuses on the Webmaster, not consumers of feeds.

  • xpath2rss — Tool for scraping Web sites using XPath expressions (a method of selecting parts of HTML and XML documents).
  • Site Summaries in XHTML — Online service (also available as an XSLT stylesheet) that uses hints in your HTML to generate a feed.
  • myRSS — An online, third-party automated scraping service. Doesn’t require any special markup.
  • RSS.py — Python library for generating and parsing RSS.
  • ROME — Java library for parsing and generating RSS and Atom feeds, as well as translating between formats.
  • XML::RSS — Perl module for generating and parsing RSS.
  • Online Validator - Check your RSS 1.0, 2.0 and Atom feeds.

[source]



*drawcode is proudly powered by WordPress
Entries (RSS) and Comments (RSS).

© 2006-2008 Ryan Christensen - template by drawk }}
Your Ad Here Your Ad Here