Firehose Access and Data structure

Your unique Firehose link enables unfiltered access to all crawled data for a given
vertical data feed - News, Blogs, Discussions, or Reviews.

The URL looks like this:

The link points to a directory list of ZIP files capped at 10 MB and updated every other minute as newly crawled data becomes available.


Every compressed ZIP file contains a set of XML files - each of which corresponds to a few posts.
Here's a sample: