Overview and Examples

Firehose is an Enterprise-Level Data Feed that provides access to all the data Webz.io crawls, from all territories in every language from Open Web verticals.

Unlike the other APIs where the retrieved data is pre-filtered, the Firehose solution provides all the data from News, Blogs, Discussions and Reviews. Posts are put in an XML format, compressed and uploaded in 10 megabytes Zip files, and made available to retrieve every other minute on a dedicated FTP site.


Firehose access is not included on your standard / evaluation subscription plans

If you require Open Web Data at scale -
Webz.io Data Experts would love to hear from you!

Open Web Firehose Scope

The Open Web Firehose section covers -

Firehose SectionWhat can I see here?
Overview and ExamplesA quick overview of the Firehose feeds plus this outline.
AccessUnderstand how to access the Firehose dedicated FTP site and the Data Structure.
Data RetrievalDo read through this section to understand the concepts of de-duplication and identifying unique and newly crawled documents.
Firehose OutputA layout of the fields available on the firehose output XML files.
Frequently Asked QuestionsJust that