Firehose is an Enterprise-Level Data Feed that provides access to all the data Webz.io crawls, from all territories in every language from Open Web verticals.
Unlike the other APIs where the retrieved data is pre-filtered, the Firehose solution provides all the data from News, Blogs, Discussions and Reviews. Posts are put in an XML format, compressed and uploaded in 10 megabytes Zip files, and made available to retrieve every other minute on a dedicated FTP site.
Firehose access is not included on your standard / evaluation subscription plans
If you require Open Web Data at scale -
Webz.io Data Experts would love to hear from you!
Open Web Firehose Scope
The Open Web Firehose section covers -
Firehose Section | What can I see here? |
---|---|
Overview and Examples | A quick overview of the Firehose feeds plus this outline. |
Access | Understand how to access the Firehose dedicated FTP site and the Data Structure. |
Data Retrieval | Do read through this section to understand the concepts of de-duplication and identifying unique and newly crawled documents. |
Firehose Output | A layout of the fields available on the firehose output XML files. |
Frequently Asked Questions | Just that |