Can I Use Webz.io to Build my own Datasets for Machine Learning?
Yes. Building artificial intelligence (AI) models that rely on machine learning requires supplying datasets. Webz.io delivers both historical and real-time data feeds at scale that can power use-cases such as predictive analytics engines, natural language processing (NLP) tools, and financial analysis programs. Webz.io users can leverage over 25TB of historical data through using the open and historical archived web data. In addition, those considering using the service can evaluate by using free datasets including information retrieved from blog posts, online messaging boards, and news articles. Webz.io has been successfully used to power models which identify fake news.
How does the Open Web APIs differ from the Google Search API?
The Google Search API was deprecated in April 2017 and was replaced by the Custom Research API which returns results from user-built Google custom search engines. These custom search engines are commonly embedded in websites and blogs and pull a list of URLs defined by the user. While Google is inarguably the best-known search engine in the world, it no longer provides its search results in machine-readable format. By comparison, the Webz.io API scans and extracts data from hundreds of thousands of global data sources, providing organizations with access to a wide range of data from blogs, news sources, and other content sources. Webz.io’s APIs are suitable for providing enterprise-class applications with the volume of data they require from open web search results.
What is the scope of your language and geographic coverage?
Webz.io supports 150+ languages across every geographic territory with online access.
Is it possible to query for posts in multiple languages?
Sure. A simple OR Boolean query will do the trick.
How many keywords can we track per month?
You can enter any Boolean query with no set limit to the number of tracked keywords. Subscription plans are limited by the number of monthly requests, which you can upgrade at any time.
How many sources do you crawl? Can you share your complete list of sources on your crawling cycle?
It is impossible to provide an up to date list of crawled sites. Our site database is dynamic by nature continuously aggregating new sources. We can tell you however, that the number of crawled sites run to millions with over 10 million posts indexed daily.
We pride ourselves in our ability to quickly add new sources typically within a few hours of detection.
Moreover, you can use the API playground (https://webz.io/web-content-api) to confirm coverage for a particular source. Our users frequently send us source requests (often including a long list of sources). If you send us a list of sources, we will include them our coverage and send you confirmation within a few days.
Can I ask that you do not log my queries?
By all means! Just let us know and we'll disable logging, for your subscription.
Does Webz.io provide the full text from the pages you crawl?
Yes, we sure do - Always the full text of a news article, comment, blog post or review.
Does the API support wildcard expressions?
Yes. The query syntax is based on Elasticsearch query string syntax and wildcards are supported.