Filters

Use the following filters to focus only on the data you need.

📘
Escaping reserved characters
If you need to use any of the characters which function as operators in your query itself (and not as operators), then you should escape them with a leading backslash.
The reserved characters are: + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /
Failing to escape these special characters correctly could lead to a syntax error which prevents your query from running.

Use the following filters to focus only on the data you need.

Post Content Filters

Parameter	Description	Example
language	The language of the post. The default is any.	Find posts in French or Italian: (language:french OR language:italian) See Supported Languages under 'References'
author	Return posts written by a specific author	Find posts written by Admin: author:Admin
text	A textual Boolean query describing the keywords that should (or should not) appear in the post text.	text:(apple OR android)
external_links	Search for posts that include links to another site.	Search for posts that link to LinkedIn (Note that both the slashes and colons are preceded by a backslash):
is_first	A Boolean parameter that searches only on the first post (exclude the comments).	is_first:true
published	A timestamp (in milliseconds) enabling you to filter posts that were published before or after certain date/time. Here is a Timestamp/Date converter	Return posts published after Thu, 30 Mar 2017 09:16:28 GMT: published:>1490865388000 published:>now-1h
num_chars	Search for posts with a minimum or maximum number of characters in the text. The excluded languages are : Japanese, Korean, Chinese	Return posts with 1800 or more characters on the text: field num_chars:>=1800
sentiment	Present news articles based on their emotional tone - categorized as positive, negative, or neutral. Supported for English, Spanish, French, Italian, Catalan , Portuguese, Chinese, Chineset, Arabic, German, Russian and Hindi news articles, sourced from the top 200,000 globally ranked domains.	sentiment:positive will return only positive news articles
category	The "Category" filter allows users to refine their news feed based on a predetermined list of thematic categories. The category values are based on the top IPTC categories: Arts, Culture and Entertainment Crime, Law and Justice Disaster and Accident Economy, Business and Finance Education Environment Health Human Interest Labor Lifestyle and Leisure Politics Religion and Belief Science and Technology Social Issue Sport War, Conflict and Unrest Weather Supported for news articles in English, Spanish, French, Italian, Catalan, Portuguese, Chinese, Chineset, Arabic, German, Russian and Hindi, sourced from the top 200,000 globally ranked domains.	category:sport will return only news article dealing with sports
topic	The "Topic" filter allows users to filter news articles based on categories derived from IPTC levels 2 and 3, providing a detailed and structured approach to content classification. Using AI, Webz.io assigns topics under each IPTC top category, ensuring comprehensive coverage of the article's context. Articles may have multiple topics to reflect their multifaceted nature. In addition to IPTC-based topics, Webz.io enhances contextual relevance by including supplementary topics like "layoffs" or "mergers and acquisitions" when applicable. This approach ensures users receive more nuanced and actionable insights from the filtered content. Click here for a full list of supported topics. Supported for news articles in English, Spanish, French, Italian, Catalan, Portuguese, Chinese, Chineset, Arabic, German, Russian and Hindi, sourced from the top 200,000 globally ranked domains.	topic:layoffs
webz_reporter	Return online news articles that are generated based on factual information extracted from selected news websites using advanced Language Model (LLM) technology. Setting this parameter to True (i.e., webz_reporter:true) will limit the search to include only articles generated using the WebzReporter feature. Note that to access the WebzReporter feature, the webz_reporter GET parameter must also be set to True, as it is False by default.	webz_reporter:true
ai_allow	If true, returns articles that are allowed to be used general usage including LLM training. If false returns articles that are not allowed only for LLM training (other usages are allowed) . In average 98% of Webz's openweb data is allowed for any purpose , where 2% is disallowed only for LLM training.	ai_allow:true
has_canonical	The has_canonical filter identifies posts that include a canonical HTML tag pointing to another website. This is often used to indicate that the article is syndicated from the original source on a different website.	has_canonical:true
breaking	The breaking filter is a boolean filter that indicates whether a news article was published with a latency of less than one hour and originated from a popular website (Top News). If set to `true`, it returns only articles that were detected within an hour of their publication time, ensuring access to the most up-to-date and rapidly emerging news.	breaking:true

Site Filters

Parameter	Description	Example
site_type	What type of sites to search in (the default is any) Available site types are: News Blogs DiscussionsWithout this filter, all site types are included.	Only news: site_type:news News & Blogs: (site_type:news OR site_type:blogs)
site	Limit the results to a specific site or sites.	Limit the results to posts from Yahoo or CNN: (site:yahoo.com OR site:cnn.com)
thread.country	The article's country of origin is determined by its domain, subdomain, or site section. This is established through the country indicators in the web address (e.g. *.co.fr) or by analyzing the country that generates the most traffic.	Return posts from sites from Hong Kong: thread.country:HK To get a full country code list, visit CountryCode.org
site_suffix	Limit the results to a specific site suffix	Return posts from sites where their top level domain (TLD) ends with .fr: site_suffix:fr
site_full	site_full Filter sites based on the domain and optionally by sub-domain	Return posts from Yahoo answers: site_full:answers.yahoo.com
site_category	Limit the results to posts originating from sites categorized as one (or more) , this filter is also used to filter top news per 59 countries. List Site Category Values	Return posts from sites categorized as sports or games related: (site_category:sports OR site_category:games)
site_section	Get all the posts of a specific site section (note that you must escape the http:// part of the URL like this: http://).	site_section:https://finance.yahoo.com/
domain_rank	A rank that specifies how popular a domain is	Search for posts from the top 1,000 sites: domain_rank:<1000

Thread Filters

A thread contains global information about the content of the whole page and its content. A thread can contain multiple posts grouped together.

Parameter	Description	Example
thread.title	A textual Boolean query describing the keywords that should (or should not) appear in the thread title.	Search for posts containing the word "glass" and not "metal" in their title: thread.title:glass -thread.title:metal
thread.section_title	A textual Boolean query describing the keywords that should (or should not) appear in a site’s section where the post was published	Search for the posts containing the word food only under sections with a title that contains the word "restaurants": food AND thread.section_title:restaurants
thread.site_title	The title text shown in the browser tab for the site's home page.	Search for the posts containing the word sport only under sites with a tab title that contains the word "basketball": sport AND thread.site_title:basketball
thread.url	Get all the posts of a specific thread (note that you must escape the http:// part of the URL like this: http://).	thread.url:
thread.published	A time-stamp (in milliseconds) filtering threads that were published before or after a certain date/time. Here is a Timestamp/Date converter	Return threads published after Thu, 30 Mar 2017 09:16:28 GMT: thread.published:> 1490865388000
crawled	A time-stamp (in milliseconds) filtering posts that were crawled before or after certain date/time. Here is a Timestamp/Date converter	Return posts crawled after Thu, 30 Mar 2017 09:16:28 GMT: crawled:>1490865388000
participants_count	The `participants_count` filter in Webz.io retrieves threads based on unique participants rather than total replies. If all comments come from one author, `participants_count` is `1`. This helps identify real discussions, filter out monologues, and track engagement.	`participants_count:>3` finds threads with at least four participants, while `participants_count:1` shows single-author threads.
licensing_agency	Get posts that provided using a licensed news agency. Our supported agencies: NLA, NCA and CFC.	Search for posts from the NLA agency: licensing_agency:NLA

Social Filters

Parameter	Description	Example
performance_score	A virality score for news and blogs posts only. The score ranges between 0-10. A score of 0 means that the post didn't do well - it was rarely or never shared. A score of 10 means that the post was "on fire" being shared thousands of times on Facebook.	Search for news or blog posts with performance score higher than 8 (highly viral): apple performance_score:>8
social.facebook.likes	Return posts filtered by the number of Facebook likes.	Return posts with more than 10 Facebook likes: social.facebook.likes:>10
social.facebook.shares	Return posts filtered by the number of Facebook shares.	Return posts with more than 10 Facebook shares: social.facebook.shares:>10
social.facebook.comments	Return posts filtered by the number of Facebook comments.	Return posts with more than 10 Facebook comments: social.facebook.comments:>10
social.vk.shares	Return posts filtered by the number of VK shares.	Return posts with more than 10 VK shares: social.vk.shares:>10

Entities & Entity Sentiment Filters

We extract entities such as Persons, Organizations and Locations from all the English news and blog posts that we crawl. We detect the sentiment attached to Persons and Organizations (not Locations) from the top news outlets.

Parameter	Description	Example
person	Filter by person name. You should use this filter only for disambiguation, otherwise you should use a simple keyword search.	person:"barack obama"
organization	Use this filter to search by the name of any organization or company. It’s especially helpful for disambiguation when a keyword may refer to multiple entities (e.g., "Apple" the company vs. the fruit). If you're looking specifically for publicly traded companies, the organization field also includes stock tickers and stock exchanges with the following fields: ticker* – Filter by the company's stock ticker exchange* – Filter by the stock exchange where the company is listed	Name example: organization:"apple" Ticker example: ticker:"aapl" Stock exchange example: exchange:"nasdaq"
location	Filter by location name Important: Do not confuse this with the country filter*. If you want to search for sites from a specific country, use the thread.country parameter (explained above).	location:"germany"
entity.sentiment	Find an entity with a sentiment context attached to it.	person.positive:"obama" organization.negative:"apple" organization.neutral:"google"

Syndication (grouping) Filters

Syndicated content is content that is classified as similar to other in the group, therefor its being tagged as such.

Each syndicated cluster is tagged as unique and is based on SIMHash algorithm.

Parameter	Description	Example
syndication.syndicated	Filters in or out syndicated (similar) content	If syndication.syndicated:true that it find all the similar/duplicated content. If syndication.syndicated:false This helps filter out articles or posts that are very similar or duplicated across different sites.
syndication.syndicate_id	Filter by unique ID of the group (generated by the system)	syndication.syndicate_id:7f3de401c5c89f3ac579918dd20b1f05ba6051b3

Trust filters

Parameter	Description	Example
trust.category	The `trust.category` filter in Webz.io allows users to refine their search by filtering content based on the trustworthiness of the source. This helps users assess the reliability of the data they retrieve. Possible values include: `fake_news` – Returns posts from sites that are recognized as known fake news sources. `trusted_news` – Returns posts from sites that are recognized as known trusted news sources. `satirical_news` – Returns posts from sites that are known for publishing satirical content.By applying this filter, users can either include or exclude certain types of sources to improve the trustworthiness of their dataset. Please note: The classification of fake news, trusted news and satirical news is conducted at the domain-level and NOT at the article-level.	trust.category:fake_news return posts from sites that are known fake news sites.
trust.top_news	The `trust.top_news` filter in Webz.io enables users to narrow their search by focusing on content from top news sites. Users can either search across all top news sources or limit results to a specific country by adding the relevant country suffix (See full list).	`trust.top_news:*` or, `trust.top_news:top_news` returns posts from all top new sites. `trust.top_news:top_news_us` returns posts from top news sites in US.
trust.bias	The `trust.bias` filter in Webz.io enables users to narrow their search results by filtering content according to political bias. Available options: left, center, right.	`trust.bias:right` returns posts from news sites recognized for having a right-leaning political bias.
trust.source (newsroom,gov,local)	Use this filter to search news sites by a trust source. Trust source can be one of the following options: newsroom gov_news local_news In addition, it is possible to filter using the following fields: trust.source.city trust.source.state trust.source.country trust.source.domain_type trust.source.agency trust.source.organization_name	`trust.source.type:newsroom` returns posts from the newsrooms of company (such as press releases, investor relations). `trust.source.type:gov_news` returns posts from news sections of governments sites. `trust.source.type:local_news` return news related to local states/cities (e.g foxla.com/news). `trust.source.city:"new york"` returns local, or gov posts related to selected city.