Filters

Use the following filters to focus only on the data you need.

📘

Escaping reserved characters

If you need to use any of the characters which function as operators in your query itself (and not as operators), then you should escape them with a leading backslash. For instance, to search for external_links:https://www.linkedin.com*, you would need to write your query as

external_links:https://[www.linkedin.com\/\](http://www.linkedin.com\/\)\*

The reserved characters are: + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /

Failing to escape these special characters correctly could lead to a syntax error which prevents your query from running.

Post Content Filters

ParameterDescriptionExample
languageThe language of the post. The default is any.Find posts in French or Italian:
(language:french OR language:italian)

See Supported Languages under 'References'
authorReturn posts written by a specific authorFind posts written by Admin:

author:Admin
textA textual Boolean query describing the keywords that should (or should not) appear in the post text.text:(apple OR android)
has_videoA Boolean parameter that searches only for posts that contain a video.
Set has_video:false to exclude posts with videos.
Without this filter, the existence of videos is ignored.
has_video:true
external_linksSearch for posts that include links to another site.Search for posts that link to LinkedIn (Note that both the slashes and colons are preceded by a backslash):

external_links:https://[www.linkedin.com\/\](http://www.linkedin.com\/\)\*
is_firstA Boolean parameter that searches only on the first post (exclude the comments).is_first:true
publishedA timestamp (in milliseconds) enabling you to filter posts that were published before or after certain date/time.
Here is a Timestamp/Date converter
Return posts published after Thu, 30 Mar 2017 09:16:28 GMT:

published:>1490865388000

published:>now-1h
num_charsSearch for posts with a minimum or maximum number of characters in the text. The excluded languages are : Japanese, Korean, ChineseReturn posts with 1800 or more characters on the text: field

num_chars:>=1800
sentimentPresent news articles based on their emotional tone - categorized as positive, negative, or neutral. Supported for English, Spanish, French, Italian, Catalan and Portuguese news articles.sentiment:positive will return only positive news articles
categoryThe "Category" filter allows users to refine their news feed based on a predetermined list of thematic categories. The category values are based on the top IPTC categories:

- Arts, Culture, and Entertainment
- Crime, Law and Justice
- Disaster and Accident
- Economy, Business and Finance
- Education
- Environment
- Health
- Human Interest
- Labor
- Lifestyle and Leisure
- Politics
- Religion and Belief
- Science and Technology
- Social Issue
- Sport
- War, Conflict and Unrest
- WeatherSupported for English, Spanish, French, Italian, Catalan and Portuguese news articles.
category:sport

will return only news article dealing with sports
webz_reporterReturn online news articles that are generated based on factual information extracted from selected news websites using advanced Language Model (LLM) technology. Setting this parameter to True (i.e., webz_reporter:true) will limit the search to include only articles generated using the WebzReporter feature. Note that to access the WebzReporter feature, the webz_reporter GET parameter must also be set to True, as it is False by default.webz_reporter:true
ai_allowIf true , returns articles that are allowed to be used general usage including LLM training. If false returns articles that are not allowed only for LLM training (other usages are allowed) . In average 98% of Webz's openweb data is allowed for any purpose , where 2% is disallowed only for LLM training.ai_allow:true

Site Filters

ParameterDescriptionExample
site_typeWhat type of sites to search in (the default is any)
Available site types are:

- News
- Blogs
- DiscussionsWithout this filter, all site types are included.
Only news: site_type:news

News & Blogs: (site_type:news OR site_type:blogs)
siteLimit the results to a specific site or sites.Limit the results to posts from Yahoo or CNN:

(site:yahoo.com OR site:cnn.com)
thread.countryThe article's country of origin is determined by its domain, subdomain, or site section. This is established through the country indicators in the web address (e.g. *.co.fr) or by analyzing the country that generates the most traffic.Return posts from sites from Hong Kong:

thread.country:HK

To get a full country code list, visit CountryCode.org
site_suffixLimit the results to a specific site suffixReturn posts from sites where their top level domain (TLD) ends with .fr:

site_suffix:fr
site_fullsite_full

Filter sites based on the domain and optionally by sub-domain
Return posts from Yahoo answers:

site_full:answers.yahoo.com
site_categoryLimit the results to posts originating from sites categorized as one (or more) , this filter is also used to filter top news per 59 countries. List Site Category ValuesReturn posts from sites categorized as sports or games related:

(site_category:sports OR site_category:games)
site_sectionGet all the posts of a specific site section (note that you must escape the http:// part of the URL like this: http\:\/\/).sitesection:https\:\/\/_finance.yahoo.com\/
domain_rankA rank that specifies how popular a domain isSearch for posts from the top 1,000 sites:

domain_rank:<1000

Thread Filters

A thread contains global information about the content of the whole page and its content. A thread can contain multiple posts grouped together.

ParameterDescriptionExample
thread.titleA textual Boolean query describing the keywords that should (or should not) appear in the thread title.Search for posts containing the word "glass" and not "metal" in their title:

thread.title:glass -thread.title:metal
thread.section_titleA textual Boolean query describing the keywords that should (or should not) appear in a site’s section where the post was publishedSearch for the posts containing the word food only under sections with a title that contains the word "restaurants":

food AND thread.section_title:restaurants
thread.urlGet all the posts of a specific thread (note that you must escape the http:// part of the URL like this: http\:\/\/).thread.url:"https\:\/\/www.rt.com\\/news\\/487006-lavrov-who-finance-stop-unfair\\/"
thread.publishedA time-stamp (in milliseconds) filtering threads that were published before or after a certain date/time.
Here is a Timestamp/Date converter
Return threads published after Thu, 30 Mar 2017 09:16:28 GMT: thread.published:>
1490865388000
crawledA time-stamp (in milliseconds) filtering posts that were crawled before or after certain date/time.
Here is a Timestamp/Date converter
Return posts crawled after Thu, 30 Mar 2017 09:16:28 GMT:
crawled:>1490865388000
image_textFind threads containing images that include the text requested.
Partials values are permissible.
image_text:cola
return threads that contain images with the word "cola" inside the image
image_labelFind posts containing images with a certain object inside. The object is represented by a label such as person, Wedding, Car etc..image_label:person

Social Filters

ParameterDescriptionExample
performance_scoreA virality score for news and blogs posts only. The score ranges between 0-10. A score of 0 means that the post didn't do well - it was rarely or never shared. A score of 10 means that the post was "on fire" being shared thousands of times on Facebook.Search for news or blog posts with performance score higher than 8 (highly viral):

apple performance_score:>8
social.facebook.likesReturn posts filtered by the number of Facebook likes.Return posts with more than 10 Facebook likes:

social.facebook.likes:>10
social.facebook.sharesReturn posts filtered by the number of Facebook shares.Return posts with more than 10 Facebook shares:
social.facebook.shares:>10
social.facebook.commentsReturn posts filtered by the number of Facebook comments.Return posts with more than 10 Facebook comments:
social.facebook.comments:>10
social.vk.sharesReturn posts filtered by the number of VK shares.Return posts with more than 10 VK shares:
social.vk.shares:>10

Entities & Entity Sentiment Filters

We extract entities such as Persons, Organizations and Locations from all the English news and blog posts that we crawl. We detect the sentiment attached to Persons and Organizations (not Locations) from the top news outlets.

ParameterDescriptionExample
personFilter by person name. You should use this filter only for disambiguation, otherwise you should use a simple keyword search.person:"barack obama"
organizationFilter by organization/company name. You should use this filter only for disambiguation, otherwise you should use a simple keyword search.organization:"apple"
locationFilter by location name
Important: Do not confuse this with the country filter. If you want to search for sites from a specific country, use the thread.country parameter (explained above).
location:"germany"
entity.sentimentFind an entity with a sentiment context attached to it.person.positive:"obama"
organization.negative:"apple"
organization.neutral:"google"

Syndication (grouping) Filters

Syndicated content is content that is classified as similar to other in the group, therefor its being tagged as such.

Each syndicated cluster is tagged as unique and is based on SIMHash algorithm.

ParameterDescriptionExample
syndication.syndicatedIf syndicate:true that it find all the similar/duplicated contentsyndication.syndicated:true
syndication.syndicate_idFilter by unique ID of the group (generated by the system)syndication.syndicate_id:7f3de401c5c89f3ac579918dd20b1f05ba6051b3