Pagination and Sorting

👍

This is important!

To get the best value out of your data feeds do pay close attention to the paging and sorting sections following. See the Pagination tutorial for more information.
We suggest that you first get your filtered searches worked out using the playground, and then use them in a script.

With the technicalities of pagination out of sight, you can concentrate on what is important - your data feeds!

Pagination: Collecting More than 100 Posts from a Query

Each request may return up to 100 posts matching your query. However, there may be significantly more results matching your filter parameters. To consume all the data, make sure you keep on calling the same URL string value that appears in the "next" key found in the output for each request.

JSON Output Example:

{

posts: [...],

totalResults: 7043729,

moreResultsAvailable: 7043629,

next: "/filterWebContent?token=XXXXXXX-XXX-XXXX-XXXX-XXXXXXXXXXX&format=json&ts=1488442985732&q=%28language%3Aenglish+OR+language%3Afrench%29",

requestsLeft: 999932

}

Sorting

Depending on the sort order you want, the "next" URL includes one of two paging parameters:

Sort by crawl date (consume the data as a stream - default):

The "next" URL contains the "ts" (timestamp) paging parameter which returns posts crawled after this timestamp.

Sort by any of the following values:

  • relevancy
  • reviews_count
  • reviewers_count
  • domain_rank
  • Review Order
  • rating

The "next" URL contains the "from" paging parameter.

Sort order
If you choose to order the posts by any of the above numeric sort values, you can choose in what order you want to get them:

  • asc (default)
  • desc

Using &order=desc will retrieve results from Now (most recent results) through the timestamp value you pass on "&ts"
While using &order=asc will retrieve results from the timestamp passed "&ts" through Now - so the most recent results will be last on your result-set.

🚧

Choosing Sorting Parameters

Sorting by any sort parameter value other than crawl date (default) may result in missing important posts. If data integrity is important, stick with the default recommended sort parameter value of crawl date, and consume the data as a stream.