Step 1: Add New Search Method
Overview
In order to initiate collection of product documents, the process begins with our Search Methods endpoint. Collection of product documents from search methods takes up to 48 hours to complete.
To begin, define what is the eCommerce domain to be crawled and identify the best method to get your product data. We have 3 types of parsers or web scrapers. They are categorized according to the type of result they return.
- Search - these scrapers are designed to help you extract search results data from the different categories sources contain.
- The output is up to 100 product documents per page from the url added.
- An example of a category can be: Headphones, Wearable Technology, etc.
- Keyword - these scrapers are designed to help you extract results through a keyword search through the main search bar of a domain.
- The output is up to 100 product documents per page from the url added.
- Product - these scrapers are designed to help you extract details for a single product by using its url.
- The output is the corresponding product doc to the product url added.
URL Structure
- Use the following url to call on the “Add New Search Method” endpoint.
Example
- As a user, I want to find products with keyword “samsung tv” from amazon.com
- User inputs the following query into Add New Search Method endpoint:
Input:
Input | Method Types | Definition | Example |
---|---|---|---|
token | - | A unique identifier used to authenticate API access | ?token=123456789 |
method (only one can be used per Search Method) | PRODUCT_URL SEARCH_URL KEYWORD | Specific product url Category Page url General search term | &method=PRODUCT_URL &method=SEARCH_URL &method=KEYWORD |
domain | - | Website name (without www.) | &domain=amazon.com |
value | - | related to the method added | PRODUCT_URL: &value=amazon.com/iPhone-Pro-256GB-Sierra-Blue/dp/B0BGYFDQJX/ SEARCH_URL: &value=amazon.com/s?i=specialty-aps&bbn=16225007011 KEYWORD: &value=Iphone%206s |
Output:
Output | Definition | Example |
---|---|---|
status code and messages | Whether the search method was successfully added or there was an error. | {status: 200, messages: [ “Topic already exists” ]} |
method | The method ID assigned to the search method added. | method: 123 |
If you’re interested in learning more about our current HTTP status codes, see the corresponding section.
Step 2: Get Products Data
Overview
After a search method has been successfully added, use the “Get Product Data” endpoint in order to extract the products collected given the corresponding method. Here you will find product documents (structured and sorted data by products crawled). It’s important to remember that data may only be visible in the endpoint in up to 48 hours since the method was added. Updated product information data will only be available after product activation.
URL Structure:
- Use the following url to access the “Get Products Data” endpoint.
Example
- As a user I want to retrieve product documents from all search methods collecting data from amazon.com
- User inputs the following query into the Get product Data endpoint:
Input & Output
Input | Input Definition | Output | Example |
---|---|---|---|
token | A unique identifier used to authenticate API access | - | ?token=123456789 |
q | Query for what is desired to search - all product fields can be used for the search | Product Documents corresponding to query | &q=domain:amazon.com |
When no query is listed | All Product Documents that have been crawled | - |
Product Object:
Field Name | Description | Searchable | Type | Example |
---|---|---|---|---|
uuid | A unique ID representing the item | Yes | String | 18e5b30e542cf3a1fbb983e4572551490f07dc15 |
url | The link to the item | Yes | String | https://amazon.com/dp/B07Q45VKVF |
parent_url | The link corresponding to the search method | Yes | String | https://www.amazon.com/s?k=Sky Organics |
image_url | A link to the main or first image of the item | Yes | String | https://images-na.ssl-images-amazon.com/images/I/61Xv-uMM%2BXL._SL1024_.jpg |
name | The title of the item | No | String | Sky Organics USDA Organic Moroccan Argan Oil: Unrefined, 100% Pure, Cold-Pressed, Moisturizing & Healing, for Dry Skin, Sensitive Skin, Hair Conditioning, Cruelty Free, Vegan, 4 oz (Pack of 2) |
description | Product details and description | No | String | Contains: 2 x 4 oz. bottle of Organic Argan Oil by Sky Organics. 100% Pure and Cold-pressed Oil: Our Moroccan Argan Oil is free of synthetic ingredients, fragrance, alcohol, silicones, or fillers….. |
price | Original price of product | Yes | String | $28.35 |
sku | Manufacturer stock keeping unit id | Yes | String | B07Q45VKVF |
product_id | Product unique identifier | Yes | String | B07Q45VKVF |
review_count | The amount of reviews submitted on a product | Yes | Integer | 138 |
rating_count source specific field | The total number of customer ratings that a product has received | Yes | Integer | 138 |
aggregated_rating | Average rating of product as listed | Yes | Float | 4.2 |
brand_name | Brand of product listing | Yes | String | Sky Organics |
domain | Source name | Yes | String | amazon.com |
method | Search method id | Yes | String | 9189 |
methods | If in multiple methods | Yes | String | 9189 9187 |
reviews_retrieval | Activation for reviews (are reviews being retrieved) | Yes | Boolean | false |
historical_collection | Whether was classified to collect historical reviews, new reviews, or both. | No | Boolean | false |
crawled | Date data collected | Yes | Date Format: yyyy-MM-dd'T'HH:mm:ss.SSSXXX | 2020-05-21T17:21:27.226+03:00 |
updated | Date data was last updated | Yes | Date Format: yyyy-MM-dd'T'HH:mm:ss.SSSXXX | 2020-05-21T17:21:27.226+03:00 |
Step 3: Update Products (Review Activation)
Overview
This endpoint gives you the ability to manage which product documents (a specific uuid) you are interested in retrieving continuous new data on and extract reviews on. By setting the reviews_retrieval field to "true" (review collection activation), we will begin collecting all available historical reviews on the corresponding product or entity and then continue to collect reviews on an ongoing basis to bring the delta - or until classified otherwise. Additionally, activated products will also be re-crawled in parallel to update the product information fields which can be found in the getProducts endpoint. If historical reviews collection is not of your scope, please set the historical_collection field to "false" and by doing so, the product activated will only have new reviews collected. It’s important to remember that review data may take up to 48 hours to be collected.
URL Structure:
- Use the following url to access the “Update Products” endpoint.
Update/Activation URL Example:
Usage Example:
- As a user, after receiving product data from a search method collection, I want to start collecting review data for the following product uuid:
- “033cd2dc1eadfff83fcc242dcd8b3031496cac7c”
- User uses the following endpoint to activate review retrieval on. the product uuid
Input:
Input | Definition | Example |
---|---|---|
token | A unique identifier used to authenticate API access | ?token=123456789 |
uuid | A unique ID representing the item | &uuid=033cd2dc1eadfff83fcc242dcd8b3031496cac7c |
reviews_retrieval | “null” when product has not been activated yet “true” where the objective is to receive reviews for corresponding product ‘false” where the objective is to stop receiving reviews for corresponding product | &reviews_retrieval=true |
historical_collection | (Optional input parameter) “null” when parameter was not used “true” where the objective is to collect historical reviews and new reviews ‘false” where the objective is to only collect new reviews | &historical_collection=true |
Output:
Output | Definition | Example |
---|---|---|
status code and messages | Whether review retrieval was successfully activated or there was an error. | {status: 500, messages: [ “Failed to update product : xyz123” ]} |
If you’re interested in learning more about our current HTTP status codes, see the corresponding section.
Step 4: Get Product Review Data
Overview
This endpoint gives you the ability to extract reviews on activated products (review_retrieval field set to “true”). Once initial historical data has been collected, we will continue to crawl activated products to deliver new reviews (the delta from what was collected, if any) in up to every 48 hours. Users will receive data in descending order.
URL Structure:
- Use the following url to access the “Get Product Review Data” endpoint.
Example:
- As a user, I want to see the review data for a specific product uuid.
- User Inputs the following query on the Get Product Review Data endpoint:
Input & Output
Input | Input Definition | Output | Query Example |
---|---|---|---|
token | A unique identifier used to authenticate API access | - | ?token=123456789 |
q | Query for what is desired to search - all product fields can be used for the search | Review Documents corresponding to query | &q=product_uuid:e8fb0a6f89a013e9b385aa22e294724b2e6da361 |
No Query | All Review Documents | - |
Review Object, Core Fields:
Field Name | Description | Searchable | Type | Example |
---|---|---|---|---|
uuid | A unique ID representing a post in a thread | Yes | String | 596787e5146389e00e88acb55a8b452b433afc64 |
review_id | Consumer review unique identifier | Yes | String | 220866944 |
text | The text body of the review | No | String | We’ve had this TV and sound bar for about a year now and it’s been great. We’re not that tech savvy so it took a little while for us to set up; the sound bar, Wifi, but after we got it hooked up, nice….. |
published | The date/time when the review was published. | Yes | Date Format: yyyy-MM-dd'T'HH:mm:ss.SSSXXX | 2022-11-06T05:28:00.000+02:00 |
author | The name of the review author | Yes | String | JakeR |
rating | The rating parameter provides the star rating for the review. rating is a floating number between 0.0 to 5.0. | Yes | Float | 5.0 |
title | The title of the review | No | String | Picture quality is great |
domain | The source crawled | Yes | String | https://www.samsung.com/es |
url | A link to the review of the item | Yes | String | https://www.samsung.com/es/tvs/qled-tv/q50a-32-inch-qled-smart-tv-qe32q50aauxxc/#220866944 |
product_id | Corresponding product id | Yes | String | QE32Q50AAUXXC |
product_uuid | Corresponding product that was reviewed | Yes | String | cde042856d476a8b9dfde7d7111efe1a8a421ece |
image_urls source specific field | Images included in customer reviews displayed as direct URLs in a list. | Yes | List | https://m.media-amazon.com/images/I/61TKWWlevAL._SL1600_.jpg |
num_of_images source specific field | Total number of images posted for the review | Yes | Integer | 6 |
crawled | The date/time when the review was crawled. | Yes | Date Format: yyyy-MM-dd'T'HH:mm:ss.SSSXXX | 2022-11-13T14:30:50.686+02:00 |
Review Object, Optional Fields:
Field Name | Description | Searchable | Type | Example |
---|---|---|---|---|
is_verified source specific field | Is the reviewer a verified purchased | Yes | Boolean | true |
variant source specific field | Product ID of a specific variant | Yes | String | B01M1NAMVM |
syndicated_text source specific field | A domain where the review was originally written | Yes | String | amazon.com |
is_vine source specific field | See: https://www.amazon.com/vine/about | Yes | Boolean | true |
is_recommended source specific field | Recommended product | Yes | Boolean | true |
is_incentivized source specific field | The contributor received a free product or service to review | Yes | Boolean | true |
is_sweepstakes_entry source specific field | Offering your customers an entry to win a prize, discount, or any other type of reward for writing a review | Yes | Boolean | true |
is_seed_member source specific field | Only for homedepot.com | Yes | Boolean | true |
is_reviewer_program source specific field | Only for homedepot.com | Yes | Boolean | true |
is_neighbors_program source specific field | Only for wayfair.com | Yes | Boolean | true |
had_tried_product source specific field | This reviewer was invited to try the product in exchange for their honest opinion. Only for currys.co.uk | Yes | Boolean | true |
is_prize_draw_participant source specific field | Only for bosch.co.uk | Yes | Boolean | true |
is_sponsored_rating source specific field | Only for aeg.de | Yes | Boolean | true |
is_part_of_competition source specific field | Only for aeg.de | No | Boolean | true |
purchased source specific field | Only for bestbuy.com | Yes | Date Format: yyyy-MM-dd'T'HH:mm:ss.SSSXXX | 2023-04-26T03:00:00.000+03:00 |
is_partner_review source specific field | Only for johnlewis.com | No | Boolean | true |
Get Search Methods
External endpoint - not part of main flow
Overview
This endpoint gives you the ability to manage, access, and sort through existing search methods added. Accessing your active search methods can give you an understanding of what type of data you extracted in the past and can give you ideas into what you may want to extract in the future.
URL Structure:
- Use the following url to access the “Get Search Methods” endpoint.
Example:
- As a user I want to see all my search methods collecting data from amazon.com
- User inputs the following query into Get Search Methods endpoint
Input & Output:
Input | Input Definition | Output | Query Example |
---|---|---|---|
token | A unique identifier used to authenticate API access | - | ?token=123456789 |
q | Query for what is desired search | Methods and details corresponding to the query | &domain=amazon.com |
No Query | All methods and corresponding details | - |
Restriction
- Currently there is only an ability to query by “domain” field in this endpoint (however, you can still use all the general get parameters).
Get Status
External endpoint - not part of main flow
Overview
This endpoint gives API subscription customers the ability to check and understand how many credits are available to them for products and reviews collection.
To learn more about our pricing plans and subscription models please contact [email protected]
URL Structure:
Use the following url to access the “Get Status” endpoint:
Usage Example:
- As a user I want to check how many products and reviews credits I have available in order to plan out my next requests.
- User inputs the following query into Get Status endpoint:
Input
Input | Definition | Example |
---|---|---|
token | A unique identifier used to authenticate API access | ?token=123456789 |
Output
Output | Definition | Type | Example |
---|---|---|---|
productRequestsLeft | How many more product doc requests/credits are available in your current subscription plan. | integer | "productRequestsLeft": 886 |
reviewRequestsLeft | How many more review doc requests/credits are available in your current subscription plan. | integer | "reviewRequestsLeft": 886 |
Delete Products
External endpoint - not part of main flow
Overview
This endpoint gives API subscription customers the ability to delete products collected - helping the user manage their products' credit capacity in accordance with the package purchased. Please note that when a product is deleted, the corresponding reviews associated with the product will be deleted as well, but the credits used for those reviews will not be recycled.
To learn more about our pricing plans and subscription models please contact [email protected]
URL Structure
Use the following url to access the “Delete Products” endpoint:
Usage Example:
- As a user I no longer need data corresponding to a specific product and want to delete it from my data collected.
- User inputs the following query into Delete Products endpoint:
Input
Input | Definition | Example |
---|---|---|
token | A unique identifier used to authenticate API access | ?token=123456789 |
uuid | A unique ID representing the item | &uuid=zyz123 |
Output
Output | Definition | Example |
---|---|---|
status code and message | Whether the product was successfully deleted or there was an error. | {status: 500, messages: [ “Failed to delete product : xyz123” ]} |