We do our best to pick up unique posts' URLs from the pages we crawl, when possible. But sources don't always make those available, and sometimes there is no way to acquire unique URLs.
In such cases, we fabricate our own ID per post.
See the <post_id> field on the Firehose Output.
Please note that once we fabricate an ID, the post_url will not include it.
- Ensure that you concatenate the post_url + post_id.
- Only a combination of the post_url and post_id fields is considered a unique post.