This how-to guide will show you how to filter search results to include only specific sources. This
is implemented by leveraging both the search
and knowledge_graph
services.
Below is a step by step example to search for content mentioning Amazon, but only
focussing on sources from the United States with the highest source quality and a
retention period higher than 1 year.
First, let’s import the required dependencies and prepare the client:
Import required dependencies and prepare the client
from bigdata_client import Bigdata
from bigdata_client.models.sources import SourceRank, SourceRetentionPeriod
from bigdata_client.models.advanced_search_query import Similarity, Source, Entity
from bigdata_client.daterange import RollingDateRange
from functools import reduce
client = Bigdata()
Then, we need to query for a list of sources that match our criteria. In this case, we want sources
from the United States, with a rank of RANK_1
(highest quality), and a retention period greater
than 1 year. We can use the find_sources
method to retrieve these sources:
Find the relevant sources based on the criteria
sources = client.knowledge_graph.find_sources(
country="US",
rank=SourceRank.RANK_1,
retention=[
SourceRetentionPeriod.FULL_HISTORY,
SourceRetentionPeriod.FIVE_YEARS,
SourceRetentionPeriod.TWO_YEARS,
], limit=100
)
if len(sources) == 0:
raise ValueError("No sources found matching the criteria")
Once we have the sources, we can prepare a filter that matches any of the sources by joining them
using the |
operator.
Prepare the source filter for the query
source_filter = reduce(lambda x, y: x | y, sources)
Then we need to create the final query that matches the entity of Amazon and (&
) the source filter
we just created:
Prepare the final filter for the query
amazon = client.knowledge_graph.autosuggest("amazon")[0]
entity_filter = Entity(amazon.id)
query = entity_filter & source_filter
Finally, run the query to get the results (limited to 5 results for demonstration purposes):
documents = client.search.new(
query=query,
date_range=RollingDateRange.LAST_YEAR,
).run(limit=5)
for result in documents:
print(result.headline)
Amazon And Google Backed AI Startup Anthropic Reportedly Hits $3 Billion In Annualized Revenue - Where's ChatGPT-Parent OpenAI At?
Amazon's Zoox Unit Issues Software Recall for 270 Robotaxis
Market Chatter: Amazon, Stellantis Agree to End Collaboration on In-Car Software
AWS Announces General Availability Of Amazon Aurora DSQL: Serverless, Distributed SQL Database Offering 99.999% Multi-Region Availability, Strong Consistency, PostgreSQL Compatibility, And 4x Faster Performance Without Infrastructure Management
The New York Times inks deal with Amazon to license content for AI training