This how-to guide will show you how to filter search results to include only specific sources. This is implemented by leveraging both the search and knowledge_graph services.

Below is a step by step example to search for content mentioning Amazon, but only focussing on sources from the United States with the highest source quality and a retention period higher than 1 year.

First, let’s import the required dependencies and prepare the client:

Import required dependencies and prepare the client
from bigdata_client import Bigdata
from bigdata_client.models.sources import SourceRank, SourceRetentionPeriod

from bigdata_client.models.advanced_search_query import Similarity, Source, Entity
from bigdata_client.daterange import RollingDateRange

from functools import reduce

client = Bigdata()

Then, we need to query for a list of sources that match our criteria. In this case, we want sources from the United States, with a rank of RANK_1 (highest quality), and a retention period greater than 1 year. We can use the find_sources method to retrieve these sources:

Find the relevant sources based on the criteria
sources = client.knowledge_graph.find_sources(
    country="US",
    rank=SourceRank.RANK_1,
    retention=[
        SourceRetentionPeriod.FULL_HISTORY,
        SourceRetentionPeriod.FIVE_YEARS,
        SourceRetentionPeriod.TWO_YEARS,
    ], limit=100
)
if len(sources) == 0:
    raise ValueError("No sources found matching the criteria")

Once we have the sources, we can prepare a filter that matches any of the sources by joining them using the | operator.

Prepare the source filter for the query
source_filter = reduce(lambda x, y: x | y, sources)

Then we need to create the final query that matches the entity of Amazon and (&) the source filter we just created:

Prepare the final filter for the query
amazon = client.knowledge_graph.autosuggest("amazon")[0]
entity_filter = Entity(amazon.id)

query = entity_filter & source_filter

Finally, run the query to get the results (limited to 5 results for demonstration purposes):

Run the desired query
documents = client.search.new(
    query=query,
    date_range=RollingDateRange.LAST_YEAR,
).run(limit=5)

for result in documents:
    print(result.headline)
Output
Amazon And Google Backed AI Startup Anthropic Reportedly Hits $3 Billion In Annualized Revenue - Where's ChatGPT-Parent OpenAI At?
Amazon's Zoox Unit Issues Software Recall for 270 Robotaxis
Market Chatter: Amazon, Stellantis Agree to End Collaboration on In-Car Software
AWS Announces General Availability Of Amazon Aurora DSQL: Serverless, Distributed SQL Database Offering 99.999% Multi-Region Availability, Strong Consistency, PostgreSQL Compatibility, And 4x Faster Performance Without Infrastructure Management
The New York Times inks deal with Amazon to license content for AI training