Bigdata’s ecosystem comprises key high-quality content sources, including web content, premium news, press wires, call transcripts, and regulatory filings, sourced from more than 13,000 sources.

This content sources are classified based on their country and the retention period of the content they provide and then ranked based on their influence and trustworthiness of the provider.

  • Rank 1: Sources that are fully accountable, reputable, and balanced.
  • Rank 2: Sources that are official, reliable, and honest.
  • Rank 3: Sources that are acknowledged, formal, and credible.
  • Rank 4: Sources that are known and reasonable credible.
  • Rank 5: Sources that have satisfactory credibility.

The knowledge_graph service provides a way to search for these sources and retrieve them based on various criteria such as their name, country, source rank, and retention period. This allows users to filter and find the sources that best fit their needs for analysis and research.

Usage

The simplest way to search for sources in Bigdata by using the find_sources method and providing a name. This is similar to how autosuggest() works, but guarantees that the results will be Sources only.

The following Python code will return a list of source objects that match the search term.

Find by name
sources = bigdata.knowledge_graph.find_sources("bussiness insiders")

for source in sources:
    print(source)

Filtering sources by characteristics

If you want to find all sources that match some specific criteria, you can use the find_sources method to filter all sources available in Bigdata by country, source rank and retention period.

For example, to retrieve sources from the United States, with full historic data and a source rank of 1, you can use the following code:

Note how the values parameter is left unset

Find by characteristics
from bigdata_client.models.sources import SourceRank, SourceRetentionPeriod

sources = bigdata.knowledge_graph.find_sources(
    country="US",
    rank=SourceRank.RANK_1,
    retention=SourceRetentionPeriod.FULL_HISTORY,
)

The country parameter follows the ISO 3166-1 alpha-2 format.

Advanced usage

For more advanced usage, you can increase the limit on how many sources to retrieve and provide multiple values to each of the filtering parameters. For example, to retrieve 100 sources from the United States and Canada with a source rank of 1 and a retention period of 5 years or full history, you can use the following code:

Advanced usage
sources = bigdata.knowledge_graph.find_sources(
    country=["US", "CA"],
    rank=SourceRank.RANK_1,
    retention=[
        SourceRetentionPeriod.FIVE_YEARS,
        SourceRetentionPeriod.FULL_HISTORY
    ],
    limit=100
)

The limit parameter is optional and by default is set to 20. The maximum value is 1000.

Next steps

Once you have found the set of sources you are interested in, you can use them to filter your search queries. This allows you to focus on specific content providers. Some examples of how to use the sources for filtering a search query can be found in the Search with specific sources guide.