Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.bigdata.com/llms.txt

Use this file to discover all available pages before exploring further.

Search

We are sunsetting our SDKs and will no longer add new features, security patches, bug fixes, or technical support for them. To access the latest capabilities and ongoing improvements, we encourage you to migrate to our RESTful API.SDK support will officially end on December 31, 2026. On this date, the underlying endpoints used by the SDKs and related documentation will be decommissioned.To avoid any disruption to your services, please ensure your migration is complete by that date.For migration assistance, please contact us at support@bigdata.com.

search_by_companies

Screen for documents based on a list of companies, sentences, and other filters. Parameters
  • companies (List[Company]): The list of companies to use.
  • sentences (List[str]): The list of sentences to screen for.
  • start_date (str): The start date for the search.
  • end_date (str): The end date for the search.
  • scope (DocumentType): The document type scope (e.g., DocumentType.ALL, DocumentType.TRANSCRIPTS). Defaults to DocumentType.ALL.
  • fiscal_year (Optional[int]): The fiscal year to filter queries. If None, no fiscal year filter is applied.
  • sources (Optional[List[str]]): List of sources to filter on. If none, searches across all sources.
  • keywords (List[str]): A list of keywords for constructing keyword queries.
  • control_entities (Dict): A dictionary of control entities of different types for creating co-mentions queries.
  • freq (str): The frequency of the date ranges. Defaults to ‘M’.
  • sort_by (SortBy): The sorting criterion for the search results. Defaults to SortBy.RELEVANCE.
  • rerank_threshold (Optional[float]): The threshold for reranking the search results.
  • document_limit (int): The maximum number of documents to return per Bigdata query.
  • batch_size (int): The number of entities to include in each batched query.
  • **kwargs: Additional keyword arguments.
Returns
  • DataFrame: The DataFrame with the screening results.
    Columns include:
    timestamp_utc, document_id, sentence_id, headline, entity_id, document_type, is_reporting_entity, entity_name, entity_sector, entity_industry, entity_country, entity_ticker, text, other_entities, entities, masked_text, other_entities_map.
Example
from bigdata_research_tools.search import search_by_companies
from bigdata_client.models.entities import Company
from bigdata_client.models.search import DocumentType, SortBy

companies = [Company(name="Apple Inc.", ticker="AAPL"), Company(name="Microsoft Corp.", ticker="MSFT")]
sentences = ["AI is transforming business.", "Cloud adoption is accelerating."]

results_df = search_by_companies(
    companies=companies,
    sentences=sentences,
    start_date="2024-01-01",
    end_date="2024-06-30",
    scope=DocumentType.NEWS,
    sort_by=SortBy.RELEVANCE,
    document_limit=20,
    batch_size=5
)

search_narratives

Screen for documents based on input sentences and other filters. Parameters
  • sentences (List[str]): The list of theme sentences to screen for.
  • start_date (str): The start date for the search.
  • end_date (str): The end date for the search.
  • scope (DocumentType): The document type scope (e.g., DocumentType.NEWS, DocumentType.TRANSCRIPTS).
  • fiscal_year (Optional[int]): The fiscal year to filter queries. If None, no fiscal year filter is applied.
  • sources (Optional[List[str]]): List of sources to filter on. If none, searches across all sources.
  • keywords (Optional[List[str]]): A list of keywords for constructing keyword queries.
  • control_entities (Optional[List[str]]): A list of control entity IDs for creating co-mentions queries.
  • freq (str): The frequency of the date ranges. Defaults to ‘M’.
  • sort_by (SortBy): The sorting criterion for the search results. Defaults to SortBy.RELEVANCE.
  • rerank_threshold (Optional[float]): The threshold for reranking the search results.
  • document_limit (int): The maximum number of documents to return per Bigdata query.
  • batch_size (int): The number of entities to include in each batched query.
  • **kwargs: Additional keyword arguments.
Returns
  • DataFrame: The DataFrame with the screening results.
    Columns include: timestamp_utc, document_id, sentence_id, headline, text.
Example
from bigdata_research_tools.search import search_narratives
from bigdata_client.models.search import DocumentType, SortBy

sentences = [
    "AI is transforming healthcare.",
    "Cloud adoption is accelerating."
]

results_df = search_narratives(
    sentences=sentences,
    start_date="2024-01-01",
    end_date="2024-06-30",
    scope=DocumentType.NEWS,
    sort_by=SortBy.RELEVANCE,
    document_limit=20,
    batch_size=5
)

Execute multiple searches concurrently using the Bigdata client, with rate limiting. Parameters
  • queries (list[QueryComponent]): A list of QueryComponent objects.
  • date_ranges (Optional[Union[AbsoluteDateRange, RollingDateRange, List[Union[AbsoluteDateRange, RollingDateRange]]]]): Date range filter for the search results.
  • sortby (SortBy): The sorting criterion for the search results. Defaults to SortBy.RELEVANCE.
  • scope (DocumentType): The scope of the documents to include. Defaults to DocumentType.ALL.
  • limit (int): The maximum number of documents to return per query. Defaults to 10.
  • only_results (bool): If True, return only the search results. If False, return the queries along with the results. Defaults to True.
  • rerank_threshold (Optional[float]): The threshold for reranking the search results.
  • **kwargs: Additional keyword arguments to pass to the underlying search manager.
Returns
  • list[list[Document]] if only_results is True: List of search results.
  • dict[tuple[QueryComponent, Union[AbsoluteDateRange, RollingDateRange]], list[Document]] if only_results is False: Mapping of query/date range to results.
Example
from bigdata_research_tools.search import run_search
from bigdata_client.models.search import QueryComponent, DocumentType, SortBy

queries = [
    QueryComponent(query="AI in healthcare"),
    QueryComponent(query="Cloud computing trends"),
]

results = run_search(
    queries=queries,
    date_ranges=None,
    sortby=SortBy.RELEVANCE,
    scope=DocumentType.NEWS,
    limit=5,
    only_results=True,
    rerank_threshold=0.7
)

build_batched_query

Build a list of batched query objects for advanced search, supporting similarity, keyword, entity, control entity, source, and fiscal year filters. Parameters
  • sentences (List[str]): Sentences for creating similarity queries.
  • keywords (Optional[List[str]]): Keywords for constructing keyword queries.
  • entities (Optional[EntitiesToSearch]): Entities to include in the query (companies, people, orgs, etc).
  • control_entities (Optional[EntitiesToSearch]): Control entities for co-mentions or control queries.
  • sources (Optional[List[str]]): List of sources for constructing source queries.
  • batch_size (int): Number of entities per batch when auto-batching.
  • fiscal_year (Optional[int]): Fiscal year to filter queries.
  • scope (DocumentType): Document type scope (e.g., ALL, TRANSCRIPTS).
  • custom_batches (Optional[List[EntitiesToSearch]]): Custom entity batches for advanced batching.
Returns
  • List[QueryComponent]: List of expanded query components.
Example
from bigdata_research_tools.search.query_builder import build_batched_query, EntitiesToSearch
from bigdata_client.models.search import DocumentType

sentences = ["AI is transforming business.", "Cloud adoption is accelerating."]
entities = EntitiesToSearch(companies=["AAPL", "MSFT"])
sources = ["MT Newswires"]
batch_size = 10

queries = build_batched_query(
    sentences=sentences,
    keywords=None,
    entities=entities,
    control_entities=None,
    sources=sources,
    batch_size=batch_size,
    fiscal_year=None,
    scope=DocumentType.NEWS,
    custom_batches=None
)

EntitiesToSearch

A dataclass for specifying which entities to include in a search. Fields
  • people (Optional[List[str]])
  • product (Optional[List[str]])
  • org (Optional[List[str]])
  • place (Optional[List[str]])
  • topic (Optional[List[str]])
  • concepts (Optional[List[str]])
  • companies (Optional[List[str]])
Example
entities = EntitiesToSearch(
    companies=["AAPL", "MSFT"],
    people=["Elon Musk"],
    org=["OpenAI"]
)