Search

We are sunsetting our SDKs and will no longer add new features, security patches, bug fixes, or technical support for them. To access the latest capabilities and ongoing improvements, we encourage you to migrate to our RESTful API.SDK support will officially end on December 31, 2026. On this date, the underlying endpoints used by the SDKs and related documentation will be decommissioned.To avoid any disruption to your services, please ensure your migration is complete by that date.For migration assistance, please contact us at support@bigdata.com.

search_by_companies

Screen for documents based on a list of companies, sentences, and other filters. Parameters

companies (List[Company]): The list of companies to use.
sentences (List[str]): The list of sentences to screen for.
start_date (str): The start date for the search.
end_date (str): The end date for the search.
scope (DocumentType): The document type scope (e.g., DocumentType.ALL, DocumentType.TRANSCRIPTS). Defaults to DocumentType.ALL.
fiscal_year (Optional[int]): The fiscal year to filter queries. If None, no fiscal year filter is applied.
sources (Optional[List[str]]): List of sources to filter on. If none, searches across all sources.
keywords (List[str]): A list of keywords for constructing keyword queries.
control_entities (Dict): A dictionary of control entities of different types for creating co-mentions queries.
freq (str): The frequency of the date ranges. Defaults to ‘M’.
sort_by (SortBy): The sorting criterion for the search results. Defaults to SortBy.RELEVANCE.
rerank_threshold (Optional[float]): The threshold for reranking the search results.
document_limit (int): The maximum number of documents to return per Bigdata query.
batch_size (int): The number of entities to include in each batched query.
**kwargs: Additional keyword arguments.

Returns

DataFrame: The DataFrame with the screening results.
Columns include:
timestamp_utc, document_id, sentence_id, headline, entity_id, document_type, is_reporting_entity, entity_name, entity_sector, entity_industry, entity_country, entity_ticker, text, other_entities, entities, masked_text, other_entities_map.

Example

from bigdata_research_tools.search import search_by_companies
from bigdata_client.models.entities import Company
from bigdata_client.models.search import DocumentType, SortBy

companies = [Company(name="Apple Inc.", ticker="AAPL"), Company(name="Microsoft Corp.", ticker="MSFT")]
sentences = ["AI is transforming business.", "Cloud adoption is accelerating."]

results_df = search_by_companies(
    companies=companies,
    sentences=sentences,
    start_date="2024-01-01",
    end_date="2024-06-30",
    scope=DocumentType.NEWS,
    sort_by=SortBy.RELEVANCE,
    document_limit=20,
    batch_size=5
)

search_narratives

Screen for documents based on input sentences and other filters. Parameters

sentences (List[str]): The list of theme sentences to screen for.
start_date (str): The start date for the search.
end_date (str): The end date for the search.
scope (DocumentType): The document type scope (e.g., DocumentType.NEWS, DocumentType.TRANSCRIPTS).
fiscal_year (Optional[int]): The fiscal year to filter queries. If None, no fiscal year filter is applied.
sources (Optional[List[str]]): List of sources to filter on. If none, searches across all sources.
keywords (Optional[List[str]]): A list of keywords for constructing keyword queries.
control_entities (Optional[List[str]]): A list of control entity IDs for creating co-mentions queries.
freq (str): The frequency of the date ranges. Defaults to ‘M’.
sort_by (SortBy): The sorting criterion for the search results. Defaults to SortBy.RELEVANCE.
rerank_threshold (Optional[float]): The threshold for reranking the search results.
document_limit (int): The maximum number of documents to return per Bigdata query.
batch_size (int): The number of entities to include in each batched query.
**kwargs: Additional keyword arguments.

Returns

DataFrame: The DataFrame with the screening results.
Columns include: timestamp_utc, document_id, sentence_id, headline, text.

Example

from bigdata_research_tools.search import search_narratives
from bigdata_client.models.search import DocumentType, SortBy

sentences = [
    "AI is transforming healthcare.",
    "Cloud adoption is accelerating."
]

results_df = search_narratives(
    sentences=sentences,
    start_date="2024-01-01",
    end_date="2024-06-30",
    scope=DocumentType.NEWS,
    sort_by=SortBy.RELEVANCE,
    document_limit=20,
    batch_size=5
)

run_search

Execute multiple searches concurrently using the Bigdata client, with rate limiting. Parameters

queries (list[QueryComponent]): A list of QueryComponent objects.
date_ranges (Optional[Union[AbsoluteDateRange, RollingDateRange, List[Union[AbsoluteDateRange, RollingDateRange]]]]): Date range filter for the search results.
sortby (SortBy): The sorting criterion for the search results. Defaults to SortBy.RELEVANCE.
scope (DocumentType): The scope of the documents to include. Defaults to DocumentType.ALL.
limit (int): The maximum number of documents to return per query. Defaults to 10.
only_results (bool): If True, return only the search results. If False, return the queries along with the results. Defaults to True.
rerank_threshold (Optional[float]): The threshold for reranking the search results.
**kwargs: Additional keyword arguments to pass to the underlying search manager.

Returns

list[list[Document]] if only_results is True: List of search results.
dict[tuple[QueryComponent, Union[AbsoluteDateRange, RollingDateRange]], list[Document]] if only_results is False: Mapping of query/date range to results.

Example

from bigdata_research_tools.search import run_search
from bigdata_client.models.search import QueryComponent, DocumentType, SortBy

queries = [
    QueryComponent(query="AI in healthcare"),
    QueryComponent(query="Cloud computing trends"),
]

results = run_search(
    queries=queries,
    date_ranges=None,
    sortby=SortBy.RELEVANCE,
    scope=DocumentType.NEWS,
    limit=5,
    only_results=True,
    rerank_threshold=0.7
)

build_batched_query

Build a list of batched query objects for advanced search, supporting similarity, keyword, entity, control entity, source, and fiscal year filters. Parameters

sentences (List[str]): Sentences for creating similarity queries.
keywords (Optional[List[str]]): Keywords for constructing keyword queries.
entities (Optional[EntitiesToSearch]): Entities to include in the query (companies, people, orgs, etc).
control_entities (Optional[EntitiesToSearch]): Control entities for co-mentions or control queries.
sources (Optional[List[str]]): List of sources for constructing source queries.
batch_size (int): Number of entities per batch when auto-batching.
fiscal_year (Optional[int]): Fiscal year to filter queries.
scope (DocumentType): Document type scope (e.g., ALL, TRANSCRIPTS).
custom_batches (Optional[List[EntitiesToSearch]]): Custom entity batches for advanced batching.

Returns

List[QueryComponent]: List of expanded query components.

Example

from bigdata_research_tools.search.query_builder import build_batched_query, EntitiesToSearch
from bigdata_client.models.search import DocumentType

sentences = ["AI is transforming business.", "Cloud adoption is accelerating."]
entities = EntitiesToSearch(companies=["AAPL", "MSFT"])
sources = ["MT Newswires"]
batch_size = 10

queries = build_batched_query(
    sentences=sentences,
    keywords=None,
    entities=entities,
    control_entities=None,
    sources=sources,
    batch_size=batch_size,
    fiscal_year=None,
    scope=DocumentType.NEWS,
    custom_batches=None
)

EntitiesToSearch

A dataclass for specifying which entities to include in a search. Fields

people (Optional[List[str]])
product (Optional[List[str]])
org (Optional[List[str]])
place (Optional[List[str]])
topic (Optional[List[str]])
concepts (Optional[List[str]])
companies (Optional[List[str]])

Example

entities = EntitiesToSearch(
    companies=["AAPL", "MSFT"],
    people=["Elon Musk"],
    org=["OpenAI"]
)

Documentation Index

​Search

​search_by_companies

​search_narratives

​run_search

​build_batched_query

​EntitiesToSearch

Search

search_by_companies

search_narratives

run_search

build_batched_query

EntitiesToSearch