Search

search_by_companies

Screen for documents based on a list of companies, sentences, and other filters.

Parameters

  • companies (List[Company]): The list of companies to use.
  • sentences (List[str]): The list of sentences to screen for.
  • start_date (str): The start date for the search.
  • end_date (str): The end date for the search.
  • scope (DocumentType): The document type scope (e.g., DocumentType.ALL, DocumentType.TRANSCRIPTS). Defaults to DocumentType.ALL.
  • fiscal_year (Optional[int]): The fiscal year to filter queries. If None, no fiscal year filter is applied.
  • sources (Optional[List[str]]): List of sources to filter on. If none, searches across all sources.
  • keywords (List[str]): A list of keywords for constructing keyword queries.
  • control_entities (Dict): A dictionary of control entities of different types for creating co-mentions queries.
  • freq (str): The frequency of the date ranges. Defaults to ‘M’.
  • sort_by (SortBy): The sorting criterion for the search results. Defaults to SortBy.RELEVANCE.
  • rerank_threshold (Optional[float]): The threshold for reranking the search results.
  • document_limit (int): The maximum number of documents to return per Bigdata query.
  • batch_size (int): The number of entities to include in each batched query.
  • **kwargs: Additional keyword arguments.

Returns

  • DataFrame: The DataFrame with the screening results.
    Columns include:
    timestamp_utc, document_id, sentence_id, headline, entity_id, document_type, is_reporting_entity, entity_name, entity_sector, entity_industry, entity_country, entity_ticker, text, other_entities, entities, masked_text, other_entities_map.

Example

from bigdata_research_tools.search import search_by_companies
from bigdata_client.models.entities import Company
from bigdata_client.models.search import DocumentType, SortBy

companies = [Company(name="Apple Inc.", ticker="AAPL"), Company(name="Microsoft Corp.", ticker="MSFT")]
sentences = ["AI is transforming business.", "Cloud adoption is accelerating."]

results_df = search_by_companies(
    companies=companies,
    sentences=sentences,
    start_date="2024-01-01",
    end_date="2024-06-30",
    scope=DocumentType.NEWS,
    sort_by=SortBy.RELEVANCE,
    document_limit=20,
    batch_size=5
)

search_narratives

Screen for documents based on input sentences and other filters.

Parameters

  • sentences (List[str]): The list of theme sentences to screen for.
  • start_date (str): The start date for the search.
  • end_date (str): The end date for the search.
  • scope (DocumentType): The document type scope (e.g., DocumentType.NEWS, DocumentType.TRANSCRIPTS).
  • fiscal_year (Optional[int]): The fiscal year to filter queries. If None, no fiscal year filter is applied.
  • sources (Optional[List[str]]): List of sources to filter on. If none, searches across all sources.
  • keywords (Optional[List[str]]): A list of keywords for constructing keyword queries.
  • control_entities (Optional[List[str]]): A list of control entity IDs for creating co-mentions queries.
  • freq (str): The frequency of the date ranges. Defaults to ‘M’.
  • sort_by (SortBy): The sorting criterion for the search results. Defaults to SortBy.RELEVANCE.
  • rerank_threshold (Optional[float]): The threshold for reranking the search results.
  • document_limit (int): The maximum number of documents to return per Bigdata query.
  • batch_size (int): The number of entities to include in each batched query.
  • **kwargs: Additional keyword arguments.

Returns

  • DataFrame: The DataFrame with the screening results.
    Columns include: timestamp_utc, document_id, sentence_id, headline, text.

Example

from bigdata_research_tools.search import search_narratives
from bigdata_client.models.search import DocumentType, SortBy

sentences = [
    "AI is transforming healthcare.",
    "Cloud adoption is accelerating."
]

results_df = search_narratives(
    sentences=sentences,
    start_date="2024-01-01",
    end_date="2024-06-30",
    scope=DocumentType.NEWS,
    sort_by=SortBy.RELEVANCE,
    document_limit=20,
    batch_size=5
)

Execute multiple searches concurrently using the Bigdata client, with rate limiting.

Parameters

  • queries (list[QueryComponent]): A list of QueryComponent objects.
  • date_ranges (Optional[Union[AbsoluteDateRange, RollingDateRange, List[Union[AbsoluteDateRange, RollingDateRange]]]]): Date range filter for the search results.
  • sortby (SortBy): The sorting criterion for the search results. Defaults to SortBy.RELEVANCE.
  • scope (DocumentType): The scope of the documents to include. Defaults to DocumentType.ALL.
  • limit (int): The maximum number of documents to return per query. Defaults to 10.
  • only_results (bool): If True, return only the search results. If False, return the queries along with the results. Defaults to True.
  • rerank_threshold (Optional[float]): The threshold for reranking the search results.
  • **kwargs: Additional keyword arguments to pass to the underlying search manager.

Returns

  • list[list[Document]] if only_results is True: List of search results.
  • dict[tuple[QueryComponent, Union[AbsoluteDateRange, RollingDateRange]], list[Document]] if only_results is False: Mapping of query/date range to results.

Example

from bigdata_research_tools.search import run_search
from bigdata_client.models.search import QueryComponent, DocumentType, SortBy

queries = [
    QueryComponent(query="AI in healthcare"),
    QueryComponent(query="Cloud computing trends"),
]

results = run_search(
    queries=queries,
    date_ranges=None,
    sortby=SortBy.RELEVANCE,
    scope=DocumentType.NEWS,
    limit=5,
    only_results=True,
    rerank_threshold=0.7
)

build_batched_query

Build a list of batched query objects for advanced search, supporting similarity, keyword, entity, control entity, source, and fiscal year filters.

Parameters

  • sentences (List[str]): Sentences for creating similarity queries.
  • keywords (Optional[List[str]]): Keywords for constructing keyword queries.
  • entities (Optional[EntitiesToSearch]): Entities to include in the query (companies, people, orgs, etc).
  • control_entities (Optional[EntitiesToSearch]): Control entities for co-mentions or control queries.
  • sources (Optional[List[str]]): List of sources for constructing source queries.
  • batch_size (int): Number of entities per batch when auto-batching.
  • fiscal_year (Optional[int]): Fiscal year to filter queries.
  • scope (DocumentType): Document type scope (e.g., ALL, TRANSCRIPTS).
  • custom_batches (Optional[List[EntitiesToSearch]]): Custom entity batches for advanced batching.

Returns

  • List[QueryComponent]: List of expanded query components.

Example

from bigdata_research_tools.search.query_builder import build_batched_query, EntitiesToSearch
from bigdata_client.models.search import DocumentType

sentences = ["AI is transforming business.", "Cloud adoption is accelerating."]
entities = EntitiesToSearch(companies=["AAPL", "MSFT"])
sources = ["MT Newswires"]
batch_size = 10

queries = build_batched_query(
    sentences=sentences,
    keywords=None,
    entities=entities,
    control_entities=None,
    sources=sources,
    batch_size=batch_size,
    fiscal_year=None,
    scope=DocumentType.NEWS,
    custom_batches=None
)

EntitiesToSearch

A dataclass for specifying which entities to include in a search.

Fields

  • people (Optional[List[str]])
  • product (Optional[List[str]])
  • org (Optional[List[str]])
  • place (Optional[List[str]])
  • topic (Optional[List[str]])
  • concepts (Optional[List[str]])
  • companies (Optional[List[str]])

Example

entities = EntitiesToSearch(
    companies=["AAPL", "MSFT"],
    people=["Elon Musk"],
    org=["OpenAI"]
)