Search

search_by_companies

Screen for documents based on a list of companies, sentences, and other filters.

Parameters

companies (List[Company]): The list of companies to use.
sentences (List[str]): The list of sentences to screen for.
start_date (str): The start date for the search.
end_date (str): The end date for the search.
scope (DocumentType): The document type scope (e.g., DocumentType.ALL, DocumentType.TRANSCRIPTS). Defaults to DocumentType.ALL.
fiscal_year (Optional[int]): The fiscal year to filter queries. If None, no fiscal year filter is applied.
sources (Optional[List[str]]): List of sources to filter on. If none, searches across all sources.
keywords (List[str]): A list of keywords for constructing keyword queries.
control_entities (Dict): A dictionary of control entities of different types for creating co-mentions queries.
freq (str): The frequency of the date ranges. Defaults to ‘M’.
sort_by (SortBy): The sorting criterion for the search results. Defaults to SortBy.RELEVANCE.
rerank_threshold (Optional[float]): The threshold for reranking the search results.
document_limit (int): The maximum number of documents to return per Bigdata query.
batch_size (int): The number of entities to include in each batched query.
**kwargs: Additional keyword arguments.

Returns

DataFrame: The DataFrame with the screening results.
Columns include:
timestamp_utc, document_id, sentence_id, headline, entity_id, document_type, is_reporting_entity, entity_name, entity_sector, entity_industry, entity_country, entity_ticker, text, other_entities, entities, masked_text, other_entities_map.

Example

from bigdata_research_tools.search import search_by_companies
from bigdata_client.models.entities import Company
from bigdata_client.models.search import DocumentType, SortBy

companies = [Company(name="Apple Inc.", ticker="AAPL"), Company(name="Microsoft Corp.", ticker="MSFT")]
sentences = ["AI is transforming business.", "Cloud adoption is accelerating."]

results_df = search_by_companies(
    companies=companies,
    sentences=sentences,
    start_date="2024-01-01",
    end_date="2024-06-30",
    scope=DocumentType.NEWS,
    sort_by=SortBy.RELEVANCE,
    document_limit=20,
    batch_size=5
)

search_narratives

Screen for documents based on input sentences and other filters.

Parameters

sentences (List[str]): The list of theme sentences to screen for.
start_date (str): The start date for the search.
end_date (str): The end date for the search.
scope (DocumentType): The document type scope (e.g., DocumentType.NEWS, DocumentType.TRANSCRIPTS).
fiscal_year (Optional[int]): The fiscal year to filter queries. If None, no fiscal year filter is applied.
sources (Optional[List[str]]): List of sources to filter on. If none, searches across all sources.
keywords (Optional[List[str]]): A list of keywords for constructing keyword queries.
control_entities (Optional[List[str]]): A list of control entity IDs for creating co-mentions queries.
freq (str): The frequency of the date ranges. Defaults to ‘M’.
sort_by (SortBy): The sorting criterion for the search results. Defaults to SortBy.RELEVANCE.
rerank_threshold (Optional[float]): The threshold for reranking the search results.
document_limit (int): The maximum number of documents to return per Bigdata query.
batch_size (int): The number of entities to include in each batched query.
**kwargs: Additional keyword arguments.

Returns

DataFrame: The DataFrame with the screening results.
Columns include: timestamp_utc, document_id, sentence_id, headline, text.

Example

from bigdata_research_tools.search import search_narratives
from bigdata_client.models.search import DocumentType, SortBy

sentences = [
    "AI is transforming healthcare.",
    "Cloud adoption is accelerating."
]

results_df = search_narratives(
    sentences=sentences,
    start_date="2024-01-01",
    end_date="2024-06-30",
    scope=DocumentType.NEWS,
    sort_by=SortBy.RELEVANCE,
    document_limit=20,
    batch_size=5
)

run_search

Execute multiple searches concurrently using the Bigdata client, with rate limiting.

Parameters

queries (list[QueryComponent]): A list of QueryComponent objects.
date_ranges (Optional[Union[AbsoluteDateRange, RollingDateRange, List[Union[AbsoluteDateRange, RollingDateRange]]]]): Date range filter for the search results.
sortby (SortBy): The sorting criterion for the search results. Defaults to SortBy.RELEVANCE.
scope (DocumentType): The scope of the documents to include. Defaults to DocumentType.ALL.
limit (int): The maximum number of documents to return per query. Defaults to 10.
only_results (bool): If True, return only the search results. If False, return the queries along with the results. Defaults to True.
rerank_threshold (Optional[float]): The threshold for reranking the search results.
**kwargs: Additional keyword arguments to pass to the underlying search manager.

Returns

list[list[Document]] if only_results is True: List of search results.
dict[tuple[QueryComponent, Union[AbsoluteDateRange, RollingDateRange]], list[Document]] if only_results is False: Mapping of query/date range to results.

Example

from bigdata_research_tools.search import run_search
from bigdata_client.models.search import QueryComponent, DocumentType, SortBy

queries = [
    QueryComponent(query="AI in healthcare"),
    QueryComponent(query="Cloud computing trends"),
]

results = run_search(
    queries=queries,
    date_ranges=None,
    sortby=SortBy.RELEVANCE,
    scope=DocumentType.NEWS,
    limit=5,
    only_results=True,
    rerank_threshold=0.7
)

build_batched_query

Build a list of batched query objects for advanced search, supporting similarity, keyword, entity, control entity, source, and fiscal year filters.

Parameters

sentences (List[str]): Sentences for creating similarity queries.
keywords (Optional[List[str]]): Keywords for constructing keyword queries.
entities (Optional[EntitiesToSearch]): Entities to include in the query (companies, people, orgs, etc).
control_entities (Optional[EntitiesToSearch]): Control entities for co-mentions or control queries.
sources (Optional[List[str]]): List of sources for constructing source queries.
batch_size (int): Number of entities per batch when auto-batching.
fiscal_year (Optional[int]): Fiscal year to filter queries.
scope (DocumentType): Document type scope (e.g., ALL, TRANSCRIPTS).
custom_batches (Optional[List[EntitiesToSearch]]): Custom entity batches for advanced batching.

Returns

List[QueryComponent]: List of expanded query components.

Example

from bigdata_research_tools.search.query_builder import build_batched_query, EntitiesToSearch
from bigdata_client.models.search import DocumentType

sentences = ["AI is transforming business.", "Cloud adoption is accelerating."]
entities = EntitiesToSearch(companies=["AAPL", "MSFT"])
sources = ["MT Newswires"]
batch_size = 10

queries = build_batched_query(
    sentences=sentences,
    keywords=None,
    entities=entities,
    control_entities=None,
    sources=sources,
    batch_size=batch_size,
    fiscal_year=None,
    scope=DocumentType.NEWS,
    custom_batches=None
)

EntitiesToSearch

A dataclass for specifying which entities to include in a search.

Fields

people (Optional[List[str]])
product (Optional[List[str]])
org (Optional[List[str]])
place (Optional[List[str]])
topic (Optional[List[str]])
concepts (Optional[List[str]])
companies (Optional[List[str]])

Example

entities = EntitiesToSearch(
    companies=["AAPL", "MSFT"],
    people=["Elon Musk"],
    org=["OpenAI"]
)

Introduction

Research Service

Research Tools

Search

Search

search_by_companies

search_narratives

run_search

build_batched_query

EntitiesToSearch

Introduction

Research Service

Research Tools

​Search

​search_by_companies

​search_narratives

​run_search

​build_batched_query

​EntitiesToSearch

Search

search_by_companies

search_narratives

run_search

build_batched_query

EntitiesToSearch