Skip to main content

ThematicScreener

We are sunsetting our SDKs and will no longer add new features, security patches, bug fixes, or technical support for them. To access the latest capabilities and ongoing improvements, we encourage you to migrate to our RESTful API.SDK support will officially end on December 31, 2026. On this date, the underlying endpoints used by the SDKs and related documentation will be decommissioned.To avoid any disruption to your services, please ensure your migration is complete by that date.For migration assistance, please contact us at support@bigdata.com.
Screen a universe of companies for exposure to a given theme using LLM-powered analysis. Parameters
  • llm_model (str): LLM <provider::model> to use for text processing and analysis (e.g., "openai::gpt-4o-mini").
  • main_theme (str): The main theme to screen for. Sub-themes are generated from this.
  • companies (List[Company]): List of companies to analyze.
  • start_date (str): Start date for searching relevant documents (YYYY-MM-DD).
  • end_date (str): End date for searching relevant documents (YYYY-MM-DD).
  • document_type (DocumentType): Type of documents to search (NEWS, FILINGS, TRANSCRIPTS).
  • fiscal_year (int, optional): Fiscal year to analyze.
  • sources (Optional[List[str]]): Filter search results by document sources.
  • rerank_threshold (Optional[float]): Threshold for reranking search results.
  • focus (str, optional): Focus for sub-theme generation.
Returns
  • ThematicScreener instance.
Example
from bigdata_research_tools.screener import ThematicScreener
from bigdata_client.models.entities import Company
from bigdata_client.models.search import DocumentType

companies = [Company(name="Apple Inc.", ticker="AAPL"), Company(name="Microsoft Corp.", ticker="MSFT")]

screener = ThematicScreener(
    llm_model="openai::gpt-4o-mini",
    main_theme="AI",
    companies=companies,
    start_date="2024-01-01",
    end_date="2024-06-30",
    document_type=DocumentType.NEWS,
    rerank_threshold=0.7
)

screen_companies

Screen companies for thematic exposure and generate labeled results, company/industry summaries, and motivations. Parameters
  • document_limit (int, optional): Max documents per query (default: 10).
  • batch_size (int, optional): Number of entities per batch (default: 10).
  • frequency (str, optional): Date range frequency ('Y', 'M', 'W', 'D', default: '3M').
  • word_range (Tuple[int, int], optional): Word count range for motivations (default: (50, 100)).
  • export_path (str, optional): Path to export results as Excel.
Returns
  • dict with:
    • df_labeled: DataFrame with labeled search results.
    • df_company: DataFrame with company-level output.
    • df_industry: DataFrame with industry-level output.
    • theme_tree: ThemeTree object used for screening.
Example
results = screener.screen_companies(
    document_limit=10,
    batch_size=10,
    frequency="3M",
    word_range=(50, 100),
    export_path="output/thematic_screening.xlsx"
)

df_labeled = results["df_labeled"]
df_company = results["df_company"]
df_industry = results["df_industry"]
theme_tree = results["theme_tree"]