ThematicScreener

Screen a universe of companies for exposure to a given theme using LLM-powered analysis.

Parameters

  • llm_model (str): LLM <provider::model> to use for text processing and analysis (e.g., "openai::gpt-4o-mini").
  • main_theme (str): The main theme to screen for. Sub-themes are generated from this.
  • companies (List[Company]): List of companies to analyze.
  • start_date (str): Start date for searching relevant documents (YYYY-MM-DD).
  • end_date (str): End date for searching relevant documents (YYYY-MM-DD).
  • document_type (DocumentType): Type of documents to search (NEWS, FILINGS, TRANSCRIPTS).
  • fiscal_year (int, optional): Fiscal year to analyze.
  • sources (Optional[List[str]]): Filter search results by document sources.
  • rerank_threshold (Optional[float]): Threshold for reranking search results.
  • focus (str, optional): Focus for sub-theme generation.

Returns

  • ThematicScreener instance.

Example

from bigdata_research_tools.screener import ThematicScreener
from bigdata_client.models.entities import Company
from bigdata_client.models.search import DocumentType

companies = [Company(name="Apple Inc.", ticker="AAPL"), Company(name="Microsoft Corp.", ticker="MSFT")]

screener = ThematicScreener(
    llm_model="openai::gpt-4o-mini",
    main_theme="AI",
    companies=companies,
    start_date="2024-01-01",
    end_date="2024-06-30",
    document_type=DocumentType.NEWS,
    rerank_threshold=0.7
)

screen_companies

Screen companies for thematic exposure and generate labeled results, company/industry summaries, and motivations.

Parameters

  • document_limit (int, optional): Max documents per query (default: 10).
  • batch_size (int, optional): Number of entities per batch (default: 10).
  • frequency (str, optional): Date range frequency ('Y', 'M', 'W', 'D', default: '3M').
  • word_range (Tuple[int, int], optional): Word count range for motivations (default: (50, 100)).
  • export_path (str, optional): Path to export results as Excel.

Returns

  • dict with:
    • df_labeled: DataFrame with labeled search results.
    • df_company: DataFrame with company-level output.
    • df_industry: DataFrame with industry-level output.
    • theme_tree: ThemeTree object used for screening.

Example

results = screener.screen_companies(
    document_limit=10,
    batch_size=10,
    frequency="3M",
    word_range=(50, 100),
    export_path="output/thematic_screening.xlsx"
)

df_labeled = results["df_labeled"]
df_company = results["df_company"]
df_industry = results["df_industry"]
theme_tree = results["theme_tree"]