Thematic Screeners
Identify Companies Aligned with Investment Themes
Why?
In an investment landscape increasingly shaped by long-term structural trends — such as decarbonization, AI adoption, or supply chain reshaping — investors need tools to quickly identify which companies are exposed to the themes driving markets. Whether you’re looking for new investment ideas, managing thematic portfolios, or assessing risks tied to specific trends, thematic screening enables you to systematically map theme-company connections using vast amounts of unstructured data.
The bigdata-research-tools package provides a specialised class, ThematicScreener
, designed to help you uncover these connections across your chosen universe of companies. By applying thematic logic across different document types, you can assess how companies are positioned relative to the themes that matter:
DocumentType.NEWS
→ capturing thematic mentions in news coverageDocumentType.TRANSCRIPTS
→ spotting companies’ strategic alignment to a theme in earnings callsDocumentType.FILINGS
→ identifying disclosures, risks, or opportunities in SEC filings
Each Thematic Screener workflow follows a clear process:
- Select the company universe to screen for thematic exposure
- Define your own thematic labels that describe the target theme, or use our LLM generated mindmapper to build a theme taxonomy
- Retrieve and process relevant documents using Bigdata’s search tools
- Apply LLM-based classification to tag theme mentions at the company and industry level
- Analyze the outputs to quantify exposure, trends, and relative positioning
This notebook will walk you through using the Thematic Screener to generate actionable insights on how specific companies link to major investment themes — turning unstructured data into structured, decision-ready intelligence.
Ready to get started? Let’s dive in!
Setup and Imports
Below is the Python code required for setting up our environment and importing necessary libraries.
Define Output Paths
We define the output paths for our thematic screening results.
Load Environment Variables
The Thematic Screener requires API credentials for both the Bigdata API and the LLM API (in this case, OpenAI). Make sure you have these credentials available as environment variables or in a secure credential store.
Never hardcode credentials directly in your notebook or scripts.
Defining your Screening Parameters
- Main Theme (
main_theme
): The central concept to explore - Company Universe (
companies
): The set of companies to screen - Time Period (
start_date
andend_date
): The date range over which to run the search - Document Type (
document_type
): Specify which documents to search over (transcripts, filings, news) - Sources (
sources
): Specify set of sources within a document type, for example which news outlets (available via Bigdata API) you wish to search over - Fiscal Year (
fiscal_year
): If the document type is transcripts or filings, fiscal year needs to be specified - Model Selection (
llm_model
): The LLM model used to mindmap the theme and label the search result chunks - Rerank Threshold (
rerank_threshold
): By setting this value, you’re enabling the cross-encoder which reranks the results and selects those whose relevance is above the percentile you specify (0.7 being the 70th percentile). More information on the re-ranker can be found here. - Focus (
focus
): Specify a focus within the main theme. This will then be used in building the LLM generated mindmapper
Mindmap a Theme Taxonomy with Bigdata Research Tools
You can leverage Bigdata Research Tools to generate a comprehensive theme taxonomy with an LLM, breaking down a megatrend into smaller, well-defined concepts for more targeted analysis. More details on the implementation can be found in the API Reference here.
The taxonomy tree includes descriptive sentences that explicitly connect each sub-theme back to the “Supply Chain Reshaping” main theme, ensuring all search results remain contextually relevant to our central trend.
Retrieve Content
With the theme taxonomy and screening parameters, you can leverage the Bigdata API to run a search on company transcripts. We need to define 3 more parameters for searching:
- Frequency (
freq
): The frequency of the date ranges to search over. Supported values:- ‘Y’: Yearly intervals.
- ‘M’: Monthly intervals.
- ‘W’: Weekly intervals.
- ‘D’: Daily intervals. Defaults to ‘3M’.
- Document Limit (
document_limit
): The maximum number of documents to return per query to Bigdata API. - Batch Size (
batch_size
): The number of entities to include in a single batched query.
DataFrame Summary: 5 rows × 16 columns
Label the Results
Use an LLM to analyze each text chunk and determine its relevance to the sub-themes. Any chunks which aren’t explicitly linked to supply chain reshaping will be filtered out.
Assess Thematic Exposure
We’ll look at the top 10 most exposed companies to supply chain reshaping. The function get_scored_df
will calculate the composite thematic score, summing up the scores across the sub-themes for each company (df_company
) or industry (df_industry
).
Now, let’s visualize the results using Plotly to create an interactive dashboard:
Extract Key Insights
The visualizations reveal key insights about how companies are positioning themselves within the supply chain reshaping theme:
AI and Machine Learning Emerges as the Core Enabler
With the highest cumulative score across all companies, AI and Machine Learning is the most dominant theme, highlighting its foundational role in predictive analytics, automation, and optimization within modern supply chains.
Circular Economy and Automation as Structural Shifts
The strong presence of Circular Economy Practices and Automation & Robotics indicates a structural shift toward sustainable and efficient supply chain models—companies are not just digitizing but rethinking operational design.
Tech-Centric Players Lead the Pack
Siemens AG, Infineon Technologies AG, and Qualcomm Inc. are the frontrunners in thematic exposure, underscoring that companies at the intersection of industrial technology and digital infrastructure are best positioned to drive—and benefit from—supply chain transformation.
IoT Integration as a Bridge Between Physical and Digital
IoT’s high ranking shows its critical role in connecting assets, enabling real-time visibility, and facilitating advanced automation, especially for manufacturers and hardware-driven firms.
Industry Polarisation
Sector Engagement
- Semiconductors and Computer Services industries show the strongest average exposure, reflecting their integral role in enabling supply chain tech (e.g., sensors, connectivity, software).
- Traditional Sectors like Diversified Industrials show broader but shallower engagement, suggesting they are still in earlier phases of thematic adoption.
Strategic Focus
Concentration vs. Diversification in Exposure
Most companies exhibit thematic concentration, focusing efforts on a few high-impact areas rather than spreading across all themes—likely reflecting strategic prioritization rather than lack of alignment.
Export the Results
Export the data as Excel files for further analysis or to share with the team.
Conclusion
The Thematic Screener provides a powerful way to identify companies that are most aligned with or exposed to specific investment themes. By leveraging BigData’s search capabilities and applying LLM-based classification, you can:
- Discover thematic leaders - Find companies with the strongest strategic alignment to emerging trends
- Compare across industries - Identify which sectors are most proactive in addressing thematic challenges and opportunities
- Identify investment opportunities - Spot companies that may be undervalued relative to their thematic positioning
- Monitor thematic evolution - Track how themes gain or lose prominence across your investment universe over time
Whether you’re building thematic portfolios, conducting sector research, or seeking alpha through theme-based strategies, the Thematic Screener transforms unstructured data into structured, decision-ready intelligence.