What is it?

Thematic investing has emerged as one of the most powerful approaches for capturing long-term structural trends that reshape industries and markets. However, identifying which companies are genuinely positioned to benefit from mega-trends like artificial intelligence, decarbonization, or supply chain transformation requires systematic analysis beyond traditional sector classifications. Thematic screening is the process of systematically identifying and quantifying corporate exposure to specific investment themes using unstructured data analysis. Unlike simple keyword searches or manual research, modern thematic screening leverages AI and natural language processing to analyze earnings calls, regulatory filings, and news articles to understand how companies are actually positioned relative to emerging trends. This approach enables investors to move beyond surface-level theme association and discover companies with genuine strategic alignment to structural changes, often uncovering opportunities that traditional sector-based analysis might miss.

Deploy your own thematic screener service

To support users building with Bigdata, we’ve released a pre-built Docker image that lets you run your own thematic screening analysis. This service is built on top of the same Thematic Screener technology available in our bigdata-research-tools package, providing a deployable API for systematic thematic analysis. The service uses the Bigdata API to analyze corporate exposure to specific themes using unstructured data from news, earnings calls, and regulatory filings. It combines LLM-powered theme taxonomies, semantic content retrieval, and structured scoring methodologies to transform narrative signals into quantifiable thematic exposure metrics. If you prefer to work with the thematic screener as a Python package directly, you can explore the Thematic Screeners which provides comprehensive Jupyter notebook cookbooks and detailed examples.
Open in GitHub

What the Thematic Screener Service Provides

The thematic screener service offers several key capabilities for systematic thematic investing:
  • Automated Theme Taxonomy Generation: Uses AI to break down complex investment themes into specific, measurable sub-categories for comprehensive analysis
  • Systematic Positioning Analysis: Identifies how companies align with key themes through analysis of corporate communications and strategic initiatives
  • Cross-sector Exposure Comparison: Enables portfolio-level thematic assessment by providing standardized exposure metrics across different industries
  • Qualitative-to-Quantitative Transformation: Converts narrative signals from unstructured data into structured, actionable investment insights
  • Multi-source Intelligence: Analyzes data from earnings calls, regulatory filings, and news articles to capture comprehensive thematic positioning
  • RESTful API Interface: Offers programmatic access for integration into existing investment research and portfolio management workflows
  • Interactive Web Interface: Provides a user-friendly dashboard for ad-hoc thematic analysis and exploration
Current Limitation: The demo version is currently restricted to analyzing TRANSCRIPTS only. Support for additional document sources (NEWS and FILINGS) is planned for future releases.

Setup

Some pre-requisites are required to run the service:
  • A Bigdata.com account that supports programmatic access.
  • A Bigdata.com API key, which can be obtained from your account settings.
  • An LLM and embeddings provider, currently the service supports OpenAI.
To build and run the Docker image, you need to have Docker installed on your machine.

Quickstart

To quickly get started, you have two options:
  1. Build and run locally: You need to build the docker image first and then run it:
# Clone the repository and navigate to the folder
git clone git@github.com:Bigdata-com/bigdata-thematic-screener.git
cd "bigdata-thematic-screener"

# Build the docker image
docker build -t bigdata_thematic_screener .

# Run the docker image
docker run -d \
  --name bigdata_thematic_screener \
  -p 8000:8000 \
  -e BIGDATA_API_KEY=<bigdata-api-key-here> \
  -e OPENAI_API_KEY=<openai-api-key-here> \
  bigdata_thematic_screener
  1. Run directly from GitHub Container Registry:
docker run -d \
  --name bigdata_thematic_screener \
  -p 8000:8000 \
  -e BIGDATA_API_KEY=<bigdata-api-key-here> \
  -e OPENAI_API_KEY=<openai-api-key-here> \
  ghcr.io/bigdata-com/bigdata-thematic-screener:latest
This will start the thematic screener service locally on port 8000. You can then access the service at http://localhost:8000/ and the documentation for the API at http://localhost:8000/docs.
For custom enterprise-ready solutions, please contact us at support@bigdata.com. If you are interested in using a different LLM provider—whether enterprise-grade or self-hosted solutions, let us know by opening an issue on the Bigdata.com GitHub repository or through our support channels.

Usage

The thematic screener service provides a comprehensive API for systematic thematic analysis across your investment universe. Once the service is running, you can access it on port 8000 by default through both programmatic endpoints and an interactive web interface.

Access Methods

  • Interactive Web Interface: Navigate to http://localhost:8000/ for a user-friendly dashboard that lets you run thematic screenings through a visual interface
  • API Documentation: Visit http://localhost:8000/docs for complete API documentation with interactive examples
  • Programmatic Access: Use the RESTful API endpoints for integration into your existing investment research workflows

Core Parameters

To run a thematic screening, you’ll need to configure several key parameters that define the scope and focus of your analysis: Theme Definition Parameters:
  • theme: The main theme, topic, or trend you want to screen for exposure. It can be specified as a single word or as a short sentence. The Screener will generate a list of sub-themes representing individual, self contained components of the main theme. The theme can contain multiple core concepts, but we would recommend not adding too many core concepts in the same screener run. (e.g., “Artificial Intelligence”, “Supply Chain Reshaping”, “Energy Transition”)
  • focus: Use this parameter to pass additional, custom instructions to the llm when breaking down the theme into sub-themes. These parameters allow you to guide the mindmap creation and customize it to your needs, as it allows users to inject their own domain knowledge, your specific point of view, and it will ensure that the mindmap will focus on the core concepts required.
Company Universe:
  • companies: The portfolio of companies you want to screen for exposure, either as a list of RavenPack entity IDs representing individual companies ["4A6F00", "D8442A"] or a watchlist ID "44118802-9104-4265-b97a-2e6d88d74893". Watchlists can be created programmatically using the Bigdata.com SDK or through the Bigdata app
Analysis Configuration:
  • start_date / end_date: The start and end of the time sample during which you want to screen your portfolio for thematic exposure. The value has to be specified as a string in YYYY-MM-DD format.
  • document_type: The type of documents to search over. Use this parameter to point your screener to analyse text data extracted from news, corporate transcripts, or corporate filings. Currently, only supports “TRANSCRIPTS”.
  • fiscal_year: when screening for exposure in Transcripts and Filings, these documents can be further filtered by their reporting details. fiscal_year represents the annual reporting period of the transcript and can be used in combination with start_date and end_date to further limit the queries to only those that are time sensitive from a calendar year and reporting period perspective. This parameter is not to be applied to News as news are not augmented with reporting metadata.
  • frequency: This parameter allows you to break down your sample range into higher frequency intervals. It can be useful when running a screener on a large sample, as the document_limit parameter will limit the ability of search to retrieve a representative sample of documents across many months. Instead of increasing the document limit, breaking down the creation of a large archive into smaller intervals will allow you to have more control over the retrieval process and obtain a more meaningful representation of exposure over time. The value must be one of: D, Y, M, 3M or Y.
For a complete list of parameters and advanced configuration options, refer to the API documentation at http://localhost:8000/docs.

Example: Analyzing Supply Chain Reshaping Theme

Here’s a practical example analyzing how companies in your portfolio are positioned relative to supply chain transformation trends:
curl -X 'POST' \
  'http://localhost:8000/thematic-screener' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "theme": "Supply Chain Reshaping",
  "focus": "Logistics automation, nearshoring strategies, and supply chain digitalization",
  "companies": "44118802-9104-4265-b97a-2e6d88d74893",
  "start_date": "2024-01-01",
  "end_date": "2024-12-31",
  "document_type": "TRANSCRIPTS",
  "fiscal_year": 2024,
  "frequency": "M"
}'

Built on top of the thematic screener service

The thematic screener service serves as a powerful foundation for building sophisticated thematic investment and research applications. Here are comprehensive use cases you can develop:

Thematic Portfolio Management

  • Theme-Based Portfolio Construction: Build portfolios that systematically capture exposure to specific mega-trends with quantified thematic scores
  • Dynamic Theme Allocation: Automatically adjust portfolio weights based on changing thematic exposure scores as companies evolve their strategic focus
  • Multi-Theme Optimization: Create portfolios that balance exposure across multiple complementary themes (e.g., AI + Cybersecurity + Cloud Computing)
  • Thematic ESG Integration: Combine thematic analysis with ESG scores to build sustainable investing strategies aligned with structural trends

Investment Research Automation

  • Thematic Discovery Engine: Automatically identify emerging themes by analyzing patterns in corporate communications across your investment universe
  • Competitor Thematic Analysis: Compare how different companies within the same industry are positioned relative to key investment themes
  • Theme Evolution Tracking: Monitor how thematic positioning changes over time to identify companies accelerating or decelerating their strategic alignment
  • Cross-Sector Theme Mapping: Discover unexpected thematic leaders in non-obvious industries

Client-Facing Investment Solutions

  • Personalized Thematic Strategies: Create custom investment strategies based on clients’ specific thematic interests and conviction levels
  • Thematic Education Tools: Build interactive tools that help clients understand how their investments align with global mega-trends
  • Theme-Based Performance Attribution: Provide detailed analysis of how thematic positioning contributed to portfolio performance
  • Thematic Risk Reporting: Generate reports showing how portfolio exposure to specific themes creates concentration risk or opportunity

Institutional Applications

  • Pension Fund Thematic Mandates: Help institutional investors implement thematic mandates with systematic exposure measurement and monitoring
  • Endowment Theme Integration: Enable endowments to align their investment strategies with their institution’s mission through thematic analysis
  • Sovereign Wealth Fund Trend Analysis: Provide country-level investors with tools to assess how global themes affect their domestic economies and investment opportunities
  • Insurance Company Theme Assessment: Help insurance companies understand how thematic trends might impact their investment portfolios and underwriting risks

Advanced Analytics Platform

  • Machine Learning Enhancement: Use thematic scores as features in ML models for stock selection and portfolio optimization
  • Alternative Data Integration: Combine thematic analysis with satellite data, patent filings, or other alternative datasets for enhanced insights
  • Real-Time Theme Monitoring: Build systems that continuously update thematic scores based on new corporate communications and market developments
  • Theme Sentiment Analysis: Track how market sentiment toward specific themes evolves and identify inflection points

Integration and Workflow Applications

  • Research Management System Integration: Connect thematic insights directly into platforms or research systems
  • Trading System Integration: Automatically generate trade ideas based on changes in thematic exposure scores
  • CRM Enhancement: Enrich client relationship management with thematic insights for more informed investment conversations
  • Compliance and Mandate Monitoring: Build automated systems that ensure portfolios maintain required thematic exposure levels per investment mandates