Why It Matters

AI disruption requires identifying which companies will thrive versus those vulnerable to displacement, but evaluating AI positioning across scattered news, earnings calls, and filings is resource-intensive. Investment decisions need systematic analysis of both AI risks and strategic responses to identify winners and potential shorts.

What It Does

The GenerateReport class in the bigdata-research-tools package systematically evaluates both AI disruption risks and proactive AI adoption across company watchlists using unstructured data from multiple sources. Built for portfolio managers and financial analysts, it transforms scattered AI-related information into quantifiable positioning intelligence and identifies investment opportunities based on AI readiness.

How It Works

The GenerateReport combines semantic content retrieval, dual-theme analysis, and comparative scoring methodologies to deliver:
  • Risk-proactivity assessment measuring both AI disruption vulnerability and strategic AI adoption initiatives
  • Standardized scoring system enabling cross-company comparison of AI positioning and competitive readiness
  • Investment intelligence generation revealing underlying narratives that shape each company’s AI transformation journey
  • Structured output for reporting ranking companies by AI resilience and strategic positioning

A Real-World Use Case

This cookbook demonstrates the complete workflow through analyzing AI disruption risks and proactive responses across a company watchlist, showing how the generator automatically quantifies AI positioning and identifies potential investment opportunities based on strategic AI readiness. Ready to get started? Let’s dive in! Open in GitHub

Prerequisites

To run the Report Generator workflow, you can choose between two options:
  • 💻 GitHub cookbook
    • Use this if you prefer working locally or in a custom environment.
    • Follow the setup and execution instructions in the README.md.
    • API keys are required:
      • Option 1: Follow the key setup process described in the README.md
      • Option 2: Refer to this guide: How to initialise environment variables
        • ❗ When using this method, you must manually add the OpenAI API key:
          # OpenAI credentials
          OPENAI_API_KEY = "<YOUR_OPENAI_API_KEY>"
          
  • 🐳 Docker Installation
    • Docker installation is available for containerized deployment.
    • Provides an alternative setup method with containerized deployment, simplifying the environment configuration for those preferring Docker-based solutions.

Setup and Imports

Below is the Python code required for setting up our environment and importing necessary libraries.
from src.report_generator import GenerateReport
from src.summary.summary import SummarizerCompany
from src.tool import create_styled_table, plot_company_scores, plot_company_scores_log_log, generate_html_report_v4, display_report


from bigdata_research_tools.search.screener_search import search_by_companies
from bigdata_research_tools.labeler.screener_labeler import ScreenerLabeler
from bigdata_research_tools.excel import ExcelManager
from bigdata_client.models.search import DocumentType
from bigdata_client import Bigdata

import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime
import pandas as pd
from IPython.display import display, HTML

# Define output file paths for our report
output_dir = f"{current_dir}/output"
os.makedirs(output_dir, exist_ok=True)

export_path = f"{output_dir}/ai_disruption_report.xlsx"

Defining the Report Parameters

Fixed Parameters

  • Keywords (keywords): Keywords used for improving retrieval
  • Main Theme (main_theme_risk): The central concept to explore
  • Main Theme Proactivity (main_theme_proactivity): Proactive measures and strategies companies take to address the main theme risk
  • List of sentences about risks themes (list_sentences_risks): Sentences used to improve the retrieval regarding the main theme
  • List of sentences about risks proactivity themes (list_sentences_proactivity): Sentences used to improve the retrieval regarding the proactivity against the main theme
  • Bigdata (bigdata): Bigdata connection
# ===== Fixed Parameters =====

# Keywords for filtering content
keywords = ['AI']

# Risk Theme
main_theme_risk = 'Risks or concerns related to AI disruption'

# Proactivity Theme
main_theme_proactivity = 'Integrating GenAI into products or forming partnerships to improve GenAI offerings'

# Sentences expressing risk-related narratives
list_sentences_risks = [
    "Companies face substantial risks from AI-driven market disruptions today.",
    "AI advancements threaten traditional business models and competitive advantages.",
    "Disruption by AI can lead to loss of market share rapidly.",
    "Existing workflows may become obsolete due to AI innovations.",
    "Companies risk being outpaced by agile AI-focused competitors."
]

# Sentences expressing proactivity-related narratives
list_sentences_proactivity = [
    "Companies actively incorporate GenAI to enhance product offerings and features.",
    "Partnerships aim to improve GenAI solutions in competitive markets.",
    "Proactive firms leverage GenAI technology to boost customer engagement strategies.",
    "Organizations collaborate to advance GenAI capabilities and applications effectively.",
    "Firms invest in GenAI partnerships to innovate their service delivery."
]

Customizable Parameters

  • Watchlist (my_watchlist_id): The set of companies to analyze. This is the ID of your watchlist in the watchlist section of the app.
  • Model Selection (llm_model): The LLM model used to label search result document chunks and generate summaries
  • Frequency (freq): The frequency of the date ranges to search over. Supported values:
    • Y: Yearly intervals.
    • M: Monthly intervals.
    • W: Weekly intervals.
    • D: Daily intervals. Defaults to 3M.
  • Time Period (start_date and end_date): The date range over which to run the analysis
  • Fiscal Year (fiscal_year): If the document type is transcripts or filings, fiscal year needs to be specified
  • Focus (focus): Specify a focus within the main theme
  • Document Limit (document_limit_news, document_limit_filings, document_limit_transcripts): The maximum number of documents to return per query to Bigdata API for each category of documents
  • Batch Size (batch_size): The number of entities to include in a single batched query
# ===== Customizable Parameters =====

# Company Universe (from Watchlist)
my_watchlist_id = "37fe5403-9688-4313-9775-9fd9770ca6e2"
watchlist = bigdata.watchlists.get(my_watchlist_id)
companies = bigdata.knowledge_graph.get_entities(watchlist.items)

# LLM Specification
llm_model = "openai::gpt-4o-mini"

# Search Frequency
search_frequency='M'

# Specify Time Range
start_date="2025-01-01"
end_date="2025-04-20"

# Fiscal Year
fiscal_year = 2025

# Document Limits
document_limit_news=10
document_limit_filings=5
document_limit_transcripts=5

# Others
batch_size=1

Generate Report

We initialize the class GenerateReport and in the following section of the cookbook, we will go through each step used by this class to generate the report. In the colab cookbook you can skip the step-by-step process and directly run the generate_report() method in the section Direct Method.
report_generator = GenerateReport(
        watchlist_id=my_watchlist_id,
        keywords=['AI'],
        main_theme_risk='Risks or concerns related to AI disruption',
        main_theme_proactivity='Integrating GenAI into products or forming partnerships to improve GenAI offerings',
        list_sentences_risks=list_sentences_risks,
        list_sentences_proactivity=list_sentences_proactivity,
        llm_model= llm_model,
        api_key=OPENAI_API_KEY,
        start_date=start_date,
        end_date=end_date,
        fiscal_year = fiscal_year,
        search_frequency=search_frequency,
        document_limit_news=document_limit_news,
        document_limit_filings=document_limit_filings,
        document_limit_transcripts=document_limit_transcripts,
        batch_size=batch_size,
        bigdata=bigdata
)

Retrieve Content

You can leverage the Bigdata API to run a search on company news, filings and transcripts.
# Initialize empty lists to collect dataframes
df_risk_list = []
df_proactivity_list = []

# Define the document types and limits
doc_configs = [
   (DocumentType.NEWS, document_limit_news, None),
   (DocumentType.FILINGS, document_limit_filings, fiscal_year),
   (DocumentType.TRANSCRIPTS, document_limit_transcripts, fiscal_year)
]

# Iterate through each document type configuration
for doc_type, doc_limit, year in doc_configs:
   if doc_limit > 0:
       # Search for semantic risk sentences
       df_sentences_semantic_risk_search = search_by_companies(
           companies=companies,
           keywords=keywords,
           sentences=list_sentences_risks,
           fiscal_year=year,
           start_date=start_date,
           end_date=end_date,
           scope=doc_type,
           freq=search_frequency,
           document_limit=doc_limit,
           batch_size=batch_size
       )
       df_risk_list.append(df_sentences_semantic_risk_search)

       # Search for semantic proactivity sentences
       df_sentences_semantic_proactivity_search = search_by_companies(
           companies=companies,
           keywords=keywords,
           sentences=list_sentences_proactivity,
           fiscal_year=year,
           start_date=start_date,
           end_date=end_date,
           scope=doc_type,
           freq=search_frequency,
           document_limit=doc_limit,
           batch_size=batch_size
       )
       df_proactivity_list.append(df_sentences_semantic_proactivity_search)

# Combine all risk dataframes into a single dataset
df_sentences_semantic_risk = pd.concat(df_risk_list, ignore_index=True)

# Combine all proactivity dataframes into a single dataset
df_sentences_semantic_proactivity = pd.concat(df_proactivity_list, ignore_index=True)

Label the Results

Use an LLM to analyze each document chunk and determine its relevance to the main theme. Any document chunks which aren’t explicitly linked to AI Disruption Risk will be filtered out.
labeler = ScreenerLabeler(llm_model=llm_model)

df_risk_labels = labeler.get_labels(
                main_theme=main_theme_risk,
                labels=['risk'],
                texts=df_sentences_semantic_risk["masked_text"].tolist()
            )

# Merge the datasets
df_risk_labeled = pd.merge(df_sentences_semantic_risk, df_risk_labels, left_index=True, right_index=True)

df_proactivity_labels = labeler.get_labels(
                main_theme=main_theme_proactivity,
                labels=['proactivity'],
                texts=df_sentences_semantic_risk["masked_text"].tolist()
            )


# Merge the datasets
df_proactivity_labeled = pd.merge(df_sentences_semantic_proactivity, df_proactivity_labels, left_index=True, right_index=True)

# Process the results
df_risk_labeled_relevant = df_risk_labeled.loc[~df_risk_labeled.label.isin(['', 'unassigned', 'unclear'])].copy()
df_proactivity_labeled_relevant = df_proactivity_labeled.loc[~df_proactivity_labeled.label.isin(['', 'unassigned', 'unclear'])].copy()

Document Distribution Visualization

You can visualize the tables showing the count of different document types for each company in the given universe. This helps you understand the distribution and availability of information across different sources for each entity.
create_styled_table(df_risk_labeled_relevant, title='Relevant Document Count for AI Disruption Risk by Company and Document Type', companies_list = company_names)
Document Risk Count
create_styled_table(df_proactivity_labeled_relevant, title='Relevant Document Count for AI Proactivity by Company and Document Type', companies_list = company_names)
Document Proactivity Count

Summarizer

The following code is used to create a summary for each company using the information from the retrieved documents.
summarizer_company = SummarizerCompany(
    model=llm_model.split('::')[1],
    api_key=OPENAI_API_KEY,
    logger= GenerateReport.logger,
    verbose=True
)
df_risk_by_company = asyncio.run(
            summarizer_company.process_by_company(
                df_labeled=df_risk_labeled_relevant,
                list_entities=companies,
                theme=main_theme_risk,
                focus='risk'
            ))

df_proactivity_by_company = asyncio.run(
            summarizer_company.process_by_company(
                df_labeled=df_proactivity_labeled_relevant,
                list_entities=companies,
                theme=main_theme_proactivity,
                focus=''
            )
        )

Getting the Scores

Processing the datasets and computing the AI Disruption Risk Score and AI Proactivity Score.
dfr = df_risk_by_company[['entity_id', 'entity_name', 'topic_summary', 'n_documents']].copy()
dfr = dfr.rename(columns={'topic_summary': 'risk_summary', 'n_documents': 'n_documents_risk'})
dfp = df_proactivity_by_company[['entity_id', 'entity_name', 'topic_summary', 'n_documents']].copy()
dfp = dfp.rename(columns={'topic_summary': 'proactivity_summary', 'n_documents': 'n_documents_proactivity'})
df_by_company = dfr.merge(dfp, on=['entity_id', 'entity_name'], how='outer')

df_by_company['n_documents_risk'] = df_by_company['n_documents_risk'].fillna(0)
df_by_company['n_documents_proactivity'] = df_by_company['n_documents_proactivity'].fillna(0)
df_by_company['ai_disruption_risk_score'] = df_by_company['n_documents_risk']/df_by_company['n_documents_risk'].mean()
df_by_company['ai_proactivity_score'] = df_by_company['n_documents_proactivity']/df_by_company['n_documents_proactivity'].mean()
df_by_company['ai_proactivity_minus_disruption_risk_score'] = df_by_company['ai_proactivity_score'] - df_by_company['ai_disruption_risk_score']

Final Dataset

The final dataset for the report is generated by processing and merging the risk and proactivity datasets.

df_quotes_risk = GenerateReport.aggregate_verbatim(df_risk_labeled_relevant, 'risk')
df_by_company = pd.merge(df_by_company, df_quotes_risk, how='left', on='entity_name')
df_quotes_proactivity = GenerateReport.aggregate_verbatim(df_proactivity_labeled_relevant, 'proactivity')
df_report = pd.merge(df_by_company, df_quotes_proactivity, how='left', on='entity_name')

Results Visualization

This is to visualize the values for each of the companies AI Disruption Risk Score vs AI Proactivity Score
plot_company_scores_log_log(df_report, 'ai_disruption_risk_score', 'ai_proactivity_score', 'AI Narrative Signals')
AI Narrative Signals  chart
AI Disruption Risk Score vs AI Proactivity Score - (Log-Log Scale)
plot_company_scores(df_report, 'ai_disruption_risk_score', 'ai_proactivity_score', 'AI Narrative Signals')
AI Narrative Signals log log chart

Generate Final Report

Using the results, an example of how a report could be formatted is provided in the cookbook.
display_report(df_report=df_report, score='ai_proactivity_minus_disruption_risk_score', top='top', nb_entities=5, export_to_path=output_dir+'/report_ai_disruption_top_5.html')
By looking at the top 3 companies with the highest AI Proactivity Minus Disruption Risk Score we get the following:

Score Definitions

AI Disruption Risk Score: The number of unique documents related to the risk of GenAI disruption, normalized by the average number of risk documents retrieved across the entire watchlist.AI Proactivity Score: The number of unique documents related to the proactive adoption of GenAI, normalized by the average number of proactivity documents retrieved across the entire watchlist.AI Proactivity Minus Disruption Risk Score: The difference between the AI Proactivity Score and the AI Disruption Risk Score. A higher score indicates a stronger company response relative to the identified risks.

1. Palantir Technologies Inc. 🥇

AI Proactivity Minus Disruption Risk Score: 0.72AI Disruption Risk Score: 0.48 / Nb Documents Risk: 18AI Proactivity Score: 1.19 / Nb Documents Proactivity: 45

AI Disruption Risk

Palantir Technologies Inc. faces significant risks related to AI disruption, particularly from emerging competitors like DeepSeek, which could undermine its market position and pricing model. The company’s reliance on government contracts exposes it to vulnerabilities from potential policy changes and defense spending cuts, which could impact its revenue stability. Additionally, the rapid commoditization of AI models and the increasing competition from larger tech firms threaten to erode Palantir’s competitive advantages, especially as AI infrastructure evolves. Investors are advised to remain cautious due to these competitive pressures and the high valuation of Palantir’s stock amidst these uncertainties.

AI Proactivity

Palantir Technologies Inc. has significantly advanced its integration of Generative AI (GenAI) through various strategic partnerships, including collaborations with EllisDon, SAUR, and TWG Global, aimed at enhancing operational efficiencies and transforming contract management processes. The company has also formed alliances with Databricks and Everfox to deliver secure AI solutions and improve military operations, respectively. Notably, Palantir’s AI services have driven substantial revenue growth, with a reported $1.8 billion in contracts and a 73% increase in its U.S. commercial customer base. These initiatives underscore Palantir’s commitment to leveraging AI across multiple sectors, including defense, healthcare, and financial services, positioning it as a leader in the AI landscape.

2. Advanced Micro Devices Inc. 🥈

AI Proactivity Minus Disruption Risk Score: 0.69AI Disruption Risk Score: 0.45 / Nb Documents Risk: 17AI Proactivity Score: 1.14 / Nb Documents Proactivity: 43

AI Disruption Risk

Advanced Micro Devices Inc. (AMD) faces significant risks related to AI disruption, including a decline in market share due to competition from Nvidia and emerging custom AI chip solutions from companies like Broadcom and Marvell. The company’s heavy investment in AI has not yielded expected returns, with recent reports indicating a 9.9% drop in shares following disappointing AI chip revenue and a forecasted 7% decrease in data center sales. Additionally, AMD is projected to incur an $800 million loss due to export restrictions on its AI chips, further complicating its financial outlook. Analysts have expressed concerns over AMD’s ability to sustain growth in the AI sector, leading to downgrades and reduced price targets amid a challenging market environment.

AI Proactivity

Advanced Micro Devices Inc. (AMD) has made significant strides in integrating Generative AI (GenAI) into its product offerings and forming strategic partnerships to enhance its AI capabilities. Notably, AMD’s collaboration with Absci, which includes a $20 million investment, aims to accelerate AI-driven drug discovery, leveraging AMD’s high-performance computing solutions. Additionally, AMD’s partnerships with companies like Dell and Ocient focus on enhancing AI capabilities in commercial devices and data analytics, respectively. Furthermore, AMD’s recent acquisition of ZT Systems is set to bolster its AI infrastructure, enabling faster deployment of AI solutions tailored for various customer needs.

3. Adobe Inc. 🥉

AI Proactivity Minus Disruption Risk Score: 0.61AI Disruption Risk Score: 0.66 / Nb Documents Risk: 25AI Proactivity Score: 1.27 / Nb Documents Proactivity: 48

AI Disruption Risk

Adobe Inc. faces significant risks related to AI disruption, particularly in monetizing its AI innovations like Firefly, which has shown user engagement but lacks clear revenue growth strategies. The company is under pressure from established competitors like Microsoft and Salesforce, which have stronger pricing power and bundled offerings, potentially limiting Adobe’s market share. Additionally, regulatory challenges and compliance costs associated with AI could hinder Adobe’s ability to adapt and innovate effectively, impacting its financial performance. Concerns about the rapid evolution of AI technologies and the competitive landscape further exacerbate the uncertainty surrounding Adobe’s future in the market.

AI Proactivity

Adobe Inc. is significantly enhancing its product offerings through the integration of generative AI (GenAI) technologies, particularly with its Firefly suite, which has been embedded across Creative, Document, and Experience Clouds to improve user productivity and streamline workflows. Partnerships with companies like IBM and Publicis Groupe are expanding Adobe’s capabilities in digital marketing and content creation, leveraging Firefly for personalized customer experiences and efficient content production. The launch of new AI-driven features, such as the Firefly Video Model and Acrobat AI Assistant, is expected to drive revenue growth and user engagement, with Adobe’s AI innovations already contributing to a substantial increase in annual recurring revenue. As of early 2025, Adobe’s strategic focus on GenAI is positioned to attract new users and enhance retention, further solidifying its market leadership in creative and marketing solutions.

Export the Results

Export the data as Excel files for further analysis or to share with the team.
try:
    # Create the Excel manager
    excel_manager = ExcelManager()

    # Define the dataframes and their sheet configurations
    df_args = [
        (df_report, "Report AI Disruption Risk", (2, 3))
    ]

    # Save the workbook
    excel_manager.save_workbook(df_args, export_path)

except Exception as e:
    print(f"Warning while exporting to excel: {e}")

Conclusion

The Report Generator provides a comprehensive automated framework for analyzing AI threats and opportunities across your investment universe. By systematically combining advanced information retrieval with LLM-powered analysis, this workflow transforms unstructured data from news, filings, and transcripts into actionable intelligence for strategic decision-making. Through the automated analysis of AI disruption risks and proactive responses, you can:
  1. Identify AI-resilient leaders - Discover companies that are not only aware of AI disruption risks but are actively positioning themselves to capitalize on these changes through strategic partnerships and product integration
  2. Assess competitive positioning - Compare how companies within your watchlist are responding to AI transformation relative to their peers, highlighting potential winners and laggards
  3. Quantify risk-response balance - The AI Proactivity Minus Disruption Risk Score provides a clear metric to identify companies that demonstrate strong strategic responses relative to their exposure to AI-driven market disruption
  4. Monitor strategic evolution - Track how companies’ AI strategies and risk profiles evolve over time, enabling dynamic watchlist adjustments based on changing competitive landscapes
  5. Generate watchlist insights - Create comprehensive reports that can inform investment committees, risk management decisions, and thematic investment strategies
From conducting due diligence on AI exposure to building thematic watchlists focused on artificial intelligence or assessing watchlist-wide risks from technological disruption, the Report Generator automates the research process while maintaining the depth and nuance required for professional analysis. The standardized scoring methodology ensures consistent evaluation across companies and time periods, making it an invaluable tool for systematic AI impact assessment.