Why It Matters

Understanding how market narratives emerge and evolve across different information sources is crucial for investment decision-making, but manually tracking narrative development across scattered news coverage, earnings calls, and regulatory filings is time consuming. Investment decisions need systematic analysis of narrative progression to identify emerging trends and timing patterns.

What It Does

The NarrativeMiner class in the bigdata-research-tools package systematically tracks narrative evolution across multiple document types using unstructured data from news, transcripts, and filings. Built for analysts and investment professionals, it transforms scattered narrative signals into quantified trend intelligence and identifies timing patterns across different information sources.

How It Works

The NarrativeMiner combines multi-source content retrieval, temporal narrative tracking, and cross-source comparative analysis to deliver:
  • Cross-document narrative mapping across news media, earnings calls, and SEC filings
  • Temporal evolution tracking showing how narratives develop and change over time across sources
  • Intensity measurement quantifying narrative prevalence and significance across document types

A Real-World Use Case

This cookbook demonstrates the complete workflow through analyzing “AI Bubble Concerns” narrative as it emerges and evolves across news, earnings calls, and regulatory filings, highlighting the difference between public discourse and corporate communications. Ready to get started? Let’s dive in! Open in GitHub

Prerequisites

To run the Narrative Miner workflow, you can choose between two options:
  • 💻 GitHub cookbook
    • Use this if you prefer working locally or in a custom environment.
    • Follow the setup and execution instructions in the README.md.
    • API keys are required:
      • Option 1: Follow the key setup process described in the README.md
      • Option 2: Refer to this guide: How to initialise environment variables
        • ❗ When using this method, you must manually add the OpenAI API key:
          # OpenAI credentials
          OPENAI_API_KEY = "<YOUR_OPENAI_API_KEY>"
          
  • 🐳 Docker Installation
    • Docker installation is available for containerized deployment.
    • Provides an alternative setup method with containerized deployment, simplifying the environment configuration for those preferring Docker-based solutions.

Setup and Imports

Below is the Python code required for setting up our environment and importing necessary libraries.
from IPython.display import display, HTML, IFrame
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import warnings

from src.tool import (
    load_results,
    extract_narrative_insights,
    create_source_summary,
    display_sample_data,
    visualize_cross_source_narratives,
    visualize_news_narrative_breakdown
)

from bigdata_research_tools.workflows.narrative_miner import NarrativeMiner
from bigdata_research_tools.excel import ExcelManager
from bigdata_client import Bigdata
from bigdata_client.daterange import RollingDateRange
from bigdata_client.models.sources import Source
from bigdata_client.models.search import DocumentType

# Setup for Plotly in Colab
import plotly.offline as pyo
import plotly
import plotly.io as pio
import plotly.graph_objects as go

# Enable offline mode for Colab
pyo.init_notebook_mode(connected=True)

# Set renderer
pio.renderers.default = 'colab'
print("✅ Plotly configured for Colab")

# Define output file paths for our results
output_dir = "output"
os.makedirs(output_dir, exist_ok=True)

news_results_path = f"{output_dir}/ai_bubble_news.xlsx"
transcripts_results_path = f"{output_dir}/ai_bubble_transcripts.xlsx"
filings_results_path = f"{output_dir}/ai_bubble_filings.xlsx"
visualization_path = f"{output_dir}/ai_bubble_narratives.html"

Defining the Narrative Analysis Parameters

  • AI Bubble Narratives (main_narratives): Specific narrative sentences related to AI bubble concerns
  • Model Selection (llm_model): The LLM model used to label search result document chunks and generate summaries
  • Time Period (start_date and end_date): The date range over which to run the analysis
  • Rerank Threshold (rerank_threshold): By setting this value, you’re enabling the cross-encoder which reranks the results and selects those whose relevance is above the percentile you specify (0.7 being the 70th percentile). More information on the re-ranker can be found here.
  • Document Limit (document_limit): The maximum number of documents to return per query to Bigdata API.
  • Fiscal Year (fiscal_year): If the document type is transcripts or filings, fiscal year needs to be specified
  • Frequency (freq): The frequency of the date ranges to search over. Supported values:
    • Y: Yearly intervals.
    • M: Monthly intervals.
    • W: Weekly intervals.
    • D: Daily intervals.
# AI Bubble Narratives
main_narratives = [
    "Tech valuations have detached from fundamental earnings potential",
    "AI investments show classic signs of irrational exuberance",
    "Market is positioning AI as revolutionary without proven ROI",
    "Current AI investments may not generate predicted financial returns",
    "Tech CEOs acknowledge AI implementation challenges amid high expectations",
    "Analysts are questioning the timeline for AI-driven profits",
    "Companies are spending billions on unproven AI technology",
    "AI infrastructure costs are rising but revenue gains remain uncertain",
    "Venture capital is flooding AI startups at unsustainable valuations",
    "Regulatory concerns could derail AI market growth projections",
    "Public discourse about AI capabilities exceeds technical realities",
    "AI talent acquisition costs have created an unsustainable bubble",
    "Corporate executives privately express concerns about AI ROI timelines",
    "AI market projections rely on aggressive and unproven assumptions",
    "Industry veterans drawing parallels to previous tech bubbles"
]

# LLM Specification
llm_model = "openai::gpt-4o-mini"

# Rerank Threshold
rerank_threshold = 0.7

# Specify Time Range
start_date = "2024-03-01"
end_date = "2025-03-28"

# Search Frequency
freq = '6M'

# Fiscal Year
fiscal_year = 2024

# Document Limits
document_limit = 10

Configure the Narrative Miners

Create narrative miners for each document type. In this example, we select MT Newswires as the news source for focused analysis.
# Common Params
common_params = {
    "narrative_sentences": main_narratives,
    "llm_model": llm_model,
    "start_date": start_date,
    "end_date": end_date,
    "rerank_threshold": rerank_threshold}

# Choose MT Newswires as a news source
tech_news_sources = bigdata.knowledge_graph.find_sources("MT Newswires")
tech_news_ids = [source.id for source in tech_news_sources if "MT Newswires" == source.name]

# Create the specialized miners for each document type
news_miner = NarrativeMiner(
    sources=tech_news_ids,
    document_type=DocumentType.NEWS,
    fiscal_year=None,
    **common_params
)

transcripts_miner = NarrativeMiner(
    sources=None,
    document_type=DocumentType.TRANSCRIPTS,
    fiscal_year=fiscal_year,
    **common_params
)

filings_miner = NarrativeMiner(
    sources=None,
    fiscal_year=fiscal_year,
    document_type=DocumentType.FILINGS,
    **common_params
)

Run Narrative Mining Across Sources

Execute the narrative mining processes for news, earnings call transcripts, and SEC filings. Each miner independently analyzes its document type for the specified narratives.
# Mine news narratives
print("Mining news narratives...")
try:
    news_results = news_miner.mine_narratives(
        document_limit=document_limit,
        freq=freq,
        export_path=news_results_path
    )
    print("✅ News mining completed successfully!")
except Exception as e:
    print(f"Warning during news mining: {e}")

# Mine transcripts narratives
print("Mining earnings call transcripts...")
try:
    transcripts_results = transcripts_miner.mine_narratives(
        document_limit=document_limit,
        freq=freq,
        export_path=transcripts_results_path
    )
    print("✅ Transcripts mining completed successfully!")
except Exception as e:
    print(f"Warning during transcripts mining: {e}")

# Mine filings narratives
print("Mining SEC filings...")
try:
    filings_results = filings_miner.mine_narratives(
        document_limit=document_limit,
        freq=freq,
        export_path=filings_results_path
    )
    print("✅ Filings mining completed successfully!")
except Exception as e:
    print(f"Warning during filings mining: {e}")

Load and Process Results

Load the exported Excel files, clean the data, and display a comprehensive summary of narrative findings across all document sources.
# Load results from all three document types with labeling
news_df = load_results(news_results_path, "News Media")
transcripts_df = load_results(transcripts_results_path, "Earnings Calls")
filings_df = load_results(filings_results_path, "SEC Filings")

# Create and display summary
source_summary = create_source_summary(news_df, transcripts_df, filings_df)
display(source_summary)

# Display sample data from each source
display_sample_data(news_df, transcripts_df, filings_df)

Create Narrative Visualizations

Generate comparative visualizations showing narrative evolution across sources and detailed breakdown of news narratives. These visualizations reveal timing patterns and intensity variations across different information channels.
# Suppress future warnings for cleaner output
warnings.filterwarnings("ignore", message=".*'method'.*", category=FutureWarning)

# Create the comparative source visualization
print("Creating cross-source narrative visualization...")
fig1 = visualize_cross_source_narratives(news_df, transcripts_df, filings_df)
fig1.show()
Cross-source narrative evolution showing AI bubble concerns across news, earnings calls, and SEC filings

# Create the narrative breakdown visualization for news
print("Creating news narrative breakdown...")
fig2 = visualize_news_narrative_breakdown(news_df)
fig2.show()
Detailed breakdown of AI bubble narratives

Extract and Display Key Insights

Extract key insights from the narrative mining data to understand narrative progression patterns and cross-source relationships.
# Extract insights from our narrative mining data
insights = extract_narrative_insights(news_df, transcripts_df, filings_df)

print("## AI Bubble Narrative Key Insights\n")
print(f"Peak month for news coverage: {insights['peak_news_month']}")
print(f"Peak month for earnings call mentions: {insights['peak_transcript_month']}")
print(f"Peak month for regulatory filing mentions: {insights['peak_filing_month']}")
print(f"\nDominant narrative in news: \"{insights['top_news_narrative']}\"")
print(f"Dominant narrative in earnings calls: \"{insights['top_transcript_narrative']}\"")
print(f"Dominant narrative in regulatory filings: \"{insights['top_filing_narrative']}\"")
print(f"\nTotal narrative mentions in news: {insights['total_news_mentions']}")
print(f"Total mentions in earnings calls: {insights['total_transcript_mentions']}")
print(f"Total mentions in regulatory filings: {insights['total_filing_mentions']}")
print(f"\nAverage lag between news coverage peaks and SEC filings: {insights['avg_lag_days']} days")

AI Bubble Narrative Key Insights

  • Peak month for news coverage: January 2025
  • Peak month for earnings call mentions: May 2024
  • Peak month for regulatory filing mentions: March 2025

  • Dominant narrative in news: “Companies are spending billions on unproven AI technology”
  • Dominant narrative in earnings calls: “Tech CEOs acknowledge AI implementation challenges amid high expectations”
  • Dominant narrative in regulatory filings: “Current AI investments may not generate predicted financial returns”

  • Total narrative mentions in news: 115
  • Total mentions in earnings calls: 352
  • Total mentions in regulatory filings: 2440

Average lag between news coverage peaks and SEC filings: 32 days

Key Narrative Patterns Revealed

The analysis reveals important patterns in how the AI bubble narrative evolved across information sources:

Timing and Intensity Variations

  • News media shows major spikes in AI bubble concerns, often leading the narrative cycle with the highest peaks
  • Earnings calls demonstrate cyclical attention to bubble concerns, with executives addressing topics most prominently during specific quarters
  • SEC filings show the most volatile pattern with multiple significant spikes, suggesting ongoing regulatory concerns

Narrative Progression

  • Media coverage often leads the initial bubble narrative, potentially triggering corporate responses visible in earnings calls
  • Corporate executives’ discussions peak during specific periods but tend to diminish over time
  • SEC filing mentions frequently show increased intensity throughout the analysis period, indicating persistent regulatory attention

Cross-Source Intelligence

  • Different sources provide complementary perspectives on the same underlying narrative
  • Timing lags between sources reveal information flow patterns and decision-making hierarchies
  • The intensity patterns help identify when narratives are gaining or losing momentum across different stakeholder groups

Export the Results

Export the data as Excel files for further analysis or to share with the team.
try:
    # Create the Excel manager
    excel_manager = ExcelManager()

    # Define the dataframes and their sheet configurations
    df_args = [
        (news_df, "News Narratives", (0, 0)),
        (transcripts_df, "Earnings Call Narratives", (0, 0)),
        (filings_df, "SEC Filing Narratives", (0, 0)),
        (source_summary, "Summary", (1, 1))
    ]

    # Save the workbook
    combined_results_path = f"{output_dir}/ai_bubble_narrative_analysis.xlsx"
    excel_manager.save_workbook(df_args, combined_results_path)

    print(f"✅ Results exported to {combined_results_path}")

except Exception as e:
    print(f"Warning while exporting to excel: {e}")

Conclusion

The NarrativeMiners provide a comprehensive automated framework for tracking narrative evolution across multiple information sources simultaneously. By systematically combining advanced information retrieval with temporal analysis, this workflow transforms scattered narrative signals into structured intelligence for strategic decision-making. Through the automated analysis of AI bubble concerns across news, earnings calls, and regulatory filings, you can:
  1. Identify narrative emergence patterns - Discover how market narratives first appear and which sources tend to lead narrative development cycles
  2. Track cross-source narrative flow - Monitor how narratives propagate from initial media coverage through corporate communications to regulatory attention
  3. Quantify narrative intensity - Measure the relative importance and persistence of specific narratives across different information channels and time periods
  4. Detect timing patterns - Identify lead-lag relationships between sources that reveal decision-making hierarchies and information flow patterns
  5. Monitor narrative evolution - Track how narratives change, intensify, or diminish over time across different stakeholder groups
  6. Generate comparative intelligence - Create comprehensive reports that reveal which narratives are gaining momentum and where, enabling proactive strategic positioning
From conducting early-warning narrative monitoring to building investment strategies based on narrative momentum or assessing market sentiment evolution across information sources, the NarrativeMiners automate the research process while maintaining the depth required for professional analysis. The cross-source methodology ensures comprehensive coverage of narrative development, making it an invaluable tool for systematic market narrative intelligence in dynamic information environments. This analysis demonstrates how systematic narrative mining across multiple document types provides richer insights than analyzing any single source in isolation, revealing the complete lifecycle of market narratives from emergence to institutional response.