Board Management Monitoring

This workflow demonstrates the capabilities of Bigdata to monitor specific entities. In this example, we track the People entity to identify where and when an individual is mentioned across selected documents.

Why It Matters

Tracking specific individuals across news coverage is essential for governance analysis and investment decisions, but manually monitoring when and where these people are mentioned in thousands of articles is resource-intensive. This workflow leverages bigdata’s entity tracking capabilities to systematically monitor individuals, providing comprehensive analysis of their media exposure and reputation signals to identify potential risks and opportunities.

What It Does

This Board Management Monitoring workflow systematically tracks specific individuals across news coverage using multiple search strategies. Built for analysts, portfolio managers, and investment professional, it transforms scattered mentions into structured intelligence about management activity and board dynamics.

How It Works

The workflow combines multi-mode search strategies, entity-specific filtering, and temporal analysis to deliver:

Comprehensive person tracking across multiple name variations and contexts
Company-specific filtering ensuring relevance to the monitored organization
Multi-mode search precision from strict entity matching to broader coverage with post-filtering
Source filtering enabling focused analysis across trusted news sources
Temporal analysis showing how coverage patterns evolve over time

A Real-World Use Case

This workflow demonstrates monitoring Emmanuel Faber from Danone S.A., tracking his exposure across management integrity themes and board governance topics, showing how different search modes capture varying levels of coverage. Ready to get started? Let’s dive in!

Prerequisites

To run the analysis, you can choose between two options:

💻 GitHub cookbook
- Use this if you prefer working locally or in a custom environment.
- Follow the setup and execution instructions in the README.md.
- API keys are required:
  - Option 1: Follow the key setup process described in the README.md
  - Option 2: Refer to this guide: How to initialise environment variables
    - ❗ When using this method, you must manually add the OpenAI API key:
      # OpenAI credentials OPENAI_API_KEY = "<YOUR_OPENAI_API_KEY>"
🐳 Docker Installation
- Docker installation is available for containerized deployment.
- Provides an alternative setup method with containerized deployment, simplifying the environment configuration for those preferring Docker-based solutions.

Setup and Imports

Below is the Python code required for setting up our environment and importing necessary libraries.

import os
import time
from datetime import datetime, date
from typing import List, Dict, Any, Optional, Tuple
import pandas as pd

from bigdata_client import Bigdata
from bigdata_client.models.search import DocumentType, SortBy
from bigdata_research_tools.search.search import run_search

import plotly.io as pio

from src.tool import (
    timer,
    run_monitoring_workflow,
    plot_combined_monitoring_activity,
    plot_management_distributions,
    get_common_quarter_ticks,
    build_queries_for_monitoring,
    filter_documents_by_company,
    deduplicate_results,
    process_results_to_dataframe,
    plot_top_sources
)

# Setup for Plotly in Colab
import plotly.offline as pyo
from plotly.subplots import make_subplots
import plotly
import plotly.io as pio
import plotly.graph_objects as go

pyo.init_notebook_mode(connected=True)
pio.renderers.default = 'colab'
print("✅ Plotly configured for Colab")

# Define output file paths for our results
output_dir = "output"
os.makedirs(output_dir, exist_ok=True)

Parameters Definition

Fixed Parameters

First Theme (management_themes): List of sentences about a theme you are interested in
Second Theme (board_themes): List of sentences about a theme you are interested in
Dates (date_periods): List of tuples containing starting and ending date of a specific period of time. The dates used in this workflow are chosen to provide adequate temporal coverage surrounding the earnings dates
Trusted Sources (trusted_sources): Dictionary containing the sources and their ID
Person of interest (persons_dict): Dictionary containing the name of the person to search for, their ID and their possible different ways of naming them
Company of interest (company_name, company_id): The name of the company to search for and a dictionary with its ID
Search Mode (search_mode): Different ways of searching for that specific person
- strict: Company entity and person must appear at chunk level
- relaxed: Only person name variations are used
- relaxed_post: Person search with company filtering at document level
Trusted Sources Flag (use_trusted_sources): Flag to activate filtering with trusted sources

# Management Integrity Themes
management_themes = [
    "Past Misconduct of Management",
    "Inconsistencies in Statements",
    "Whistleblower Allegations",
    "Negative Media Coverage on Integrity",
    "Accusations of Unethical Behavior",
    "Reputation of Management in Media",
    "Positive Analyst Ratings of Management",
    "Management's Reputation in Industry",
    "Introduction of Innovations",
    "Successful M&A Activities",
    "Company Turnaround Success",
    "Pressure from Activist Investors for Better Oversight",
    "External Audits or Reviews",
    "Changes in Board Composition",
    "Resignations Due to Oversight Concerns",
    "Conflicts of Interest Involving the Board",
    "Criticism of Corporate Governance Practices",
    "Reports Highlighting Oversight Issues",
    "Management Compensation Linked to Performance",
    "Shareholder-Friendly Capital Allocation",
    "Transparent Communication with Investors",
]

# Board Governance Themes
board_themes = [
    "Board member appointments and new director appointments",
    "Board member resignations and director departures",
    "Board meetings and governance proceedings",
    "Board member retirements and succession plans",
    "Death of board members and directors",
    "Board diversity and composition changes",
    "Board member compensation and salaries",
    "Board member firings and removals",
    "Board member health issues and medical leaves",
    "Board member absences and attendance issues"
]

# Time Ranges for Monitoring
date_periods = [
    ("2024-12-04", "2025-03-04"),
    ("2024-09-05", "2024-12-04"),
    ("2024-05-30", "2024-09-05"),
    ("2024-02-28", "2024-05-30"),
    ("2023-12-01", "2024-02-28"),
    ("2023-09-07", "2023-12-01"),
    ("2023-06-01", "2023-09-07"),
    ("2023-03-02", "2023-06-01"),
    ("2022-12-01", "2023-03-02"),
    ("2022-09-08", "2022-12-01"),
    ("2022-06-02", "2022-09-08"),
    ("2022-03-03", "2022-06-02"),
    ("2021-12-02", "2022-03-03"),
    ("2021-09-09", "2021-12-02"),
    ("2021-06-03", "2021-09-09"),
    ("2021-03-04", "2021-06-03"),
    ("2020-12-03", "2021-03-04"),
    ("2020-09-10", "2020-12-03"),
    ("2020-06-06", "2020-09-10"),
    ("2020-03-07", "2020-06-06"),
    ("2020-01-19", "2020-03-07")
]

# Trusted News Sources Configuration
use_trusted_sources = False  # Set True to activate the filter

trusted_sources = {
    'The Economist': '5B7D72','Washington Post': 'DC6F95', 
    'Bloomberg News': '208421', 'The Washington Post Blog': '471CDE',
    'BNN Bloomberg': '7490C8', 'CNN': '2435A4',
    'BBC': 'A61D00', 'FOX Business': '2D0020',
    'Reuters': '751371', 'Financial Times': 'DA9FC6',
    'Wall Street Journal': 'AA6E89', 'CNBC': 'AA1167',
    'MarketWatch': '1E5E35', 'Washington Post Via Web': 'B6ACEE',
    'Forbes': '22AC8B', "Investor's Business Daily": '15B968','Al Jazeera (English)': 'CD85BA',

}

# Person Details
persons_dict = {
    'Emmanuel Faber': {
        'id': '6D9368',
        'variations': [
            'Emmanuel Faber',   # First Name + Last Name
            'E. Faber',         # Initial + Last Name
            'Faber, Emmanuel',  # Last Name + comma + First Name
            'Emmanuel F.'       # First Name + Initial
        ]
    }
}

#Company Details
company_name = 'Danone S.A.'
company_data = {'id': '3E149C'}

Search Mode Configuration

The workflow supports three different search modes, each offering different levels of precision and recall:

Search Mode Options

Strict Mode: Company entity and person must appear at chunk level - highest precision
Relaxed Mode: Only person name variations are used - highest recall
Relaxed Post Mode: Person search with company filtering at document level - balanced approach

Multi-Mode Monitoring Execution

The workflow executes all three search modes to provide comprehensive coverage analysis:

search_modes = ["strict", "relaxed", "relaxed_post"]
selected_sources = trusted_sources if use_trusted_sources else None
results_files = {}

for search_mode in search_modes:
    print(f"\nRunning {search_mode.upper()} mode monitoring...")

    with timer(f"{search_mode} mode execution"):
        # Build queries for current search mode
        queries, date_ranges, query_details = build_queries_for_monitoring(
            date_periods=date_periods,
            persons=persons_dict,
            company=company_data,
            management_themes=management_themes,
            board_themes=board_themes,
            search_mode=search_mode,
            sources=selected_sources,
            use_source_filter=use_trusted_sources
        )

        # Execute search
        search_results = run_search(
            queries=queries,
            date_ranges=date_ranges,
            sortby=SortBy.RELEVANCE,
            scope=DocumentType.NEWS,
            limit=100,
            only_results=True,
            rerank_threshold=None
        )

        # Filter and process results
        filtered_results = filter_documents_by_company(
            search_results=search_results,
            query_details_template=query_details,
            company_name=company_name,
            search_mode=search_mode
        )

        # Remove duplicates
        deduplicated_results = deduplicate_results(filtered_results)

        # Convert to structured DataFrame
        df_results = process_results_to_dataframe(deduplicated_results)

        # Save results
        source_type = "trusted" if use_trusted_sources else "all"
        output_file = os.path.join(
            output_dir,
            f"board_monitoring_{search_mode}_{source_type}_sources.csv"
        )
        df_results.to_csv(output_file, index=False, encoding="utf-8-sig")
        results_files[search_mode] = output_file

        print(f"{search_mode.upper()} mode: {df_results.shape[0]} documents saved to {output_file}")

Results Analysis and Visualization

After executing all search modes, load and compare the results:

# Load results from all three search modes
df_strict = pd.read_csv("output/board_monitoring_strict_all_sources.csv")
df_relaxed = pd.read_csv("output/board_monitoring_relaxed_all_sources.csv")
df_relaxed_post = pd.read_csv("output/board_monitoring_relaxed_post_all_sources.csv")

print("Search Mode Comparison:")
print(f"Strict Mode: {len(df_strict)} documents")
print(f"Relaxed Mode: {len(df_relaxed)} documents")
print(f"Relaxed Post Mode: {len(df_relaxed_post)} documents")

Search Mode Comparison:

Strict Mode: 97 documents
Relaxed Mode: 172 documents
Relaxed Post Mode: 172 documents

Quarterly Activity Visualization

Generate comprehensive quarterly analysis showing how coverage volume changes across time periods:

# Generate quarterly comparison visualization
common_tick_dates, common_tick_text = get_common_quarter_ticks("2020Q1", "2025Q4")
x_axis_range = [common_tick_dates[0], common_tick_dates[-1]]

# Plot for the distribution of management and board mentions for each search mode
fig = plot_management_distributions(
    df_strict=df_strict,
    df_relaxed=df_relaxed, 
    df_relaxed_post=df_relaxed_post,
    title="Management vs Board Distribution - All Search Modes",
    x_range=x_axis_range,
    tick_vals=common_tick_dates,
    tick_text=common_tick_text,
    interactive=True,
    normalize_to_percentage=False
)
fig.show()

# Plot for the distribution of all mentions across search modes
fig_combined = plot_combined_monitoring_activity(
    df_strict,
    df_relaxed,
    df_relaxed_post,
    title="Emmanuel Faber Board Monitoring - Search Mode Comparison",
    x_range=x_axis_range,
    tick_vals=common_tick_dates,
    tick_text=common_tick_text,
    interactive=True # set to False to have the static plot
)

# Display the visualization
fig_combined.show()

The quarterly comparison chart provides insights into:

Temporal Pattern Analysis

Coverage Volume: Quarterly document counts for the monitored individual across different periods
Search Mode Comparison: Visual comparison of different approaches
Activity Peaks: Identification of periods with heightened media attention

Key Insights and Analysis

print("Coverage Analysis:")
print("• Strict Mode captures the most precise mentions with company co-occurrence")
print("• Relaxed Mode provides comprehensive coverage but may include false positives")
print("• Relaxed Post Mode offers balanced precision-recall by post-filtering for company mentions")

print("\nTemporal Patterns:")
strict_dates = pd.to_datetime(df_strict['Date']).dt.tz_localize(None).dt.to_period('Q').value_counts().sort_index()
relaxed_dates = pd.to_datetime(df_relaxed['Date']).dt.tz_localize(None).dt.to_period('Q').value_counts().sort_index()


if len(strict_dates) > 0 and len(relaxed_dates) > 0:
    print(f"• Peak activity period in Strict Mode: {strict_dates.idxmax()}")
    print(f"• Peak activity period in Relaxed Mode: {relaxed_dates.idxmax()}")

Coverage Analysis:

Strict Mode captures the most precise mentions with company co-occurrence
Relaxed Mode provides comprehensive coverage but may include false positives
Relaxed Post Mode offers balanced precision-recall by post-filtering for company mentions

Temporal Patterns:

Peak activity period in Strict Mode: 2021Q1
Peak activity period in Relaxed Mode: 2021Q1

Source Analysis

Analyze which news sources provide the most coverage for comprehensive media monitoring:

plot_top_sources(df_relaxed, person_name="Emmanuel Faber", top_n=5, interactive=True)

Export the result

Create and export comprehensive summary for reporting and further analysis:

# Create comprehensive summary
summary_data = {
    'Search Mode': ['Strict', 'Relaxed', 'Relaxed Post'],
    'Documents Retrieved': [len(df_strict), len(df_relaxed), len(df_relaxed_post)],
    'Unique Sources': [
        df_strict['SourceID'].nunique() if len(df_strict) > 0 else 0,
        df_relaxed['SourceID'].nunique() if len(df_relaxed) > 0 else 0,
        df_relaxed_post['SourceID'].nunique() if len(df_relaxed_post) > 0 else 0
    ]
}

summary_df = pd.DataFrame(summary_data)
summary_df.to_csv(os.path.join(output_dir, "monitoring_summary.csv"), index=False)

print("Final Summary:")
print(summary_df.to_string(index=False))

Conclusion

The Board Management Monitoring workflow provides a comprehensive automated framework for tracking individuals across news coverage with multiple precision levels. By systematically combining entity tracking with temporal analysis, this workflow transforms scattered media mentions into structured intelligence for governance and risk assessment. Through automated multi-mode monitoring, you can:

Track individual exposure - Monitor specific board members and executives across comprehensive news coverage with varying levels of precision and recall
Compare search strategies - Utilize multiple search modes to balance precision and recall based on specific monitoring requirements
Analyze coverage volume - Quantify media attention intensity across quarterly periods and identify peaks in coverage
Monitor source diversity - Track which news outlets provide the most coverage for comprehensive media monitoring
Generate structured datasets - Export processed results with metadata for further qualitative analysis and reporting

The multi-mode approach ensures comprehensive coverage while providing flexibility to choose the appropriate precision level based on your monitoring requirements, making it a valuable tool for systematic individual tracking across news media.

Cookbooks

Conversational AI

Search & Discovery

Market & Financial Analysis

Risk Management

Screening & Monitoring

Insights & Reporting

Why It Matters

What It Does

How It Works

A Real-World Use Case

Prerequisites

Setup and Imports

Parameters Definition

Fixed Parameters

Search Mode Configuration

Search Mode Options

Multi-Mode Monitoring Execution

Results Analysis and Visualization

Quarterly Activity Visualization

Temporal Pattern Analysis

Key Insights and Analysis

Source Analysis

Export the result

Conclusion

Cookbooks

Conversational AI

Search & Discovery

Market & Financial Analysis

Risk Management

Screening & Monitoring

Insights & Reporting

​Why It Matters

​What It Does

​How It Works

​A Real-World Use Case

​Prerequisites

​Setup and Imports

​Parameters Definition

​Fixed Parameters

​Search Mode Configuration

Search Mode Options

​Multi-Mode Monitoring Execution

​Results Analysis and Visualization

​Quarterly Activity Visualization

Temporal Pattern Analysis

​Key Insights and Analysis

​Source Analysis

​Export the result

​Conclusion

Why It Matters

What It Does

How It Works

A Real-World Use Case

Prerequisites

Setup and Imports

Parameters Definition

Fixed Parameters

Search Mode Configuration

Multi-Mode Monitoring Execution

Results Analysis and Visualization

Quarterly Activity Visualization

Key Insights and Analysis

Source Analysis

Export the result

Conclusion