This workflow demonstrates the capabilities of Bigdata to monitor specific entities. In this example, we track the ‘People’ entity to identify where and when an individual is mentioned across selected documents.

Why It Matters

Tracking specific individuals across news coverage is essential for governance analysis and investment decisions, but manually monitoring when and where these people are mentioned in thousands of articles is resource-intensive. This workflow leverages bigdata’s entity tracking capabilities to systematically monitor individuals, providing comprehensive analysis of their media exposure and reputation signals to identify potential risks and opportunities.

What It Does

This Board Management Monitoring workflow systematically tracks specific individuals across news coverage using multiple search strategies. Built for analysts, portfolio managers, and investment professional, it transforms scattered mentions into structured intelligence about management activity and board dynamics.

How It Works

The workflow combines multi-mode search strategies, entity-specific filtering, and temporal analysis to deliver:
  • Comprehensive person tracking across multiple name variations and contexts
  • Company-specific filtering ensuring relevance to the monitored organization
  • Multi-mode search precision from strict entity matching to broader coverage with post-filtering
  • Source filtering enabling focused analysis across trusted news sources
  • Temporal analysis showing how coverage patterns evolve over time

A Real-World Use Case

This workflow demonstrates monitoring Malcolm Wilson from GXO Logistics Inc., tracking his exposure across management integrity themes and board governance topics, showing how different search modes capture varying levels of coverage. Ready to get started? Let’s dive in! Open in GitHub

Prerequisites

To run the analysis, you can choose between two options:
  • 💻 GitHub cookbook
    • Use this if you prefer working locally or in a custom environment.
    • Follow the setup and execution instructions in the README.md.
    • API keys are required:
      • Option 1: Follow the key setup process described in the README.md
      • Option 2: Refer to this guide: How to initialise environment variables
        • ❗ When using this method, you must manually add the OpenAI API key:
          # OpenAI credentials
          OPENAI_API_KEY = "<YOUR_OPENAI_API_KEY>"
          
  • 🐳 Docker Installation
    • Docker installation is available for containerized deployment.
    • Provides an alternative setup method with containerized deployment, simplifying the environment configuration for those preferring Docker-based solutions.

Setup and Imports

Below is the Python code required for setting up our environment and importing necessary libraries.
import os
import time
from datetime import datetime, date
from typing import List, Dict, Any, Optional, Tuple
import pandas as pd

from bigdata_client import Bigdata
from bigdata_client.models.search import DocumentType, SortBy
from bigdata_research_tools.search.search import run_search

import plotly.io as pio

from src.tool import (
    timer,
    run_monitoring_workflow,
    plot_combined_monitoring_activity,
    get_common_quarter_ticks,
    build_queries_for_monitoring,
    filter_documents_by_company,
    deduplicate_results,
    process_results_to_dataframe,
    plot_top_sources
)

# Setup for Plotly in Colab
import plotly.offline as pyo
from plotly.subplots import make_subplots
import plotly
import plotly.io as pio
import plotly.graph_objects as go

pyo.init_notebook_mode(connected=True)
pio.renderers.default = 'colab'
print("✅ Plotly configured for Colab")

# Define output file paths for our results
output_dir = "output"
os.makedirs(output_dir, exist_ok=True)

Parameters Definition

Fixed Parameters

  • First Theme (news_search_mapping): List of sentences about a theme you are interested in
  • Second Theme (board_themes): List of sentences about a theme you are interested in
  • Dates (date_periods): List of tuples containing starting and ending date of a specific period of time. The dates used in this workflow are chosen to provide adequate temporal coverage surrounding the earnings dates
  • Trusted Sources (trusted_sources): Dictionary containing the sources and their ID
  • Person of interest (persons_dict): Dictionary containing the name of the person to search for, their ID and their possible different ways of naming them
  • Company of interest (company_name, company_id): The name of the company to search for and a dictionary with its ID
  • Search Mode (search_mode): Different ways of searching for that specific person
    • strict: Company entity and person must appear at chunk level
    • relaxed: Only person name variations are used
    • relaxed_post: Person search with company filtering at document level
  • Trusted Sources Flag (use_trusted_sources): Flag to activate filtering with trusted sources
# Management Integrity Themes
news_search_mapping = [
    "Past Misconduct of Management",
    "Inconsistencies in Statements",
    "Whistleblower Allegations",
    "Negative Media Coverage on Integrity",
    "Accusations of Unethical Behavior",
    "Reputation of Management in Media",
    "Positive Analyst Ratings of Management",
    "Management's Reputation in Industry",
    "Introduction of Innovations",
    "Successful M&A Activities",
    "Company Turnaround Success",
    "Pressure from Activist Investors for Better Oversight",
    "External Audits or Reviews",
    "Changes in Board Composition",
    "Resignations Due to Oversight Concerns",
    "Conflicts of Interest Involving the Board",
    "Criticism of Corporate Governance Practices",
    "Reports Highlighting Oversight Issues",
    "Management Compensation Linked to Performance",
    "Shareholder-Friendly Capital Allocation",
    "Transparent Communication with Investors",
]

# Board Governance Themes
board_themes = [
    "Board member appointments and new director appointments",
    "Board member resignations and director departures",
    "Board meetings and governance proceedings",
    "Board member retirements and succession plans",
    "Death of board members and directors",
    "Board diversity and composition changes",
    "Board member compensation and salaries",
    "Board member firings and removals",
    "Board member health issues and medical leaves",
    "Board member absences and attendance issues"
]

# Time Ranges for Monitoring
date_periods = [
    ("2024-12-04", "2025-03-04"),
    ("2024-09-05", "2024-12-04"),
    ("2024-05-30", "2024-09-05"),
    ("2024-02-28", "2024-05-30"),
    ("2023-12-01", "2024-02-28"),
    ("2023-09-07", "2023-12-01"),
    ("2023-06-01", "2023-09-07"),
    ("2023-03-02", "2023-06-01"),
    ("2022-12-01", "2023-03-02"),
    ("2022-09-08", "2022-12-01"),
    ("2022-06-02", "2022-09-08"),
    ("2022-03-03", "2022-06-02"),
    ("2021-12-02", "2022-03-03"),
    ("2021-09-09", "2021-12-02"),
    ("2021-06-03", "2021-09-09"),
    ("2021-03-04", "2021-06-03"),
    ("2020-12-03", "2021-03-04"),
    ("2020-09-10", "2020-12-03"),
    ("2020-06-06", "2020-09-10"),
    ("2020-03-07", "2020-06-06"),
    ("2020-01-19", "2020-03-07")
]

# Trusted News Sources Configuration
use_trusted_sources = False  # Set True to activate the filter

trusted_sources = {
    'Washington Post': 'DC6F95', 'Washington Post Via Web': 'B6ACEE',
    'Bloomberg News': '208421', 'The Washington Post Blog': '471CDE',
    'BNN Bloomberg': '7490C8', 'CNN': '2435A4',
    'BBC': 'A61D00', 'FOX Business': '2D0020',
    'Fox News': '6F1A9E', 'Associated Press': 'D904DE',
    'Reuters': '751371', 'Business & Financial Times': '9207B2',
    'Wall Street Journal': 'AA6E89', 'CNBC': 'AA1167',
    'MarketWatch': '1E5E35', 'The Economist': '5B7D72',
    'Forbes': '22AC8B', "Investor's Business Daily": '15B968',
    'Business Insider': 'C75B8C', 'Al Jazeera (English)': 'CD85BA',
    'Deutsche Welle': '967022', 'The Guardian': 'C85B0B',
    'APN News': '9E4F25', 'The Jerusalem Post': '6E182B',
    'The Times of Israel': 'CA6D21', 'VOA Press Releases and Documents': 'C2B46E',
    'The Economic Times (IN)': 'C307F2', 'Challenges': 'C589CE',
    'Neue Zürcher Zeitung': '96ED6F', 'Le Temps (Geneva)': '7617F5',
    'SWI swissinfo.ch': 'AEF7A0', 'Business Sweden': '8D142A',
    'The Local': '400069', 'The Copenhagen Post': '34EA9E'
}

# Person and Company Details
persons_dict = {
    'Malcolm Wilson': {
        'id': 'W7W0CG',
        'variations': [
            'Malcolm Wilson',    # First Name + Last Name
            'M. Wilson',         # Initial + Last Name
            'Wilson, Malcolm',   # Last Name + comma + First Name
            'Malcolm W.'         # First Name + Initial
        ]
    }
}

company_name = 'GXO Logistics Inc.'
company_data = {'id': 'MSER6L'}

Search Mode Configuration

The workflow supports three different search modes, each offering different levels of precision and recall:

Search Mode Options

  • Strict Mode: Company entity and person must appear at chunk level - highest precision
  • Relaxed Mode: Only person name variations are used - highest recall
  • Relaxed Post Mode: Person search with company filtering at document level - balanced approach

Multi-Mode Monitoring Execution

The workflow executes all three search modes to provide comprehensive coverage analysis:
search_modes = ["strict", "relaxed", "relaxed_post"]
selected_sources = trusted_sources if use_trusted_sources else None
results_files = {}

for search_mode in search_modes:
    print(f"\nRunning {search_mode.upper()} mode monitoring...")

    with timer(f"{search_mode} mode execution"):
        # Build queries for current search mode
        queries, date_ranges, query_details = build_queries_for_monitoring(
            date_periods=date_periods,
            persons=persons_dict,
            company=company_data,
            news_search_mapping=news_search_mapping,
            board_themes=board_themes,
            search_mode=search_mode,
            sources=selected_sources,
            use_source_filter=use_trusted_sources
        )

        # Execute search
        search_results = run_search(
            queries=queries,
            date_ranges=date_ranges,
            sortby=SortBy.RELEVANCE,
            scope=DocumentType.NEWS,
            limit=100,
            only_results=True,
            rerank_threshold=None
        )

        # Filter and process results
        filtered_results = filter_documents_by_company(
            search_results=search_results,
            query_details_template=query_details,
            company_name=company_name,
            search_mode=search_mode
        )

        # Remove duplicates
        deduplicated_results = deduplicate_results(filtered_results)

        # Convert to structured DataFrame
        df_results = process_results_to_dataframe(deduplicated_results)

        # Save results
        source_type = "trusted" if use_trusted_sources else "all"
        output_file = os.path.join(
            output_dir,
            f"board_monitoring_{search_mode}_{source_type}_sources.csv"
        )
        df_results.to_csv(output_file, index=False, encoding="utf-8-sig")
        results_files[search_mode] = output_file

        print(f"{search_mode.upper()} mode: {df_results.shape[0]} documents saved to {output_file}")

Results Analysis and Visualization

After executing all search modes, load and compare the results:
# Load results from all three search modes
df_strict = pd.read_csv("output/board_monitoring_strict_all_sources.csv")
df_relaxed = pd.read_csv("output/board_monitoring_relaxed_all_sources.csv")
df_relaxed_post = pd.read_csv("output/board_monitoring_relaxed_post_all_sources.csv")

print("Search Mode Comparison:")
print(f"Strict Mode: {len(df_strict)} documents")
print(f"Relaxed Mode: {len(df_relaxed)} documents")
print(f"Relaxed Post Mode: {len(df_relaxed_post)} documents")

Search Mode Comparison:
  • Strict Mode: 34 documents
  • Relaxed Mode: 290 documents
  • Relaxed Post Mode: 34 documents

Quarterly Activity Visualization

Generate comprehensive quarterly analysis showing how coverage volume changes across time periods:
# Generate quarterly comparison visualization
common_tick_dates, common_tick_text = get_common_quarter_ticks("2020Q1", "2025Q4")
x_axis_range = [common_tick_dates[0], common_tick_dates[-1]]

# Create comprehensive comparison chart
fig_combined = plot_combined_monitoring_activity(
    df_strict,
    df_relaxed,
    df_relaxed_post,
    title="Malcolm Wilson Board Monitoring - Search Mode Comparison",
    x_range=x_axis_range,
    tick_vals=common_tick_dates,
    tick_text=common_tick_text
)

# Display the visualization
fig_combined.show()
Temporal Trend
The quarterly comparison chart provides insights into:

Temporal Pattern Analysis

  • Coverage Volume: Quarterly document counts for the monitored individual across different periods
  • Search Mode Comparison: Visual comparison of different approaches
  • Activity Peaks: Identification of periods with heightened media attention

Key Insights and Analysis

print("Coverage Analysis:")
print("• Strict Mode captures the most precise mentions with company co-occurrence")
print("• Relaxed Mode provides comprehensive coverage but may include false positives")
print("• Relaxed Post Mode offers balanced precision-recall by post-filtering for company mentions")

print("\nTemporal Patterns:")
strict_dates = pd.to_datetime(df_strict['Date']).dt.tz_localize(None).dt.to_period('Q').value_counts().sort_index()
relaxed_dates = pd.to_datetime(df_relaxed['Date']).dt.tz_localize(None).dt.to_period('Q').value_counts().sort_index()


if len(strict_dates) > 0 and len(relaxed_dates) > 0:
    print(f"• Peak activity period in Strict Mode: {strict_dates.idxmax()}")
    print(f"• Peak activity period in Relaxed Mode: {relaxed_dates.idxmax()}")

Coverage Analysis:
  • Strict Mode captures the most precise mentions with company co-occurrence
  • Relaxed Mode provides comprehensive coverage but may include false positives
  • Relaxed Post Mode offers balanced precision-recall by post-filtering for company mentions
Temporal Patterns:
  • Peak activity period in Strict Mode: 2024Q4
  • Peak activity period in Relaxed Mode: 2024Q4

Source Analysis

Analyze which news sources provide the most coverage for comprehensive media monitoring:
plot_top_sources(df_relaxed, person_name="Malcolm Wilson", top_n=5, interactive=True)
Top 5 sources

Export the result

Create and export comprehensive summary for reporting and further analysis:
# Create comprehensive summary
summary_data = {
    'Search Mode': ['Strict', 'Relaxed', 'Relaxed Post'],
    'Documents Retrieved': [len(df_strict), len(df_relaxed), len(df_relaxed_post)],
    'Unique Sources': [
        df_strict['SourceID'].nunique() if len(df_strict) > 0 else 0,
        df_relaxed['SourceID'].nunique() if len(df_relaxed) > 0 else 0,
        df_relaxed_post['SourceID'].nunique() if len(df_relaxed_post) > 0 else 0
    ]
}

summary_df = pd.DataFrame(summary_data)
summary_df.to_csv(os.path.join(output_dir, "monitoring_summary.csv"), index=False)

print("Final Summary:")
print(summary_df.to_string(index=False))

Conclusion

The Board Management Monitoring workflow provides a comprehensive automated framework for tracking individuals across news coverage with multiple precision levels. By systematically combining entity tracking with temporal analysis, this workflow transforms scattered media mentions into structured intelligence for governance and risk assessment. Through automated multi-mode monitoring, you can:
  1. Track individual exposure - Monitor specific board members and executives across comprehensive news coverage with varying levels of precision and recall
  2. Compare search strategies - Utilize multiple search modes to balance precision and recall based on specific monitoring requirements
  3. Analyze coverage volume - Quantify media attention intensity across quarterly periods and identify peaks in coverage
  4. Monitor source diversity - Track which news outlets provide the most coverage for comprehensive media monitoring
  5. Generate structured datasets - Export processed results with metadata for further qualitative analysis and reporting
The multi-mode approach ensures comprehensive coverage while providing flexibility to choose the appropriate precision level based on your monitoring requirements, making it a valuable tool for systematic individual tracking across news media.