Why It Matters
In commodity markets like crude oil, timely awareness of trending news can directly influence trading strategies, hedging decisions, and market forecasts. As news cycles accelerate, being able to identify not just what is being talked about but also its potential market impact is essential for staying ahead of price-moving developments.What It Does
This workflow identifies, verifies, clusters, and summarizes the most relevant and impactful news trends in the crude oil market using the Bigdata API for retrieval and large language models for topic analysis. It produces both daily market reports and structured datasets that can be used for monitoring or backtesting.How It Works
The notebook implements a four-step agentic workflow built on Bigdata API:- Lexicon Generation of industry-specific jargon to maximize recall in news retrieval
- Content Retrieval via the Bigdata API, splitting searches into daily windows and parallelizing keyword lookups for speed
- Topic Clustering & Selection to verify, group, and summarize news into ranked trending topics, scoring each for trendiness, novelty, impact, and magnitude
- Custom Report Generation in the form of a daily digest with a configurable ranking system, supported by granular news sources for verification
A Real-World Use Case
This cookbook illustrates the full workflow through a practical example: tracking daily stories in the crude oil market during the Israel-Iran tensions of June 2025. You’ll learn how to transform unstructured news into structured, ranked insights on market-moving narratives. Ready to get started? Let’s dive in!Prerequisites
To run the Daily Digest Crude Oil workflow, you can choose between two options:-
💻 GitHub cookbook
- Use this if you prefer working locally or in a custom environment.
- Follow the setup and execution instructions in the
README.md. - API keys are required:
- Option 1: Follow the key setup process described in the
README.md - Option 2: Refer to this guide: How to initialise environment variables
- ❗ When using this method, you must manually add the OpenAI API key:
- ❗ When using this method, you must manually add the OpenAI API key:
- Option 1: Follow the key setup process described in the
-
🐳 Docker Installation
- Docker installation is available for containerized deployment.
- Provides an alternative setup method with containerized deployment, simplifying the environment configuration for those preferring Docker-based solutions.
Setup and Imports
Below is the Python code required for setting up our environment and importing necessary libraries.Defining Your Daily Digest Context and Parameters
To perform a trending topics analysis, we need to define several key parameters:- Main Theme (
main_theme): The main topic, asset class, or context to analyze (e.g. Crude Oil) - Point of View (
point_of_view): The additional instructions to the LLM-based Impact and Magnitude generation. Use this parameter to add your domain expertise and contextualize your own definition of Financial Materiality - Time Period (
start_dateandend_date): The date range over which to run the search - Document Type (
document_type): Specify which documents to search over (transcripts, filings, news) - Model Selection (
llm_model): The AI model used for semantic analysis and topic classification - Frequency (
frequency): The frequency of the date ranges to search over. Supported values:Y: Yearly intervalsM: Monthly intervalsW: Weekly intervalsD: Daily intervals (default)
- Document Limit (
document_limit): The maximum number of documents to return per query to Bigdata API
Instantiating the Lexicon Generator
In this step, we identify the specialized industry-specific jargon relevant to the crude oil market to ensure a high recall in the content retrieval.Content Retrieval from Bigdata Search API
In this section, we perform a keyword search on the news content with the Bigdata API to retrieve documents, splitting the search over daily timeframes and multi-threading the content search on the individual keywords for speed purpose. With the list of market-specific keywords parameters, you can leverage the Search functionalities in bigdata-research-tools, built with Bigdata API, to run search at scale against news documents.Topic Clustering and Summarization
In this step, we perform topic modelling using a large language model to verify and cluster the news. Then, the summarization ensures topic selection identifying the top trending news for crude oil, while deriving advanced analytics to quantify the trendiness, novelty, impact and magnitude of the trending topics.Topic Scoring
Trendiness and Novelty Scores: We derive analytics related to the trendiness of the topic based on the news volume, and the novelty of the topic based on the changes in daily summaries.Generate a Custom Daily Digest
In this step, we rank the topics, allowing the user to customize the ranking system to reindex the news, based on their trendiness, novelty, and financial materiality on crude oil prices.
Export the Results
Export the data as CSV files for further analysis or to share with the team.Conclusion
The Daily Digest provides a comprehensive framework for identifying and quantifying trending topics in the crude oil market. By leveraging advanced information retrieval and LLM-powered analysis, this workflow transforms unstructured data into actionable market intelligence. Through the automated analysis of crude oil market dynamics, you can:- Identify trending topics - Discover the most relevant and impactful news trends in the crude oil market through systematic analysis
- Assess market impact - Use scoring methodology to evaluate the potential impact and magnitude of news developments on crude oil prices
- Generate daily reports - Create professional HTML reports with ranked topics and comprehensive market summaries
- Export structured data - Obtain structured datasets for backtesting and further quantitative analysis

