Skip to main content

June 23, 2026

Data Science, Workflows, AI, Search/6 minutes read
Bigdata Briefs is a service with a simple job: given a company and a date, generate a short list of novel, materially relevant bullet points about that company.

GitHub

Setup, configuration, and API reference

Web App

Try it live: real companies, real briefs, updated daily.
The first version worked by running each entity through a fixed set of research topics and applying a novelty filter on top. Building on that foundation, we designed a second version with three specific improvements. First, we wanted retrieval to adapt to each entity rather than lean on a fixed taxonomy. The topics and concepts driving the search should reflect what is actually happening in the news for that specific company at that moment, built on the fly instead of pulled from a predefined list. Second, novelty is inherently non-binary: a topic may recur over time while still containing incremental but meaningful developments. Capturing these degrees of novelty requires more than a simple duplicate-or-not decision, as the relevance and significance of new information often depend on context. Because modern reasoning models can reliably determine whether two facts convey the same underlying information despite differences in phrasing, while also assessing their relative relevance, they are the right tool for this evaluation. This enables a nuanced assessment of novelty that rule-based filters cannot easily provide. Third, we wanted a novelty system that works reliably without requiring previous runs to exist, so coverage on a new entity is solid from day one. To get there, novelty is checked at the news level rather than against prior bullet points. This post explains how each of those three pieces works in practice.

The Three Improvements

1. What to search for

A single broad query returns results that cluster around whatever dominated the news flow. If there was a product announcement, those chunks crowd out the supply chain story, the regulatory development, and the executive commentary that happened in the same period. The service avoids this by separating discovery from retrieval. A broad exploratory pass runs first to understand which themes are active for this entity in this window. An LLM then identifies up to five distinct topic areas from those results, each with the specific concepts that matter within it. Targeted retrieval runs separately per theme, so the evidence going into each section of the brief reflects what actually happened across all areas, not just the most-covered one.

2. How to keep bullets from repeating each other

Each theme gets its own generation step: the LLM reads that theme’s evidence and writes its bullets. Bullets from earlier themes are passed into the prompt so the model avoids repeating the same information. Evidence naturally overlaps across themes, so without this cross-theme hint the same fact can surface twice, phrased differently but carrying the same information. Every generated bullet is then scored for relevance on a 1-to-5 scale, based on its actionability, materiality, and direct impact on the entity. Bullets that don’t clear the threshold are dropped before the novelty stage runs.

3. How to enforce novelty across runs

Text similarity alone is not enough: the same fact can be phrased in many ways, a statistic can shift slightly while representing the same trend, a quote can be paraphrased. The service uses a two-stage approach that escalates precision as needed. Stage 1: Embedding filter. Each bullet is embedded and used to retrieve the most similar previously published bullets for the same entity from a vector index. An LLM reads the retrieved candidates and makes a coarse decision: keep, discard, or flag for deeper inspection. Clear repeats are discarded here. Everything else, whether kept as-is or flagged as partially novel, moves to the second stage. Stage 2: Claim-level search judgment. Each bullet is broken down into its individual factual claims. For each claim, a targeted search retrieves the evidence most relevant to that specific assertion, and an LLM judges whether the claim is new, already known, or somewhere in between. The claim-level judgments determine what happens to the bullet:
  • Publish: all claims are new.
  • Rewrite: the bullet contains genuine new information alongside already-known context. The rewriter foregrounds the novel part and subordinates the old: “While X has been a known concern, the company disclosed Y in its Q1 filing.”
  • Discard: no claim adds new information, or a claim is an inference not supported by evidence.
Rewritten bullets are rescored for relevance and dropped if the rewrite produced low-value output. Once the bullets are finalized, the service can optionally generate a one-sentence narrative, a brief editorial summary of the day’s output, useful as a headline or preview before reading the full brief.

What This Enables

Recurring credit monitoring: A fixed-income desk running weekly updates on 150 issuers needs each brief to contain only what changed. The novelty layer ensures a leverage ratio from last week’s brief does not reappear unless there is a new development that changes its significance. Earnings coverage across a book: A PM covering 40 names needs the same structured brief on every company before its print date, guidance track record, estimate momentum, material developments, applied consistently regardless of how much media coverage each name receives in a given week. Full-portfolio event coverage: one run surfaces the material developments on every name in the portfolio, down to the low-coverage ones carried by a single article that would otherwise be drowned out by whatever dominates the news flow. Auditable output at scale: Because every bullet carries its evidence trace and every filtered bullet carries its rejection reason, the output can be reviewed at the claim level rather than the document level. That is the bar institutional workflows have to clear.

How you use Briefs

A brief is only useful if it reaches you where you already work. There are three ways to read briefs and to trigger new ones, from zero-code to fully programmatic:
  • The Web App, a ready-to-use desk you open in the browser.
  • Claude integration via MCP, so an assistant can generate and reason over briefs in your own workflows.
  • Direct requests to the running service, for teams that want to wire briefs into their own systems.

The Web App

The service ships with a desk-style Web App, so a portfolio can be monitored without writing a single line of code. You build a watchlist once, and every morning the app shows you what changed. The Brief landing view of the Bigdata Briefs app The Brief is the landing view. On the left, every company in your portfolio with its ticker and today’s item count; one click loads that company’s brief. On the right, the Portfolio Brief ranks the top five names by media-attention momentum, with a toggle between the headline bullets and a one-line narrative per company. A calendar strip underneath surfaces upcoming earnings calls and conferences across the portfolio. A company tearsheet in the Bigdata Briefs app Click any company and you land on its tearsheet: the editorial narrative up top, the published bullets grouped by theme with every source citation inline (publisher, headline, excerpt), and a right rail tracking the company’s media attention and sentiment over time. An Audit tab shows every bullet the service considered, published or discarded, with the reason behind each decision, so nothing is a black box. Per-run cost breakdown in the Bigdata Briefs app And because covering a universe at scale has a real cost, every run is fully accounted for. The Costs view breaks a single run into its main drivers: the compute-token cost of the LLM calls that discover themes and write the bullets, the embeddings cost of the novelty checks, and the grounding-token cost of validating every bullet against its cited sources. Each is attributed to the stage that incurred it, so the economics of running the service are transparent from day one.

Claude integration via MCP

Briefs is also exposed as an MCP server, so Claude can generate and read briefs directly, on demand or on a schedule, and feed the output straight into whatever comes next. That turns briefs from something you open into something an assistant works with on your behalf. Bigdata Briefs MCP integration: an automated, ranked price-impact report generated in Claude Code In this example, a scheduled routine in Claude Code runs the Briefs tool across an entire watchlist and then works directly with what it gets back: it reads each brief, ranks the companies where the day’s novel developments are most likely to move the stock, and pulls in a sentiment score for each name from another Bigdata tool, with every signal linked back to the source it came from. There is no dashboard to watch: the report arrives on its own, already prioritized, deduplicated against what was already known, and ready to act on. The same pattern works anywhere an MCP client does, from a research assistant in your IDE to a fully automated reporting pipeline.

Direct requests to the running service

For teams that want briefs inside their own systems, the running service exposes a simple API: trigger a run for any company or universe, poll for completion, and pull back the published bullets and narratives as structured data. It is the most flexible and most technical of the three options, and the place to look when you want to embed briefs in your own stack.

Getting Started

Briefs is open and self-contained: you can run it against your own entity list with a Bigdata API key and an OpenAI API key. The reference guide is the single place for the technical detail: the full architecture and configuration, the API endpoint reference with request examples, and step-by-step instructions for running your first brief locally.
Francesco Ricigliano

Francesco Ricigliano

Data Scientist