> ## Documentation Index
> Fetch the complete documentation index at: https://docs.bigdata.com/llms.txt
> Use this file to discover all available pages before exploring further.

# How to Use Your Indexed Email Archive as a Search Grounding Layer

<p class="text-gray-400 text-base mb-8">April 21, 2026</p>

<div class="flex items-center gap-3 mb-6">
  <span class="text-sm font-medium text-primary">Claude · Bigdata.com · MCP · Private content API · Skills</span>
  <span class="text-gray-500">/</span>
  <span class="text-sm text-gray-400">5 minutes read</span>
</div>

Every research desk eventually hits the same wall: **your best signals live in private email, connectors, and uploads**, while **the numbers that justify a thesis live in market data**. Copy-pasting between inboxes and terminals does not scale. Neither does “ask the model” without a repeatable way to **discover what you actually have**, **ground answers in both corpora**, and **leave a memory** for the next run.

This post walks through a workflow we implemented end to end: **private newsletter analysis** plus **public and tearsheet enrichment**, using **your files** as a first-class source.

***

## The problem: two silos, one question

Analysts and operators routinely need questions that **span private material and market reality at once**, weighing what their own content imply against consensus, prices, and public news flow or turning their research documents into something they can **revisit with ease** instead of a static file on disk. Traditional tooling still splits **private retrieval** and **market intelligence**. The friction is not curiosity; it is **workflow**: discovery, parallel retrieval, synthesis, and persistence.

***

## The approach: one prompt, three parallel lanes, one memory loop

We combined four ideas:

1. **[Bigdata.com](https://bigdata.com) MCP**: `bigdata_search`, `find_companies`, company and sentiment tearsheets, and related tools for **public** grounding and **structured** company metrics ([Bigdata MCP documentation](/blog/mcp/claude_with_bigdata) describes the broader Claude + MCP + skills stack for institutional-style outputs).
2. **Bigdata private content API**: Connectors, tags, documents, and additional search requests with private content specific filters, so the agent searches in your personal available data, not a vague “upload folder”.
3. **A Claude skill (“playbook”)**: Routes when to list tags, when to filter search by tag, when to call `find_companies` before tearsheets, and how to upload artifacts back through **enrich-document**.
4. **Automatic tagging from email provenance**: So “only mail from this sender” becomes a **stable filter** instead of brittle full-text guesses.

The lifecycle is **one continuous pipeline**, from the user’s prompt through discovery and retrieval to an artifact stored back in Bigdata, rather than a patchwork of unrelated tools. After discovery, **three lanes run in parallel** (**Private**, **Public**, and **Companies**); their outputs merge into interpretation, then the artifact is **automatically** ingested so it is discoverable in later research, building a memory loop of curated reports.

<Frame>
  <img src="https://mintcdn.com/ravenpackinternational/Sf9MtziD0iLKyb1C/images/blog/proprietary-content/workflow-private-public-companies.png?fit=max&auto=format&n=Sf9MtziD0iLKyb1C&q=85&s=208cb05dca68f7301ebef0a51230e8cd" alt="Workflow diagram: user prompt, content and tag discovery, parallel private search, public grounding, company tearsheets, interpret and combine, report artifact, auto-stored in Bigdata, indexed in my_files" width="849" height="1024" data-path="images/blog/proprietary-content/workflow-private-public-companies.png" />
</Frame>

*Diagram: full lifecycle with parallel execution after content & tag discovery; synthesis feeds enrich and index the resulting report in Bigdata.*

### The prompt

The run was driven by this exact user prompt:

```
Analyze the emails received from dan@tldrnewsletter.com. Identify key players and generate a report comparing the most relevant events of each organization.
Enrich this report with quantitative data you can extract from company and sentiment tearsheet via Bigdata MCP.
The report concludes with a series of forward-looking indicators and a list of relevant events to keep in mind.
```

That instruction fixes the **private slice** (newsletter / sender scope), the deliverable format and the **market enrichment** (company + sentiment tearsheets over [Bigdata.com](https://bigdata.com) MCP).

### Discovery: inventory before retrieval

Before any heavy retrieval, the agent can use the **private content APIs** to explore what is actually indexed: **GET** `/contents/v1/tags`, **GET** `/contents/v1/documents`, and connector listings when needed. Those calls support the same questions a careful analyst would ask up front: *What can I filter on?*, *What content do I have access to?*, *Roughly what volume sits behind each tag or connector?* So the next step is grounded in real names and counts instead of guesswork. Once that inventory is clear, later search filters such as **`query.filters.tag.any_of`** can use **real tag names** from the platform, which keeps the newsletter or sender scope in the prompt aligned with identifiers that actually exist in the index.

### Private lane (teal): skill-shaped API calls

Private grounding runs through the **Search API** with category **my\_files**: Claude issues **`POST /v1/search`** with `query.text` shaped from the report spec, and narrows results with **`query.filters.tag.any_of`** when tags are known, including **sender-level tags** produced by **automatic EML metadata tagging** on ingest, so provenance becomes a filter instead of a brittle full-text workaround. For this newsletter exercise, that meant tags such as `from:dan@tldrnewsletter.com` to stay on the TLDR slice, **multiple thematic searches** under that constraint to maximize recall and then **ranking recurring organizations** from the retrieved set (OpenAI, Anthropic, Alphabet, Apple, Meta, Nvidia, fintech names, and others).

### Public lane (purple): MCP `bigdata_search`

The same themes that drive private `query.text` are mirrored into **`bigdata_search`** so the narrative is anchored in **web and market corpus** context: quotes, catalysts, and sector chatter that do not exist in your private content. Those public pulls ran **in parallel** with the private searches so the write-up could contrast newsletter emphasis with what the wider market corpus was highlighting.

### Companies lane (coral): resolve, then tearsheet

For each named issuer, **`find_companies`** returns the RavenPack entity id and public/private type. That unlocks **`bigdata_company_tearsheet`** and **`bigdata_sentiment_tearsheet`** (public names) so tables can carry **EPS surprises, multiples, consensus, KPIs**, and **media sentiment scores**. After the organization shortlist was stable, Claude **pulled tearsheets** for key public filers and a **private-company** profile where data was available, so the tables were filled from tools, not hand-typed placeholders.

### Deliverable: Report, workflow notes, and the repeatable contract

With retrieval and enrichment in place, Claude wrote the prompted report. See the generated artifact below:

<Frame caption="Sample report (inline preview)">
  <iframe src="https://mozilla.github.io/pdf.js/web/viewer.html?file=https%3A%2F%2Fraw.githubusercontent.com%2FBigdata-com%2Fbigdata-docs-resources%2Fmain%2Fmcp%2Freport-examples%2Ftldr_report_2026-04-10.pdf" title="Sample report (inline preview)." />
</Frame>

When the report reads well and you want it to **live as first-class memory**, ask Claude to **publish or enrich** the report through the usual content path (for example **enrich-document** in your skill). The platform then **processes and indexes** the artifact so the next research session can **search it alongside newsletters and tearsheets** instead of losing it as a one-off export. You can also enrich it with custom tags and configurations that fit your research workflow.

The important product is not any single file format, it is the **repeatable contract**: discovery, parallel lanes, merge, **enrich** and **indexed memory**.

***

## Why automatic EML tagging matters

Email connectors are noisy. **Automatic EML metadata tagging** turns “who sent this?” into a **first-class filter** in the Search API. The user’s hint "ground only mail from this sender" becomes a **tag constraint**, not a fragile regex over bodies. That is how a **newsletter connector** stays usable at scale: each issue inherits predictable and discoverable tags.

***

## Skills as the glue in Claude

We ran this in **Claude** with the Bigdata MCP connector enabled and a **packaged skill** that encodes the playbook, similar idea as the ecosystem’s [Financial Research Analyst skill](/skills-reference/mcp-helpers/financial-research-analyst). The pattern is always **structured routing + tool contracts + provenance in the answer**.

Our skill encodes:

* **Inventory-first** when tags or connectors are unknown.
* **`find_companies` before tearsheets** to avoid wrong-entity financials.
* **Enrich upload** when the user wants the artifact to join **`my_files`** with explicit tags for the next “memory workload.”

***

## What comes next

This is only the beginning. We are actively expanding the library of skills and workflows available through Bigdata MCP. If your team has a research workflow that could benefit from this approach, email us at [support@bigdata.com](mailto:support@bigdata.com). We would love to hear what you are building or what we can build together.

***

## Relevant links

* [Professional-Grade Financial Reports with Claude, Bigdata MCP, and Skills](/blog/mcp/claude_with_bigdata): Claude, MCP, and skills stack for institutional-style outputs.
* [Financial Research Analyst skill](/skills-reference/mcp-helpers/financial-research-analyst): packaged playbook for research and writing with Bigdata tools.
* [Claude MCP integration](/mcp-reference/oauth-integrations/claude-mcp-integration): enabling the Bigdata connector in Claude.
* [Content API introduction](/api-rest/content_introduction): connectors, uploads, and private document operations.
* [Upload your own content](/getting-started/upload_your_own_content): tags, listing documents, and using tags in search.
* [List tags](/api-reference/tags/list-tags): **GET** `/contents/v1/tags`.
* [List documents](/api-reference/documents/list-documents): **GET** `/contents/v1/documents`.
* [Search documents](/api-reference/search/search-documents): **POST** `/v1/search`, including `my_files` and `query.filters.tag`.
* [Search in uploaded files](/how-to-guides/search_in_uploaded_files): end-to-end upload and search over private content.

***

<div class="flex items-center gap-3 mt-8">
  <div class="w-10 h-10 rounded-full overflow-hidden">
    <img src="https://mintcdn.com/ravenpackinternational/Sf9MtziD0iLKyb1C/images/blog/authors/victor_pimentel_naranjo.png?fit=max&auto=format&n=Sf9MtziD0iLKyb1C&q=85&s=da0b16935d37b8a2dfb2ff903fa46ad6" alt="Víctor Pimentel Naranjo" class="author-avatar-image" width="512" height="512" data-path="images/blog/authors/victor_pimentel_naranjo.png" />
  </div>

  <div>
    <p class="text-sm font-semibold m-0">Víctor Pimentel Naranjo</p>
    <p class="text-xs text-gray-400 m-0">Senior Product Manager, Team Lead</p>
  </div>
</div>
