> ## Documentation Index > Fetch the complete documentation index at: https://docs.bigdata.com/llms.txt > Use this file to discover all available pages before exploring further. # How to Use Your Indexed Email Archive as a Search Grounding Layer

April 21, 2026

Claude · Bigdata.com · MCP · Private content API · Skills / 5 minutes read

Every research desk eventually hits the same wall: **your best signals live in private email, connectors, and uploads**, while **the numbers that justify a thesis live in market data**. Copy-pasting between inboxes and terminals does not scale. Neither does “ask the model” without a repeatable way to **discover what you actually have**, **ground answers in both corpora**, and **leave a memory** for the next run. This post walks through a workflow we implemented end to end: **private newsletter analysis** plus **public and tearsheet enrichment**, using **your files** as a first-class source. *** ## The problem: two silos, one question Analysts and operators routinely need questions that **span private material and market reality at once**, weighing what their own content imply against consensus, prices, and public news flow or turning their research documents into something they can **revisit with ease** instead of a static file on disk. Traditional tooling still splits **private retrieval** and **market intelligence**. The friction is not curiosity; it is **workflow**: discovery, parallel retrieval, synthesis, and persistence. *** ## The approach: one prompt, three parallel lanes, one memory loop We combined four ideas: 1. **[Bigdata.com](https://bigdata.com) MCP**: `bigdata_search`, `find_companies`, company and sentiment tearsheets, and related tools for **public** grounding and **structured** company metrics ([Bigdata MCP documentation](/blog/mcp/claude_with_bigdata) describes the broader Claude + MCP + skills stack for institutional-style outputs). 2. **Bigdata private content API**: Connectors, tags, documents, and additional search requests with private content specific filters, so the agent searches in your personal available data, not a vague “upload folder”. 3. **A Claude skill (“playbook”)**: Routes when to list tags, when to filter search by tag, when to call `find_companies` before tearsheets, and how to upload artifacts back through **enrich-document**. 4. **Automatic tagging from email provenance**: So “only mail from this sender” becomes a **stable filter** instead of brittle full-text guesses. The lifecycle is **one continuous pipeline**, from the user’s prompt through discovery and retrieval to an artifact stored back in Bigdata, rather than a patchwork of unrelated tools. After discovery, **three lanes run in parallel** (**Private**, **Public**, and **Companies**); their outputs merge into interpretation, then the artifact is **automatically** ingested so it is discoverable in later research, building a memory loop of curated reports. Workflow diagram: user prompt, content and tag discovery, parallel private search, public grounding, company tearsheets, interpret and combine, report artifact, auto-stored in Bigdata, indexed in my_files

Workflow diagram: user prompt, content and tag discovery, parallel private search, public grounding, company tearsheets, interpret and combine, report artifact, auto-stored in Bigdata, indexed in my_files

*Diagram: full lifecycle with parallel execution after content & tag discovery; synthesis feeds enrich and index the resulting report in Bigdata.* ### The prompt The run was driven by this exact user prompt: ``` Analyze the emails received from dan@tldrnewsletter.com. Identify key players and generate a report comparing the most relevant events of each organization. Enrich this report with quantitative data you can extract from company and sentiment tearsheet via Bigdata MCP. The report concludes with a series of forward-looking indicators and a list of relevant events to keep in mind. ``` That instruction fixes the **private slice** (newsletter / sender scope), the deliverable format and the **market enrichment** (company + sentiment tearsheets over [Bigdata.com](https://bigdata.com) MCP). ### Discovery: inventory before retrieval Before any heavy retrieval, the agent can use the **private content APIs** to explore what is actually indexed: **GET** `/contents/v1/tags`, **GET** `/contents/v1/documents`, and connector listings when needed. Those calls support the same questions a careful analyst would ask up front: *What can I filter on?*, *What content do I have access to?*, *Roughly what volume sits behind each tag or connector?* So the next step is grounded in real names and counts instead of guesswork. Once that inventory is clear, later search filters such as **`query.filters.tag.any_of`** can use **real tag names** from the platform, which keeps the newsletter or sender scope in the prompt aligned with identifiers that actually exist in the index. ### Private lane (teal): skill-shaped API calls Private grounding runs through the **Search API** with category **my\_files**: Claude issues **`POST /v1/search`** with `query.text` shaped from the report spec, and narrows results with **`query.filters.tag.any_of`** when tags are known, including **sender-level tags** produced by **automatic EML metadata tagging** on ingest, so provenance becomes a filter instead of a brittle full-text workaround. For this newsletter exercise, that meant tags such as `from:dan@tldrnewsletter.com` to stay on the TLDR slice, **multiple thematic searches** under that constraint to maximize recall and then **ranking recurring organizations** from the retrieved set (OpenAI, Anthropic, Alphabet, Apple, Meta, Nvidia, fintech names, and others). ### Public lane (purple): MCP `bigdata_search` The same themes that drive private `query.text` are mirrored into **`bigdata_search`** so the narrative is anchored in **web and market corpus** context: quotes, catalysts, and sector chatter that do not exist in your private content. Those public pulls ran **in parallel** with the private searches so the write-up could contrast newsletter emphasis with what the wider market corpus was highlighting. ### Companies lane (coral): resolve, then tearsheet For each named issuer, **`find_companies`** returns the RavenPack entity id and public/private type. That unlocks **`bigdata_company_tearsheet`** and **`bigdata_sentiment_tearsheet`** (public names) so tables can carry **EPS surprises, multiples, consensus, KPIs**, and **media sentiment scores**. After the organization shortlist was stable, Claude **pulled tearsheets** for key public filers and a **private-company** profile where data was available, so the tables were filled from tools, not hand-typed placeholders. ### Deliverable: Report, workflow notes, and the repeatable contract With retrieval and enrichment in place, Claude wrote the prompted report. See the generated artifact below: