GROUNDING
messages, each containing one or more references that link a specific span of the
answer text to a specific source. Used correctly, GROUNDING lets you render
verifiable inline citations and footnote-style source lists. Used incorrectly, it
silently mis-attributes claims to the wrong sentences.
This guide explains the structure of a GroundingReference, the buffering rule
clients must follow, how grounding interacts with AUDIT traces, and a complete
worked example that turns a streamed response into a Markdown answer with inline
citations.
Anatomy of a GroundingReference
A GROUNDING message carries a references array. Each reference describes one
attribution.
| Field | Description |
|---|---|
start | Inclusive character offset where the cited span begins in the cumulative answer text. |
end | Exclusive character offset where the cited span ends. |
tool_name | The tool that produced this reference (commonly "search"). |
audit_id | Identifier linking to a specific entry in a preceding AUDIT message. |
source | The cited document or web result. Populated only for search results; null for every other tool (earnings calendar, company tearsheet, charts, and so on), where the citation is anchored at the whole-tool level via audit_id rather than to a single document. |
source is one of two shapes, discriminated on its type field
— both produced only by the search tool:
type: "BIGDATA"— a document from the Bigdata.com content platform.type: "EXTERNAL"— a result from an external web source.
BigdataDocument and
ExternalResult schemas), see the
Research Agent API reference
rather than relying on a copy here.
A null source is normal and expected — it does not mean the reference is
broken. It means the citing tool is not a search tool, so the reference grounds
an answer span to that tool’s result as a whole. Tie it back to the matching
AUDIT trace through audit_id, and render it as a tool-level attribution (or
leave it out of a document-only citation list).
The buffering rule
Concretely, build the answer text by concatenatingANSWER.content values in
arrival order, exactly as they appear. Do not trim, normalize, or rewrap the text
between chunks: any character-level edit invalidates downstream offsets. Once the
stream is fully consumed (or the answer phase is finished), slice the buffer using
the reference offsets to extract the cited spans.
ANSWER chunks; the answer
text up to ref["end"] is guaranteed to exist by the time you apply the slice,
which is why post-stream resolution is the safest pattern.
Linking AUDIT traces
Theaudit_id field on each reference matches the tool_id field on entries in
preceding AUDIT messages. This linkage lets you retrieve the full tool execution
context for a citation — for example, the search query that produced the source
document.
Worked example: render inline citations
The script below executes a Research Agent request, buffers the streamed answer, collects GROUNDING references, and produces a Markdown document where each cited span is annotated with a numeric superscript that points at a footnote-style source list.[^1], [^2], etc. markers at the end of
each cited span, followed by a footnote list whose entries use the Bigdata.com
brand-standard format Source name - YYYY-MM-DD linked to the source URL.
_build_markdown skips references whose source is null — the tool-level
citations described above. If you want to surface those too, render them from
their tool_name and audit_id rather than expecting a document.
Deduplication strategies
A single source often supports multiple claims in the answer, so the same source may appear in many references. Most renderings should map each unique source to a single footnote number (as the example above does), so users see one entry per source rather than a duplicate-laden list. Good deduplication keys, in order of preference:source.id— present onBIGDATAandEXTERNALsources alike; the most reliable identifier.source.url— stable forEXTERNALsources; not always present forBIGDATA.source.hd(headline) — the fallback when no id or url exists; can collide between unrelated documents that share a title.
audit_id: a single tool call can return many sources, so
two references with the same audit_id may point at different documents.
Source attribution format
The Bigdata.com brand-standard inline format for source citations is:BIGDATA sources, derive
the name from source.src_name and the date from source.ts. For EXTERNAL
sources, derive both from the nested source.action object. Use ISO-style
YYYY-MM-DD when locale-aware month abbreviations are not feasible.
Reports and end-of-response source sections should additionally aggregate every
unique source into a “Sources” (or “Data sources”) section at the bottom of the
output, matching the same format.
Next steps
Streaming responses
Full reference for every message type in the SSE stream.
Error handling
How to handle ERROR, TOOL_ERROR, and LLM_RETRY events robustly.