Documentation Index
Fetch the complete documentation index at: https://docs.bigdata.com/llms.txt
Use this file to discover all available pages before exploring further.
Downloads
The BDDF schema defines all supported fields and their data types. You can download it and the example files below:- Latest BDDF JSON Schema - defines all supported fields and data types - download
- Expert Interview Example - sample BDDF file for expert-interview content - download
- Report Example - sample BDDF file for report-type content - download
Tips & Tricks
A few tips to make your content easier to process — both for humans reading it and for machines working behind the scenes: Be mindful of paragraphs Well-structured paragraphs improve readability and also give NLP (Natural Language Processing) systems more context to work with.- For
text/plainandtext/markdown, we recommend separating paragraphs with two newlines(\n\n) - For
application/html, use proper<p>tags
contentBlock for the title should usually just be a short, clear string — plain text is best here rather than formatted markup.
Use the right content_type for complex content
If you need to express rich structures like tables, lists, or headings, don’t force them into plain text. Use either:
- application/html — if you want precise control with tags, or
- text/markdown — if you prefer lightweight formatting
contentBlock. However there are scenarios in which it is recommended to split the body into multiple content blocks:
- When the document has sections
- Break the body into one
contentBlockper section - Use the
sectionfield to identify where each block belongs
- Break the body into one
- When the document is paginated
- For paged formats (like PDFs), create one
contentBlockper page - Use the
pagesfield to indicate which page(s) the content came from
Simple but correct example
Simple but correct example
- Text is already formatted in markdown
- No section or page metadata
- No change in content_type
Different content types mixed - separate content blocks expected
Different content types mixed - separate content blocks expected
- The table of contents uses Markdown formatting, while the rest of the text is plain text, so a separate content block is needed.
Avoid spanning multiple pages ❌
Avoid spanning multiple pages ❌
- As a general rule, start a new content block at the beginning of each new page.
- Tables are an exception to this rule, they should remain in a single block even if they span multiple pages.
Quick Links
- Onboarding Overview - typical onboarding flow
- Quick Start Guide - step by step guide to your first BDDF file
- Bigdata Document Format - in-depth schema definition with examples