Here you’ll find a detailed overview of what to expect during the onboarding process. We recommend giving it a read at least once - it’ll help you understand what’s ahead, what we’ll need from you, the kind of support you can expect from us, and how timelines typically unfold. That said, if you’re eager to dive in right away, feel free to jump straight to the Quick Start Guide and come back to this page when you need more context. Our onboarding process will typically go through the following five steps:

Evaluation
Preparation
Validation
Data load
Launch! 🚀

Evaluation

During the evaluation stage, we aim to understand your data as thoroughly as possible. The more we know upfront, the better prepared we’ll be to ingest and work with your content effectively later on. Here are some of the things we are typically interested in:

Original Content Format: XML, JSON, PDF, HTML?
Content Type: Text, Audio, Video?
Format Consistency: Is the format consistent across the entire archive?
Historical Coverage: How many years of history are available?
Known Gaps: Are there any known missing periods or data?
Archive Volume: Total size or document count in the archive.
Data Anomalies: Any known spikes, dips, or anomalies in volume and why?
Delivery Frequency: Real-time, Intraday, Daily, Weekly
Expected Volume: Average number of documents expected per day (depending on the frequency)
Metadata: What document metadata do you have available

It won’t just be about answering questions - we’ll also ask for a set of document samples so we can really dig in. We work collaboratively to address any gaps and ensure your content delivers maximum value to our clients. These are the key things we review every time we onboard new partner content. It might look like a lot at first, but don’t worry - we’ve got a smooth, well-practiced process in place to handle it all. Check out our Validation Steps page for a closer look at the steps we follow.

Preparation

This part is a two-way street:

You’ll be focused on preparing your documents - don’t worry, there’s plenty of guidance coming up to help you with that. At the end of this page you’ll find a link to our Quick Start Guide,
On our side, we’ll be setting up the pipeline: provisioning infrastructure, creating internal IDs, and preparing an SFTP account for you. Most of this is fully automated, but we will make sure the setup is tailored to your specific content type. We recommend you start from SFTP page as we’ll need a few pieces of info to get started with the pipeline setup.

Validation

Now that you’re ready to prepare BDDF documents, we’d like to validate a few of them so we’ll ask that you share a small sample of the actual content. The type of sample can vary depending on your content volume, for example:

all documents you have related to a specific group of companies
all content from the most recent month
all documents from a certain document type (for example all annual reports) for a given set of companies
or whatever else makes sense for us to validate the format, the content, the coverage etc.

However, we’d appreciate receiving an early version of your BDDF as soon as it’s available, even if the sample content isn’t finalized yet. This first version doesn’t need to be complete or fully polished. Even a rough or manually created file helps us begin validation early and surface any structural or mapping issues up front. Where to send it: you can upload it to the SFTP historical data folder, attach it to an email, or send it over Slack - whatever works best for you. Want to see the exact steps we follow? Head over to our Validation Steps page for all the details. During validation, we’ll likely identify a few key improvement areas. Once those updates are made and the BDDF files are in good shape, we’ll move ahead with the data load.

Data load

Our preferred data delivery mechanism is through an SFTP server - we recommend you start your journey on the SFTP page so we can get everything sorted and prepared for once you are ready to start uploading . We’re currently working on supporting bundled s - a single large JSON file containing multiple documents. Until that’s ready, please continue sending your documents individually.

Real-time Delivery

As you create them, upload each one separately to the SFTP, and our processing pipeline will automatically pick them up. Just a quick reminder: you’ve been provided with two separate SFTP usernames. Make sure to use the one designated for real-time delivery when sending real-time documents. The distinction is subtle, but important - it determines how we handle the files on our end. Real-time documents will be picked up and processed immediately, without overloading the system. On the other hand, loading (large) historical archives is something we like to do in a semi-manual and controlled manner. And that’s it! Your documents will begin showing up on Bigdata.com, and you’ll be able to start interacting with them right away. To get the best possible results, though, we’ll also need your historical archive.

Historical Data Delivery

Once real-time ingestion is activated and confirmed to be working smoothly, we can begin planning the onboarding of your historical archive. It’s important to note that while real-time ingestion doesn’t require perfect data to get started, onboarding the full historical archive is different. We want to try get it right the first time. If there are any major improvements or blockers, we’d prefer to address them upfront, before ingestion begins. This is especially critical for large archives spanning several years or containing millions of documents, as re-processing such large volumes would be time-consuming for both sides. After the validation phase, we will have aligned on the key requirements that must be in place before we move forward with ingesting the archive. Once they are also done, we can get started. Historical Data Onboarding Process

Archive Scope - as with the initial sample, we may begin with a subset of your archive. This could include a defined number of years or content for a specific set of companies. We’ll confirm the exact scope
Date Range Definition - we’ll provide you with the specific date range we’re looking to ingest
File Format Requirements - please upload each document as an individual file. At this time, we’re unable to process archives that are bundled together in a single file or compressed (e.g., ZIP files).

Launch 🚀

Once all content has been successfully onboarded, real-time data is flowing, and historical data is fully available, we move into the final stage: preparing for launch. This includes creating marketing materials, defining the scope of the data package, and getting everything ready for listing in the Bigdata Store. Once that’s in place, we’re all set to go live and make the package available for customers.

Now that you know what to expect, we are ready to get started. Head to Quick Start Guide and go follow our step-by-step guide towards your first upload. Here’s to plunder and no blunder, may ye find fair winds! ⛵

Introduction

Getting Started

Format Requirements

Upload Mechanisms

Developer Resources

Onboarding Overview

Evaluation

Preparation

Validation

Data load

Real-time Delivery

Historical Data Delivery

Launch 🚀

Introduction

Getting Started

Format Requirements

Upload Mechanisms

Developer Resources

​Evaluation

​Preparation

​Validation

​Data load

​Real-time Delivery

​Historical Data Delivery

​Launch 🚀

Evaluation

Preparation

Validation

Data load

Real-time Delivery

Historical Data Delivery

Launch 🚀