The most straightforward delivery mechanism, hence we’ve picked it to be the first one we support: we’ll create a root folder for you on our secure FTP server, arrange credentials and it’s done - you can start pushing the documents whenever you are ready. Please read the section about best practices, it might give you ideas on how to best organize your content.

To securely exchange data with our system via FTP, follow the steps below:

  1. Send Your SSH Public Key: Please generate a SSH key pair and send us your public key (id_ed25519.pub or similar) via a secure channel (e.g., encrypted email or a secure file-sharing tool). This key will be used to authenticate your access to the SFTP server.

    ⚠️ Do not send your private key. If you are not familiar with public/private key authentication, you can read about it on SSH.com

  2. SFTP Setup: Once we receive your API key, we will:

    • Set up your SFTP user account.
    • Create a dedicated folder for you on our secure SFTP server.
    • Associate your public key with your user for passwordless authentication.
  3. Receive Connection Details: We will securely send you the SFTP connection information:

    • Hostname
    • Port
    • Username Once you receive these, you can connect using any standard SFTP client.
  4. Start Uploading: You can begin uploading files into your assigned SFTP folder. Files placed here will be automatically picked up and processed by our Bigdata system.


Best practices for maximizing your FTP use

  1. Organize by Source Create a dedicated subfolder for each content source within your SFTP root directory. For example, if you gather content from multiple websites, use a separate folder for each. If you already use internal source IDs, feel free to mirror that structure here - and also include the ID in the appropriate metadata field within the Bigdata Document Format.

  2. Separate Real-Time and Historical Data Within folders for each content source, it’s a good practice to create a subfolder in which you will deliver historical data to us. The reason behind is how we configure our system - every time you upload something it will trigger our collector to fetch and process it - this is perfect for real-time documents but can overload the system if you deliver large historical archives. So, if you deliver it in a separate folder (and let us know about it via email), we can organize better to process it without overloading the system.

  3. Use Clear and Consistent File Naming Choose filenames that clearly describe the content. This makes file management easier and helps us (and you) debug issues when needed.

    File names need to be unique. If they are not, they will write over the previous file.

    Our system processes files automatically, but human-readable names go a long way when something needs a closer look - we are sure you know what we are talking about 😉