You can upload your own files such as PDFs, TXT documents, and other textual formats to Bigdata.com to analyze them and extract valuable insights.

Once uploaded, your files are automatically indexed, making their content searchable and accessible through both the search and Chat endpoints.

The script below makes it easy to upload multiple files from a local directory using parallel threads, ideal for batch uploads.

If your browser displays the python text instead of downloading it. You can press ctrl+s after the file opens.

Script parameters

  • workdir: Absolute path to the work directory. For instance /home/user/workdir_batch_01
  • upload_txt_filename: Text file containing the absolute path of the files to upload, this file must be inside the above work directory. For instance: file_list.txt
/home/user/files_to_upload/file_01.txt
/home/user/files_to_upload/file_02.pdf
/home/user/files_to_upload/file_03.csv
  • max_concurrency: The number of concurrent threads to upload files

How to run the script

  1. Follow Prerequisites instructions to set up the require environment
  2. Add all the files that you want to upload in a directory, for instance in /home/user/files_to_upload
  3. Create the work directory, for instance /home/user/workdir_batch_01
  4. In the work directory, create a txt file containing the absolute path of all files to upload, for instance file_list.txt
/home/user/files_to_upload/file_01.txt
/home/user/files_to_upload/file_02.pdf
/home/user/files_to_upload/file_03.csv
  1. Finally you can run the script
python3 batch_file_upload.py                                 \
    workdir=/home/user/workdir_batch_01                      \
    upload_txt_filename=file_list.txt                        \
    max_concurrency=5

The script will generate two files:

  • Logging file: Contain details about the upload process. For instance: bigdata_processing_20241026_002610.log

  • CSV file with IDs: Enumerate the IDs of the uploaded files so you can manage (Delete, download, etc) them in the future. The CSV file contains the following values:

    • file_id: File identifier that we can use in future requests to download or delete the uploaded files
    • upload_status: Status of the upload. It can be UPLOAD_DONE or UPLOAD_ERROR
    • original_absolute_file_path: The absolute path of the uploaded files

Example of the file uploaded_file_ids_20241026_002611.csv

4C303FEB0B384EEB882FAF927D4F1961,UPLOAD_DONE,/home/user/files_to_upload/file_01.txt
3BDBA5EBA34A4A65817954E3559476BB,UPLOAD_DONE,/home/user/files_to_upload/file_02.pdf
F6FCC64ABAD64D52AC8A6864AE5F7C40,UPLOAD_DONE,/home/user/files_to_upload/file_03.csv