You can upload your own files such as PDFs, TXT documents, and other textual formats to Bigdata.com to analyze them and extract valuable insights. Once uploaded, your files are automatically indexed, making their content searchable and accessible through both the search and Chat endpoints. The script below makes it easy to upload multiple files from a local directory using parallel threads, ideal for batch uploads.

batch_file_upload.py

If your browser displays the python text instead of downloading it. You can press ctrl+s after the file opens.

Script parameters

workdir: Absolute path to the work directory. For instance /home/user/workdir_batch_01
upload_txt_filename: Text file containing the absolute path of the files to upload, this file must be inside the above work directory. For instance: file_list.txt

/home/user/files_to_upload/file_01.txt
/home/user/files_to_upload/file_02.pdf
/home/user/files_to_upload/file_03.csv

max_concurrency: The number of concurrent threads to upload files

How to run the script

Follow Prerequisites instructions to set up the require environment
Add all the files that you want to upload in a directory, for instance in /home/user/files_to_upload
Create the work directory, for instance /home/user/workdir_batch_01
In the work directory, create a txt file containing the absolute path of all files to upload, for instance file_list.txt

/home/user/files_to_upload/file_01.txt
/home/user/files_to_upload/file_02.pdf
/home/user/files_to_upload/file_03.csv

Finally you can run the script

python3 batch_file_upload.py                                 \
    workdir=/home/user/workdir_batch_01                      \
    upload_txt_filename=file_list.txt                        \
    max_concurrency=5

The script will generate two files:

Logging file: Contain details about the upload process. For instance: bigdata_processing_20241026_002610.log
CSV file with IDs: Enumerate the IDs of the uploaded files so you can manage (Delete, download, etc) them in the future. The CSV file contains the following values:
- file_id: File identifier that we can use in future requests to download or delete the uploaded files
- upload_status: Status of the upload. It can be UPLOAD_DONE or UPLOAD_ERROR
- original_absolute_file_path: The absolute path of the uploaded files

Example of the file uploaded_file_ids_20241026_002611.csv

4C303FEB0B384EEB882FAF927D4F1961,UPLOAD_DONE,/home/user/files_to_upload/file_01.txt
3BDBA5EBA34A4A65817954E3559476BB,UPLOAD_DONE,/home/user/files_to_upload/file_02.pdf
F6FCC64ABAD64D52AC8A6864AE5F7C40,UPLOAD_DONE,/home/user/files_to_upload/file_03.csv

How to guides

Research Service

Search Service

Upload proprietary content

Knowledge Graph

Batch files upload

Script parameters

How to run the script

How to guides

Research Service

Search Service

Upload proprietary content

Knowledge Graph

​Script parameters

​How to run the script

Script parameters

How to run the script