Bigdata automatically generates rich analytics for your uploaded content, including inline entity and topic tagging. These enrichments help you better understand the context and structure of your documents. If you’ve already uploaded files, you can download their analytics using the script below, which processes data efficiently using parallel threads. Haven’t uploaded your files yet? Follow the Batch files upload guide to get started.

batch_file_analytics_download.py

If your browser displays the python text instead of downloading it. You can press ctrl+s after the file opens.

Script parameters

workdir: Absolute path to the work directory. For instance /home/user/workdir_batch_01
output_dir: Absolute path of the directory to download all analytic files. For instance: /home/user/workdir_batch_01/analytics_files
uploaded_file_ids_csv_filename: Filename of the previous generated CSV containing IDs of the uploaded files. For instance: uploaded_file_ids_20241026_002611.csv
max_concurrency: The number of concurrent threads to use
max_download_timeout: Timeout in seconds the script will wait for each file in case it is not processed yet.

How to run the script

(If not yet done) Follow Prerequisites instructions to set up the require environment
Ensure that the CSV file uploaded_file_ids_YYYYMMDD_HHMMSS.csv, containing the ID of the previous uploaded files, is in the work directory /home/user/workdir_batch_01
Create a new directory to store all analytic files that we plan to download, for instance /home/user/workdir_batch_01/analytics_files
Finally, you can run the script

python3 batch_file_analytics_download.py                                       \
    workdir=/home/user/workdir_batch_01                                        \
    output_dir=/home/user/workdir_batch_01/analytics_files                     \
    uploaded_file_ids_csv_filename=uploaded_file_ids_20241026_002611.csv       \
    max_concurrency=50                                                         \
    max_download_timeout=100

The script will download and store the analytic files in the output_dir folder. The analytic files will have the following format:

<original_base_filename>_<original_file_extention>_analytics.json. For instance file_01_abc_analytics.json

The script will also generate an output CSV file download_result__%Y%m%d_%H%M%S.csv with the following values:

file_id: File identifier that we can use in future requests to download or delete files
download_status: Status of the download. It can be DOWNLOAD_DONE or DOWNLOAD_ERROR
original_absolute_file_path: The absolute path of the uploaded files

Example of the file download_result_20241026_003611.csv

4C303FEB0B384EEB882FAF927D4F1961,DOWNLOAD_DONE,/home/user/files_to_upload/file_01.txt
3BDBA5EBA34A4A65817954E3559476BB,DOWNLOAD_DONE,/home/user/files_to_upload/file_02.pdf
F6FCC64ABAD64D52AC8A6864AE5F7C40,DOWNLOAD_ERROR,/home/user/files_to_upload/file_03.csv

If the file contains any DOWNLOAD_ERROR you can run the script again, but using the download_result_20241026_003611.csv in the parameter uploaded_file_ids_csv_filename. The script will then try to download all file IDs with the status DOWNLOAD_ERROR

python3 download.py                                                            \
workdir=/home/user/workdir_batch_01                                            \
output_dir=/home/user/workdir_batch_01/analytics_files                         \
uploaded_file_ids_csv_filename=download_result_20241026_003611.csv             \
max_concurrency=50                                                             \
max_download_timeout=100

How to guides

Research Service

Search Service

Upload proprietary content

Knowledge Graph

Download file analytics

Script parameters

How to run the script

How to guides

Research Service

Search Service

Upload proprietary content

Knowledge Graph

​Script parameters

​How to run the script

Script parameters

How to run the script