Download file analytics
Bigdata automatically generates rich analytics for your uploaded content, including inline entity and topic tagging. These enrichments help you better understand the context and structure of your documents. If you’ve already uploaded files, you can download their analytics using the script below, which processes data efficiently using parallel threads.
Haven’t uploaded your files yet? Follow the Batch files upload guide to get started.
If your browser displays the python text instead of downloading it. You can press ctrl+s after the file opens.
Script parameters
workdir
: Absolute path to the work directory. For instance/home/user/workdir_batch_01
output_dir
: Absolute path of the directory to download all analytic files. For instance:/home/user/workdir_batch_01/analytics_files
uploaded_file_ids_csv_filename
: Filename of the previous generated CSV containing IDs of the uploaded files. For instance:uploaded_file_ids_20241026_002611.csv
max_concurrency
: The number of concurrent threads to usemax_download_timeout
: Timeout in seconds the script will wait for each file in case it is not processed yet.
How to run the script
- (If not yet done) Follow Prerequisites instructions to set up the require environment
- Ensure that the CSV file
uploaded_file_ids_YYYYMMDD_HHMMSS.csv
, containing the ID of the previous uploaded files, is in the work directory/home/user/workdir_batch_01
- Create a new directory to store all analytic files that we plan to
download, for instance
/home/user/workdir_batch_01/analytics_files
- Finally, you can run the script
The script will download and store the analytic files in the
output_dir
folder. The analytic files will have the following format:
<original_base_filename>_<original_file_extention>_analytics.json
. For instancefile_01_abc_analytics.json
The script will also generate an output CSV file
download_result__%Y%m%d_%H%M%S.csv
with the following values:
file_id
: File identifier that we can use in future requests to download or delete filesdownload_status
: Status of the download. It can beDOWNLOAD_DONE
orDOWNLOAD_ERROR
original_absolute_file_path
: The absolute path of the uploaded files
Example of the file download_result_20241026_003611.csv
If the file contains any DOWNLOAD_ERROR
you can run the script again,
but using the download_result_20241026_003611.csv
in the parameter
uploaded_file_ids_csv_filename
. The script will then try to download
all file IDs with the status DOWNLOAD_ERROR