> ## Documentation Index
> Fetch the complete documentation index at: https://docs.bigdata.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Batch files upload

You can upload your own files such as PDFs, TXT documents, and other textual formats to Bigdata.com; they are then enriched (extraction, structure and annotation of the content) and indexed.

Once uploaded, your files are enriched and indexed automatically, making them available for the Search and Research Agent endpoints.

The script below uploads multiple files to Bigdata using the REST API: it reads a list of file paths, uploads each file (POST → PUT to presigned URL → poll until enrichment completes), and writes results to a CSV.

* [batch\_file\_upload.py](https://github.com/Bigdata-com/bigdata-docs-resources/tree/main/how_to_guides)

<Tip>
  If your browser displays the python text instead of downloading it, press **Ctrl+S** (or **Cmd+S** on Mac) after the file opens.
</Tip>

## Setup

1. **Create a virtual environment (recommended)**

   ```bash theme={null}
   python -m venv .venv
   source .venv/bin/activate   # Linux/macOS
   # or:  .venv\Scripts\activate   # Windows
   ```

2. **Install dependencies**

   ```bash theme={null}
   pip install -r requirements.txt
   ```

3. **Configure environment**

   Copy `.env` to a new file if needed, then edit `.env` and set your Bigdata API key:

   ```bash theme={null}
   cp .env .env.local
   # Edit .env or .env.local and set BIGDATA_API_KEY=your-api-key
   ```

   The script loads variables from `.env` in the script directory. You can also set `BIGDATA_API_KEY` (and optionally `BIGDATA_API_BASE_URL`) in your shell.

   For general environment setup, see [Prerequisites](../how_to_prerequisites).

## Usage

Run the script with these parameters:

| Parameter                 | Description                                                                                                                           |
| ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| **workdir**               | Directory that contains your files and where the log and result CSV will be written.                                                  |
| **upload\_txt\_filename** | Name of a text file inside `workdir` that lists files to upload (one path per line; paths are relative to `workdir` unless absolute). |
| **max\_concurrency**      | Number of files to upload in parallel (e.g. `5`).                                                                                     |

Create a list file (e.g. `file_list.txt`) in `workdir` with one filename or path per line:

```text theme={null}
report.pdf
data/other_doc.PDF
```

### Example: run the script

From the `batch_file_upload` directory, with `BIGDATA_API_KEY` set in `.env`:

```bash theme={null}
cd /path/to/bigdata-api-resources/how_to_guides/batch_file_upload
pip install -r requirements.txt
# Edit .env and set BIGDATA_API_KEY=your-api-key

# Run (paths in the list file are relative to workdir)
python batch_file_upload.py \
  workdir=/home/you/Documents/PDFsamples \
  upload_txt_filename=file_list.txt \
  max_concurrency=5
```

Or set the API key in the shell and run from any directory:

```bash theme={null}
export BIGDATA_API_KEY=your-api-key
python /path/to/batch_file_upload/batch_file_upload.py \
  workdir=/home/you/Documents/PDFsamples \
  upload_txt_filename=file_list.txt \
  max_concurrency=5
```

The script will:

1. Write a log file in `workdir` (e.g. `bigdata_processing_20260312_120000.log`).
2. Write a result CSV in `workdir` (e.g. `uploaded_file_ids_20260312_120000.csv`) with columns: `file_id`, `upload_status`, `file_path`.

Example `uploaded_file_ids_20260312_120000.csv`:

```text theme={null}
file_id,upload_status,file_path
4C303FEB0B384EEB882FAF927D4F1961,UPLOAD_DONE,report.pdf
3BDBA5EBA34A4A65817954E3559476BB,UPLOAD_DONE,data/other_doc.PDF
F6FCC64ABAD64D52AC8A6864AE5F7C40,UPLOAD_ERROR,another.pdf
```

## Environment variables

| Variable                           | Required | Default                   | Description                                                |
| ---------------------------------- | -------- | ------------------------- | ---------------------------------------------------------- |
| `BIGDATA_API_KEY`                  | Yes      | —                         | Your Bigdata API key.                                      |
| `BIGDATA_API_BASE_URL`             | No       | `https://api.bigdata.com` | API base URL.                                              |
| `BIGDATA_RATE_LIMIT_PER_MINUTE`    | No       | `500`                     | Max requests per minute (should match your WAF).           |
| `BIGDATA_RATE_LIMIT_SAFETY_MARGIN` | No       | `20`                      | Margin under the limit (actual cap = limit − margin).      |
| `BIGDATA_POLL_INTERVAL_SEC`        | No       | `10`                      | Seconds between status polls while waiting for completion. |
| `BIGDATA_UPLOAD_MAX_RETRIES`       | No       | `5`                       | Max retries per file on 429/5xx.                           |

Variables are loaded from `.env` in the script folder; you can override them in the shell.
