> ## Documentation Index
> Fetch the complete documentation index at: https://docs.bigdata.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Sunsetting Legacy Upload your own content

<Tip>
  **SDK upload methods are being sunset** and will stop working in a future release. Use the **Content API** (REST) instead for direct upload, tagging, and managing your files. See the [Upload your own content](/getting-started/upload_your_own_content) guide to get started.
</Tip>

Bigdata lets you query external data, and also upload your own content for search and analysis.

<Check>
  Private & Secure: No LLM training on your data
</Check>

The following method allows you to upload files from disk using the **Python SDK**:

```python theme={null}
from bigdata_client import Bigdata

bigdata = Bigdata()
file = bigdata.uploads.upload_from_disk('path/to/file')
```

The `file` object returned is a
`bigdata_client.models.uploads.File`
object, which contains:

* `id`: The unique identifier of the file.
* `name`: The name of the file. It will be set to the name of the
  original file in the disk.
* `status`: The status of the file. Check
  `bigdata_client.file_status.FileStatus` for the list of possible statuses.
* `uploaded_at`: The datetime when the file was uploaded, according to
  the server.
* `raw_size`: The size of the file in bytes.

Besides the `path`, the `upload_from_disk()` method also accepts the
following optional parameters:

* `provider_document_id`: Allows you to assign a specific ID to your
  document which will be available as `provider_document_id` in the
  metadata node of the `annotated.json`. It is useful in case you want
  to co-relate your own ids with the ones provided by Bigdata.
* `provider_date_utc`: Allows you to assign a specific timestamp (a
  string with `YYYY-MM-DD hh:mm:ss` format or a datetime) to your
  document. This will modify the document published date, allowing us to
  better assign a reporting date to detected events.
* `primary_entity`: You can specify a "Primary Entity" to boost entity
  recognition in your document. When a primary entity is set for a
  document, it increases the chances to detect events even when the
  entity is not explicitly mentioned. Setting a primary entity is
  optional and you can use either a name or the corresponding
  rp\_entity\_id.
* `skip_metadata`: If True, it will upload the file but not retrieve its
  metadata. Recommended for bulk uploads. It is False by default.

```python theme={null}
file = bigdata.uploads.upload_from_disk(
  'path/to/file',
  provider_document_id='my_document_id',
  provider_date_utc='2022-01-01 12:00:00',
  primary_entity='Apple Inc.',
  skip_metadata=True
)
```

Once a file is uploaded, the service will analyze it and index it to the Vector Database, so it becomes available for [tagging](./upload_content#add-tag), [sharing](./upload_content#sharing-private-content), and querying with both the Chat and Search services.

Please use the following method to wait for the file to be fully processed:

```python theme={null}
file.wait_for_completion()
```

If your company disabled the indexing step, because you are only interested in the analytics, you can use the following method to wait for the file to be fully analyzed:

```python theme={null}
file.wait_for_analysis_complete()
```

<Note>
  By default, these methods will wait `2400` seconds (40 minutes) until the file is processed. If you want to customize the time you are willing to wait, you can pass a `timeout` parameter to the method.

  After the timeout (In seconds) is reached, the method will raise a `TimeoutError` exception:

  ```python theme={null}
  file.wait_for_completion(timeout=300)
  ```
</Note>

<Note>
  The platform must format the original file with extra metadata before processing it, and the maximum file size after that normalization is 10MB.
</Note>

# Tag uploaded files

You can modify file tags using the `add_tags()`, `remove_tags()`, and
`set_tags()` methods of the `File` class objects. The file object may
come from the `list()`, `get()`, or `upload_from_disk()` methods.

## Add Tag

To add a tag to a file, use the `add_tags()` method. You can add a
single tag or a list of tags.

```python theme={null}
file = bigdata.uploads.get("4DC8AF5500AD4EB0A360D0C7BD6F9286")
print(file.tags)
>>> []

file.add_tags(["New Tag"])
print(file.tags)
>>> ["New Tag"]

file.add_tags(["New Tag 2", "New Tag 3"])
print(file.tags)
>>> ["New Tag", "New Tag 2", "New Tag 3"]
```

## Remove Tag

To remove a tag from a file, use the `remove_tags()` method. You can
remove a single tag or a list of tags.

```python theme={null}
file.remove_tags(["New Tag"])
print(file.tags)
>>> ["New Tag 2", "New Tag 3"]

# To remove all tags from a file
file.remove_tags(file.tags)
print(file.tags)
>>> []
```

## Set Tags

To replace all tags with new ones, use the `set_tags()` method. This
operation is permanent and replaces all existing tags.

```python theme={null}
file.set_tags(["Final Tag"])
print(file.tags)
>>> ["Final Tag"]

file.set_tags(["New Final Tag 1", "New Final Tag 2"])
print(file.tags)
>>> ["New Final Tag 1", "New Final Tag 2"]
```

## List my tags

You can find all of the tags used across your own files and list them
with `list_my_tags()` method.

```python theme={null}
bigdata.uploads.list_my_tags()
>>> ["New Final Tag 1", "New Final Tag 2"]
```

## List tags shared with me

Files shared with you can also have their own tags. In order to find all
these tags and list them use `list_tags_shared_with_me()` method.

```python theme={null}
bigdata.uploads.list_tags_shared_with_me()
>>> ["Tag set by another user", "Another tag set by another user"]
```

# Working with your files

To list all the files that have been uploaded to the server, you can use
the `list()` method:

```python theme={null}
files = bigdata.uploads.list()
for file in files:
   print(file)
```

In case you have many files, you must iterate over the results:

```python theme={null}
for n in itertools.count(start=1):
    files = bigdata_cli.uploads.list(page_number=n)
    do_stuff_with_files(files)
    if not files:
        break
```

Where the output contains the ID, file size, upload date, and name of
the file:

```console theme={null}
C48410DA1AEE439ABAA0619F272B67F4  123 Jan  1 2021 My First Document.pdf,
BE61DA39E0F540A599E958BBEB9BA3D5   1K Feb 10 2023 Document_2.txt,
687A8B473E654416A0C19CD79EE77413 120K Jul 31 2024 Document-3.docx,
F1345B07DDE145CAB30C08CC01B393D6 1.2M Dec 31 2024 Another file.docx,
3A56AC4B2BCB42FEA7B0AF062FE78534 1.1G Apr 10 2024 The last file.pdf,
```

Additionally, you can get a file by its ID:

```python theme={null}
file = bigdata.uploads.get("<document_id>")
print(file)

# C48410DA1AEE439ABAA0619F272B67F4  123 Jan  1 2021 My First Document.pdf
```

Once your files are processed, you can download 3 different versions of
the file:

* The original file, by calling the `download_original()` method of the
  file object.
* The annotated version of the file, by calling the
  `download_annotated()` method of the file object. This is a JSON file
  containing the text together with the detections made by the system.
* The analytics version of the file, by calling the
  `download_analytics()` method of the file object. This is a JSON file
  containing the analytics created by the system.

```python theme={null}
file.download_original('path/to/save/original_file')
file.download_annotated('path/to/save/annotated_file.json')
file.download_analytics('path/to/save/analytics_file.json')
```

Additionally, you can get the annotations directly as a python
dictionary by calling the `get_<file_type>_dict()` method:

```python theme={null}
annotations = file.get_annotated_dict()
print(annotations)

analytics = file.get_analytics_dict()
print(analytics)
```

# Sharing Private Content

You can share your private content with other members of your
organization. This allows your colleagues to find the documents you
share in their search results. To share a document, use the
`share_with_company` method. For example:

```python theme={null}
file = bigdata.uploads.get("<document_id>")
file.share_with_company()  # Option 1. Operating on the object in memory
```

After sharing, the `company_shared_permission` attribute of the search
object will be set to `SharePermission.READ`.

```python theme={null}
bigdata.uploads.share_with_company("<document_id>")  # Option 2. Operating on the file ID
```

To **unshare** a file, use the `unshare_with_company` method:

```python theme={null}
file = bigdata.uploads.get("<document_id>")
file.unshare_with_company()  # Option 1. Operating on the object in memory
```

```python theme={null}
bigdata.uploads.unshare_with_company("<document_id>")  # Option 2. Operating on the file ID
```

To list all the files that have been shared with you, please refer to
`list_shared()` method:

```python theme={null}
files = bigdata.uploads.list_shared()
for file in files:
   print(file)
```

In case you have many files, you must iterate over the results:

```python theme={null}
for n in itertools.count(start=1):
    files = bigdata_cli.uploads.list_shared(page_number=n)
    print(files)
    if not files:
        break
```

The same operations to download the original version of the file, the
annotated structure, and its analytics are also available for the shared
files.

# Deleting uploaded files

To delete a file, you can use the `delete()` method of the file object,
where the object may be coming from the `list()` method, from the
`get()` method, or from the `upload_from_disk()` method:

```python theme={null}
files = []
for n in itertools.count(start=1):
 files_in_page = bigdata_cli.uploads.list_shared(page_number=n)
 files.extend(files_in_page)
 if not files_in_page:
     break

for i, file in enumerate(files):
   print(f"{i} {file}")

print(f"Enter the file row number to delete: [0 - {len(files)-1}]")
file_id = int(input())
if file_id > 0:
   file = files[file_id]
   file.delete()

# The file is now deleted and bigdata.uploads.get() will raise an exception since the file does not exist anymore
```

<Warning>
  Note that deleting a file is a permanent operation and cannot be undone.
</Warning>

Another way to delete a file, if we know the ID, is to use the
`delete()` method of the `Uploads` object. This will avoid the need to
get the file object first:

```python theme={null}
bigdata.uploads.delete("<document_id>")
```

<Warning>
  Only files that are in the `COMPLETED` or `FAILED` can be deleted.
  Attempting to delete a file that is still being processed will raise an
  exception. To avoid this, you can use the `wait_for_completion()`
  method:

  ```python theme={null}
  file = bigdata.uploads.upload_from_disk('path/to/file')

  # Wait for the file to be processed
  file.wait_for_completion()
  file.delete()
  ```
</Warning>
