file
object returned is a
bigdata_client.models.uploads.File
object, which contains:
id
: The unique identifier of the file.name
: The name of the file. It will be set to the name of the original file in the disk.status
: The status of the file. Checkbigdata_client.file_status.FileStatus
for the list of possible statuses.uploaded_at
: The datetime when the file was uploaded, according to the server.raw_size
: The size of the file in bytes.
path
, the upload_from_disk()
method also accepts the
following optional parameters:
provider_document_id
: Allows you to assign a specific ID to your document which will be available asprovider_document_id
in the metadata node of theannotated.json
. It is useful in case you want to co-relate your own ids with the ones provided by Bigdata.provider_date_utc
: Allows you to assign a specific timestamp (a string withYYYY-MM-DD hh:mm:ss
format or a datetime) to your document. This will modify the document published date, allowing us to better assign a reporting date to detected events.primary_entity
: You can specify a “Primary Entity” to boost entity recognition in your document. When a primary entity is set for a document, it increases the chances to detect events even when the entity is not explicitly mentioned. Setting a primary entity is optional and you can use either a name or the corresponding rp_entity_id.skip_metadata
: If True, it will upload the file but not retrieve its metadata. Recommended for bulk uploads. It is False by default.
By default, these methods will wait
2400
seconds (40 minutes) until the file is processed. If you want to customize the time you are willing to wait, you can pass a timeout
parameter to the method.After the timeout (In seconds) is reached, the method will raise a TimeoutError
exception:The platform must format the original file with extra metadata before processing it, and the maximum file size after that normalization is 10MB.
Tag uploaded files
You can modify file tags using theadd_tags()
, remove_tags()
, and
set_tags()
methods of the File
class objects. The file object may
come from the list()
, get()
, or upload_from_disk()
methods.
Add Tag
To add a tag to a file, use theadd_tags()
method. You can add a
single tag or a list of tags.
Remove Tag
To remove a tag from a file, use theremove_tags()
method. You can
remove a single tag or a list of tags.
Set Tags
To replace all tags with new ones, use theset_tags()
method. This
operation is permanent and replaces all existing tags.
List my tags
You can find all of the tags used across your own files and list them withlist_my_tags()
method.
List tags shared with me
Files shared with you can also have their own tags. In order to find all these tags and list them uselist_tags_shared_with_me()
method.
Working with your files
To list all the files that have been uploaded to the server, you can use thelist()
method:
- The original file, by calling the
download_original()
method of the file object. - The annotated version of the file, by calling the
download_annotated()
method of the file object. This is a JSON file containing the text together with the detections made by the system. - The analytics version of the file, by calling the
download_analytics()
method of the file object. This is a JSON file containing the analytics created by the system.
get_<file_type>_dict()
method:
Sharing Private Content
You can share your private content with other members of your organization. This allows your colleagues to find the documents you share in their search results. To share a document, use theshare_with_company
method. For example:
company_shared_permission
attribute of the search
object will be set to SharePermission.READ
.
unshare_with_company
method:
list_shared()
method:
Deleting uploaded files
To delete a file, you can use thedelete()
method of the file object,
where the object may be coming from the list()
method, from the
get()
method, or from the upload_from_disk()
method:
Note that deleting a file is a permanent operation and cannot be undone.
delete()
method of the Uploads
object. This will avoid the need to
get the file object first:
Only files that are in the
COMPLETED
or FAILED
can be deleted.
Attempting to delete a file that is still being processed will raise an
exception. To avoid this, you can use the wait_for_completion()
method: