Bigdata not only allows you to query and analyze pre-existing data, but also to upload your own content to be analyzed and searched. The only method currently supported is to upload a file from disk:
The file
object returned is a
bigdata_client.models.uploads.File
object, which contains:
id
: The unique identifier of the file.name
: The name of the file. It will be set to the name of the
original file in the disk.status
: The status of the file. Check
bigdata_client.file_status.FileStatus
for the list of possible statuses.uploaded_at
: The datetime when the file was uploaded, according to
the server.raw_size
: The size of the file in bytes.Besides the path
, the upload_from_disk()
method also accepts the
following optional parameters:
provider_document_id
: Allows you to assign a specific ID to your
document which will be available as provider_document_id
in the
metadata node of the annotated.json
. It is useful in case you want
to co-relate your own ids with the ones provided by Bigdata.provider_date_utc
: Allows you to assign a specific timestamp (a
string with YYYY-MM-DD hh:mm:ss
format or a datetime) to your
document. This will modify the document published date, allowing us to
better assign a reporting date to detected events.primary_entity
: You can specify a “Primary Entity” to boost entity
recognition in your document. When a primary entity is set for a
document, it increases the chances to detect events even when the
entity is not explicitly mentioned. Setting a primary entity is
optional and you can use either a name or the corresponding
rp_entity_id.skip_metadata
: If True, it will upload the file but not retrieve its
metadata. Recommended for bulk uploads. It is False by default.Once a file is uploaded, the service will analyze it and index it to the Vector Database, so it becomes available for tagging, sharing, and querying with both the Chat and Search services.
Please use the following method to wait for the file to be fully processed:
If your company disabled the indexing step, because you are only interested in the analytics, you can use the following method to wait for the file to be fully analyzed:
By default, these methods will wait 2400
seconds (40 minutes) until the file is processed. If you want to customize the time you are willing to wait, you can pass a timeout
parameter to the method.
After the timeout (In seconds) is reached, the method will raise a TimeoutError
exception:
The platform must format the original file with extra metadata before processing it, and the maximum file size after that normalization is 10MB.
You can modify file tags using the add_tags()
, remove_tags()
, and
set_tags()
methods of the File
class objects. The file object may
come from the list()
, get()
, or upload_from_disk()
methods.
To add a tag to a file, use the add_tags()
method. You can add a
single tag or a list of tags.
To remove a tag from a file, use the remove_tags()
method. You can
remove a single tag or a list of tags.
To replace all tags with new ones, use the set_tags()
method. This
operation is permanent and replaces all existing tags.
You can find all of the tags used across your own files and list them
with list_my_tags()
method.
Files shared with you can also have their own tags. In order to find all
these tags and list them use list_tags_shared_with_me()
method.
To list all the files that have been uploaded to the server, you can use
the list()
method:
In case you have many files, you must iterate over the results:
Where the output contains the ID, file size, upload date, and name of the file:
Additionally, you can get a file by its ID:
Once your files are processed, you can download 3 different versions of the file:
download_original()
method of the
file object.download_annotated()
method of the file object. This is a JSON file
containing the text together with the detections made by the system.download_analytics()
method of the file object. This is a JSON file
containing the analytics created by the system.Additionally, you can get the annotations directly as a python
dictionary by calling the get_<file_type>_dict()
method:
You can share your private content with other members of your
organization. This allows your colleagues to find the documents you
share in their search results. To share a document, use the
share_with_company
method. For example:
After sharing, the company_shared_permission
attribute of the search
object will be set to SharePermission.READ
.
To unshare a file, use the unshare_with_company
method:
To list all the files that have been shared with you, please refer to
list_shared()
method:
In case you have many files, you must iterate over the results:
The same operations to download the original version of the file, the annotated structure, and its analytics are also available for the shared files.
To delete a file, you can use the delete()
method of the file object,
where the object may be coming from the list()
method, from the
get()
method, or from the upload_from_disk()
method:
Note that deleting a file is a permanent operation and cannot be undone.
Another way to delete a file, if we know the ID, is to use the
delete()
method of the Uploads
object. This will avoid the need to
get the file object first:
Only files that are in the COMPLETED
or FAILED
can be deleted.
Attempting to delete a file that is still being processed will raise an
exception. To avoid this, you can use the wait_for_completion()
method:
Bigdata not only allows you to query and analyze pre-existing data, but also to upload your own content to be analyzed and searched. The only method currently supported is to upload a file from disk:
The file
object returned is a
bigdata_client.models.uploads.File
object, which contains:
id
: The unique identifier of the file.name
: The name of the file. It will be set to the name of the
original file in the disk.status
: The status of the file. Check
bigdata_client.file_status.FileStatus
for the list of possible statuses.uploaded_at
: The datetime when the file was uploaded, according to
the server.raw_size
: The size of the file in bytes.Besides the path
, the upload_from_disk()
method also accepts the
following optional parameters:
provider_document_id
: Allows you to assign a specific ID to your
document which will be available as provider_document_id
in the
metadata node of the annotated.json
. It is useful in case you want
to co-relate your own ids with the ones provided by Bigdata.provider_date_utc
: Allows you to assign a specific timestamp (a
string with YYYY-MM-DD hh:mm:ss
format or a datetime) to your
document. This will modify the document published date, allowing us to
better assign a reporting date to detected events.primary_entity
: You can specify a “Primary Entity” to boost entity
recognition in your document. When a primary entity is set for a
document, it increases the chances to detect events even when the
entity is not explicitly mentioned. Setting a primary entity is
optional and you can use either a name or the corresponding
rp_entity_id.skip_metadata
: If True, it will upload the file but not retrieve its
metadata. Recommended for bulk uploads. It is False by default.Once a file is uploaded, the service will analyze it and index it to the Vector Database, so it becomes available for tagging, sharing, and querying with both the Chat and Search services.
Please use the following method to wait for the file to be fully processed:
If your company disabled the indexing step, because you are only interested in the analytics, you can use the following method to wait for the file to be fully analyzed:
By default, these methods will wait 2400
seconds (40 minutes) until the file is processed. If you want to customize the time you are willing to wait, you can pass a timeout
parameter to the method.
After the timeout (In seconds) is reached, the method will raise a TimeoutError
exception:
The platform must format the original file with extra metadata before processing it, and the maximum file size after that normalization is 10MB.
You can modify file tags using the add_tags()
, remove_tags()
, and
set_tags()
methods of the File
class objects. The file object may
come from the list()
, get()
, or upload_from_disk()
methods.
To add a tag to a file, use the add_tags()
method. You can add a
single tag or a list of tags.
To remove a tag from a file, use the remove_tags()
method. You can
remove a single tag or a list of tags.
To replace all tags with new ones, use the set_tags()
method. This
operation is permanent and replaces all existing tags.
You can find all of the tags used across your own files and list them
with list_my_tags()
method.
Files shared with you can also have their own tags. In order to find all
these tags and list them use list_tags_shared_with_me()
method.
To list all the files that have been uploaded to the server, you can use
the list()
method:
In case you have many files, you must iterate over the results:
Where the output contains the ID, file size, upload date, and name of the file:
Additionally, you can get a file by its ID:
Once your files are processed, you can download 3 different versions of the file:
download_original()
method of the
file object.download_annotated()
method of the file object. This is a JSON file
containing the text together with the detections made by the system.download_analytics()
method of the file object. This is a JSON file
containing the analytics created by the system.Additionally, you can get the annotations directly as a python
dictionary by calling the get_<file_type>_dict()
method:
You can share your private content with other members of your
organization. This allows your colleagues to find the documents you
share in their search results. To share a document, use the
share_with_company
method. For example:
After sharing, the company_shared_permission
attribute of the search
object will be set to SharePermission.READ
.
To unshare a file, use the unshare_with_company
method:
To list all the files that have been shared with you, please refer to
list_shared()
method:
In case you have many files, you must iterate over the results:
The same operations to download the original version of the file, the annotated structure, and its analytics are also available for the shared files.
To delete a file, you can use the delete()
method of the file object,
where the object may be coming from the list()
method, from the
get()
method, or from the upload_from_disk()
method:
Note that deleting a file is a permanent operation and cannot be undone.
Another way to delete a file, if we know the ID, is to use the
delete()
method of the Uploads
object. This will avoid the need to
get the file object first:
Only files that are in the COMPLETED
or FAILED
can be deleted.
Attempting to delete a file that is still being processed will raise an
exception. To avoid this, you can use the wait_for_completion()
method: