It will only take you 5 minutes and will guide you through:
✅ Install bigdata-client
package
✅ Authenticate to bigdata.com
✅ Create two sample files to upload
✅ Upload private files
✅ Query bigdata.com
Ready to get started? Let’s dive in!
Install bigdata-client
package
Follow Prerequisites instructions
to set up the require environment.
Authenticate to bigdata.com
Because you have already set your credentials in the environment
following the Prerequisites step,
Bigdata constructor will read them.
from bigdata_client import Bigdata
bigdata = Bigdata()
Create two samples files to upload
Create the following two sample files in your local directory.
File name data_science_research-2020-06.txt
:
RavenPack Data Science researchers recommend the following stocks in June 2020
Microsoft (NASDAQ: MSFT): Microsoft has been heavily investing in AI and cloud computing, which are key growth areas for the company.
Datadog (NASDAQ: DDOG): Datadog is a leading provider of monitoring and analytics solutions for cloud-based applications.
Oracle (NYSE: ORCL): Oracle is a major player in the enterprise software and cloud computing market.
File name soup_recipes-2020-06.txt
:
We recommend making chicken noodle soup with homemade chicken stock
Upload private files
We will upload two files and use the parameter provider_data_utc
to
inform bigdata about their creation date. This will modify the document
published date, allowing us to better assign a reporting date to
detected events.
file = bigdata.uploads.upload_from_disk("./data_science_research-2020-06.txt",
provider_document_id='my_document_id',
provider_date_utc='2020-06-10 12:00:00',
primary_entity='RavenPack',
skip_metadata=True)
# Check the file's processing status
file.reload_status()
print(f"File processing status: {file.status}")
# Wait for completion
file.wait_for_completion(timeout=60)
print(f"File processing status: {file.status}")
Output:
File processing status: PENDING
File processing status: COMPLETED
The first file was successfully analysed and indexed into the
bigdata.com vector database.
As you might have many type of private documents, you could also assign
tags to each type and use them during the search
file.add_tags(["Data Science Research"])
print(f"File tags: {file.tags}")
Output:
File tags: ['Data Science Research']
Let’s upload the second file
file = bigdata.uploads.upload_from_disk("./soup_recipes-2020-06.txt",
provider_document_id='my_document_id',
provider_date_utc='2020-06-10 12:00:00',
primary_entity='RavenPack',
skip_metadata=True)
# Check the file's processing status
file.reload_status()
print(f"File processing status: {file.status}")
# Wait for completion
file.wait_for_completion(timeout=60)
print(f"File processing status: {file.status}")
Output:
File processing status: PENDING
File processing status: COMPLETED
and tag as Cooking recipes
file.add_tags(["Cooking recipes"])
print(f"File tags: {file.tags}")
Output:
File tags: ['Cooking recipes']
Query bigdata.com
Let’s do a Similarity
search with the text recommend stock
in the
month of June 2020:
from bigdata_client.query import Similarity
from bigdata_client.daterange import AbsoluteDateRange
from bigdata_client.models.search import DocumentType
# Similarity search
query = Similarity("recommend stock")
# Full month of June 2020
in_june_2020 = AbsoluteDateRange("2020-06-01T08:00:00", "2020-06-30T00:00:00")
# Create a bigdata search
search = bigdata.search.new(query, date_range=in_june_2020, scope=DocumentType.ALL)
# Retrieve content of four documents
documents = search.run(4)
for doc in documents:
print(f"\nDocument headline: {doc.headline}")
Output:
Document headline: Nifty outlook and stock recommendations by CapitalVia: Buy RBL Bank, ONGC
Document headline: Forget the Naysayers: 3 Top Retail Stocks You Should Own
Document headline: Here's How to Invest Like Warren Buffett
Document headline: 2 Tech Stocks to Buy Right Now
The private files got indexed but there are many other files, let’s
narrow the date range to only 2 seconds around the publication timestamp
of our private files:
# Narrow down the date range to 2 seconds
two_secs_in_june_2020 = AbsoluteDateRange("2020-06-10T11:59:59", "2020-06-10T12:00:01")
# Create a bigdata search
search = bigdata.search.new(query, date_range=two_secs_in_june_2020, scope=DocumentType.ALL)
# Retrieve content of four documents
documents = search.run(4)
for doc in documents:
print(f"\nDocument headline: {doc.headline}")
Output:
Document headline: soup_recipes-2020-06.txt
Document headline: Deutsche Post AG: Investor Meeting
Document headline: Ford Motor Co.: Deutsche Bank Global Auto Industry Conference
Document headline: data_science_research-2020-06.txt
🎉We see them both!
If we only want to get insights from our private files, then we can set
the scope
to DocumentType.FILES
.
# Create a bigdata search with scope "FILES"
search = bigdata.search.new(query, date_range=in_june_2020, scope=DocumentType.FILES)
# Retrieve content of four documents
documents = search.run(4)
# Read all retrieved documents and print some details
for doc in documents:
print(f"\nDocument headline: {doc.headline}")
for chunk in doc.chunks:
print(f" Chunk text: {chunk.text}")
Output:
Document headline: soup_recipes-2020-06.txt
Chunk text: We recommend making chicken noodle soup with homemade chicken stock
Document headline: data_science_research-2020-06.txt
Chunk text: (Sample file for testing purpose) RavenPack Data Science researches recommend the following stocks in June 2020 Microsoft (NASDAQ: MSFT): Microsoft has been heavily investing in AI and cloud computing, which are key growth areas for the company.
We can even use tags to focus on specfic type of private files, for
instance Data Science Research
from bigdata_client.query import Similarity, FileTag
# Similarity search
query = Similarity("recommend stock") & FileTag("Data Science Research")
# Create a bigdata search
search = bigdata.search.new(query, date_range=in_june_2020, scope=DocumentType.FILES)
# Retrieve content of four documents
documents = search.run(4)
# Read all retrieved documents and print some details
for doc in documents:
print(f"\nDocument headline: {doc.headline}")
for chunk in doc.chunks:
print(f" Chunk text: {chunk.text}")
Output:
Document headline: data_science_research-2020-06.txt
Chunk text: (Sample file for testing purpose) RavenPack Data Science researches recommend the following stocks in June 2020 Microsoft (NASDAQ: MSFT): Microsoft has been heavily investing in AI and cloud computing, which are key growth areas for the company.
Summary
Congratulations! 🎉 You have successfully uploaded private files and
retrieve insights about them amongst millions of other documents.
The following pages are related to private file uploading, managing tags
and search using the tag query filter:
- Upload your own content: It describes all
supported parameters and methods to manage private files.
- Batch file upload: It contains a
script to help your organization quickly upload all private files.
- FileTag: It describes the
FileTag
query filter.
- Query operators: It describes the
supported query operators:
&
, |
, ~
, All
and Any
.