Looping through search.run(n) will yield objects of type Document.

Document main attributes

The Document object has the following attributes:
  • id: The unique identifier of the document, a 32-character hexadecimal string.
  • headline: The title of the document.
  • url: The URL of the original document.
  • sentiment (Deprecated): The sentiment of the first chunks retrieved from a given document.
  • source: The source of the document, an instance of Source, which contains the key and name of the source, and its Rank.
  • timestamp: The timestamp of the document, a datetime object.
  • chunks: The content of the document, as a list of DocumentChunk.
  • language: The language of the document, an instance of Language.
As for “chunks”, each DocumentChunk object has the following attributes:
  • text: The text of the chunk.
  • chunk: The index of the chunk in the document, zero-based. You can use it to sort them in order.
  • entities: A list of entities detected in the chunk, each entity is an instance of DocumentSentenceEntity and contains the entity key as well as the position in the text where it’s detected. Note that these entity objects can’t be used to create queries for searches.
  • sentences: A list of sentences detected in the chunk, each sentence is an instance of DocumentSentence and contains only the number of paragraph and sentence.
  • relevance: A float betwen 0 and 1. Relevance indicates the degree to which a particular match aligns with the various terms in your query. It’s important to note that the relevance scores you receive are only meaningful within the context of the same search. Each execution generates unique scores, so comparing relevance across different queries is not meaningful.
  • sentiment: A score between -1.0 and +1.0 that represents the chunk-level sentiment.

Document dictionary

We can print the document in dictionary format with the following code:
document = documents[0]
print(str(document.__dict__))
Output:
{
    'id': '0B4623C81D073FAE45DE6D6F23874878', 
    'headline': 'The new industry partnerships fuelling innovation',
    'sentiment': 0.54,  // (Deprecated)
    'document_scope': <DocumentScope.NEWS: 'news'>, 
    'source': DocumentSource(
      key='D6F0EE', 
      name='Drug Discovery World', 
      rank=3), 
    'timestamp': datetime.datetime(
        2025, 4, 8, 10, 10, 14,
        tzinfo=datetime.timezone.utc), 
    'chunks': [
        DocumentChunk(
            text="This agreement represents the sixth collaboration signed under 
            "the broader strategic partnership between Flagship Pioneering and  "
            "Pfizer to create a new pipeline of innovative assets, following  "
            "programmes announced with Flagship companies ProFound Therapeutics, "
            "Quotient Therapeutics, Montai Therapeutics, and Ampersand ."
            " Biomedicines \nIntegrated DNA Technologies and Elegen   \nDNA "
            "Technologies (IDT) and Elegen are teaming up to offer IDT customers"
            " early access to Elegen's ENFINIA Plasmid DNA, a long and "
            "high-complexity clonal gene synthesis service.", 
            chunk=13,  // The index of the chunk in the document, zero-based.
            entities=[
                DocumentSentenceEntity(
                  key='E3E549', 
                  start=36, end=49, 
                  query_type=<QueryType.ENTITY: 'entity'>), 
                DocumentSentenceEntity(
                  key='913660', 
                  start=439, end=448, 
                  query_type=<QueryType.ENTITY: 'entity'>), 
                ...
                DocumentSentenceEntity(
                  key='71QN2J', 
                  start=309, end=331, 
                  query_type=<QueryType.ENTITY: 'entity'>), 
                DocumentSentenceEntity(
                  key='business,partnerships,partnership,,', 
                  start=75, end=135, 
                  query_type=<QueryType.TOPIC: 'rp_topic'>), 
                ...
        ], 
         sentences=[
            DocumentSentence(paragraph=15, sentence=1), 
            DocumentSentence(paragraph=15, sentence=2)
            ], 
         relevance=0.5533394852057426, 
         sentiment=0.54,
         section_metadata=None, 
         speaker=None)
     ], 
    'language': 'English', 
    'cluster': None,
    'reporting_period': None, 
    'document_type': None, 
    'reporting_entities': None, 
    'url': 'https://www.ddw-online.com/the-new-industry-partnerships-fuelling-innovation-34442-202504/
}

Download the entire document

If you are interested in the entire document, you can use the following method:
  • download_annotated_dict()
Example:
import json

doc = documents[0]
# Download document with the document ID in the file name
with open(f'annotated_doc_{doc.id}.json', 'w', encoding='utf-8') as outfile:
    json.dump(doc.download_annotated_dict(), outfile, ensure_ascii=False, indent=2)
It will create the following file in the current directory
  • annotated_doc_0B4623C81D073FAE45DE6D6F23874878.json