> ## Documentation Index
> Fetch the complete documentation index at: https://docs.bigdata.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Similarity Search Demystified

*Finding* **Relevant** *Needles in the Data Haystack!*

# Introduction

The **Bigdata.com API** provides powerful retrieval capabilities,
enabling you to search and analyze news articles, transcripts, corporate
filings, and other documents. Notably, it supports both **keyword-based
searches** and **similarity searches**, along with a range of other
[advanced search
features](../../getting-started/search/overview).

In this notebook, we'll demonstrate how to use the Bigdata.com API to
perform a **similarity search** effectively.

```python theme={null}
# Import required modules and classes
import html

from IPython.display import display, HTML
from bigdata_client import Bigdata
from bigdata_client.daterange import RollingDateRange
from bigdata_client.models.advanced_search_query import Similarity
from bigdata_client.models.search import DocumentType, SortBy

# Initialize the Bigdata client
# Make sure BIGDATA_USERNAME and BIGDATA_PASSWORD are set in the environment
# Alternatively, you can pass your credentials directly to the Bigdata class
bigdata = Bigdata()
```

# Helper Functions

We define a helper function to show the search results in a nicely
formatted HTML:

<Accordion title="Click to show">
  ```python theme={null}
  def escape_special_chars(text):
      """Escapes special characters for safe HTML display."""
      text = html.escape(text)  # Escapes HTML special characters like <, >, &
      # text = text.replace(r"$", r"\$")  # Escape the dollar sign properly
      text = text.replace("  ", "&nbsp;&nbsp;")  # Preserve double spaces
      return text


  def print_results_html(results):
      """Prints search results in a readable format."""
      html_output = """
      <style>
          .results-container {
              font-family: Arial, sans-serif;
              background: #1e1e1e;
              color: white;
              padding: 20px;
              border-radius: 10px;
              max-width: 800px;
              margin: auto;
              box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.5);
          }
          .result-card {
              border: 1px solid #444;
              padding: 15px;
              margin: 15px 0;
              border-radius: 8px;
              background: #2a2a2a;
              transition: transform 0.2s, box-shadow 0.2s;
          }
          .result-card:hover {
              transform: scale(1.02);
              box-shadow: 0px 4px 10px rgba(255, 255, 255, 0.1);
          }
          .rank-container {
              display: flex;
              gap: 10px; /* Space between rank bubbles */
              align-items: center;
              margin-bottom: 10px;
          }
          .rank-badge {
              font-weight: bold;
              font-size: 16px;
              padding: 6px 12px;
              border-radius: 20px;
              display: inline-block;
              color: white;
          }
          .badge-blue {
              background: #1E88E5;
          }
          .headline {
              font-size: 20px;
              font-weight: bold;
              color: #ffcc00;
          }
          .timestamp {
              font-size: 14px;
              color: #cccccc;
          }
          .text {
              font-size: 16px;
              line-height: 1.6;
              color: #dddddd;
          }
      </style>
      <div class='results-container'>
      """

      for idx, document in enumerate(results, 1):
          # Infer ranks for the document
          headline = escape_special_chars(document.headline.title())
          timestamp = document.timestamp.strftime("%Y-%m-%d %H:%M:%S")
          relevance = round(document.chunks[0].relevance, 2)
          first_chunk_text = escape_special_chars(document.chunks[0].text)

          html_output += f"""
          <div class='result-card'>
              <div class='rank-container'>
                      <div class='rank-badge badge-blue'>{('📕📗📘' * idx)[:idx]}
          </div>
              </div>
              <div class='headline'>{headline}</div>
              <div class='timestamp'><strong>Timestamp:</strong> {timestamp}</div>
              <div class='relevance'><strong>📘 Relevance:</strong> {relevance}</div>
              <div class='text'>{first_chunk_text}</div>
          </div>
          """

      html_output += "</div>"

      display(HTML(html_output))
  ```
</Accordion>

# Define Search Query and Parameters

We define our search parameters, including the query, time period, and
the number of documents to retrieve. In this example, we are searching
for articles related to the Federal Reserve's actions on inflation and
concerns about tariffs.

```python theme={null}
# Create a similarity search query
query = Similarity('Fed addresses inflation amid tariff concerns')

# Search within a specific time frame
DATE_RANGE = RollingDateRange.LAST_WEEK

# Set the rerank threshold to improve search relevance
RERANK_THRESHOLD = 0.85

# This will limit the search to news articles only
chunk_relevance = ...

# Set the maximum number of documents to retrieve
DOCUMENT_LIMIT = 10
```

# Execute Search

We now run the search using the specified parameters.

One of the key features of the Bigdata API is the ability to **rerank**
the search results based on relevance scores. This is a cross-encoder
reranking that can help you find the most relevant documents quickly.
You can read more about the reranking feature
[here](../../how-to-guides/rerank_search).

We activate this feature by setting the `rerank_threshold`:

```python theme={null}
# Execute the search
# Configure and execute the search with specified parameters
search = bigdata.search.new(
    query=query,
    date_range=DATE_RANGE,
    rerank_threshold=RERANK_THRESHOLD,
    scope=DocumentType.NEWS,  # Limit to news articles
    sortby=SortBy.RELEVANCE  # Sort by relevance score
)

# Run the search and get results
results = search.run(DOCUMENT_LIMIT)
```

# Display Results

Now that we have the search results, we can display them in a readable
format:

```python theme={null}
print_results_html(results)
```

# Conclusion

For more details and documentation on the Bigdata.com API, refer to the
[official documentation](../../getting-started/search/overview).
There are many more filters that you can apply to narrow down your
search results.

**Happy Searching!** 🚀
