> ## Documentation Index
> Fetch the complete documentation index at: https://docs.bigdata.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Search with Re-Ranking

Bigdata.com API enables one to search through a vast amount of documents
instantly. To enhance the search results even further, Bigdata provides
a Cross Encoder re-ranker that assigns a new relevance score to each
candidate document. This allows the system to filter out less pertinent
results and focus on the most promising ones.

# Introduction

In this use case, we explore the effect of cross encoder in action. In
the first step, a similarity search rapidly scans hundreds of millions
of documents to surface a list of matches. While this method efficiently
gathers candidates, not all of them may be perfectly aligned with the
query's intent. We call this the *baseline* search results. Then we run
search for the same query with cross encoder re-ranking enabled, which
filters out documents that do not meet the relevance threshold.

# Step 0: Prerequisites

We need to import the Bigdata client library with the supporting
modules:

```python theme={null}
import html

from IPython.display import display, HTML
from bigdata_client import Bigdata
from bigdata_client.daterange import RollingDateRange
from bigdata_client.query import Similarity
```

# Step 1: Initialization

First we need to initialize the Bigdata class. This class will be used
to interact with the Bigdata API.

The authentication is done here too, by loading the `BIGDATA_USERNAME`
and `BIGDATA_PASSWORD` environment variables. If they are not set, you
can either set them or pass the `username` and `password` arguments to
the `Bigdata` class:

```python theme={null}
bigdata = Bigdata()
```

# Step 2: Define Helper Functions

We define a helper function to print the search results. This function
prints the search results in a readable format:

```python theme={null}
def escape_special_chars(text):
    """Escapes special characters for safe HTML display."""
    text = html.escape(text)  # Escapes HTML special characters like <, >, &
    # text = text.replace(r"$", r"\$")  # Escape the dollar sign properly
    text = text.replace("  ", "&nbsp;&nbsp;")  # Preserve double spaces
    return text


def infer_ranks(idx, document):
    """Infers the original and new ranks for a document."""
    if getattr(document, "baseline_rank", None):
        new_rank = idx
        original_rank = document.baseline_rank
    else:
        new_rank = idx
        original_rank = None
    return new_rank, original_rank


def print_results_html(results):
    """Prints search results in a readable format."""
    html_output = """
    <style>
        .results-container {
            font-family: Arial, sans-serif;
            background: #1e1e1e;
            color: white;
            padding: 20px;
            border-radius: 10px;
            max-width: 800px;
            margin: auto;
            box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.5);
        }
        .result-card {
            border: 1px solid #444;
            padding: 15px;
            margin: 15px 0;
            border-radius: 8px;
            background: #2a2a2a;
            transition: transform 0.2s, box-shadow 0.2s;
        }
        .result-card:hover {
            transform: scale(1.02);
            box-shadow: 0px 4px 10px rgba(255, 255, 255, 0.1);
        }
        .rank-container {
            display: flex;
            gap: 10px; /* Space between rank bubbles */
            align-items: center;
            margin-bottom: 10px;
        }
        .rank-badge {
            font-weight: bold;
            font-size: 16px;
            padding: 6px 12px;
            border-radius: 20px;
            display: inline-block;
            color: white;
        }
        .badge-blue {
            background: #1E88E5;
        }
        .badge-green {
            background: #28A745;
        }
        .headline {
            font-size: 20px;
            font-weight: bold;
            color: #ffcc00;
        }
        .timestamp {
            font-size: 14px;
            color: #cccccc;
        }
        .relevance {
            font-size: 16px;
            font-weight: bold;
            color: #00d1b2;
        }
        .text {
            font-size: 16px;
            line-height: 1.6;
            color: #dddddd;
        }
    </style>
    <div class='results-container'>
    """

    for idx, document in enumerate(results, 1):
        # Infer ranks for the document
        new_rank, original_rank = infer_ranks(idx, document)

        headline = escape_special_chars(document.headline.title())
        timestamp = document.timestamp.strftime("%Y-%m-%d %H:%M:%S")
        relevance = round(document.chunks[0].relevance, 2)
        first_chunk_text = escape_special_chars(document.chunks[0].text)

        html_output += """
        <div class='result-card'>
        """

        # Display rank badges if available
        if original_rank:
            html_output += f"""
                <div class='rank-container'>
                    <div class='rank-badge badge-green'>New Rank: {new_rank}</div>
                    <div class='rank-badge badge-blue'>Original Rank: {original_rank}</div>
                </div>
            """
        else:
            html_output += f"""
                <div class='rank-container'>
                    <div class='rank-badge badge-blue'>Rank: {new_rank}</div>
                </div>
            """

        html_output += f"""
            <div class='headline'>{headline}</div>
            <div class='timestamp'><strong>Timestamp:</strong> {timestamp}</div>
            <div class='relevance'><strong>📘 Relevance:</strong> {relevance}</div>
            <div class='text'>{first_chunk_text}</div>
        </div>
        """

    html_output += "</div>"

    display(HTML(html_output))
```

# Step 3: Define Search Query

As an example, we explore the potential impact of President Trump's
proposed tax cuts---referred to as "Trump 2.0" tax cuts---on the federal
deficit. We use a `Similarity` query to search for documents that are
similar to the query string.

```python theme={null}
query = Similarity('Trump 2.0 tax cuts impact on federal deficit')
```

# Step 4: Phase 1 - Baseline Search

We first run a search with the query string to see the results without
cross encoder re-ranking. This will give us a *baseline* to compare the
results later once we enable cross encoder re-ranking.

```python theme={null}
results_baseline = bigdata.search.new(
    query,
    date_range=RollingDateRange.LAST_THIRTY_DAYS,
).run(limit=10)
```

```python theme={null}
# Baseline search results, a.k.a. without cross encoder re-ranking enabled
print_results_html(results_baseline)
```

# Step 4: Phase 2 - Search with Re-Ranking

The `rerank_threshold` argument is used to apply the re-ranking using
the cross encoder between the query and the initial search results. This
will filter out the documents that have a re-ranking relevance score
below this threshold, so enhances the relevance of the final search
results.

```python theme={null}
results_rerank = bigdata.search.new(
    query,
    rerank_threshold=0.9,  # Filter out results with relevance below 0.9
    date_range=RollingDateRange.LAST_THIRTY_DAYS,
).run(limit=10)
```

We compare the results of the baseline search with the re-ranked search
results to see the impact of cross encoder re-ranking.

```python theme={null}
# Add baseline rank to each document in the re-ranked results
baseline_document_ids = [doc.id for doc in results_baseline]
for document in results_rerank:
    try:
        baseline_rank = baseline_document_ids.index(document.id) + 1
    except ValueError:
        baseline_rank = None
    document.baseline_rank = baseline_rank
```

```python theme={null}
print_results_html(results_rerank)
```

# Conclusion

The results are **re-ranked** to prioritize highly relevant matches with
the query intent, ensuring the most pertinent results appear at the top.
For example, results related to adjacent topics (e.g., "tariffs") are
deprioritized in favor of those directly aligned with the search query.
Any results falling below the `rerank_threshold` are filtered out,
**improving** overall relevance.

For more details, please refer to the [Bigdata.com API official
documentation](../../getting-started/introduction).

**Happy Searching!** 🚀
