Resource retrieval
Get the pre-trained net:
NetModel parameters
This model consists of a family of individual nets, each identified by a specific parameter combination. Inspect the available parameters:
Pick a non-default net by specifying the parameters:
Evaluation function
Get the tokenizer:
Write a function that preprocesses a list of input sentences:
Write a function that incorporates all the steps from tokenizing the input string up to outputting the scores:
Basic usage
Get the question and the sentences:
Get the scores for a given question for each sentence:
Get the scores for a given question for each sentence using a non-default model:
Input preprocessing
Preprocess a batch of sentences into inputs expected by a model. The result is an association:
• "input_ids": integer token indices representing each token of the concatenated question-sentence pairs
• "attention_mask": binary mask indicating valid tokens vs. padding tokens
• "token_type_ids": segment IDs used for sentence pair tasks showing which sentence each token belongs to (here 0 for the question and 1 for the sentence)
Get the dimensions of the preprocessed pairs of the sentences:
Visualize the preprocessed sentences:
Get the scores from the pairs of sentences:
Reranker-only document retrieval
The reranker model directly evaluates the joint relationship between a query and each document by concatenating them and producing a single relevance score. This full query-document interaction provides high accuracy, but because reranker scores cannot be precomputed and must be recalculated for every pair, applying it to all documents is computationally expensive and scales poorly. Therefore, this approach is best suited for small corpora. First, get the documents:
Split the input data into individual sentences to use each as a separate passage:
Get the query:
Run the reranker directly on all documents and get the scores from the pairs of sentences with recorded time:
Get the top-K relevant answers for the given query with the corresponding relevance score:
Show the results:
Embedding-only document retrieval
In large-scale information retrieval, processing every document with a reranker for each query is infeasible. To handle this efficiently, a text embedder encodes each document into a dense vector representation and stores these embeddings for reuse. At query time, the user query is encoded into the same vector space, and the system retrieves the most semantically similar documents based on vector distances. This approach enables fast and scalable retrieval with minimal latency. First, define the utility functions responsible for embedding extraction:
Evaluate the utility functions to compute embeddings for the query and all passages by recording the time:
Use the utility function to compute distance scores between the query embedding and all passage embeddings:
Get the top-K relevant answer for the given query with the corresponding distance score:
Show the results:
Record the overall runtime, excluding the document embedding step, since those embeddings are computed once and reused across queries:
Two-stage document retrieval (embedding + reranker)
Two-stage retrieval first uses the fast embedder to fetch a small pool of top-N candidate documents and then applies the more accurate reranker only to those candidates. This keeps retrieval efficient while still benefiting from the reranker’s fine-grained relevance scoring. This process is summarized in the diagram:
Get the indices of the top-N closest passages by selecting the smallest distance values from all computed distance scores:
Run the reranker on the top-N most similar passages to get the relevance scores:
Get the top-K relevant answers for the given query with the corresponding relevance score:
Show the results:
Record the overall runtime, excluding the document embedding step, since those embeddings are computed once and reused across queries:
Timing summary
This section compares the total runtime across three retrieval strategies: embedding-only retrieval, reranker-only retrieval and two-stage retrieval. Each approach was evaluated on the same query and document set to measure their relative computational efficiency. Create an association for each retrieval method by storing the total runtime values and show the results: