HyDE: How Hypothetical Document Embeddings Supercharge RAG Retrieval

8 min read
HyDE: How Hypothetical Document Embeddings Supercharge RAG Retrieval

The effectiveness of Retrieval-Augmented Generation (RAG) hinges on one core capability: finding the right information to augment your large language model (LLM). But what if the way we search for that information is fundamentally flawed? Traditional semantic search, embedding the user’s query directly, can sometimes miss the mark. Enter HyDE, or Hypothetical Document Embeddings, a clever technique that can significantly boost your RAG system’s retrieval quality. Let’s dive into how it works and, more importantly, how to implement it.

The Problem with Naive Semantic Search

Imagine a user asks: “What were the main criticisms of the film Citizen Kane?”

A standard RAG pipeline would embed this question and search a vector database for similar embeddings, hoping to retrieve relevant documents. The problem is that the question itself doesn’t necessarily resemble the answer. A document containing a detailed analysis of Citizen Kane’s flaws might use different vocabulary, focusing on cinematic techniques, narrative structure, and Orson Welles’ performance. The raw query simply lacks the “shape” of a good answer.

This is where HyDE shines. Instead of embedding the question directly, we first ask the LLM to generate a hypothetical answer.

HyDE: Answering Before Asking

The core idea of HyDE is remarkably simple:

  1. Generate a Hypothetical Answer: Provide the user’s query to an LLM and ask it to generate a plausible, even if imperfect, answer. Crucially, instruct the LLM to adopt a specific tone or perspective that will be reflected in the embedding.
  2. Embed the Hypothetical Answer: Embed this generated “answer” using your chosen embedding model.
  3. Search the Vector Database: Use the embedding of the hypothetical answer to perform semantic search and retrieve relevant documents.

The rationale is that the hypothetical answer, even if not entirely accurate, will contain terminology and concepts more closely related to the actual relevant documents. It provides a better “search query” in embedding space.

Here’s Python code demonstrating a HyDE implementation using Langchain, OpenAI, and a ChromaDB vectorstore. (These are all available in the Quartalis ecosystem.)

import os
from langchain.llms import OpenAI
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

# Set your OpenAI API key
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

# Initialize LLM and embedding model
llm = OpenAI(temperature=0) #zero temperature for more consistent results
embedding_function = OpenAIEmbeddings()

# Load your documents and create a ChromaDB vectorstore
# (Assuming you've already done this)
# Example:
# from langchain.document_loaders import TextLoader
# documents = TextLoader("your_data.txt").load()
# db = Chroma.from_documents(documents, embedding_function)

# Or, assume you already have one created, from persistent storage:
persist_directory = 'db'
db = Chroma(persist_directory=persist_directory, embedding_function=embedding_function)

# Define the HyDE prompt template
hyde_prompt_template = """Please write a hypothetical answer to the question below.
Question: {question}
Hypothetical answer:"""
hyde_prompt = PromptTemplate(template=hyde_prompt_template, input_variables=["question"])

# Create the HyDE chain
hyde_chain = LLMChain(llm=llm, prompt=hyde_prompt)

def hyde_search(query, vectorstore, k=4):
    """
    Performs HyDE search using an LLM to generate a hypothetical answer.

    Args:
        query (str): The user's query.
        vectorstore (Chroma): The ChromaDB vectorstore.
        k (int): Number of documents to return.

    Returns:
        list: A list of retrieved documents.
    """
    hypothetical_answer = hyde_chain.run(query)
    embedding = embedding_function.embed_query(hypothetical_answer)
    results = vectorstore.similarity_search_by_vector(embedding, k=k)
    return results

# Example usage
query = "What were the main criticisms of the film Citizen Kane?"
results = hyde_search(query, db)

for doc in results:
    print(doc.page_content)

In this code:

  • We initialize an OpenAI LLM and the OpenAI embeddings model.
  • We create a ChromaDB vectorstore (you’ll need to load your documents into it beforehand).
  • We define a prompt template that instructs the LLM to generate a hypothetical answer. Lowering the temperature ensures more consistent (but potentially less creative) results.
  • The hyde_search function takes the user’s query, generates a hypothetical answer, embeds it, and then performs a similarity search against the vector database.
  • Finally, we print the content of the retrieved documents.

Evaluating Retrieval Quality: A Practical Example

To demonstrate the effectiveness of HyDE, let’s consider a small dataset consisting of three documents related to the film Citizen Kane:

  • Document 1: A general overview of the film’s plot and critical reception.
  • Document 2: An in-depth analysis of the film’s cinematography and use of deep focus.
  • Document 3: A discussion of the film’s narrative structure and its impact on filmmaking.

Without HyDE, a query like “What were the main criticisms of the film Citizen Kane?” might retrieve Document 1, but could easily miss Documents 2 and 3, which contain more specific criticisms related to technical aspects and storytelling.

Using HyDE, the LLM might generate a hypothetical answer that mentions the film’s unconventional narrative, its sometimes confusing plot, and the perceived self-indulgence of Orson Welles. This hypothetical answer, when embedded, is much more likely to retrieve Documents 2 and 3, providing a more comprehensive answer to the user’s query.

Here’s an example of running this with the above code, after adding three dummy documents to the chroma DB:

Document 1 Content: Citizen Kane is a 1941 American drama film by Orson Welles.  It was critically acclaimed for its innovative cinematography, music, and narrative structure. However, it also received some criticism for its complex plot and perceived ambiguity.

Document 2 Content: The cinematography of Citizen Kane is notable for its use of deep focus, which allows multiple planes of the image to be in focus simultaneously. This technique, while visually striking, was sometimes criticized for being distracting and drawing attention away from the story.

Document 3 Content: Citizen Kane's non-linear narrative structure, jumping between different time periods, was groundbreaking for its time. However, some viewers found it confusing and difficult to follow, leading to criticism of its accessibility.

Now, if you run the python code, you should see results similar to this:

Citizen Kane's non-linear narrative structure, jumping between different time periods, was groundbreaking for its time. However, some viewers found it confusing and difficult to follow, leading to criticism of its accessibility.
Citizen Kane is a 1941 American drama film by Orson Welles.  It was critically acclaimed for its innovative cinematography, music, and narrative structure. However, it also received some criticism for its complex plot and perceived ambiguity.
The cinematography of Citizen Kane is notable for its use of deep focus, which allows multiple planes of the image to be in focus simultaneously. This technique, while visually striking, was sometimes criticized for being distracting and drawing attention away from the story.

Notice that we get the most relevant document first, whereas a vanilla query might have put the general overview first.

Fine-Tuning the Hypothetical Answer

The quality of the hypothetical answer is crucial to HyDE’s success. Experiment with different prompts to guide the LLM. Consider these strategies:

  • Specify the Tone: Ask the LLM to answer in a specific style (e.g., “as a film critic,” “as a historian”).
  • Limit the Length: Constrain the answer to a specific number of sentences or words. Shorter answers can sometimes be more focused.
  • Request Specificity: Encourage the LLM to include specific details and examples.

For example, you might modify the prompt template like this:

hyde_prompt_template = """Please write a concise hypothetical answer, in the style of a film critic, to the question below.  Focus on specific criticisms.
Question: {question}
Hypothetical answer:"""

HyDE and the Quartalis Ecosystem

The Quartalis ecosystem provides a robust foundation for building and deploying RAG pipelines with HyDE. Its modular design makes it easy to integrate different LLMs, embedding models, and vector databases. The built-in monitoring and evaluation tools allow you to track the performance of your HyDE implementation and fine-tune it for optimal retrieval quality. Furthermore, features like managed vector storage and serverless function deployment make it straightforward to scale your RAG applications. Consider using Quartalis to streamline your development and deployment process.

Wrapping Up

HyDE offers a powerful technique for improving the retrieval quality of your RAG systems. By generating a hypothetical answer before searching, you can bridge the gap between the user’s query and the relevant documents. While it adds a layer of complexity to your pipeline, the potential gains in accuracy and relevance are well worth the effort. Experiment with different prompts, LLMs, and embedding models to find the optimal configuration for your specific use case.

Need this built for your business?

Get In Touch