Name: Quartalis
Address: GB
Price range: ££

Reading time: ~12–15 minutes

Traditional Retrieval-Augmented Generation (RAG) systems are like chefs following a recipe: they take a query, find relevant documents, and produce an answer. But what happens when the recipe misses an ingredient or the kitchen is missing a tool? This is where agentic RAG shines—it’s not just a chef, but a fully autonomous kitchen that can detect gaps, fetch missing ingredients, and even redesign the recipe if needed.

Agentic RAG introduces a paradigm shift in how AI systems interact with knowledge bases. Unlike passive RAG pipelines, which follow a linear flow from query to retrieval to generation, agentic systems operate as autonomous agents capable of self-directed reasoning, iterative retrieval, and dynamic query refinement. This approach is particularly valuable in complex domains where a single retrieval pass is insufficient to answer a question. In this post, we’ll explore the mechanics of agentic RAG, focusing on three core capabilities: gap detection, multi-step retrieval loops, and query decomposition. We’ll also walk through a real-world implementation using Quartalis’ ecosystem tools, showing how these techniques can be applied in practice.

Understanding the Limitations of Traditional RAG

Traditional RAG systems follow a straightforward workflow:

A user submits a query.
The system retrieves documents from a vector database using similarity search.
The retrieved documents are passed to a language model, which generates an answer.

This approach works well for simple, factual questions but falters in complex scenarios. For example:

Partial information: If the retrieved documents contain only fragments of the required answer, the generated response may be incomplete or inaccurate.
Missing context: A query might require information from multiple sources that are not simultaneously retrieved.
Ambiguity: Complex questions may need decomposition into sub-questions, each requiring separate retrieval steps.

Consider a query like: “What are the key factors contributing to the decline in UK manufacturing output between 2010 and 2020, and how did Brexit influence this trend?” A traditional RAG system might retrieve documents about UK manufacturing trends or Brexit’s economic impact, but it could miss the nuanced interplay between the two.

Agentic RAG addresses these limitations by introducing a feedback loop that allows the system to:

Identify gaps in retrieved information.
Dynamically refine queries based on missing context.
Execute multiple retrieval steps to gather comprehensive evidence.

Introducing Agentic RAG: The Autonomous Approach

Agentic RAG systems operate as self-directed agents, combining elements of planning, retrieval, and generation. The core idea is to treat the retrieval process not as a one-time event but as an iterative, adaptive process. This is achieved through:

1. Gap Detection

The system evaluates the retrieved documents to identify missing information. For example, if a query about a technical process requires data from a specific year but the retrieved documents only cover earlier years, the agent can flag this gap and trigger a follow-up retrieval.

2. Multi-Step Retrieval Loops

Instead of a single retrieval pass, the system may perform multiple rounds of retrieval. Each iteration uses the results of the previous step to refine the search. This is particularly useful for questions that require cross-referencing multiple sources or resolving ambiguities.

3. Query Decomposition

Complex queries are broken down into sub-questions, each of which is addressed through targeted retrieval. This ensures that all aspects of the original query are thoroughly explored.

Let’s look at how this works in practice.

Key Components of Agentic RAG

Gap Detection: Identifying Missing Information

Gap detection is the first step in agentic RAG. The system evaluates the retrieved documents to determine whether they provide sufficient context for the query. If not, it identifies the missing pieces and triggers a follow-up retrieval.

For example, suppose a user asks: “What are the primary causes of deforestation in the Amazon rainforest, and what mitigation strategies have been proposed?” A traditional RAG system might retrieve documents on deforestation causes and mitigation strategies separately. However, it may miss documents that discuss the interplay between economic drivers and environmental policies.

An agentic system would:

Retrieve initial documents on deforestation causes.
Analyze the retrieved text to identify gaps (e.g., lack of information on mitigation strategies).
Generate a follow-up query like “What mitigation strategies have been proposed for Amazon deforestation?” and perform a second retrieval.
Combine both sets of documents to generate a comprehensive answer.

This approach ensures that the system doesn’t rely on a single retrieval pass but instead adapts its search based on the information it already has.

Multi-Step Retrieval Loops: Iterative Refinement

Multi-step retrieval loops allow the system to refine its search iteratively. Each retrieval step builds on the previous one, gradually narrowing down the scope of the query.

Here’s a simplified example using Python and FAISS for vector similarity search:

import faiss
from sentence_transformers import SentenceTransformer

# Initialize embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Sample documents (in practice, these would come from a database)
documents = [
    "Deforestation in the Amazon is driven by agricultural expansion and logging.",
    "Mitigation strategies include reforestation and stricter enforcement of environmental laws.",
    "Economic incentives for sustainable agriculture have been proposed as a solution."
]

# Embed documents
document_embeddings = model.encode(documents, convert_to_tensor=True)
index = faiss.IndexFlatL2(document_embeddings.shape[1])
index.add(document_embeddings)

def retrieve(query, index, model, documents, top_k=2):
    query_embedding = model.encode([query], convert_to_tensor=True)
    distances, indices = index.search(query_embedding, top_k)
    results = [documents[i] for i in indices[0]]
    return results

# Initial retrieval
initial_query = "What are the primary causes of deforestation in the Amazon?"
initial_results = retrieve(initial_query, index, model, documents)
print("Initial retrieval results:", initial_results)

# Gap detection: Assume we identify a need for mitigation strategies
follow_up_query = "What mitigation strategies have been proposed for Amazon deforestation?"
follow_up_results = retrieve(follow_up_query, index, model, documents)
print("Follow-up retrieval results:", follow_up_results)

In this example, the system first retrieves information on deforestation causes and then identifies a gap in mitigation strategies. It performs a second retrieval to address this gap, demonstrating the power of multi-step loops.

Query Decomposition: Breaking Down Complex Questions

Query decomposition is the process of splitting a complex question into smaller, more manageable sub-questions. Each sub-question is then addressed through targeted retrieval.

For example, consider the query: “How did the 2008 financial crisis impact the UK housing market, and what policy changes were implemented in response?” A traditional RAG system might retrieve documents on the financial crisis and housing market trends but may miss policy changes.

An agentic system would decompose the query into:

“What was the impact of the 2008 financial crisis on the UK housing market?”
“What policy changes were implemented in the UK in response to the 2008 financial crisis?”

Each sub-question is then addressed through separate retrieval steps, ensuring that all aspects of the original query are covered.

This approach is particularly valuable in domains where questions require cross-referencing multiple sources or resolving ambiguities.

Real-World Implementation Example: Building an Agentic RAG System

To illustrate how agentic RAG works in practice, let’s walk through a real-world implementation using Quartalis’ ecosystem tools. Suppose we’re building a customer support chatbot for a SaaS company. The chatbot needs to answer complex technical questions by retrieving relevant documentation and code examples.

Step 1: Set Up the Knowledge Base

We start by indexing technical documentation, FAQs, and code examples into a vector database. Using Quartalis’ semantic caching tools, we can reduce latency and costs by storing frequently accessed documents in a high-performance cache.

from quartalis import SemanticCache

# Initialize semantic cache
cache = SemanticCache(index_type='faiss', embedding_model='all-MiniLM-L6-v2')

# Index technical documentation
documents = ["...", "...", "..."]  # Replace with actual documentation
cache.index_documents(documents)

Step 2: Implement Gap Detection and Multi-Step Retrieval

The chatbot uses a language model to generate answers, but it also includes a gap detection mechanism. If the retrieved documents don’t provide sufficient context, the system automatically refines the query and performs additional retrievals.

def answer_query(query):
    # Initial retrieval
    initial_results = cache.retrieve(query, top_k=3)
    
    # Check for gaps (simplified example)
    if "error handling" not in " ".join(initial_results):
        follow_up_query = "How should errors be handled in this scenario?"
        follow_up_results = cache.retrieve(follow_up_query, top_k=2)
        initial_results.extend(follow_up_results)
    
    # Generate answer using the combined results
    answer = generate_answer(initial_results)
    return answer

Step 3: Query Decomposition for Complex Questions

For complex questions, the system decomposes the query into sub-questions and addresses each one separately. For example:

User Query: “How can I integrate the Quartalis API into my Python application, and what are the best practices for handling API rate limits?”

Decomposed Sub-Questions:

“How can I integrate the Quartalis API into a Python application?”
“What are the best practices for handling API rate limits?”

Each sub-question is addressed through targeted retrieval, ensuring that the final answer is comprehensive and accurate.

Implementation Details with Code

Let’s dive deeper into the code for an agentic RAG system. The following example uses Python, FAISS, and the transformers library to implement a basic agentic RAG pipeline.

1. Setting Up the Vector Database

import faiss
from sentence_transformers import SentenceTransformer

# Initialize embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Sample documents (replace with your own data)
documents = [
    "The Quartalis API provides a RESTful interface for interacting with AI systems.",
    "To integrate the API, use Python's `requests` library with the base URL `https://api.quartalis.co`.",
    "API rate limits are enforced at 100 requests per minute per user."
]

# Embed documents
document_embeddings = model.encode(documents, convert_to_tensor=True)
index = faiss.IndexFlatL2(document_embeddings.shape[1])
index.add(document_embeddings)

2. Gap Detection and Multi-Step Retrieval

def retrieve(query, index, model, documents, top_k=2):
    query_embedding = model.encode([query], convert_to_tensor=True)
    distances, indices = index.search(query_embedding, top_k)
    results = [documents[i] for i in indices[0]]
    return results

def answer_query(query, index, model, documents):
    # Initial retrieval
    initial_results = retrieve(query, index, model, documents)
    print("Initial results:", initial_results)
    
    # Gap detection: Check if results are sufficient
    if "rate limits" not in " ".join(initial_results):
        follow_up_query = "How are API rate limits handled in Quartalis?"
        follow_up_results = retrieve(follow_up_query, index, model, documents)
        initial_results.extend(follow_up_results)
    
    # Generate answer (simplified)
    answer = " ".join(initial_results)
    return answer

3. Query Decomposition

For complex queries, we can use a language model to decompose the question into sub-questions:

from transformers import AutoModelForQuestionAnswering, AutoTokenizer

# Load QA model for decomposition
decomposer = AutoModelForQuestionAnswering.from_pretrained('bert-base-uncased')
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

def decompose_query(query):
    inputs = tokenizer(query, return_tensors='pt')
    outputs = decomposer(**inputs)
    # Extract sub-questions (simplified example)
    sub_questions = ["How to integrate Quartalis API", "API rate limit handling"]
    return sub_questions

This approach allows the system to dynamically break down complex queries into manageable parts, ensuring that all aspects are addressed.

What’s Next: Scaling Agentic RAG with Quartalis

Agentic RAG is still an emerging field, and there are many opportunities to refine and expand its capabilities. Future work could include:

Self-hosting agentic systems: Using Quartalis’ self-hosting tools to deploy agentic RAG pipelines on-premises or in hybrid environments.
Enhanced query decomposition: Leveraging advanced language models to improve the accuracy of sub-question generation.
Integration with corrective RAG: Combining agentic RAG with corrective techniques to refine answers based on user feedback.

For more information on implementing agentic RAG systems, refer to the Quartalis documentation.

This guide provides a comprehensive overview of agentic RAG, from gap detection and multi-step retrieval to query decomposition and real-world implementation. By leveraging tools like Quartalis, developers can build powerful, adaptive systems that deliver accurate and context-aware answers.

Agentic RAG: When Your AI System Thinks Before It Answers

Understanding the Limitations of Traditional RAG

Introducing Agentic RAG: The Autonomous Approach

1. Gap Detection

2. Multi-Step Retrieval Loops

3. Query Decomposition

Key Components of Agentic RAG

Gap Detection: Identifying Missing Information

Multi-Step Retrieval Loops: Iterative Refinement

Query Decomposition: Breaking Down Complex Questions

Real-World Implementation Example: Building an Agentic RAG System

Step 1: Set Up the Knowledge Base

Step 2: Implement Gap Detection and Multi-Step Retrieval

Step 3: Query Decomposition for Complex Questions

Implementation Details with Code

1. Setting Up the Vector Database

2. Gap Detection and Multi-Step Retrieval

3. Query Decomposition

What’s Next: Scaling Agentic RAG with Quartalis

Related Posts

Topic Consolidation: Turning Thousands of Chunks into Structured Knowledge

Recency Weighting in RAG: When Newer Information Matters More

Multi-Query Retrieval: Ask the Same Question Five Different Ways