Guide to Adaptive RAG Systems with LangGraph


Large language models answer questions using the knowledge they learned during training. This fixed knowledge base limits them. They can’t give you current or highly specific information. Retrieval-Augmented Generation (RAG) helps by letting LLMs pull in external data, but even RAG needs help with complex questions. Adaptive RAG offers a solution. It picks the best approach for answering a question. This could be a direct answer from the LLM, a single data lookup, or multiple data lookups or even a web search. A classifier looks at the question’s complexity and decides the best method.

Here are the three answering strategies Adaptive RAG uses:

  • Straight forward Response: For easy questions, the LLM uses its own stored knowledge. This gives fast answers without needing to look up external information.
  • Single-step Approach: For questions that are a bit harder, the system gets information from one external source. This keeps the answer quick but also well-informed.
  • Multi-step Approach: For the most complex questions, the system looks at several sources. This builds a detailed and complete answer.

Why Adaptive-RAG Works So Well?

Adaptive-RAG is flexible. It handles each question efficiently, using only the resources needed. This saves computing power and makes users happier because they get precise answers quickly.  Adaptive-RAG also updates itself based on question complexity. Because it adapts, it reduces the chance of old or too-general answers. It gives current, specific and reliable answers.

Improving Accuracy

Adaptive-RAG is very accurate. It understands the complexity of each question and adapts. By matching the retrieval method to the question, Adaptive-RAG makes sure answers are relevant. It also uses the newest available information. This adaptability greatly reduces the chance of getting old or generic answers. Adaptive-RAG sets a new standard for accurate automated question answering.

QA performance (F1) and efficiency (Time/Query)
for different retrieval-augmented generation approaches.
Source: click here

The image from the research paper compares different RAG approaches. Adaptive RAG excels in both speed and performance.

What is Adaptive RAG?

Adaptive Retrieval-Augmented Generation (Adaptive RAG) is a RAG architecture designed to improve how large language models (LLMs) handle queries. It adjusts the strategy for answering questions based on their complexity.

The Key Idea of Adaptive RAG

Adaptive RAG selects the best approach for answering a query, whether it’s simple or complex. A classifier evaluates the query and decides whether to use a straightforward method, a single-step retrieval, or a multi-step process.

Comparison with other approaches

As shown in the above image, here’s the comparison of adaptive RAG with a single-step, multi-step approach.

1. Single-Step Approach

    This method retrieves information in one go and pass it to the LLm to generate an answer directly. Although it works well for simple questions like “What is Agentic RAG?”, it struggles with complex queries. For instance “What is the use of Agentic RAG in the healthcare sector?” due to lack of reasoning on the retrieved information.

    2. Multi-Step Approach

      This method processes all queries through multiple retrieval steps, refining answers iteratively. While effective for complex queries, it can waste resources on simple questions. For instance, repeatedly processing “What is Agentic RAG?” is inefficient.

      3. Adaptive Approach

        A classifier determines the complexity of the query and selects the appropriate strategy: 

        • Straightforward Query: Generates an answer without retrieval (e.g., “What is the capital of India?”). 
        • Simple Query: Uses a single-step retrieval. 
        • Complex Query: Employs multi-step retrieval for detailed reasoning. 
        • Advantages: This method reduces unnecessary processing for simple queries while ensuring accuracy for complex ones.

        Adaptive RAG Framework

        • Classifier Role: A smaller language model predicts the complexity of the query. It learns from past outcomes and patterns in the data. 
        • Dynamic Strategy Selection: The framework conserves resources for simple queries and ensures thorough reasoning for complex ones.

        RAG System Architecture Flow from LangGraph

        Here’s another example of an adaptive RAG System architecture flow from LangGraph:

        Adaptive RAG System Architecture Flow:

        1. Query Analysis

        The process begins with analyzing the user query to determine the most appropriate pathway for retrieving and generating the answer.

        •  Route Determination
          • The query is classified into categories based on its relevance to the existing index (database or vector store).
          • Related to Index: If the query is aligned with the indexed content, it is routed to the RAG module for retrieval and generation.
          • Unrelated to Index: If the query is outside the scope of the index, it is routed for a web search or another external knowledge source.
        • Optional Routes: Additional pathways can be added for more specialized scenarios, such as domain-specific tools or external APIs.

        2. RAG + Self-Reflection

        If the query is routed through the RAG module, it undergoes an iterative, self-reflective process to ensure high-quality and accurate responses.

        • Retrieve Node
          • Retrieves documents from the indexed database based on the query.
          • These documents are passed to the next stage for evaluation.
        • Grade Node
          • Assesses the relevance of the retrieved documents.
          • Decision Point:
            • If documents are relevant: Proceed to generate an answer.
            • If documents are irrelevant: The query is rewritten for better retrieval and the process loops back to the retrieve node.
        • Generate Node
          • Generates a response based on the relevant documents.
          • The generated response is evaluated further to ensure accuracy and relevance.
        • Self-Reflection Steps
          • Does it answer the question?
            • If yes: The process ends, and the answer is returned to the user.
            • If no: The query undergoes another iteration, potentially with additional refinements.
          • Hallucinations Check
            • If hallucinations are detected (inaccuracies or made-up facts): The query is rewritten, or additional retrieval is triggered for correction.
        • Re-write Question Node
          • Refines the query for better retrieval results and loops it back into the process.
          • This ensures that the model adapts dynamically to handle edge cases or incomplete data.

        3. Web Search for Unrelated Queries

        If the query is deemed unrelated to the indexed knowledge base during the Query Analysis stage:

        • Generate Node with Web Search: The system directly performs a web search and uses the retrieved data to generate a response.
        • Answer with Web Search: The generated response is delivered directly to the user.

        In summary, Adaptive RAG is a smart framework that enhances response quality and efficiency by adapting to the complexity of queries. Now let’s build a RAG application using the Adaptive-RAG strategy. Our application sends user questions to the best data source. Here’s the setup:

        Implementation Steps

        Here’s a breakdown of how we build the application using the LangGraph framework:

        1. Indexing Documents: We start by indexing documents with ChromaDB, a vector database in LangChain. This lets us quickly find relevant documents for a user’s question.
        2. Routing: The key part of our application is sending each question to the right place. This could be our web search or our internal vector database.
        3. Retrieval Grader: We use a retrieval grader to check if the retrieved documents are relevant. It gives a simple “yes” or “no” score.
        4. Generation: To create answers, we combine the prompt, the language model (LLM), and the data from the vector database or web search. This builds a clear and relevant response.
        5. Hallucination Grader: To make sure the information is correct, a hallucination grader checks if the answer is based on facts. It also gives a “yes” or “no” score.
        6. Answer Grader: An answer grader checks if the response fully answers the user’s question.
        7. Question Rewriter: This part improves the question. It rewrites the question to make it work better with the vector database.
        8. Web Search: The system finds relevant information online with the paraphrased question.
        9. Graph Representation using LangGraph: Finally, we use LangGraph to show how the application works. We define nodes and build a graph that shows the whole process.

        Building an Adaptive RAG System with Self-Reflection

        We will be implementing the Adaptive RAG System we have discussed so far using LangGraph. We will be loading some of our blog contents using PyPDFLoader into our vector database, the ChromaDB vector database. and also using the Tavily Search tool for web search. Connections to LLMs and prompting will be made with LangChain, and the agent will be built using LangGraph. For our LLM, we will be using ChatGPT GPT-4o, which is a powerful LLM that has native support for tool calling. Although, we can use any other LLM, it is advisable to use LLM finetuned for tool calling.

        Workflow of Adaptive RAG

        Workflow of Adaptive RAG
        Source: Author

        This image explains the workflow of the adaptive rag system.

        Install OpenAI, ChromaDB, LangGraph and LangChain Dependencies

        !pip install -U langchain_community tiktoken langchain-openai langchainhub langchain_chroma langchain langgraph  tavily-python pypdf

        Set up OpenAI and Tavily API

        from getpass import getpass

        OPENAI_KEY = getpass('Enter Open AI API Key: ')

        TAVILY_API_KEY = getpass('Enter Tavily Search API Key: ')

        Setting up environment variables

        import os
        os.environ['OPENAI_API_KEY'] = OPENAI_KEY
        os.environ['TAVILY_API_KEY'] = TAVILY_API_KEY

        Set up OpenAI embeddings

        from langchain.text_splitter import RecursiveCharacterTextSplitter
        from langchain_community.document_loaders import WebBaseLoader
        from langchain_community.vectorstores import Chroma
        from langchain_openai import OpenAIEmbeddings
        openai_embed_model = OpenAIEmbeddings(model="text-embedding-3-small")

        Load Knowledge Base

        We have downloaded the blogs PDFs and made the pdfs documents available in a folder named “pdf” on Google Drive, you can download it from here.

        Google drive link.

        Now upload the “pdf” folder in the colab.

        After loading the pdfs, docs_list contains the documents loaded from the PDFs in the directory

        from langchain_community.document_loaders import DirectoryLoader
        from langchain_community.document_loaders import PyPDFLoader
        
        
        pdf_directory = "/content/pdf"
        
        
        # Create a DirectoryLoader with PyPDFLoader as the loader
        loader = DirectoryLoader(pdf_directory, glob="**/*.pdf", loader_cls=PyPDFLoader)
        
        
        docs_list = loader.load()
        
        
        # Now, docs_list contains the documents loaded from the PDFs in the directory
        print(f"Loaded {len(docs_list)} documents.")
        
        
        # Example: Print the first few pages of the first document (if it has pages)
        if docs_list:
           first_doc = docs_list[0]
           if hasattr(first_doc, 'page_content'):
               print(first_doc.page_content[:500])  # Print the first 500 characters of the first page

        Output

        Loaded 60 documents.
        
        5 Types of AI Agents that you Must Know About
        
        Introduction
        
        What if machines could make their own decisions, solve problems, and adapt to new
         situations just like wedo? This would potentially lead to a world where artificial
         intelligence becomes not just a tool but acollaborator. That’s exactly what AI
         agents aim to achieve! These smart systems are designed tounderstand their 
        surroundings, process information, and act independently to accomplish specific
         tasks.

        Let’s think about your daily life—w

        Performing the Chunking using RecursiveCharacterTextSplitter

        splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=300)
        chunked_docs = splitter.split_documents(docs_list)
        chunked_docs[67].page_content

        Output

        # Instantiate tools docs_tool = 
        DirectoryReadTool(directory='/home/xy/VS_Code/Ques_Ans_Gen/blog-posts')\nfile_tool =
        FileReadTool() search_tool = SerperDevTool() web_rag_tool =
        WebsiteSearchTool()\nExplanations\ndocs_tool: Reads files from the specified
        directory(/home/badrinarayan/VS_Code/Ques_Ans_Gen/blog-posts). This could be used
        for reading past blogposts to help in writing new ones.\nfile_tool: Reads
        individual files, which could come in handy for retrieving research materials or
        drafts.\nsearch_tool: Performs web searches using the Serper API to gather data on
        market trends in AI.\nweb_rag_tool: Searches for specific website content to assist
        in research.\n# Create agents researcher = Agent( role="Market Research Analyst"
        , goal="Provide 2024 market analysis of the\nAI industry", backstory='An expert
        analyst with a keen eye for market trends.', tools=[search_tool,\nweb_rag_tool],
        verbose=True )\nExplanations\nResearcher: This agent conducts market research. It
        uses the search_too

        Creating a Vector Store using Chroma

        Storing vectors in Chroma DB using hnsw index for a better retrieval.

        from langchain_chroma import Chroma
        chroma_db = Chroma.from_documents(documents=chunked_docs,
                                         collection_name="rag_db",
                                         embedding=openai_embed_model,
        # need to set the distance function to cosine else it uses Euclidean by default
        # check https://docs.trychroma.com/guides#changing-the-distance-function
                                         collection_metadata={"hnsw:space": "cosine"},
                                         persist_directory="./rag_db")

        Here we set up a similarity_threshold_retriever that extracts the best chunks using a threshold value for similarity between user query and chunks.

        similarity_threshold_retriever = chroma_db.as_retriever(search_type="similarity_score_threshold",
                              search_kwargs={"k": 3,
                              "score_threshold": 0.3})

        Lets Test our Retriever 

        query = "what is AI agents?"
        top3_docs = similarity_threshold_retriever.invoke(query)
        top3_docs

        Output

        [Document(id='f808a58f-69b9-4442-86b4-fdf3247bdffa', metadata={'creationdate':
         '2025-03-06T11:05:04+00:00', 'creator': 'Chromium', 'moddate': '2025-03-
        06T11:05:04+00:00', 'page': 0, 'page_label': '1', 'producer': 'Skia/PDF m83',
         'source': '/content/pdf/Understanding LangChain Agent Framework.pdf', 
        'total_pages': 9}, page_content="AI Agent Terms\nHere’s the list of AI Agent
         terms:\n1. AI Agent\nAn AI agent is a type of digital assistant or robot that
         perceives its environment, processes it, and performsactions to accomplish
         particular goals. An example is a self-driving vehicle that has cameras and
         sensorsto drive on the road. It is an AI that is working to keep you safe in real
         time.\n2. Autonomous Agent\nAn autonomous agent is an AI that operates on its own
         and does not require human supervision. Anexample would be a package delivery drone 
        that decides on a specific route, avoids obstacles, and makesthe delivery without 
        any help.\nA I  A G E N T S\nA R T I F I C I A L  I N T E L L I G E N C E\nB E G I N
         N E R"),
        
        Document(id='eef333a4-e2b4-4876-924a-e02ab77052a9', metadata={'creationdate': '2025-
        03-06T11:05:04+00:00', 'creator': 'Chromium', 'moddate': '2025-03-
        06T11:05:04+00:00', 'page': 0, 'page_label': '1', 'producer': 'Skia/PDF m83', 
        'source': '/content/pdf/Understanding LangChain Agent Framework.pdf', 'total_pages':
         9}, page_content="AI Agent Terms\nHere’s the list of AI Agent terms:\n1. AI
         Agent\nAn AI agent is a type of digital assistant or robot that perceives its
         environment, processes it, and performsactions to accomplish particular goals. An
         example is a self-driving vehicle that has cameras and sensorsto drive on the road.
         It is an AI that is working to keep you safe in real time.\n2. Autonomous Agent\nAn
         autonomous agent is an AI that operates on its own and does not require human
         supervision. Anexample would be a package delivery drone that decides on a specific
         route, avoids obstacles, and makesthe delivery without any help.\nA I  A G E N T
         S\nA R T I F I C I A L  I N T E L L I G E N C E\nB E G I N N E R"),
        
        
        Document(id='2a9a3be4-2a61-4bed-8e46-804f34dc624c', metadata={'creationdate': '2024-
        07-10T10:50:06+00:00', 'creator': 'Chromium', 'moddate': '2024-07-
        
        10T10:50:06+00:00', 'page': 1, 'page_label': '2', 'producer': 'Skia/PDF m83',
         'source': '/content/pdf/Build a Customized AI Agent Without Code Using 
        Relevance.ai.pdf', 'total_pages': 11}, page_content="Frequently Asked
         Questions\nUnderstanding AI Agents\nAI agents are self-governing creatures that
         employ sensors to keep an eye on their environment, processinformation, and 
        accomplish predefined goals. They can be anything from basic bots to 
        sophisticatedsystems that can adjust and learn over time. Typical instances include
         recommendation engines likeNetflix and Amazon’s, chatbots like Siri and Alexa, and
         self-driving cars from Tesla and Waymo.\nAlso essential in a number of sectors are these agents: UiPath and Blue Prism are examples of roboticprocess automation (RPA)
         programs that automate repetitive processes. DeepMind and IBM Watson Healthare 
        examples of healthcare diagnostics systems that help diagnose diseases and recommend
         treatments.In their domains, AI agents greatly improve productivity, precision, and
         customisation.\nWhy AI Agents are Important?\nThese agents play a critical role in
         improving our daily lives and accomplishing particular objectives.\nAI agents are
         significant because they can:\nlowering the amount of human labor required to 
        complete routine operations, resulting in increasedproduction and
         efficiency.\nanalyzing enormous volumes of data to offer conclusions and suggestions that support decision-making.\nutilizing chatbots and virtual assistants
         to provide individualized interactions and assistance.\nenabling complex 
        applications in industries like as banking, transportation, and healthcare.\nIn
         essence, AI agents are pivotal in driving the next wave of technological
         advancements, making systemssmarter and more responsive to user
         needs.\nApplications and Use Cases of AI Agents\nAI agents have a wide range of
         applications across various industries. Here are some notable use cases:\nCustomer
         Service: AI agents in the form of chatbots and virtual assistants handle customer
         inquiries,resolve issues, and provide personalized support. They can operate 24/7,
         offering consistent andefficient service.")]
        query = "what is the capital of USA?"
        top3_docs = similarity_threshold_retriever.invoke(query)
        top3_docs

        Let’s now try a question that is out of context, such that no context documents related to the question are there in the vector database.

        Output

        []

        We can see that our RAG system is working well.

        Creating Router

        It will route the user query based on the nature of the query to Web Search or Vector Store

        from typing import Literal
        from langchain_core.prompts import ChatPromptTemplate
        from langchain_core.pydantic_v1 import BaseModel, Field
        from langchain_openai import ChatOpenAI
        
        
        # Data model
        class RouteQuery(BaseModel):
           """Route a user query to the most relevant datasource."""
           datasource: Literal["vectorstore", "web_search"] = Field(
               description="Given a user question choose to route it to web search or a vectorstore.",
           )
        # LLM with function call
        llm = ChatOpenAI(model="gpt-4o", temperature=0)
        structured_llm_router = llm.with_structured_output(RouteQuery)
        # Prompt
        system = """You are an expert at routing a user question to a vectorstore or web search.
        The vectorstore contains documents related to AI agents, Agentic Patterns and Types of AI agents.
        Use the vectorstore for questions on these topics. Otherwise, use web-search."""
        route_prompt = ChatPromptTemplate.from_messages(
           [
               ("system", system),
               ("human", "{question}"),
           ]
        )
        question_router = route_prompt | structured_llm_router
        print(question_router.invoke({"question": "Who won the IPL 2024?"}))
        print(question_router.invoke({"question": "What are the types of AI agents?"}))

        Output

        datasource="web_search"

        datasource="vectorstore"

        Retriever Grader

        It indicates whether the query is relevant to retrieved documents or not by simple “yes” or “no” output.

        # Data model
        class GradeDocuments(BaseModel):
           """Binary score for relevance check on retrieved documents."""
        
        
           binary_score: str = Field(
               description="Documents are relevant to the question, 'yes' or 'no'"
           )
        
        # LLM with function call
        llm = ChatOpenAI(model="gpt-4o", temperature=0)
        structured_llm_grader = llm.with_structured_output(GradeDocuments)
        
        
        # Prompt
        system = """You are a grader assessing the relevance of a retrieved document to a user question. \n
           If the document contains keyword(s) or semantic meaning related to the user question, grade it as relevant. \n
           It does not need to be a stringent test. The goal is to filter out erroneous retrievals. \n
           Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question."""
        grade_prompt = ChatPromptTemplate.from_messages(
           [
               ("system", system),
               ("human", "Retrieved document: \n\n {document} \n\n User question: {question}"),
           ]
        )
        
        
        retrieval_grader = grade_prompt | structured_llm_grader
        question = "agentic patterns"
        docs = similarity_threshold_retriever.invoke(question)
        doc_txt = docs[1].page_content
        print(retrieval_grader.invoke({"question": question, "document": doc_txt})

        Output

        binary_score="yes"

        Generation Node

        This piece of code is fetching rag prompt from Langchain hub and then generating answer using LLM

        from langchain import hub
        from langchain_core.output_parsers import StrOutputParser
        
        
        # Prompt
        prompt = hub.pull("rlm/rag-prompt")
        
        
        # LLM
        llm = ChatOpenAI(model_name="gpt-4o", temperature=0)
        
        # Post-processing
        def format_docs(docs):
           return "\n\n".join(doc.page_content for doc in docs)
        
        # Chain
        rag_chain = prompt | llm | StrOutputParser()
        
        # Run
        generation = rag_chain.invoke({"context": docs, "question": question})
        print(generation)

        Output

        Agentic patterns in AI design refer to strategies that enhance AI systems' autonomy
        and effectiveness. The key agentic design patterns include the Reflection Pattern,
        which focuses on self-evaluation and refinement of outputs, Tool Use, Planning, and
        Multi-Agent Collaboration. These patterns enable AI models to perform complex tasks
        by encouraging self-evaluation, tool integration, strategic thinking, and collaboration.

        Hallucination Grader

        Here we are checking if the generated answer contains hallucination or not which is a self reflection step:

        # Data model
        class GradeHallucinations(BaseModel):
           """Binary score for hallucination present in generated answer."""
        
        
           binary_score: str = Field(
               description="Answer is grounded in the facts, 'yes' or 'no'"
           )
        
        # LLM with function call
        llm = ChatOpenAI(model="gpt-4o", temperature=0)
        structured_llm_grader = llm.with_structured_output(GradeHallucinations)
        
        
        # Prompt
        system = """You are a grader assessing whether an LLM generation is grounded in / supported by a set of retrieved facts. \n
            Give a binary score 'yes' or 'no'. 'Yes' means that the answer is grounded in / supported by the set of facts."""
        hallucination_prompt = ChatPromptTemplate.from_messages(
           [
               ("system", system),
               ("human", "Set of facts: \n\n {documents} \n\n LLM generation: {generation}"),
           ]
        )
        
        
        hallucination_grader = hallucination_prompt | structured_llm_grader
        hallucination_grader.invoke({"documents": docs, "generation": generation})

        Output

        GradeHallucinations(binary_score="yes")

        Answer Grader

        It denotes whether the provided answer is legit or not according to the question asked

        # Data model
        class GradeAnswer(BaseModel):
           """Binary score to assess answer addresses question."""
        
        
           binary_score: str = Field(
               description="Answer addresses the question, 'yes' or 'no'"
           )
        
        
        # LLM with function call
        llm = ChatOpenAI(model="gpt-4o", temperature=0)
        structured_llm_grader = llm.with_structured_output(GradeAnswer)
        
        
        # Prompt
        system = """You are a grader assessing whether an answer addresses / resolves a question \n
            Give a binary score 'yes' or 'no'. Yes' means that the answer resolves the question."""
        answer_prompt = ChatPromptTemplate.from_messages(
           [
               ("system", system),
               ("human", "User question: \n\n {question} \n\n LLM generation: {generation}"),
           ]
        )
        
        
        answer_grader = answer_prompt | structured_llm_grader
        answer_grader.invoke({"question": question, "generation": generation})

        Output

        GradeAnswer(binary_score="no")

        Question Re-Writer

        It rewrites the question which contains more semantic information which is used if the answer grader tags the answer as not relevant.

        # LLM
        llm = ChatOpenAI(model="gpt-4o", temperature=0)
        
        
        # Prompt
        system = """You a question re-writer that converts an input question to a better version that is optimized \n
            for vectorstore retrieval. Look at the input and try to reason about the underlying semantic intent / meaning."""
        re_write_prompt = ChatPromptTemplate.from_messages(
           [
               ("system", system),
               (
                   "human",
                   "Here is the initial question: \n\n {question} \n Formulate an improved question.",
               ),
           ]
        )
        
        
        question_rewriter = re_write_prompt | llm | StrOutputParser()
        question_rewriter.invoke({"question": question})

        Output

        ‘What are agentic patterns and how do they influence behavior or decision-making?’

        Defining Search Tool

        Here Tavily search tool is defined which will be used to search the web

        from langchain_community.tools.tavily_search import TavilySearchResults
        
        web_search_tool = TavilySearchResults(k=3)

        Constructing the Graph using LangGraph

        Capturing the flow in as a graph.

        Defining the GraphState

        • question: query of the user
        • generation: LLM generated answer
        • documents: list of retrieved documents
        from typing import List
        from typing_extensions import TypedDict
        
        class GraphState(TypedDict):
           """
           Represents the state of our graph.
        
        
           Attributes:
               question: question
               generation: LLM generation
               documents: list of documents
           """
        
        
           question: str
           generation: str
           documents: List[str]

        Creating functions for the graphs node.

        • retrieve: Adds the retrieved documents to the graph state
        • generate: Adds the generated answer to the graph state
        • grade_documents: It updates the document key with only relevant documents
        • transform_query: Rephrases the query which capture more semantic meaning
        • web_search: It searches the web for the updated answer and updates the document key accordingly
        from langchain.schema import Document
        def retrieve(state):
           """
           Retrieve documents
           Args:
               state (dict): The current graph state
        
        
           Returns:
               state (dict): New key added to state, documents, that contains retrieved documents
           """
           print("---RETRIEVE---")
           question = state["question"]
        
        
           # Retrieval
           documents = similarity_threshold_retriever.invoke(question)
           return {"documents": documents, "question": question}
        
        
        def generate(state):
           """
           Generate answer
        
        
           Args:
               state (dict): The current graph state
        
           Returns:
               state (dict): New key added to state, generation, that contains LLM generation
           """
           print("---GENERATE---")
           question = state["question"]
           documents = state["documents"]
        
        
           # RAG generation
           generation = rag_chain.invoke({"context": documents, "question": question})
           return {"documents": documents, "question": question, "generation": generation}
        
        def grade_documents(state):
           """
           Determines whether the retrieved documents are relevant to the question.
        
        
           Args:
               state (dict): The current graph state
        
        
           Returns:
               state (dict): Updates documents key with only filtered relevant documents
           """
        
        
           print("---CHECK DOCUMENT RELEVANCE TO QUESTION---")
           question = state["question"]
           documents = state["documents"]
        
        
           # Score each doc
           filtered_docs = []
           for d in documents:
               score = retrieval_grader.invoke(
                   {"question": question, "document": d.page_content}
               )
               grade = score.binary_score
               if grade == "yes":
                   print("---GRADE: DOCUMENT RELEVANT---")
                   filtered_docs.append(d)
               else:
                   print("---GRADE: DOCUMENT NOT RELEVANT---")
                   continue
           return {"documents": filtered_docs, "question": question}
        
        def transform_query(state):
           """
           Transform the query to produce a better question.
        
        
           Args:
               state (dict): The current graph state
        
           Returns:
               state (dict): Updates question key with a re-phrased question
           """
        
        
           print("---TRANSFORM QUERY---")
           question = state["question"]
           documents = state["documents"]
        
        
           # Re-write question
           better_question = question_rewriter.invoke({"question": question})
           return {"documents": documents, "question": better_question}
        
        
        def web_search(state):
           """
           Web search based on the re-phrased question.
        
           Args:
               state (dict): The current graph state
        
        
           Returns:
               state (dict): Updates documents key with appended web results
           """
           print("---WEB SEARCH---")
           question = state["question"]
        
           # Web search
           docs = web_search_tool.invoke({"query": question})
           web_results = "\n".join([d["content"] for d in docs])
           web_results = Document(page_content=web_results)
        
        
           return {"documents": web_results, "question": question}

        Creating Edges Functions

        • route_question: It directs the question to RAG or web search
        • decide_to_generate: It denotes whether to generate the answer or rephrase it
        • grade_generation_v_documents_and_question: It determines whether the generated answer is relevant to the question
        
        def route_question(state):
           """
           Route question to web search or RAG.
        
        
           Args:
               state (dict): The current graph state
        
        
           Returns:
               str: Next node to call
           """
        
        
           print("---ROUTE QUESTION---")
           question = state["question"]
           source = question_router.invoke({"question": question})
           if source.datasource == "web_search":
               print("---ROUTE QUESTION TO WEB SEARCH---")
               return "web_search"
           elif source.datasource == "vectorstore":
               print("---ROUTE QUESTION TO RAG---")
               return "vectorstore"
        
        
        def decide_to_generate(state):
           """
           Determines whether to generate an answer, or re-generate a question.
        
           Args:
               state (dict): The current graph state
        
           Returns:
               str: Binary decision for next node to call
           """
        
        
           print("---ASSESS GRADED DOCUMENTS---")
           state["question"]
           filtered_documents = state["documents"]
        
        
           if not filtered_documents:
               # All documents have been filtered check_relevance
               # We will re-generate a new query
               print(
                   "---DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION, TRANSFORM QUERY---"
               )
               return "transform_query"
           else:
               # We have relevant documents, so generate answer
               print("---DECISION: GENERATE---")
               return "generate"
        
        
        def grade_generation_v_documents_and_question(state):
           """
           Determines whether the generation is grounded in the document and answers question.
        
        
           Args:
               state (dict): The current graph state
        
        
           Returns:
               str: Decision for next node to call
           """
        
        
           print("---CHECK HALLUCINATIONS---")
           question = state["question"]
           documents = state["documents"]
           generation = state["generation"]
        
        
           score = hallucination_grader.invoke(
               {"documents": documents, "generation": generation}
           )
           grade = score.binary_score
        
        
           # Check hallucination
           if grade == "yes":
               print("---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---")
               # Check question-answering
               print("---GRADE GENERATION vs QUESTION---")
               score = answer_grader.invoke({"question": question, "generation": generation})
               grade = score.binary_score
               if grade == "yes":
                   print("---DECISION: GENERATION ADDRESSES QUESTION---")
                   return "useful"
               else:
                   print("---DECISION: GENERATION DOES NOT ADDRESS QUESTION---")
                   return "not useful"
           else:
               pprint("---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY---")
               return "not supported"
        

        Defining the Graph Architecture

        We will look into the workflow Flow Description below:

        Initial Routing:

        • The workflow begins at the START node.
        • The route_question function determines whether to proceed with a web_search or a retrieve operation based on the incoming question.

        Data Acquisition and Processing:

        • If web_search is chosen, the system performs a web search and then moves to the generate node.
        • If retrieve is chosen, the system retrieves relevant documents from a vector store, which are then passed to the grade_documents node.
        • The grade_documents node then determines if the flow should go to transform_query or to generate.
        • If the flow goes to transform_query, then the query is transformed, and sent to the retrieve node.

        Content Generation and Evaluation:

        • The generate node creates a response based on the information gathered.
        • The grade_generation_v_documents_and_question function assesses the generated content.

        Decision and Termination:

        • If the generated content is deemed useful, the workflow reaches the END node, concluding the process.
        • If the generated content is deemed not useful, the workflow goes to transform_query node.
        from langgraph.graph import END, StateGraph, START
        
        
        workflow = StateGraph(GraphState)
        
        
        # Define the nodes
        workflow.add_node("web_search", web_search)  # web search
        workflow.add_node("retrieve", retrieve)  # retrieve
        workflow.add_node("grade_documents", grade_documents)  # grade documents
        workflow.add_node("generate", generate)  # generatae
        workflow.add_node("transform_query", transform_query)  # transform_query
        
        
        # Build graph
        workflow.add_conditional_edges(
           START,
           route_question,
           {
               "web_search": "web_search",
               "vectorstore": "retrieve",
           },
        )
        workflow.add_edge("web_search", "generate")
        workflow.add_edge("retrieve", "grade_documents")
        workflow.add_conditional_edges(
           "grade_documents",
           decide_to_generate,
           {
               "transform_query": "transform_query",
               "generate": "generate",
           },
        )
        workflow.add_edge("transform_query", "retrieve")
        workflow.add_conditional_edges(
           "generate",
           grade_generation_v_documents_and_question,
           {
               # "not supported": "generate",
               "useful": END,
               "not useful": "transform_query",
           },
        )
        
        
        # Compile
        app = workflow.compile()

        Workflow Graph

        Printing the workflow graph using draw_mermaid :

        from IPython.display import Image, display, Markdown
        display(Image(app.get_graph().draw_mermaid_png()))
        Workflow Graph

        This image depicts the workflow of Adaptive RAG

        Testing the Agent

        from pprint import pprint
        
        
        # Run
        inputs = {
           "question": "What are the types of Agentic Patterns?"
        }
        for output in app.stream(inputs):
           for key, value in output.items():
               # Node
               pprint(f"Node '{key}':")
               # Optional: print full state at each node
               # pprint.pprint(value["keys"], indent=2, width=80, depth=None)
           pprint("\n---\n")
        
        
        # Final generation
        pprint(value["generation"])

        Output

        ---ROUTE QUESTION---
        ---ROUTE QUESTION TO RAG---
        ---RETRIEVE---
        "Node 'retrieve':"
        '\n---\n'
        ---CHECK DOCUMENT RELEVANCE TO QUESTION---
        ---GRADE: DOCUMENT RELEVANT---
        ---GRADE: DOCUMENT RELEVANT---
        ---GRADE: DOCUMENT RELEVANT---
        ---ASSESS GRADED DOCUMENTS---
        ---DECISION: GENERATE---
        "Node 'grade_documents':"
        '\n---\n'
        ---GENERATE---
        ---CHECK HALLUCINATIONS---
        ---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
        ---GRADE GENERATION vs QUESTION---
        ---DECISION: GENERATION ADDRESSES QUESTION---
        "Node 'generate':"
        '\n---\n'
        ('The types of Agentic Patterns are Reflection, Tool Use, Planning, and '
         'Multi-Agent Collaboration.')
        from pprint import pprint
        
        
        # Run
        inputs = {
           "question": "What player at the IPL 2024 hit maximum sixes?"
        }
        for output in app.stream(inputs):
           for key, value in output.items():
               # Node
               pprint(f"Node '{key}':")
               # Optional: print full state at each node
               # pprint.pprint(value["keys"], indent=2, width=80, depth=None)
           pprint("\n---\n")
        
        
        # Final generation
        pprint(value["generation"])

        Output:

        ---ROUTE QUESTION---
        ---ROUTE QUESTION TO WEB SEARCH---
        ---WEB SEARCH---
        "Node 'web_search':"
        '\n---\n'
        ---GENERATE---
        ---CHECK HALLUCINATIONS---
        ---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
        ---GRADE GENERATION vs QUESTION---
        ---DECISION: GENERATION ADDRESSES QUESTION---
        "Node 'generate':"
        '\n---\n'
        ('Abhishek Sharma from Sunrisers Hyderabad hit the maximum sixes in IPL 2024, '
         'with a total of 42 sixes.')

        Hence, we can see that our Adaptive RAG is working very well utilizing its self reflection capabilities to reason about the relevancy of output generated by LLM, Also utilizing the web search tool when retrieved documents are not that good.

        Conclusion

        Adaptive RAG Systems with LangGraph is an important advancement in creating intelligent question-answering systems. It selects the best retrieval method based on how complex a question is, which helps improve both speed and accuracy. This method overcomes the limitations found in traditional language models and basic retrieval-augmented generation systems. It provides users with relevant, current, and trustworthy information.

        The examples and code resources available show how to implement Adaptive RAG Systems with LangGraph. This makes it easier to develop more effective and user-friendly AI applications. As AI technology progresses, methods like Adaptive RAG will be essential for building systems that can understand and respond to human questions more effectively.

        Harsh Mishra is an AI/ML Engineer who spends more time talking to Large Language Models than actual humans. Passionate about GenAI, NLP, and making machines smarter (so they don’t replace him just yet). When not optimizing models, he’s probably optimizing his coffee intake. 🚀☕

Login to continue reading and enjoy expert-curated content.

We will be happy to hear your thoughts

Leave a reply

Som2ny Network
Logo
Compare items
  • Total (0)
Compare
0