AI Agents are the next evolution of large language models (LLMs). They are not just chatbots that respond to a single prompt; they are intelligent systems equipped with tools, memory, and a planning loop that allows them to take complex actions and achieve multi-step goals.

This guide walks you through the foundational steps of building a simple yet powerful AI Agent using the Retrieval-Augmented Generation (RAG) pattern, which allows the agent to consult a private knowledge base before answering a question.

The Foundation: Setting Up the Environment

The first step in any AI development project is setting up the necessary libraries. We will use LangChain to orchestrate the agent's components and other libraries for data handling and model interaction.

The following code installs the core LangChain packages, including those for Google's Gemini models (langchain-google-genai), Groq's fast LLMs (langchain-groq), and PDF handling (pypdf).

!pip install -qU langchain langchain-core langchain-groq langchain-chroma langchain-google-genai langchain-community pypdf

Preparing the Knowledge Base (RAG)

A key feature of a RAG agent is its ability to retrieve information from custom documents. This process involves three sub-steps: loading, chunking, and vectorizing.

A. Document Loading and Chunking

We load the target document and break it down into smaller, manageable pieces called chunks. This is crucial because LLMs have token limits, and smaller chunks allow the agent to retrieve only the most relevant passages.

We use PyPDFLoader to read the file and RecursiveCharacterTextSplitter to handle the chunking, with an overlap to maintain context between chunks.

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load the document
file_path = "[YOUR_RESUME_LOCATION]"
loader = PyPDFLoader(file_path)
pages = loader.load()

# Split the document into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=100)
docs = text_splitter.split_documents(pages)

B. Embedding and Vector Store

Next, we convert these text chunks into numerical representations called embeddings using a specialized embedding model (in this case, gemini-embedding-001). These embeddings are then stored in a Vector Store like Chroma, which makes fast similarity search possible.

from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_chroma import Chroma

# Create embeddings
embeddings = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001", google_api_key=GEMINI_API_KEY)

# Initialize and populate the Vector Store
vector_store = Chroma(
    collection_name="resume_bot_collection",
    embedding_function=embeddings,
)
vector_store.add_documents(documents=docs)

inally, we configure a Retriever that determines how the agent searches the vector store. Here, we use Maximum Marginal Relevance (MMR) to balance relevance with diversity in the retrieved chunks.

retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 3})

Equipping the Agent with a Tool

An AI Agent’s power comes from its tools. In this RAG setup, the retriever becomes a specialized tool the LLM can decide to use when it needs external information.

We use LangChain's @tool decorator to define a function, get_resume_data, that encapsulates the retrieval logic. Crucially, the docstring provides the LLM with a clear description of when to use the tool.

from langchain_core.tools import tool

@tool
def get_resume_data(query: str) -> str:
  """
  Get information about [YOUR_NAME]'s background experience and projects, from the resume.
  """
  return retriever.invoke(query)

The Agent's Brain: The LLM Setup

The LLM is the brain of the agent, responsible for understanding the user’s intent, deciding whether to use a tool, and generating the final response.

A. Model Initialization and Tool Binding

We initialize the LLM (using ChatGroq for high-speed inference) and then bind the custom tool we created to the model. This is how the model learns about the new capability.

from langchain_groq import ChatGroq

llm = ChatGroq(api_key=GROQ_API_KEY,
               model="moonshotai/kimi-k2-instruct-0905",
               temperature=0)

# Bind the tool to the LLM
llm_with_tools = llm.bind_tools([get_resume_data])

Bringing it to Life: The Agent Loop

The agent loop manages the flow of conversation, including message history, system instructions, and the crucial logic for handling tool calls.

The Conversation Loop

The system message provides a core instruction, defining the agent's persona and primary rule: "If you don’t know an answer, call the tool." The loop then processes user input, checks the LLM's response for a tool_call, executes the tool (runs the RAG retrieval), and sends the tool's output back to the LLM for a final, informed response.

from langchain_core.messages import HumanMessage, AIMessage, ToolMessage, SystemMessage

messages = []

system_msg = SystemMessage("""
You are an assistant for [YOUR_NAME]. If you don’t know an answer, call the tool 'get_resume_data' with the question to retrieve info from the resume.
Answer ONLY with info retrieved from the tool.
""")

while True:
    user_input = input("\nEnter your query: ").strip().lower()
    if user_input in {"exit", "close", "quit"}:
        break

    messages.append(HumanMessage(content=user_input))

    combined_message = [system_msg] + messages
    ai_message = llm_with_tools.invoke(combined_message)
    messages.append(ai_message)

    if ai_message.tool_calls:
        # Tool call detected, execute the tool
        for call in ai_message.tool_calls:
            # ... tool execution logic ...
            tool_output = str(get_resume_data.invoke(tool_args)) 
            messages.append(ToolMessage(tool_call_id=call["id"], content=tool_output))

        # Re-invoke the LLM with the tool output for a final answer
        final_response = llm_with_tools.invoke(messages)
        print("\nAnswer:")
        print(final_response.content)
        messages.append(final_response)
    else:
        # No tool call, provide direct answer
        print("\nAnswer:")
        print(ai_message.content)

This complete loop creates a robust AI Agent that can intelligently consult an external document, showcasing the core principles of AI Agent development: retrieval, tool-use, and conversational memory.

Inspire Others – Share Now

A Beginner's Guide to AI Agents: Building Your First RAG-Powered Assistant

Anthropic's Agent Evolution: Code Execution with MCP

The Automated AI Agent That Writes and Publishes LinkedIn Articles From Your Blogs

1. The Foundation: Setting Up the Environment

2. Preparing the Knowledge Base (RAG)

a. Document Loading and Chunking

b. Embedding and Vector Store

3. Equipping the Agent with a Tool

4. The Agent's Brain: The LLM Setup

5. Bringing it to Life: The Agent Loop

6. The Conversation Loop

A Beginner's Guide to AI Agents: Building Your First RAG-Powered Assistant

The Foundation: Setting Up the Environment

Preparing the Knowledge Base (RAG)

A. Document Loading and Chunking

B. Embedding and Vector Store

Equipping the Agent with a Tool

The Agent's Brain: The LLM Setup

A. Model Initialization and Tool Binding

Bringing it to Life: The Agent Loop

The Conversation Loop

Inspire Others – Share Now

A Beginner's Guide to AI Agents: Building Your First RAG-Powered Assistant

Anthropic's Agent Evolution: Code Execution with MCP

The Automated AI Agent That Writes and Publishes LinkedIn Articles From Your Blogs

Table of Contents

Capabl

Capabl Ecosystem