AI Agents are the next evolution of large language models (LLMs). They are not just chatbots that respond to a single prompt; they are intelligent systems equipped with tools, memory, and a planning loop that allows them to take complex actions and achieve multi-step goals.
This guide walks you through the foundational steps of building a simple yet powerful AI Agent using the Retrieval-Augmented Generation (RAG) pattern, which allows the agent to consult a private knowledge base before answering a question.
The Foundation: Setting Up the Environment
The first step in any AI development project is setting up the necessary libraries. We will use LangChain to orchestrate the agent's components and other libraries for data handling and model interaction.
The following code installs the core LangChain packages, including those for Google's Gemini models (langchain-google-genai), Groq's fast LLMs (langchain-groq), and PDF handling (pypdf).
!pip install -qU langchain langchain-core langchain-groq langchain-chroma langchain-google-genai langchain-community pypdfPreparing the Knowledge Base (RAG)
A key feature of a RAG agent is its ability to retrieve information from custom documents. This process involves three sub-steps: loading, chunking, and vectorizing.
A. Document Loading and Chunking
We load the target document and break it down into smaller, manageable pieces called chunks. This is crucial because LLMs have token limits, and smaller chunks allow the agent to retrieve only the most relevant passages.
We use PyPDFLoader to read the file and RecursiveCharacterTextSplitter to handle the chunking, with an overlap to maintain context between chunks.
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
# Load the document
file_path = "[YOUR_RESUME_LOCATION]"
loader = PyPDFLoader(file_path)
pages = loader.load()
# Split the document into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=100)
docs = text_splitter.split_documents(pages)B. Embedding and Vector Store
Next, we convert these text chunks into numerical representations called embeddings using a specialized embedding model (in this case, gemini-embedding-001). These embeddings are then stored in a Vector Store like Chroma, which makes fast similarity search possible.
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_chroma import Chroma
# Create embeddings
embeddings = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001", google_api_key=GEMINI_API_KEY)
# Initialize and populate the Vector Store
vector_store = Chroma(
collection_name="resume_bot_collection",
embedding_function=embeddings,
)
vector_store.add_documents(documents=docs)inally, we configure a Retriever that determines how the agent searches the vector store. Here, we use Maximum Marginal Relevance (MMR) to balance relevance with diversity in the retrieved chunks.
retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 3})Equipping the Agent with a Tool
An AI Agent’s power comes from its tools. In this RAG setup, the retriever becomes a specialized tool the LLM can decide to use when it needs external information.
We use LangChain's @tool decorator to define a function, get_resume_data, that encapsulates the retrieval logic. Crucially, the docstring provides the LLM with a clear description of when to use the tool.
from langchain_core.tools import tool
@tool
def get_resume_data(query: str) -> str:
"""
Get information about [YOUR_NAME]'s background experience and projects, from the resume.
"""
return retriever.invoke(query)The Agent's Brain: The LLM Setup
The LLM is the brain of the agent, responsible for understanding the user’s intent, deciding whether to use a tool, and generating the final response.
A. Model Initialization and Tool Binding
We initialize the LLM (using ChatGroq for high-speed inference) and then bind the custom tool we created to the model. This is how the model learns about the new capability.
from langchain_groq import ChatGroq
llm = ChatGroq(api_key=GROQ_API_KEY,
model="moonshotai/kimi-k2-instruct-0905",
temperature=0)
# Bind the tool to the LLM
llm_with_tools = llm.bind_tools([get_resume_data])Bringing it to Life: The Agent Loop
The agent loop manages the flow of conversation, including message history, system instructions, and the crucial logic for handling tool calls.
The Conversation Loop
The system message provides a core instruction, defining the agent's persona and primary rule: "If you don’t know an answer, call the tool." The loop then processes user input, checks the LLM's response for a tool_call, executes the tool (runs the RAG retrieval), and sends the tool's output back to the LLM for a final, informed response.
from langchain_core.messages import HumanMessage, AIMessage, ToolMessage, SystemMessage
messages = []
system_msg = SystemMessage("""
You are an assistant for [YOUR_NAME]. If you don’t know an answer, call the tool 'get_resume_data' with the question to retrieve info from the resume.
Answer ONLY with info retrieved from the tool.
""")
while True:
user_input = input("\nEnter your query: ").strip().lower()
if user_input in {"exit", "close", "quit"}:
break
messages.append(HumanMessage(content=user_input))
combined_message = [system_msg] + messages
ai_message = llm_with_tools.invoke(combined_message)
messages.append(ai_message)
if ai_message.tool_calls:
# Tool call detected, execute the tool
for call in ai_message.tool_calls:
# ... tool execution logic ...
tool_output = str(get_resume_data.invoke(tool_args))
messages.append(ToolMessage(tool_call_id=call["id"], content=tool_output))
# Re-invoke the LLM with the tool output for a final answer
final_response = llm_with_tools.invoke(messages)
print("\nAnswer:")
print(final_response.content)
messages.append(final_response)
else:
# No tool call, provide direct answer
print("\nAnswer:")
print(ai_message.content)This complete loop creates a robust AI Agent that can intelligently consult an external document, showcasing the core principles of AI Agent development: retrieval, tool-use, and conversational memory.




