Introduction
The system accepts data from receipts, bank statements, and third-party applications, then produces a consistent internal representation that supports analysis and advisory queries. These are the same questions people usually answer through manual reconciliation or with the help of a human advisor. The full project is available at https://github.com/capabl-ai/financial_advisor.
The Ingestion Layer
The foundation of the system is the ingestion layer. Financial data in practice is inconsistent by default. Receipts are poorly printed, bank exports vary across institutions, and APIs encode business logic that does not map cleanly to accounting semantics. From the start, the ingestion layer assumes three things:
- Input formats are unreliable.
- AI output is probabilistic.
- Every data point must remain traceable to its source.
All ingestion paths converge into a single normalized transaction schema backed by SQL. No source is trusted enough to bypass validation.
Visual Bill Ingestion
Visual bill ingestion is designed for convenience without compromising correctness. Traditional OCR produces text, which still requires downstream parsing and fragile heuristics. Instead, a multimodal model extracts structured fields directly from receipt images. The prompt strictly constrains the output to a predefined schema and rejects explanatory text entirely.
def generate_structured_bill(image, mime_type):
prompt = """
You are an expert financial document analyzer.
Extract the following fields from the bill image provided:
- amount (numeric)
- transaction_type (expense or income)
- transaction_date (ISO 8601 timestamp)
- description (short summary)
- category (food, travel, utilities)
Return only a JSON object with these keys.
"""
response = gemini.generate_with_image(prompt, image, mime_type)
return response
Staging and Validation
Extraction quality depends on image clarity, merchant formatting, and layout. To prevent silent ledger corruption, extracted data never enters the transaction table directly. Every bill first lands in a staging table where validation checks ensure type correctness and format consistency.
try:
raw_json_string = generate_structured_bill(image_bytes, mime_type)
structured_data = json.loads(raw_json_string)
except (json.JSONDecodeError, Exception) as e:
pending_bill = PendingBill(
user_id=current_user.id,
image_path=image_filename,
filename=file.filename,
error_message=str(e),
extracted_data=None
)
db.add(pending_bill)
return {"message": "Bill uploaded for manual review"}
This staging approach creates a hard boundary between probabilistic extraction and deterministic accounting. Human intervention is part of the expected workflow rather than an exception.
Third-Party and Bank Integrations
Expense-Sharing Platforms
Platforms such as Splitwise represent social obligations, not financial reality. A direct import would misclassify user behavior. When a user pays for a shared expense, only their portion represents personal spending; the remainder is a receivable. The ingestion logic separates three concepts: cash outflow, personal expense, and settlement balance.
user_paid_share = 0
user_net_balance = 0
for user in expense.users:
if user.id == splitwise_user_id:
user_paid_share = float(user.paid_share)
user_net_balance = float(user.net_balance)
break
Bank Statement Normalization
Bank statements introduce variability in format, column names, and debit semantics. The parser resolves this through a configurable alias system that maps headers into canonical fields. Once normalized, transactions are enriched through asynchronous categorization to ensure immediate data access for the user.
def infer_transaction_categories(descriptions):
prompt = f"""
You are a financial transaction categorizer.
Rules for Indian transactions:
- Swiggy = food
- IRCTC = travel
- Zepto = food
Transaction descriptions:
{json.dumps(descriptions)}
"""
Reasoning and Semantic Retrieval
Once transactions are normalized, the system shifts to reasoning. To support context beyond arithmetic, curated financial writing is indexed and embedded for retrieval. Source material includes books and commentary from established investors, chunked and embedded into a persistent vector store.
User transactions are embedded into a separate collection, allowing for semantic queries like "impulse purchases." This retrieves relevant transactions even without explicit tags, replacing rigid labeling with natural analysis.
Tool Execution and Safety
Language models cannot compute balances reliably. Deterministic tools bridge this gap, each encapsulating a single responsibility with a strict interface. Database connections and user identifiers are injected during tool initialization, not passed through prompts.
tools = [
AggregateSpendByCategoryTool(db=db, user_id=user_id),
SQLQueryTool(db=db, user_id=user_id)
]Constrained SQL Tooling
For flexible querying, the system uses a constrained SQL tool. The database role is read-only, and the tool enforces user scoping and SELECT-only execution.
class SQLQueryTool(BaseTool):
name = "sql_query"
description = "Execute a read-only SELECT query on user transactions."
async def _arun(self, input_dict):
sql = input_dict.get("sql", "")
if "WHERE" in sql.upper():
sql = sql.replace(
"WHERE",
f"WHERE user_id = {self.user_id} AND ",
1
)
else:
sql = sql.replace(
"FROM transactions",
f"FROM transactions WHERE user_id = {self.user_id}",
1
)
result = await self.db.execute(text(sql))
return f"query results:\n{result.fetchall()}"
Multi-Step Reasoning with LangGraph
Multi-step reasoning is handled through a state machine. Using LangGraph, the agent alternates between decision and execution until sufficient information is gathered, allowing for complex aggregation and contextual interpretation.
graph = StateGraph(AgentState)
graph.add_node("model", call_model)
graph.add_node("tools", call_tools)
def should_continue(state):
last_message = state["messages"][-1]
if hasattr(last_message, "tool_calls") and last_message.tool_calls:
return "tools"
return END
graph.add_conditional_edges("model", should_continue)
graph.add_edge("tools", "model")
Data Architecture
The data layer preserves provenance by storing all transactions in a single table with metadata describing their origin.
class Transaction(Base):
__tablename__ = "transactions"
amount = Column(Float)
date = Column(DateTime)
category = Column(String)
external_id = Column(String, index=True)
external_source = Column(String)
Goals are modeled independently from transactions. A contribution table links the two, allowing a single transaction to fund multiple goals without distorting balances. Relational (PostgreSQL) and vector storage operate in parallel to ensure both arithmetic correctness and semantic retrieval.
Conclusion
The system was built around a single constraint: financial advice must be explainable and reversible. Every number can be traced to its source, and every decision can be recomputed. By relying on structure, validation, and explicit boundaries rather than just prompts, the system meets the minimum standard required for software that interacts with money.
Inspire Others – Share Now
Agentic AI Saksham
India’s Only 1st Ever Offline Hands-on program that adds 4 Global Certificates while making you a real engineer who has built their own AI Agents
EV
Saksham
India’s Only 1st Ever Offline Hands-on program that adds 4 Global Certificates while making you a real engineer who has built their own vehicle
Agentic AI LeadCamp
From AI User to AI Agent Builder — Capabl empowers non-coding professionals to ride the AI wave in just 4 days.
Agentic AI MasterCamp
A complete deployment ready program for Developers, Freelancers & Product Managers to be Agentic AI professionals





