Building a Financial Advisor: A Technical deep dive

Nav Srijan

This article argues that a reliable AI financial advisor depends on systems engineering, not chatty intelligence. The design separates AI suggestions from financial truth, validates every transaction before it affects balances, and keeps goals, money, and meaning in distinct layers. By pairing a relational database for accuracy with a vector database for understanding, and by tracking model costs at the token level, the system treats AI as a controlled component rather than an authority. The result is software that works with money safely instead of talking about it confidently.

Introduction

The system accepts data from receipts, bank statements, and third-party applications, then produces a consistent internal representation that supports analysis and advisory queries. These are the same questions people usually answer through manual reconciliation or with the help of a human advisor. The full project is available at https://github.com/capabl-ai/financial_advisor.

The Ingestion Layer

The foundation of the system is the ingestion layer. Financial data in practice is inconsistent by default. Receipts are poorly printed, bank exports vary across institutions, and APIs encode business logic that does not map cleanly to accounting semantics. From the start, the ingestion layer assumes three things:

  • Input formats are unreliable.
  • AI output is probabilistic.
  • Every data point must remain traceable to its source.

All ingestion paths converge into a single normalized transaction schema backed by SQL. No source is trusted enough to bypass validation.

Visual Bill Ingestion

Visual bill ingestion is designed for convenience without compromising correctness. Traditional OCR produces text, which still requires downstream parsing and fragile heuristics. Instead, a multimodal model extracts structured fields directly from receipt images. The prompt strictly constrains the output to a predefined schema and rejects explanatory text entirely.

def generate_structured_bill(image, mime_type):
    prompt = """
    You are an expert financial document analyzer.
    Extract the following fields from the bill image provided:
    - amount (numeric)
    - transaction_type (expense or income)
    - transaction_date (ISO 8601 timestamp)
    - description (short summary)
    - category (food, travel, utilities)

    Return only a JSON object with these keys.
    """

    response = gemini.generate_with_image(prompt, image, mime_type)
    return response

Staging and Validation

Extraction quality depends on image clarity, merchant formatting, and layout. To prevent silent ledger corruption, extracted data never enters the transaction table directly. Every bill first lands in a staging table where validation checks ensure type correctness and format consistency.

try:
    raw_json_string = generate_structured_bill(image_bytes, mime_type)
    structured_data = json.loads(raw_json_string)

except (json.JSONDecodeError, Exception) as e:
    pending_bill = PendingBill(
        user_id=current_user.id,
        image_path=image_filename,
        filename=file.filename,
        error_message=str(e),
        extracted_data=None
    )
    db.add(pending_bill)
    return {"message": "Bill uploaded for manual review"}

This staging approach creates a hard boundary between probabilistic extraction and deterministic accounting. Human intervention is part of the expected workflow rather than an exception.

Third-Party and Bank Integrations

Expense-Sharing Platforms

Platforms such as Splitwise represent social obligations, not financial reality. A direct import would misclassify user behavior. When a user pays for a shared expense, only their portion represents personal spending; the remainder is a receivable. The ingestion logic separates three concepts: cash outflow, personal expense, and settlement balance.

user_paid_share = 0
user_net_balance = 0

for user in expense.users:
    if user.id == splitwise_user_id:
        user_paid_share = float(user.paid_share)
        user_net_balance = float(user.net_balance)
        break

Bank Statement Normalization

Bank statements introduce variability in format, column names, and debit semantics. The parser resolves this through a configurable alias system that maps headers into canonical fields. Once normalized, transactions are enriched through asynchronous categorization to ensure immediate data access for the user.

def infer_transaction_categories(descriptions):
    prompt = f"""
    You are a financial transaction categorizer.

    Rules for Indian transactions:
    - Swiggy = food
    - IRCTC = travel
    - Zepto = food

    Transaction descriptions:
    {json.dumps(descriptions)}
    """

Reasoning and Semantic Retrieval

Once transactions are normalized, the system shifts to reasoning. To support context beyond arithmetic, curated financial writing is indexed and embedded for retrieval. Source material includes books and commentary from established investors, chunked and embedded into a persistent vector store.

User transactions are embedded into a separate collection, allowing for semantic queries like "impulse purchases." This retrieves relevant transactions even without explicit tags, replacing rigid labeling with natural analysis.

Tool Execution and Safety

Language models cannot compute balances reliably. Deterministic tools bridge this gap, each encapsulating a single responsibility with a strict interface. Database connections and user identifiers are injected during tool initialization, not passed through prompts.

tools = [
    AggregateSpendByCategoryTool(db=db, user_id=user_id),
    SQLQueryTool(db=db, user_id=user_id)
]

Constrained SQL Tooling

For flexible querying, the system uses a constrained SQL tool. The database role is read-only, and the tool enforces user scoping and SELECT-only execution.

class SQLQueryTool(BaseTool):
    name = "sql_query"
    description = "Execute a read-only SELECT query on user transactions."

    async def _arun(self, input_dict):
        sql = input_dict.get("sql", "")

        if "WHERE" in sql.upper():
            sql = sql.replace(
                "WHERE",
                f"WHERE user_id = {self.user_id} AND ",
                1
            )
        else:
            sql = sql.replace(
                "FROM transactions",
                f"FROM transactions WHERE user_id = {self.user_id}",
                1
            )

        result = await self.db.execute(text(sql))
        return f"query results:\n{result.fetchall()}"

Multi-Step Reasoning with LangGraph

Multi-step reasoning is handled through a state machine. Using LangGraph, the agent alternates between decision and execution until sufficient information is gathered, allowing for complex aggregation and contextual interpretation.

graph = StateGraph(AgentState)

graph.add_node("model", call_model)
graph.add_node("tools", call_tools)

def should_continue(state):
    last_message = state["messages"][-1]
    if hasattr(last_message, "tool_calls") and last_message.tool_calls:
        return "tools"
    return END

graph.add_conditional_edges("model", should_continue)
graph.add_edge("tools", "model")


Data Architecture

The data layer preserves provenance by storing all transactions in a single table with metadata describing their origin.

class Transaction(Base):
    __tablename__ = "transactions"

    amount = Column(Float)
    date = Column(DateTime)
    category = Column(String)

    external_id = Column(String, index=True)
    external_source = Column(String)

Goals are modeled independently from transactions. A contribution table links the two, allowing a single transaction to fund multiple goals without distorting balances. Relational (PostgreSQL) and vector storage operate in parallel to ensure both arithmetic correctness and semantic retrieval.

Conclusion

The system was built around a single constraint: financial advice must be explainable and reversible. Every number can be traced to its source, and every decision can be recomputed. By relying on structure, validation, and explicit boundaries rather than just prompts, the system meets the minimum standard required for software that interacts with money.

Inspire Others – Share Now

Table of Contents

  • Introduction | Verifiability and Project Goals
  • Ingestion Layer | Handling Unreliable Data Sources
  • Structured Extraction | Visual Document Parsing and Staging
  • Integrations | Normalizing Splitwise and Bank Statements
  • Semantic Reasoning | Vector Stores and Context Retrieval
  • Deterministic Tools | Secure SQL Execution and Tooling
  • Agent Orchestration | State Management with LangGraph
  • Data Architecture | Unified Ledger and Goal Modeling
  • Conclusion | Traceability as the Minimum Standard