Multi Agentic RAG & Live AI for Finance and Legal Solutions

Going Beyond Static AI: Multi Agentic RAG for Intelligent Finance and Legal Solutions
Introduction to Multi Agentic RAG
AI has evolved from basic automation to powerful LLMs, boosting productivity across industries. But while impressive, LLMs often lack accuracy, context, and real-time awareness. Retrieval-Augmented Generation (RAG) improved this by grounding responses in external data. Now, multi agentic RAG takes it further—letting AI autonomously retrieve information, reason, and act.
This is a game-changer for finance and legal sectors, where professionals face complex documents, strict regulations, and constant change. By applying multi agentic RAG, we can automate workflows, streamline compliance, and improve decision-making—reducing risk and boosting efficiency.
Recognizing these sector-specific challenges, we propose an architecture, which uniquely applies agentic RAG to automate complex finance and legal workflows—streamlining compliance, enhancing decision-making precision, and reducing costly errors.
Already have access to an LLM – Why use Multi Agentic RAG?
Suppose you're using a traditional LLM to analyze a complex legal agreement or financial compliance document. Initially, the LLM might generate a seemingly coherent summary or analysis. However, it might miss subtle but critical regulatory nuances, rely on outdated training data, or even "hallucinate" inaccurate details—potentially leading to non-compliance, legal risks, or financial penalties.
In contrast, our infrastructure doesn't rely solely on the LLM’s static knowledge. Instead, it employs Agentic RAG, autonomously recognizing the need for real-time regulatory updates or precise financial context, proactively retrieving accurate information from trusted sources, reasoning intelligently about the implications, and generating precise, actionable recommendations. With this, your AI transitions from a passive generator of content into an active, reliable partner, significantly enhancing accuracy, ensuring compliance, and reducing risks in critical finance and legal tasks.
A Quick Demonstration of Multi Agentic RAG System with Adaptive Retrieval:

Deep Dive Into the Architecture: How Does Multi Agentic RAG Work?
A solid architecture is the backbone of any reliable agentic RAG setup. It’s what keeps all the moving parts—from retrieval to generation to execution—working in sync. Without a clear structure, things get messy fast—especially when agents need to reason, act, and adapt dynamically. We’ve designed one that’s clean, scalable, and built to handle real-world complexity. Let’s break it down.

Code Repository – Complete Setup & Usage Guide
Server-Side: Intelligent Data Processing & Context-Aware Retrieval
At the core of our backend is Pathway, enabling low-latency data streaming and processing. Here's how we handle the complexity under the hood:
Smart Chunking That Understands Context
Large documents are broken down into manageable, semantically meaningful chunks using sentence-based splitting via NLTK. We enforce strict token limits to ensure model compatibility, and we've extended Pathway’s splitter classes with custom logic tailored to high-accuracy use cases.
Multi-Format Parsing That Just Works
Whether it’s a structured PDF or a messy JSON, our parser stack handles it seamlessly:
- PDFs: Structured documents go through OpenParse, while scanned or unstructured ones are processed using PyMuPDF + Tesseract OCR.
- CSVs and JSONs: We summarize structures (column names, types, and samples for CSVs), and flatten JSONs into queryable key-value pairs for instant access.
Hybrid Retrieval for Better Accuracy
We don’t just rely on one method of search—we combine the best of both worlds:
- BM25 for traditional keyword-based matching.
- Dense vector search using
nomic-ai/nomic-embed-text-v1.5
andUsearchKnn
(cosine similarity) for contextual, semantically rich lookups.

This hybrid approach, powered by Pathway’s retrievers, ensures we retrieve not just relevant—but right—information
Client-Side: From Raw Queries to Meaningful Answers
On the front end, we’ve built a system that does more than just send questions to an LLM. It acts as an intelligent orchestrator for nuanced query handling:
Smart File Handling
Users can upload PDF, TXT, CSV, or JSON files. Large files are stored in the main vector store for deep retrieval, while summarized previews are kept in a secondary store for quick lookups.
Guardrails First
Every incoming query is evaluated by a safety layer. Unsafe or inappropriate inputs are automatically flagged and responded to with predefined, responsible outputs.
Query Understanding Through Agentic Reformulation
We don’t just process queries—we understand and evolve them.
- The system reformulates user queries using the chat history and any uploaded document for deeper context.
- Then, an agent classifies them into:
- General – casual or creative questions handled directly by the LLM.
- Direct – file-based questions requiring precision, answered using the secondary store.
- Contextual – deep questions that require multi-source retrieval, potentially invoking web searches or background agents.
Adaptive Retrieval & Summarization
- For summarization, we use Latent Semantic Analysis (LSA) to condense large chunks into useful insights.
- If the context isn't enough, the system automatically triggers a web search, ensuring you always get the most accurate and complete response.
- Code execution is handled on-demand by a pair of conversing agents implemented using Autogen for queries involving scripts or technical analysis. Any plots generated are stored locally for future viewing and use.
Final Response with a Human Touch
Once the relevant information is retrieved and processed:
- The LLM generates a coherent final response, as it receives fresh information and clear instructions.
- If the query is ambiguous, the system asks the user for clarification, ensuring nothing is left to interpretation.
Latent Semantic Analysis
LSA is an advanced text analysis technique that uncovers hidden relationships between words by grouping them based on meaning rather than exact matches. It simplifies large textual datasets by converting them into a structured, mathematically driven format to reveal underlying patterns.

In our approach, we applied Latent Semantic Analysis (LSA) to revolutionize the summarization and extraction process within complex financial and legal documents. By utilizing this sophisticated semantic modeling technique, we were able to efficiently condense vast amounts of textual data—such as detailed financial disclosures, compliance documents, investment analyses, contracts, and legal briefs—into precise, insightful summaries. LSA allowed us to uncover critical relationships and latent themes, revealing hidden patterns and connections that traditional methods often miss due to sheer volume and complexity.
Our deployment of LSA significantly improved our capability to rapidly identify key financial indicators, spot emerging trends, and flag potential risks, thus empowering faster, smarter decisions. Similarly, within legal contexts, LSA enabled us to swiftly pinpoint essential clauses, interpret complex regulations, and ensure compliance more accurately than before. By harnessing this advanced technique, we not only optimized our analytical workflow but also demonstrated a pioneering approach that redefines how massive and intricate financial and legal texts are analyzed, summarized, and utilized for strategic advantage.
Code Execution and Detailed Financial Analysis
Queries which require code execution are classified into an appropriate sub-category and a pair of conversing agents - one handles the code generation / identification and the other one, equipped with a terminal-based code executor executes the code and provides the output.
The risk analysis mode doesn't just analyze documents—it thinks like a domain expert, iteratively refining its understanding and integrating new insights chunk by chunk.
Smarter Chunking: Sentence-Level Precision with Memory
Large documents—especially contracts, financial disclosures, and compliance frameworks—can’t be accurately interpreted in one pass. Our system leverages NLTK-based sentence tokenization to break the document into semantically meaningful chunks. But we go further: each chunk includes a tail of sentences from the previous segment, maintaining narrative continuity and contextual memory.
This ensures that critical cross-paragraph nuances—such as conditions, exceptions, or scope definitions—aren’t lost in fragmentation.
Domain-Specific Prompts with Historical Awareness
Each chunk is passed through a highly structured prompt system tailored to legal and financial analysis, covering areas such as:
- Legal Risks: contractual obligations, indemnities, dispute resolution, governing law, force majeure.
- Financial Risks: credit, liquidity, operational vulnerabilities, market volatility, and strategic threats.
A Quick Demo of AI in Legal & Financial Risk Analysis:

But what makes this truly powerful is contextual merging: prior responses are carried forward, ensuring the model incrementally builds a comprehensive risk map. It’s not just analyzing sections in isolation—it’s connecting the dots across the document, identifying contradictions, redundancies, or gaps in protections.
Adaptive Reasoning
This isn’t a static lookup. Our engine actively:
- Determines the type of analysis (Legal, Financial, or Combined) based on the content.
- Reconciles prior and current findings, ensuring nothing is lost or overwritten without reason.
- Highlights deltas — new risks, changes in interpretation, or shifts in liability.
- Asks for clarification when ambiguities arise, enabling human-in-the-loop review if needed.
Whether you’re evaluating a 200-page MSA or a complex financial disclosure, this mode ensures you’re not just scanning for issues—you’re surfacing deep, actionable risk intelligence.
Evaluation: How Well Does the Multi Agentic RAG Pipeline Perform?
Everything can sound impressive, but when it comes to validating a RAG pipeline, talk is cheap. That’s why we’ve got you covered with a thorough evaluation that cuts through the noise. We didn’t just build it — we rigorously tested our RAG pipeline across real-world scenarios using diverse datasets and comprehensive metrics to measure both quality and efficiency.
Datasets Used
To ensure robustness across domains and complexity levels, we tested our system on:
- Financial 10-K Reports — structured, domain-specific documents.
- SQuAD — general, open-domain QA dataset.
- Complex Legal/Financial Queries — designed to challenge context comprehension and long-form reasoning.
Metrics Tracked
We went beyond surface-level evaluation, focusing on both quality and efficiency:
- Token Usage: Total tokens (input + output) per query — directly impacting performance and cost.
- LLM Calls: Number of times the model is invoked per query.
- Processing Time: Average time taken to generate a response.
- Faithfulness (0 to 1): Does the response accurately reflect the retrieved content?
- Helpfulness (0 to 1): Is the answer actually useful and relevant to the query?
- Correctness (0 to 1): Is the information factually accurate?
Evaluation Results
➤ On Financial 10-K Queries

- LSA + Hybrid gives better quality but at higher token usage and latency.
- LSA + KNN is more efficient and slightly more faithful but less helpful overall.
➤ On Long-Context Questions

- Hybrid + Non-LSA is faster and more faithful but lacks relevance and accuracy.
- Hybrid + LSA is slower but provides a stronger balance across all metrics.
➤ On Mixed General + Financial Queries

- Both perform well, but Hybrid + LSA shines in helpfulness and correctness.
- KNN + LSA offers stronger faithfulness and slightly more efficiency.
Key Takeaways: Quality vs Efficiency Trade-off
- Using LSA boosts helpfulness and correctness but increases token usage and latency.
- Hybrid retrieval paired with summarization yields better quality answers for complex queries.
- Efficiency vs. accuracy is tunable via hyperparameters like retrieval depth and LSA chunk size — letting you balance speed, cost, and response quality based on the use case.
Summing Up the Shift
Well that was it! A robust, scalable architecture leveraging multi agentic RAG is now powering real solutions to complex financial and legal challenges—easing lives, reducing friction, and accelerating outcomes. With Pathway’s real-time data ingestion, responses remain up-to-date and context-aware, adapting as data evolves.
Layered with capabilities like code execution for real-time reasoning, intelligent query classification for agent delegation, and dynamic query reformulation based on interaction history, this system goes far beyond simple retrieval. It’s a purpose-built, full-stack intelligence layer—precise, adaptive, and ready for real-world complexity at scale.
If you are interested in diving deeper into the topic, here are some good references to get started with Pathway:
- Pathway Developer Documentation
- Pathway App Templates
- Discord Community of Pathway
- Power and Deploy RAG Agent Tools with Pathway
- End-to-end Real-time RAG app with Pathway
For more information for our concepts and frameworks which we have used, these are a good place to start with:
Authors

Pathway Community
Multiple authors