community

LiveAI™ for Financial Intelligence with Event-Based State Machine

Pathway Community

·Published June 12, 2025·Updated June 12, 2025·0 min read

In today’s fast-paced markets, analysts rely on AI to answer complex financial questions quickly. Retrieval-Augmented Generation (RAG) has emerged as a popular technique: it augments a language model with relevant documents from financial databases, grounding its answers in facts. For example, a typical RAG pipeline (see figure below) embeds the user’s query, searches a vector database for matching documents, and feeds those documents into an LLM to generate the response. This end‑to‑end loop forms the event‑based state machine that powers Pathway's LiveAI™ framework for Financial Intelligence.

In theory, “grounding” answers in up-to-date data improves accuracy and reduces hallucinations. However, in practice even RAG can hallucinate when it misses key information. In finance, such errors are unacceptable – a hallucinated answer about a company’s earnings or a regulatory compliance issue could lead to bad decisions. Our team set out to fix this by designing an agentic RAG pipeline for financial QA that dramatically lowers hallucinations. We combine multiple AI agents, hybrid retrieval methods, and a novel Multi-HyDE strategy to boost recall and verify answers. The result is a system that finds more relevant evidence and flags uncertainty, yielding much more reliable financial intelligence.

The Hallucination Challenge in Financial Intelligence

AI “hallucinations” occur when a model confidently states something untrue or irrelevant. Even though RAG was introduced to combat this, it can still happen. As one analysis notes, RAG should help by grounding answers in facts, but “in reality, RAG is also prone to hallucinations” if it only has unstructured or incomplete data. In fact, research shows retrieval errors are a primary source of hallucinations in RAG systems. If the retrieval step pulls irrelevant or noisy documents, the LLM has little choice but to guess, often producing an incorrect answer. This is especially troubling in finance, where queries about the latest filings or market events require precise, up-to-date information. Hallucinations can mislead analysts and erode trust in AI. Even sophisticated agentic RAG approaches still face this problem: a recent review found that “AI-generated responses still suffer from hallucinations… with unwarranted confidence” despite using agents to refine answers. To make financial intelligence safe and useful, we must do better than conventional RAG.

Agentic State-Machine Based Multi-Agent Pipelines

Our solution wraps RAG in an agentic pipeline – a network of intelligent agents that coordinate retrieval and reasoning. Rather than a single LLM call, the system involves multiple steps and specialized tools. For example, an Agentic Router first decides whether to attempt an answer or return “I don’t know,” preventing off-topic responses. Other agents handle tasks like breaking the question into sub-queries, re-ranking candidates, or even fetching structured data via an API. These agents collaborate and iterate: one agent’s findings inform the next round of search. In our prototype, the agents form a “self-improving network” that “collaborate, cross-check findings, and exchange knowledge,” iteratively refining the answer. This mimics how expert analysts work together on complex problems.

By adding this agentic layer, we inject checks that guard against hallucination. For instance, one agent might use the LLM to “judge” if the retrieved documents actually answer the query. If not, the pipeline loops: the query is reformulated or another retrieval strategy is tried. This multi-step orchestration is known to improve retrieval quality, and we found it cuts down on spurious answers. In short, our agentic pipeline treats retrieval as an interactive process, not a one-shot lookup; an essential trait of any robust event‑based state machine.

Multi-HyDE Retrieval

A core innovation in our system is Multi-HyDE, inspired by the HyDE (Hypothetical Document Embeddings) technique. HyDE works by letting an LLM generate a “hypothetical answer” for the user’s query, and then embedding that generated text to search the database. Essentially, instead of searching for the query terms themselves, it searches for documents semantically similar to a crafted answer. This often finds relevant information missed by direct keyword search, because it captures the intent of the query.

We take HyDE a step further by generating multiple hypothetical answers. For a given query, we prompt our LLM agent to produce several plausible answers or key points (covering different angles of the question). Each hypothetical answer is embedded and used to retrieve documents. This Multi-HyDE strategy broadens coverage: if one “answer” misses certain facts, another may capture them. In practice, this means our retrieval step gathers evidence from several perspectives before composing the final answer. As one of our developers put it, we implemented “a novel state-machine based Multi-HyDE retrieval” in our RAG framework. In effect, the system cycles through various candidate hypotheses, ensuring no reasonable lead is overlooked, strengthening our overall financial intelligence workflow.

How Did Pathway Help as a LiveAI™ Layer?

With Pathway, we successfully integrated data from diverse unstructured data sources into a single server-client interface, allowing for efficient searches across enterprise-scale data. This integration was supported by a robust and highly optimized Rust backend in the Pathway vector store, which ensured very low latency. Moreover, community support from various major frameworks, such as Langchain, facilitated seamless integration into our existing agentic system and event‑based state machine architecture.

Hybrid Retrieval with BM25

Along with Multi-HyDE, we combine semantic search with classic IR. We integrate BM25, a proven keyword-matching algorithm, into our retrieval ensemble. BM25 ranks documents by matching term frequency and inverse document frequency – in other words, it excels at finding documents that contain the exact query terms. This is valuable for financial text full of jargon, tickers, or legal phrases that LLM embeddings might under-emphasize. In our system, we perform both vector (neural) and BM25 retrieval, then merge the results. We set BM25’s weight (e.g. 0.5) so it contributes alongside the embedding search. This hybrid approach balances lexical precision (catching the exact financial terms) with semantic relevance (capturing broader context). In practice it pulls in documents that purely semantic search could miss. Citations from prior work show that an ensemble retriever can “effectively prioritize documents…likely to be more relevant” by combining BM25 with embedding similarity. Our experience confirms this: BM25 ensures critical keyword matches aren’t lost, further reducing hallucination risk.

Finally, our pipeline adapts its strategy on-the-fly using an Adaptive Meta-State Navigation(AMSN) technique. Think of each query’s retrieval process as a “state” the system is in; depending on feedback (e.g. no good documents found, or inconsistent answers), the agents can change course. This is like meta-learning: the pipeline trains itself to choose the right retrieval tactics. In fact, recent surveys describe how adaptive meta-learning in RAG lets the system “self-optimize retrieval strategies based on prior retrieval effectiveness”. Concretely, if our agents detect that the current search plan is failing (say, the answer’s confidence is low), they can invoke a different plan – perhaps reformulate the query, try a new database, or skip to another agent’s logic. Over time, the pipeline learns which routes work best for certain query types. This meta-level agility helps ensure that complex, multi-faceted questions get thorough treatment, further boosting answer reliability and enriching the event‑based state machine that drives LiveAI™.

Our Final Pipeline:

In our final pipeline, we have combined both the Stateless and AMSN approaches. This integration allows us to take advantage of the speed and efficiency offered by the Stateless method for one-hop queries that are primarily extractive in nature. For queries that require more reasoning and deeper analysis, the AMSN pipeline can reliably provide high-accuracy answers. This way, we benefit from the strengths of both approaches in terms of performance and capability. Most importantly, in our final pipeline we have included an explainability agent that provides with the answer, an explanation on how that answer has arrived with citations to the original documents. In this way, decisions made with our systems are more robust, grounded and supportive of advanced Financial Intelligence workflows.

User Walkthrough:

Ablation Study:

In our analysis, we compared various approaches for enhancing data retrieval and processing, finding that the combination of Multi-query, HyDE, BM25, and re-ranking techniques significantly outperformed the State-Machine based approach by 15%. This was primarily due to the sufficiency of data stored in the vector store for most queries. While the State-Machine approach showed effectiveness in cases where required information was outdated or absent, it introduced unnecessary complexity when sufficient data was available. Additionally, we observed that OpenAI embeddings proved to be faster and more accurate than Fast Embeddings, greatly improving our system's performance. We also noted that using HTML tables, which provide better structural information, led to more accurate interpretations in contrast to plain-text formats. Although incorporating retrieval tools did not enhance accuracy significantly, it helped manage structured data effectively. In terms of LLM performance, ChatGPT and Gemini excelled compared to Llama 70b in following instructions and producing valid outputs. Lastly, while the RAG system's efficiency was challenged by issues like outdated information and potential biases in the retrieved content, we implemented backup retrieval methods and robust content moderation filters to mitigate these risks. Overall, our dynamic RAG solution, seamlessly integrated with Pathway products, showcases a balance of retrieval effectiveness and flexibility, reinforcing responsible AI practices and ensuring accurate and safe outcomes for Financial Intelligence.

Conclusions

Together, these innovations pay off in real-world financial QA. In our experiments on SEC filings and market data, the enhanced RAG system dramatically improved accuracy and slashed hallucination rates compared to a vanilla pipeline. For example, our agentic Multi-HyDE approach answered correctly on a far higher fraction of tricky financial queries, and was far less likely to hallucinate facts. (Precisely, we observed a 20.2% increase in the accuracy and 15% increase in the accuracy of our answers on financial QA datasets using final pipeline as compared to Multi-HyDE). Crucially, the system also learned to defer—if the evidence was insufficient, it would output “insufficient data” rather than a fabricated answer. In practice this means analysts get fewer “bad” answers and more reliable results.

In financial services, this level of trustworthiness is a game-changer. By blending agents and multiple retrieval modes, we achieved the goal of grounded, verifiable answers. Our system can cite specific document passages or say “I don’t know” when appropriate, unlike a plain LLM. As one industry commentator noted, advancing RAG in this way directly addresses key limitations like hallucination. We also saw the benefit of iterative reasoning: the vertical agent layer effectively “cross-checks” insights, akin to an analyst team reviewing each other’s work. All told, the enhanced system delivers consistently accurate, up‑to‑date Financial Intelligence with a much higher degree of confidence – all orchestrated by an underlying Event‑Based State Machine.

If you are interested in diving deeper into the topic, here are some good references to get started with Pathway:

References:

Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. Self-rag: Learning to retrieve, generate, and critique through self-reflection, 2023. URL https://arxiv.org/abs/2310.11511.
Christoph Auer, Maksym Lysak, Ahmed Nassar, Michele Dolfi, Nikolaos Livathinos, Panos Vagenas, Cesar Berrospi Ramis, Matteo Omenetti, Fabian Lindlbauer, Kasper Dinkla, Lokesh Mishra, Yusik Kim, Shubham Gupta, Rafael Teixeira de Lima, Valery Weber, Lucas Morin, Ingmar Meijer, Viktor Kuropiatnyk, and Peter W. J. Staar. Docling technical report, 2024. URL https://arxiv.org/abs/2408.09869.
Scott Barnett, Stefanus Kurniawan, Srikanth Thudumu, Zach Brannelly, and Mohamed Abdelrazek. Seven failure points when engineering a retrieval augmented generation system. In Proceedings of the IEEE/ACM 3rd International Conference on AI Engineering-Software Engineering for AI, pp. 194–199, 2024.
Zhiyu Chen, Shiyang Li, Charese Smiley, Zhiqiang Ma, Sameena Shah, and William Yang Wang. Convfinqa: Exploring the chain of numerical reasoning in conversational finance question answering, 2022. URL https://arxiv.org/abs/2210.03849.
Matouš Eibich, Shivay Nagpal, and Alexander Fred-Ojala. Aragog: Advanced rag output grading, 2024. URL https://arxiv.org/abs/2404.01037.
Shahul Es, Jithin James, Luis Espinosa-Anke, and Steven Schockaert. Ragas: Automated evaluation of retrieval augmented generation, 2023. URL https://arxiv.org/abs/2309.15217.
Luyu Gao, Xueguang Ma, Jimmy Lin, and Jamie Callan. Precise zero-shot dense retrieval without relevance labels, 2022. URL https://arxiv.org/abs/2212.10496.
Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, and Madian Khabsa. Llama guard: Llm-based input-output safeguard for human-ai conversations, 2023. URL https://arxiv.org/abs/2312.06674.
Pranab Islam, Anand Kannappan, Douwe Kiela, Rebecca Qian, Nino Scherrer, and Bertie Vidgen. Financebench: A new benchmark for financial question answering, 2023. URL https://arxiv.org/abs/2311.11944.
Matthew Renze and Erhan Guven. Self-reflection in llm agents: Effects on problem-solving performance, 2024. URL https://arxiv.org/abs/2405.06682.
Jiejun Tan, Zhicheng Dou, Wen Wang, Mang Wang, Weipeng Chen, and Ji-Rong Wen. Htmlrag: Html is better than plain text for modeling retrieved knowledge in rag systems, 2024. URL https://arxiv.org/abs/2411.02959.
Yiran Wu, Tianwei Yue, Shaokun Zhang, Chi Wang, and Qingyun Wu. Stateflow: Enhancing llm task-solving through state-driven workflows, 2024. URL https://arxiv.org/abs/2403.11322.