(Bonus) Adaptive RAG Overview
TLDR: You can dynamically adapt the number of documents in a RAG prompt using feedback from the LLM. This allows a 4x cost reduction of RAG LLM question answering while maintaining good accuracy. The method also helps explain the lineage of LLM outputs.
Understanding Adaptive RAG
You know Retrieval Augmented Generation (RAG) allows Large Language Models (LLMs) to answer questions based on knowledge not present in the original training set.
At Pathway, we use RAG to build document intelligence solutions that answer questions based on private document collections, such as a repository of legal contracts. We are constantly working on improving the accuracy and explainability of our models while keeping costs low. Adapptive RAG is a trick that helps us achieve those goals.
It's all about Balancing Costs and Accuracy
In practical implementations, the number of documents in the prompt must balance costs, desired answer quality, and explainability. A larger number of documents increase the LLM's ability to provide a correct answer but also increases costs and can complicate model outputs.
Adaptive RAG strategy
Adaptive RAG dynamically adjusts the context size based on the question's complexity and the LLM's feedback:
- Initial Query: Ask the LLM with a small number of context documents.
- Adaptive Expansion: If the LLM refuses to answer, re-ask with a larger prompt, expanding the context size using a geometric series (doubling the number of documents each time).
Experiment Insights
We conducted experiments to analyze the accuracy and cost efficiency of this approach:
- Base RAG: Shows a typical relationship between accuracy and supporting context size.
- Error Analysis: Reveals that more context reduces "Do not know" responses but increases hallucinated answers.
- Adaptive RAG: Efficiently balances cost and accuracy by starting with a minimal prompt and expanding only when necessary.
Key Findings
- Even a single supporting document yields 68% accuracy.
- Doubling the context size reduces costs and improves accuracy incrementally.
- Overlapping prompt strategy maintains accuracy better than non-overlapping prompts.
To understand this better: Read the Blog Paper on Adaptive RAG
This might also be a good time for you to revisit the complete talk by Łukasz Kaiser (Co creator, ChatGPT, Transformer, GPT4o, TensorFlow) and Jan Chorowski (Co-author with Bengio and Hinton, Ex-Google Brain, CTO at Pathway) on the future of LLMs at a recent Pathway SF Meetup.
The talk includes various elements we've covered so far.