blogcase-study

Building End-to-End RAG with NPCI’s AI Leader

Pathway Team

·Published April 29, 2024·Updated April 29, 2024·0 min read

Building End-to-End Open Source RAG Systems: A Practical Guide with NPCI’s AI Leader

Insights from a RAG masterclass by NPCI’s AI leader, Jayprasad Hegde

Are you looking to enhance your Large Language Models (LLMs) with domain-specific knowledge? Building an end-to-end open source Retrieval Augmented Generation (RAG) system might be the solution you've been searching for. Recently, Pathway hosted an insightful session with industry experts in collaboration with friends at IIT Kanpur and IIT BHU.

To dive deeper and see these concepts in action, you can watch the detailed session hosted by Pathway in collaboration with our friends from IIT Kanpur and IIT BHU:

Drawing from this discussion, we'll guide you through constructing an open source RAG system tailored to your needs, via this blog below.

Introduction

LLMs are incredibly powerful but often operate like they're taking a closed-book exam—they rely solely on their training data. If you want to extend their capabilities, especially for specialized applications, incorporating an end-to-end open source RAG system is essential. Enterprises and organizations operating at large scales need systems that go beyond general-purpose question-answering to provide precise, context-aware interactions.

Assumed Knowledge:

Before diving in, it helps if you're familiar with:

What Large Language Models (LLMs) are
What vector embeddings are

Need a refresher? Check out this resource by Free Realtime LLM and RAG Pipelines Bootcamp by Pathway: Free Realtime LLM and RAG Pipelines Bootcamp by Pathway.

1. Assessing Your Hardware Infrastructure for Open Source RAG

The foundation of a powerful open source RAG system lies in its hardware. GPUs are crucial for the computations that underpin LLM manipulation and large dataset processing. High-performance models like NVIDIA's H100 and RTX 4090 GPUs excel in these tasks due to their CUDA-enabled architecture.

Determining If You Have the Necessary Hardware

Before getting started, it's important to assess whether your existing hardware can support your intended LLM workloads:

Use Open Source Tools: Leverage tools like the Hugging Face hardware checker to evaluate if your setup is sufficient.
Consider In-House Hardware: For projects requiring data privacy, using in-house hardware is often preferred. Organizations operating at scale often utilize local clusters for training large workloads, ensuring data security and privacy.

Open Source Considerations

Leverage Open Source Frameworks: Utilize open source platforms and libraries to build your RAG system without incurring licensing costs.
Community Support: Open source tools often have robust communities that can provide assistance and updates.

2. Data Preprocessing: Segmentation and Tokenization

Preparing your data is a critical step. You'll need to break down your documents into manageable chunks and tokenize them—converting text into tokens that LLMs can understand.

Calculating Chunk Size

Balance Context and Processing Speed: Smaller chunks enable faster processing but may lack sufficient context. Larger chunks offer more context but require more computational resources.
Token Limits of Open Source Models: Most open source LLMs have a maximum context length. For example, models like Llama2 7B can handle up to 12k tokens but may require significant RAM.
Optimal Chunking: Perform calculations to determine the chunk size that balances context with efficiency.

3. Embeddings and Similarity Search: Grasping Contextual Nuance

Embeddings are the cornerstone of an open source RAG system's ability to understand context. They represent text as vectors in a high-dimensional space, allowing you to compare text chunks and user queries using similarity search techniques like cosine similarity.

Choosing Open Source Embedding Models

Consult Open Source Benchmarks: Use resources like the MTEB (Massive Text Embedding Benchmark) leaderboard on Hugging Face to select high-performing models.
Model Compatibility: Ensure the embedding model is compatible with your chosen LLM and suits your specific use case.

Implementing Similarity Search with Pathway's In-Memory Vector Store

Start In-Memory: According Jayprasad from NPCI, for starters it’s best to get started with in-memory vector store.
Pathway's Solution: Pathway provides an in-memory vector store for production use cases at scale that automatically adapts to changes in data sources. This allows for efficient and dynamic similarity searches without the overhead of external databases.
Explore the Demo: You can check out Pathway's Demo Q&A App Template to see how this works in practice.

4. Cost Optimization and Scalability

Building a RAG system requires balancing cost with scalability. Open source tools can significantly reduce costs, but you still need to manage hardware expenses and system requirements.

Understanding Costs

Model Selection: The choice of model impacts both performance and cost. Open source models can reduce licensing fees but may require robust hardware.
Hardware Investment: Weigh the upfront costs of hardware against long-term operational expenses.
Open Source Benefits: Utilizing open source software eliminates licensing costs, allowing you to allocate resources elsewhere.

Insight:

API costs can add up quickly, even if they seem minimal at first glance. Understanding the cost aspects and system sizing thoroughly is crucial. For example, a model like Llama2 7B handling up to 12k tokens can consume significant RAM.

Scalability Considerations

Iterative Development: Start with basic methods and scale up as needed. This approach allows you to understand system behavior and optimize effectively.
Planning for Growth: Design your infrastructure to accommodate future expansion without significant overhauls.
Performance Optimization: Continuously monitor and adjust performance to handle increasing workloads efficiently.

5. Privacy and Local Open Source RAG Systems

Local RAG systems offer significant benefits, especially concerning privacy. By using open source tools, you can build a system that keeps all data on-premises, maintaining strict control over data flow.

Assessing Local vs. Cloud Solutions

Data Privacy Regulations: Ensure compliance with legal and regulatory requirements regarding data privacy.
Control Over Data: Local systems provide greater control and reduce the risk of data breaches.
Complexity vs. Security: Weigh the challenges of managing a local open source RAG system against the security benefits it provides.

Pathway's Private RAG App Template:

For those interested in building a private RAG system, Pathway offers a Private RAG App Template that can help you set up a secure, on-premises solution tailored to your organization's needs.

6. Re-ranking and Optimization for Relevance and Accuracy

Re-ranking is essential to ensure the most relevant content is presented to users. Open source tools offer various algorithms and models to enhance this process.

Implementing Re-ranking with Pathway

Iterative Improvement: Begin with basic retrieval methods and progressively introduce more sophisticated algorithms.
Model-Assisted Re-ranking: Utilize open source models to assess and re-rank the relevance of retrieved chunks.
Pathway's Rerankers: Pathway provides tools for implementing rerankers that can significantly improve the accuracy of your RAG system. Learn more in their User Guide.
Continuous Testing: Maintain rigorous testing to ensure accuracy and reliability.

Conclusion

Building an end-to-end open source RAG system is a multifaceted journey that involves careful consideration of hardware, data processing, embeddings, search algorithms, cost, and privacy.

By thoughtfully addressing these areas, you can develop a robust, scalable, and cost-effective open source RAG system tailored to your needs. Remember:

Start Simple with Open Source Tools: Begin with in-memory solutions and basic models.
Leverage Pathway's Resources: Utilize tools and templates provided by Pathway to accelerate your development.
Iteratively Improve: Continuously refine based on performance and feedback.
Prioritize Privacy: Assess whether a local open source RAG system is necessary for your data security requirements.

Are you looking to build an enterprise-grade RAG app?

Pathway is trusted by industry leaders such as NATO and Intel, and is natively available on both AWS and Azure Marketplaces. If you’d like to explore how Pathway can support your RAG and Generative AI initiatives, we invite you to schedule a discovery session with our team.

Schedule a 15-minute demo with one of our experts to see how Pathway can be the right solution for your enterprise needs.

By focusing on open source RAG systems and understanding how to build them end-to-end, you empower yourself and your organization to leverage the full potential of LLMs while maintaining control over costs and data privacy.

Pathway Team

Power your RAG and ETL pipelines with Live Data

Get started for free

Blog

Realtime Classification with Nearest Neighbors

Blog

Machine Unlearning for LLMs: Build Apps that Self-Correct in Real-Time