Building a Private RAG project

How will you go about it?

  • Installation: Installing Pathway and required libraries.
  • Data Loading: Loading documents for answer retrieval.
  • Embedding Model Selection: Choosing an open-source embedding model.
  • Local LLM Deployment: Deploying a local LLM using Ollama.
  • LLM Initialization: Setting up the LLM instance.
  • Vector Document Index Creation: Building an index for efficient document retrieval.
  • Retriever Setup: Defining the context retrieval strategy.
  • Pipeline Execution: Running the Private RAG pipeline.

Easily Executable Google Colab Notebook

Should you wish to directly check out the notebook, you can visit the link below. It combines the use of Adaptive RAG technique here.

Private RAG with Connected Data Sources using Mistral, Ollama, and PathwayGoogle Colab

Let's Understand the Steps

1. Installation

Install Pathway into a Python 3.10+ Linux runtime with a simple pip command:

!pip install -U --prefer-binary pathway

Next, install LiteLLM, a library of helpful Python wrappers for calling into our LLM:

!pip install "litellm>=1.35"

Lastly, install Sentence-Transformers for embedding the chunked texts:

!pip install sentence-transformers

2. Data Loading

Start by testing with a static sample of knowledge data. Download the sample:

!wget -q -nc https://public-pathway-releases.s3.eu-central-1.amazonaws.com/data/adaptive-rag-contexts.jsonl

Import necessary libraries:

import pandas as pd
import pathway as pw
from pathway.stdlib.indexing import default_vector_document_index
from pathway.xpacks.llm import embedders
from pathway.xpacks.llm.llms import LiteLLMChat
from pathway.xpacks.llm.question_answering import (
    answer_with_geometric_rag_strategy_from_index,
)

Load documents in which answers will be searched:

class InputSchema(pw.Schema):
    doc: str

documents = pw.io.fs.read(
    "adaptive-rag-contexts.jsonl",
    format="json",
    schema=InputSchema,
    json_field_paths={"doc": "/context"},
    mode="static",
)

# Check if documents are correctly loaded
# print(documents)

Create a table with example questions:

df = pd.DataFrame(
    {
        "query": [
            "When it is burned what does hydrogen make?",
            "What was undertaken in 2010 to determine where dogs originated from?",
        ]
    }
)
query = pw.debug.table_from_pandas(df)

3. Embedding Model Selection

Use pathway.xpacks.llm.embedders module to load open-source embedding models from the Hugging Face model library. For this showcase, use the avsolatorio/GIST-small-Embedding-v0 model:

embedding_model = "avsolatorio/GIST-small-Embedding-v0"

embedder = embedders.SentenceTransformerEmbedder(
    embedding_model, call_kwargs={"show_progress_bar": False}
)  # disable verbose logs
embedding_dimension: int = embedder.get_embedding_dimension()
print("Embedding dimension:", embedding_dimension)

4. Local LLM Deployment

Run the Mistral 7B Local Language Model, deployed as a service using Ollama:

  1. Download Ollama from ollama.com/download.
  2. In your terminal, run ollama serve.
  3. In another terminal, run ollama run mistral.

You can test it with the following:

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "mistral",
  "prompt": "Here is a story about llamas eating grass"
}'

5. LLM Initialization

Initialize the LLM instance to call your local model:

model = LiteLLMChat(
    model="ollama/mistral",
    temperature=0,
    top_p=1,
    api_base="http://localhost:11434",  # local deployment
    format="json",  # only available in Ollama local deploy, do not use in Mistral API
)

6. Vector Document Index Creation

Specify the index with documents and embedding model:

index = default_vector_document_index(
    documents.doc, documents, embedder=embedder, dimensions=embedding_dimension
)

7. Retriever Setup

Specify how to retrieve relevant context from the vector index for a user query. Use Adaptive RAG to adaptively add more documents as needed:

result = query.select(
    question=query.query,
    result=answer_with_geometric_rag_strategy_from_index(
        query.query,
        index,
        documents.doc,
        model,
        n_starting_documents=2,
        factor=2,
        max_iterations=4,
        strict_prompt=True,  # needed for open source models, instructs LLM to give JSON output strictly
    ),
)

8. Pipeline Execution

Run the pipeline once and print the results table with pw.debug.compute_and_print:

pw.debug.compute_and_print(result)

Example answers based on the questions provided:

Hydrogen makes water when burned [2].
Extensive genetic studies were conducted during the 2010s which indicated that dogs diverged from an extinct wolf-like canid in Eurasia around 40,000 years ago.

Going to Production

Now you have a fully private RAG set up with Pathway and Ollama. All your data remains safe on your system. The set-up is optimized for speed, thanks to how Ollama runs the LLM and how Pathway’s adaptive retrieval mechanism reduces token consumption while preserving the accuracy of the RAG.

You can now build and deploy your RAG application in production with Pathway, including updating data in constant connection with data sources and serving the endpoints 24/7. All the code logic built so far can be used directly!