RAG configuration YAML Examples

Here, you can find YAML configuration examples to help you configure your YAML template.

Data Sources

See our explanations and our examples about how to define the data sources in the YAML file in the dedicated article.

Question Answering Pipelines

Pathway provides RAG classes that'll build a RAG over an index. You can interact with obtained RAG using a REST API, to perform the following operations:

Answer a query.
List the indexed documents.
Retrieve the closest documents from the index based on a query.
Obtain statistics about the indexed documents.
Summarize a list of texts.

Standard RAG

You can define a standard RAG pipeline using BaseRAGQuestionAnswerer. Here are its main parameters:

llm: a LLM Chat which defines the LLM model used.
indexer: a Document Store in which the data is indexed, used to retrieve the documents to answer the queries.
search_topk (optional): the number of documents to be included as the context of the query.
prompt_template (optional): the prompt to use for querying. The prompt must be a string with {query} used as a placeholder for the question, and {context} as a placeholder for context documents.

question_answerer: !pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer
  llm: $llm
  indexer: $document_store
  search_topk: 6
  prompt_template: "Given these documents: {context}, please answer the question: {query}"

Pathway provides an advanced RAG technique called Adaptive RAG that lowers the costs of the queries. Adaptive RAG first asks the LLM with a small number of context documents. If it refuses to answer, then the questions are asked again with a higher number of context documents. You can define such a pipeline using AdaptiveRAGQuestionAnswerer. Here are its main parameters:

llm: a LLM Chat which defines the LLM model used.
indexer: a Document Store in which the data is indexed, used to retrieve the documents to answer the queries.
n_starting_documents: the number of documents retrieved for the first try.
factor: multiplicative factor for the number of retrieved document after each failed try.
max_iterations: maximum number of tries before stopping.
prompt_template (optional): the prompt to use for querying. The prompt must be a string with {query} used as a placeholder for the question, and {context} as a placeholder for context documents.

question_answerer: !pw.xpacks.llm.question_answering.AdaptiveRAGQuestionAnswerer
  llm: $llm
  indexer: $document_store
  n_starting_documents: 2
  factor: 2
  max_iterations: 4

LLM Chats

Pathway provides wrappers for the different LLM providers. Those wrappers, called "chats", are used to configure and call the different LLM APIs. You can learn more about them in the dedicated article.

Hugging Face Chat

For models from Hugging Face that you want to run locally, Pathway gives a separate wrapper called HFPipelineChat which creates a HuggingFace pipeline. It takes any argument of the pipeline - including the name of the model.

llm: !pw.xpacks.llm.llms.HFPipelineChat
  model: "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

LiteLLM Chat

Pathway wrapper for LiteLLM Chat services, LiteLLMChat, allows to use many popular LLMs such as Gemini. LiteLLM is compatible with several providers, and for each one there are several parameters: see the LiteLLM documentation to know which parameters you should use. Here is a small sample of the main parameters of the LiteLLMChat:

model: ID of the model to use. Check the providers.
retry_strategy: Strategy for handling retries in case of failures. Defaults to None, meaning no retries.
- ExponentialBackoffRetryStrategy: Retry strategy with exponential backoff with jitter and maximum retries.
- FixedDelayRetryStrategy: Retry strategy with fixed delay and maximum retries.
cache_strategy: Defines the caching mechanism.
- DefaultCache: The default caching strategy. Persistence layer will be used if enabled. Otherwise, cache will be disabled.
- InMemoryCache: In-memory LRU cache. It is not persisted between runs.
capacity: Maximum number of concurrent operations allowed. Defaults to None, indicating no specific limit.
temperature: What sampling temperature to use, between 0 and 2.
api_base: API endpoint to be used for the call.

You can use a local Mistral Model using the following setup:

$llm: !pw.xpacks.llm.llms.LiteLLMChat
  model: "ollama/mistral"
  retry_strategy: !pw.udfs.ExponentialBackoffRetryStrategy
    max_retries: 6
  cache_strategy: !pw.udfs.DefaultCache {}
  temperature: 0
  api_base: "http://localhost:11434"

You can learn more about this template by visiting its associated public GitHub project.

OpenAI Chat

The Pathway wrapper for OpenAI Chat services is called OpenAIChat. OpenAIChat allows a deep personalization thanks to its large number of parameters. Here are the main ones:

model: ID of the model, see available OpenAI models.
retry_strategy: Strategy for handling retries in case of failures. Defaults to None, meaning no retries.
- ExponentialBackoffRetryStrategy: Retry strategy with exponential backoff with jitter and maximum retries.
- FixedDelayRetryStrategy: Retry strategy with fixed delay and maximum retries.
cache_strategy: Defines the caching mechanism.
- DefaultCache: The default caching strategy. Persistence layer will be used if enabled. Otherwise, cache will be disabled.
- InMemoryCache: In-memory LRU cache. It is not persisted between runs.
capacity: Maximum number of concurrent operations allowed. Defaults to None, indicating no specific limit.
temperature: What sampling temperature to use, between 0 and 2.

$llm: !pw.xpacks.llm.llms.OpenAIChat
  model: "gpt-4o-mini"
  retry_strategy: !pw.udfs.ExponentialBackoffRetryStrategy
    max_retries: 6
  cache_strategy: !pw.udfs.DefaultCache {}
  temperature: 0
  capacity: 8

Document Store

Pathway DocumentStore builds a document indexing pipeline for processing documents and querying the closest documents to a query according to a specified index. It will take care of the parsing, splitting, and indexing of the documents. It can be configured using the following parameters:

docs: A list of Pathway tables which contain source documents. The table must contain a data column of type bytes. See our explanations and our examples about how to define the data sources in the YAML file.
parser: The parser used to preprocess the documents, see the parser section.
splitter: The splitter used to preprocess the documents, see the splitter section.
retriever_factory: the index used, see the index section.

$document_store: !pw.xpacks.llm.document_store.DocumentStore
  docs: $sources
  parser: $parser
  splitter: $splitter
  retriever_factory: $retriever_factory

Index / Retriever Factory

Pathway provides several indices for the DocumentStore.

Brute Force Index

To use a Brute Force index, you need to use the BruteForceKnnFactory. It has two main parameters: embedder and metric. The metric parameter will determine how the similarity is computed between a query and a document:

L2 distance: !pw.stdlib.indexing.BruteForceKnnMetricKind.L2SQ
Cosine distance: !pw.stdlib.indexing.BruteForceKnnMetricKind.COS

$retriever_factory: !pw.indexing.BruteForceKnnFactory
  reserved_space: 1000           # Reserved space for the index
  embedder: $embedder
  metric: !pw.stdlib.indexing.BruteForceKnnMetricKind.COS

Tantivy BM25 Index

To use the Tantivy BM25 index, use the TantivyBM25Factory. It has only two parameters:

ram_budget: The maximum capacity in bytes.
in_memory_index: to set if the whole index is stored in RAM or on Pathway disk storage.

$retriever_factory: !pw.stdlib.indexing.TantivyBM25Factory
  ram_budget: 1073741824 # 1GB
  in_memory_index: True

USearch Index

To use the USearch index, use the UsearchKnnFactory. It has two main parameters: embedder and metric. The metric parameter will determine how the similarity is computed between a query and a document:

Ip: !pw.stdlib.indexing.USearchMetricKind.IP
L2 distance: !pw.stdlib.indexing.USearchMetricKind.L2SQ
Cosine distance: !pw.stdlib.indexing.USearchMetricKind.COS
Pearson: !pw.stdlib.indexing.USearchMetricKind.PEARSON
Haversine: !pw.stdlib.indexing.USearchMetricKind.HAVERSINE
Divergence: !pw.stdlib.indexing.USearchMetricKind.DIVERGENCE
Hamming: !pw.stdlib.indexing.USearchMetricKind.HAMMING
Tanimoto: !pw.stdlib.indexing.USearchMetricKind.TANIMOTO
Sorensen: !pw.stdlib.indexing.USearchMetricKind.SORENSEN

$retriever_factory: !pw.stdlib.indexing.UsearchKnnFactory
  metric: !pw.stdlib.indexing.USearchMetricKind.HAMMING

Hybrid Index

To use Pathway Hybrid Index, use the HybridIndexFactory. It takes two parameters:

retriever_factories : The list of indices that will be used.
k : A constant used for calculating ranking score.

$retriever_factories:
  - !pw.stdlib.indexing.BruteForceKnnFactory
    reserved_space: 1000
    embedder: $embedder
    metric: !pw.stdlib.indexing.BruteForceKnnMetricKind.COS
  - !pw.stdlib.indexing.TantivyBM25Factory
    ram_budget: 1073741824
    in_memory_index: True

$retriever_factory: !pw.stdlib.indexing.HybridIndexFactory
  retriever_factories: $retriever_factories

Embedders

When storing a document in a vector store, you compute the embedding vector for the text and store the vector with a reference to the original document. You can then compute the embedding of a query and find the embedded documents closest to the query. You can learn more about them in the dedicated article.

OpenAI Embedder

The default model for OpenAIEmbedder is text-embedding-3-small.

$embedder: !pw.xpacks.llm.embedders.OpenAIEmbedder
  model: "text-embedding-3-small"

LiteLLM Embedder

The model for LiteLLMEmbedder has to be specified during initialization. No default is provided.

$embedder: !pw.xpacks.llm.embedders.LiteLLMEmbedder
  model: "text-embedding-3-small"

SentenceTransformer Embedder

This SentenceTransformerEmbedder embedder allows you to use the models from the Hugging Face Sentence Transformer models.

The model is specified during initialization. Here is a list of available models.

$embedder: !pw.xpacks.llm.embedders.SentenceTransformerEmbedder
  model: "intfloat/e5-large-v2"

GeminiEmbedder

GeminiEmbedder is the embedder for Google's Gemini Embedding Services. Available models can be found here.

$embedder: !pw.xpacks.llm.embedders.GeminiEmbedder
  model: "models/text-embedding-004"

Parsers

Parsers play a crucial role in the Retrieval-Augmented Generation (RAG) pipeline by transforming raw, unstructured data into structured formats that can be effectively indexed, retrieved, and processed by language models. In a RAG system, data often comes from diverse sources such as documents, web pages, APIs, and databases, each with its own structure and format. Parsers help extract relevant content, normalize it into a consistent structure, and enhance the retrieval process by making information more accessible and usable. You can learn more about them in the dedicated article.

Utf8Parser

Utf8Parser is a simple parser designed to decode text encoded in UTF-8. It ensures that raw byte-encoded content is converted into a readable string format for further processing in a RAG pipeline.

$parser: !pw.xpacks.llm.parsers.Utf8Parser {}

UnstructuredParser

UnstructuredParser leverages the parsing capabilities of Unstructured. It supports various document types, including PDFs, HTML, Word documents, and more.

It supports several chunking modes.

Basic

Splits text into chunks shorter than the specified max_characters length (set via the chunking_kwargs argument). It also supports a soft threshold for chunk length using new_after_n_chars.

$parser: !pw.xpacks.llm.parsers.UnstructuredParser
  chunking_mode: "basic"
  chunking_kwargs:
    max_characters: 3000        # hard limit on number of characters in each chunk
    new_after_n_chars: 2000     # soft limit on number of characters in each chunk

By Title

Similar to basic chunking but with additional constraints to split chunks at section or page breaks, resulting in more structured chunks.

$parser: !pw.xpacks.llm.parsers.UnstructuredParser
  chunking_mode: "by_title"
  chunking_kwargs:
    max_characters: 3000        # hard limit on number of characters in each chunk
    new_after_n_chars: 2000     # soft limit on number of characters in each chunk

Elements

Breaks down a document into homogeneous Unstructured elements such as Title, NarrativeText, Footer, ListItem etc.

$parser: !pw.xpacks.llm.parsers.UnstructuredParser
  chunking_mode: "elements"

Paged

Collects all elements found on a single page into one chunk. Useful for documents where content is well-separated across pages.

$parser: !pw.xpacks.llm.parsers.UnstructuredParser
  chunking_mode: "paged"

Single

Aggregates all Unstructured elements into a single large chunk. Use this mode when applying other chunking strategies available in Pathway or when using a custom chunking approach.

$parser: !pw.xpacks.llm.parsers.UnstructuredParser
  chunking_mode: "single"

DoclingParser

DoclingParser is a PDF parser that utilizes the docling library to extract structured content from PDFs.

$parser: !pw.xpacks.llm.parsers.DoclingParser {}

Table parsing

There are two main approaches for parsing tables: (1) using Docling engine or (2) parsing using multimodal LLM. The first one will run Docling OCR on the top of table that is in the pdf and transform it into markdown format. The second one will transform the table into an image and send it to multimodal LLM and ask for parsing it. As of now we only support LLMs having same API interface as OpenAI.

In order to choose between these two you must set table_parsing_strategy to either llm or docling. If you don't want to parse tables simply set this argument to None.

Using Dockling OCR:

$parser: !pw.xpacks.llm.parsers.DoclingParser
  table_parsing_strategy: 'docling'

Using a multimodal LLM:

$multimodal_llm: !pw.xpacks.llm.llms.OpenAIChat
  model: "gpt-4o-mini"

$parser: !pw.xpacks.llm.parsers.DoclingParser
  image_parsing_strategy: 'llm'
  multimodal_llm: $multimodal_llm

Image parsing

You can parse images with image_parsing_strategy: "llm": the parser detects images within the document, processes them with a multimodal LLM (such as OpenAI's GPT-4o), and embeds its descriptions in the Markdown output. If disabled, images are replaced with placeholders.

$multimodal_llm: !pw.xpacks.llm.llms.OpenAIChat
  model: "gpt-4o-mini"

$parser: !pw.xpacks.llm.parsers.DoclingParser
  parse_images: True
  multimodal_llm: $multimodal_llm
  pdf_pipeline_options:
    do_formula_enrichment: True
    image_scale: 1.5

See PdfPipelineOptions for reference of possible configuration, like OCR options, picture classification, code OCR, scientific formula enrichment, etc.

PypdfParser

PypdfParser is a lightweight PDF parser that utilizes the pypdf library to extract text from PDF documents.

$parser: !pw.xpacks.llm.parsers.PypdfParser {}

ImageParser

The ImageParser parser can be used to transform image (e.g. in .png or .jpg format) into a textual description made by multimodal LLM. On top of that it could be used to extract structured information from the image via predefined schema. It requires a OCR model and a prompt:

$multimodal_llm: !pw.xpacks.llm.llms.OpenAIChat
  model: "gpt-4o-mini"

$prompt: "Please provide a description of the image."

$image_schema: !pw.schema_from_types
  breed: str
  surrounding: str
  colors: str

$parser: !pw.xpacks.llm.parsers.ImageParser
  llm: $multimodal_llm
  parse_prompt: $prompt
  detail_parse_schema: $image_schema

SlideParser

SlideParser is a powerful parser designed to extract information from PowerPoint (PPTX) and PDF slide decks using vision-based LLMs. It converts slides into images before processing them by vision LLM that tries to describe the content of a slide.

As in case of ImageParser you can also extract information specified in pydantic schema.

$multimodal_llm: !pw.xpacks.llm.llms.OpenAIChat
  model: "gpt-4o-mini"

$prompt: "Please provide a description of the image."

$image_schema: !pw.schema_from_types
  breed: str
  surrounding: str
  colors: str

$parser: !pw.xpacks.llm.parsers.SlideParser
  llm: $multimodal_llm
  parse_prompt: $prompt
  detail_parse_schema: $image_schema

Chunking (Splitters)

Chunking helps manage and process large documents efficiently by breaking them into smaller, more manageable pieces, improving retrieval accuracy and generation. You can learn more about the splitters in the dedicated article.

TokenCount Splitter

Pathway offers a TokenCountSplitter for token-based chunking. The list of encodings is available on OpenAI's tiktoken guide.

splitter: pw.xpacks.llm.splitters.TokenCountSplitter
  min_tokes: 100
  max_tokens: 500
  encoding_name: "cl100k_base"

Recursive Splitter

RecursiveSplitter measures chunk length based on the number of tokens required to encode the text and processes a document by iterating through a list of ordered separators (configurable in the constructor), starting with the most granular and moving to the least. Its main parameters are:

chunk_size: maximum size of a chunk in characters/tokens.
chunk_overlap: number of characters/tokens to overlap between chunks.
separators: list of strings to split the text on.
encoding_name: name of the encoding from tiktoken. For the list of available encodings please refer to tiktoken documentation.
model_name: name of the model from tiktoken. See the link above for more details.

splitter: pw.xpacks.llm.splitters.RecursiveSplitter
  chunk_size: 400
  chunk_overlap: 200
  separators:
    - "\n#
    - "\n##"
    - "\n\n"
    - "\n"
  model_name: "gpt-4o-mini"

Docling Parser

By default, the DoclingParser also chunks the document by default. You can turn it off simply with chunk: False.

$parser: !pw.xpacks.llm.parsers.DoclingParser
  chunk: False