RAG configuration YAML Examples
Here, you can find YAML configuration examples to help you configure your YAML template.
See our explanations and our examples about how to define the data sources in the YAML file in the dedicated article.
Pathway provides RAG classes that'll build a RAG over an index. You can interact with obtained RAG using a REST API, to perform the following operations:
- Answer a query.
- List the indexed documents.
- Retrieve the closest documents from the index based on a query.
- Obtain statistics about the indexed documents.
- Summarize a list of texts.
You can define a standard RAG pipeline using BaseRAGQuestionAnswerer
. Here are its main parameters:
llm
: a LLM Chat which defines the LLM model used.indexer
: a Document Store in which the data is indexed, used to retrieve the documents to answer the queries.search_topk
(optional): the number of documents to be included as the context of the query.prompt_template
(optional): the prompt to use for querying. The prompt must be a string with{query}
used as a placeholder for the question, and{context}
as a placeholder for context documents.
question_answerer: !pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer
llm: $llm
indexer: $document_store
search_topk: 6
prompt_template: "Given these documents: {context}, please answer the question: {query}"
Pathway provides an advanced RAG technique called Adaptive RAG that lowers the costs of the queries. Adaptive RAG first asks the LLM with a small number of context documents. If it refuses to answer, then the questions are asked again with a higher number of context documents. You can define such a pipeline using AdaptiveRAGQuestionAnswerer. Here are its main parameters:
llm
: a LLM Chat which defines the LLM model used.indexer
: a Document Store in which the data is indexed, used to retrieve the documents to answer the queries.n_starting_documents
: the number of documents retrieved for the first try.factor
: multiplicative factor for the number of retrieved document after each failed try.max_iterations
: maximum number of tries before stopping.prompt_template
(optional): the prompt to use for querying. The prompt must be a string with{query}
used as a placeholder for the question, and{context}
as a placeholder for context documents.
question_answerer: !pw.xpacks.llm.question_answering.AdaptiveRAGQuestionAnswerer
llm: $llm
indexer: $document_store
n_starting_documents: 2
factor: 2
max_iterations: 4
Pathway provides wrappers for the different LLM providers. Those wrappers, called "chats", are used to configure and call the different LLM APIs. You can learn more about them in the dedicated article.
For models from Hugging Face that you want to run locally, Pathway gives a separate wrapper called HFPipelineChat
which creates a HuggingFace pipeline
. It takes any argument of the pipeline
- including the name of the model.
llm: !pw.xpacks.llm.llms.HFPipelineChat
model: "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
Pathway wrapper for LiteLLM Chat services, LiteLLMChat
, allows to use many popular LLMs such as Gemini.
LiteLLM is compatible with several providers, and for each one there are several parameters: see the LiteLLM documentation to know which parameters you should use.
Here is a small sample of the main parameters of the LiteLLMChat
:
model
: ID of the model to use. Check the providers.retry_strategy
: Strategy for handling retries in case of failures. Defaults toNone
, meaning no retries.ExponentialBackoffRetryStrategy
: Retry strategy with exponential backoff with jitter and maximum retries.FixedDelayRetryStrategy
: Retry strategy with fixed delay and maximum retries.
cache_strategy
: Defines the caching mechanism.DefaultCache
: The default caching strategy. Persistence layer will be used if enabled. Otherwise, cache will be disabled.InMemoryCache
: In-memory LRU cache. It is not persisted between runs.
capacity
: Maximum number of concurrent operations allowed. Defaults toNone
, indicating no specific limit.temperature
: What sampling temperature to use, between 0 and 2.api_base
: API endpoint to be used for the call.
You can use a local Mistral Model using the following setup:
$llm: !pw.xpacks.llm.llms.LiteLLMChat
model: "ollama/mistral"
retry_strategy: !pw.udfs.ExponentialBackoffRetryStrategy
max_retries: 6
cache_strategy: !pw.udfs.DefaultCache
temperature: 0
api_base: "http://localhost:11434"
You can learn more about this template by visiting its associated public GitHub project.
The Pathway wrapper for OpenAI Chat services is called OpenAIChat
.
OpenAIChat allows a deep personalization thanks to its large number of parameters. Here are the main ones:
model
: ID of the model, see available OpenAI models.retry_strategy
: Strategy for handling retries in case of failures. Defaults toNone
, meaning no retries.ExponentialBackoffRetryStrategy
: Retry strategy with exponential backoff with jitter and maximum retries.FixedDelayRetryStrategy
: Retry strategy with fixed delay and maximum retries.
cache_strategy
: Defines the caching mechanism.DefaultCache
: The default caching strategy. Persistence layer will be used if enabled. Otherwise, cache will be disabled.InMemoryCache
: In-memory LRU cache. It is not persisted between runs.
capacity
: Maximum number of concurrent operations allowed. Defaults toNone
, indicating no specific limit.temperature
: What sampling temperature to use, between 0 and 2.
$llm: !pw.xpacks.llm.llms.OpenAIChat
model: "gpt-4o-mini"
retry_strategy: !pw.udfs.ExponentialBackoffRetryStrategy
max_retries: 6
cache_strategy: !pw.udfs.DefaultCache
temperature: 0
capacity: 8
Pathway DocumentStore
builds a document indexing pipeline for processing documents and querying the closest documents to a query according to a specified index.
It will take care of the parsing, splitting, and indexing of the documents.
It can be configured using the following parameters:
docs
: A list of Pathway tables which contain source documents. The table must contain adata
column of type bytes. See our explanations and our examples about how to define the data sources in the YAML file.parser
: The parser used to preprocess the documents, see the parser section.splitter
: The splitter used to preprocess the documents, see the splitter section.retriever_factory
: the index used, see the index section.
$document_store: !pw.xpacks.llm.document_store.DocumentStore
docs: $sources
parser: $parser
splitter: $splitter
retriever_factory: $retriever_factory
Pathway provides several indices for the DocumentStore
.
To use a Brute Force index, you need to use the BruteForceKnnFactory
.
It has two main parameters: embedder
and metric
.
The metric
parameter will determine how the similarity is computed between a query and a document:
- L2 distance:
!pw.stdlib.indexing.BruteForceKnnMetricKind.L2SQ
- Cosine distance:
!pw.stdlib.indexing.BruteForceKnnMetricKind.COS
$retriever_factory: !pw.indexing.BruteForceKnnFactory
reserved_space: 1000 # Reserved space for the index
embedder: $embedder
metric: !pw.stdlib.indexing.BruteForceKnnMetricKind.COS
To use the Tantivy BM25 index, use the TantivyBM25Factory
.
It has only two parameters:
ram_budget
: The maximum capacity in bytes.in_memory_index
: to set if the whole index is stored in RAM or on Pathway disk storage.
$retriever_factory: !pw.stdlib.indexing.TantivyBM25Factory
ram_budget: 1073741824 # 1GB
in_memory_index: True
To use the USearch index, use the UsearchKnnFactory.
It has two main parameters: embedder
and metric
.
The metric
parameter will determine how the similarity is computed between a query and a document:
- Ip:
!pw.stdlib.indexing.USearchMetricKind.IP
- L2 distance:
!pw.stdlib.indexing.USearchMetricKind.L2SQ
- Cosine distance:
!pw.stdlib.indexing.USearchMetricKind.COS
- Pearson:
!pw.stdlib.indexing.USearchMetricKind.PEARSON
- Haversine:
!pw.stdlib.indexing.USearchMetricKind.HAVERSINE
- Divergence:
!pw.stdlib.indexing.USearchMetricKind.DIVERGENCE
- Hamming:
!pw.stdlib.indexing.USearchMetricKind.HAMMING
- Tanimoto:
!pw.stdlib.indexing.USearchMetricKind.TANIMOTO
- Sorensen:
!pw.stdlib.indexing.USearchMetricKind.SORENSEN
$retriever_factory: !pw.stdlib.indexing.UsearchKnnFactory
metric: !pw.stdlib.indexing.USearchMetricKind.HAMMING
To use Pathway Hybrid Index, use the HybridIndexFactory
.
It takes two parameters:
retriever_factories
: The list of indices that will be used.k
: A constant used for calculating ranking score.
$retriever_factories:
- !pw.stdlib.indexing.BruteForceKnnFactory
reserved_space: 1000
embedder: $embedder
metric: !pw.stdlib.indexing.BruteForceKnnMetricKind.COS
- !pw.stdlib.indexing.TantivyBM25Factory
ram_budget: 1073741824
in_memory_index: True
$retriever_factory: !pw.stdlib.indexing.HybridIndexFactory
retriever_factories: $retriever_factories
When storing a document in a vector store, you compute the embedding vector for the text and store the vector with a reference to the original document. You can then compute the embedding of a query and find the embedded documents closest to the query. You can learn more about them in the dedicated article.
The default model for OpenAIEmbedder
is text-embedding-3-small
.
$embedder: !pw.xpacks.llm.embedders.OpenAIEmbedder
model: "text-embedding-3-small"
The model for LiteLLMEmbedder
has to be specified during initialization. No default is provided.
$embedder: !pw.xpacks.llm.embedders.LiteLLMEmbedder
model: "text-embedding-3-small"
This SentenceTransformerEmbedder
embedder allows you to use the models from the Hugging Face Sentence Transformer models.
The model is specified during initialization. Here is a list of available models
.
$embedder: !pw.xpacks.llm.embedders.SentenceTransformerEmbedder
model: "intfloat/e5-large-v2"
GeminiEmbedder
is the embedder for Google's Gemini Embedding Services. Available models can be found here
.
$embedder: !pw.xpacks.llm.embedders.GeminiEmbedder
model: "models/text-embedding-004"
Parsers play a crucial role in the Retrieval-Augmented Generation (RAG) pipeline by transforming raw, unstructured data into structured formats that can be effectively indexed, retrieved, and processed by language models. In a RAG system, data often comes from diverse sources such as documents, web pages, APIs, and databases, each with its own structure and format. Parsers help extract relevant content, normalize it into a consistent structure, and enhance the retrieval process by making information more accessible and usable. You can learn more about them in the dedicated article.
Utf8Parser
is a simple parser designed to decode text encoded in UTF-8. It ensures that raw byte-encoded content is converted into a readable string format for further processing in a RAG pipeline.
$parser: !pw.xpacks.llm.parsers.Utf8Parser
UnstructuredParser
leverages the parsing capabilities of Unstructured. It supports various document types, including PDFs, HTML, Word documents, and more.
It supports several chunking modes.
Basic
Splits text into chunks shorter than the specified max_characters
length (set via the chunking_kwargs
argument).
It also supports a soft threshold for chunk length using new_after_n_chars
.
$parser: !pw.xpacks.llm.parsers.UnstructuredParser
chunking_mode: "basic"
chunking_kwargs:
max_characters: 3000 # hard limit on number of characters in each chunk
new_after_n_chars: 2000 # soft limit on number of characters in each chunk
By Title
Similar to basic chunking but with additional constraints to split chunks at section or page breaks, resulting in more structured chunks.
$parser: !pw.xpacks.llm.parsers.UnstructuredParser
chunking_mode: "by_title"
chunking_kwargs:
max_characters: 3000 # hard limit on number of characters in each chunk
new_after_n_chars: 2000 # soft limit on number of characters in each chunk
Elements
Breaks down a document into homogeneous Unstructured elements such as Title
, NarrativeText
, Footer
, ListItem
etc.
$parser: !pw.xpacks.llm.parsers.UnstructuredParser
chunking_mode: "elements"
Paged
Collects all elements found on a single page into one chunk. Useful for documents where content is well-separated across pages.
$parser: !pw.xpacks.llm.parsers.UnstructuredParser
chunking_mode: "paged"
Single
Aggregates all Unstructured elements into a single large chunk. Use this mode when applying other chunking strategies available in Pathway or when using a custom chunking approach.
$parser: !pw.xpacks.llm.parsers.UnstructuredParser
chunking_mode: "single"
DoclingParser
is a PDF parser that utilizes the docling library to extract structured content from PDFs.
$parser: !pw.xpacks.llm.parsers.DoclingParser
Table parsing
There are two main approaches for parsing tables: (1) using Docling engine or (2) parsing using multimodal LLM. The first one will run Docling OCR on the top of table that is in the pdf and transform it into markdown format. The second one will transform the table into an image and send it to multimodal LLM and ask for parsing it. As of now we only support LLMs having same API interface as OpenAI.
In order to choose between these two you must set table_parsing_strategy
to either llm
or docling
.
If you don't want to parse tables simply set this argument to None
.
Using Dockling OCR:
$parser: !pw.xpacks.llm.parsers.DoclingParser
table_parsing_strategy: 'dockling'
Using a multimodal LLM:
$multimodal_llm: !pw.xpacks.llm.llms.OpenAIChat
model: "gpt-4o-mini"
$parser: !pw.xpacks.llm.parsers.DoclingParser
image_parsing_strategy: 'llm'
multimodal_llm: $multimodal_llm
Image parsing
You can parse images with image_parsing_strategy: "llm"
: the parser detects images within the document, processes them with a multimodal LLM (such as OpenAI's GPT-4o), and embeds its descriptions in the Markdown output. If disabled, images are replaced with placeholders.
$multimodal_llm: !pw.xpacks.llm.llms.OpenAIChat
model: "gpt-4o-mini"
$parser: !pw.xpacks.llm.parsers.DoclingParser
parse_images: True
multimodal_llm: $multimodal_llm
pdf_pipeline_options:
do_formula_enrichment: True
image_scale: 1.5
See PdfPipelineOptions
for reference of possible configuration, like OCR options, picture classification, code OCR, scientific formula enrichment, etc.
PypdfParser
is a lightweight PDF parser that utilizes the pypdf library to extract text from PDF documents.
$parser: !pw.xpacks.llm.parsers.PypdfParser
The ImageParser
parser can be used to transform image (e.g. in .png
or .jpg
format) into a textual description made by multimodal LLM. On top of that it could be used to extract structured information from the image via predefined schema.
It requires a OCR model and a prompt:
$multimodal_llm: !pw.xpacks.llm.llms.OpenAIChat
model: "gpt-4o-mini"
$prompt: "Please provide a description of the image."
$image_schema: !pw.schema_from_types
breed: str
surrounding: str
colors: str
$parser: !pw.xpacks.llm.parsers.ImageParser
llm: $multimodal_llm
parse_prompt: $prompt
detail_parse_schema: $image_schema
SlideParser
is a powerful parser designed to extract information from PowerPoint (PPTX) and PDF slide decks using vision-based LLMs.
It converts slides into images before processing them by vision LLM that tries to describe the content of a slide.
As in case of ImageParser you can also extract information specified in pydantic schema.
$multimodal_llm: !pw.xpacks.llm.llms.OpenAIChat
model: "gpt-4o-mini"
$prompt: "Please provide a description of the image."
$image_schema: !pw.schema_from_types
breed: str
surrounding: str
colors: str
$parser: !pw.xpacks.llm.parsers.SlideParser
llm: $multimodal_llm
parse_prompt: $prompt
detail_parse_schema: $image_schema
Chunking helps manage and process large documents efficiently by breaking them into smaller, more manageable pieces, improving retrieval accuracy and generation. You can learn more about the splitters in the dedicated article.
Pathway offers a TokenCountSplitter
for token-based chunking.
The list of encodings is available on OpenAI's tiktoken guide.
splitter: pw.xpacks.llm.splitters.TokenCountSplitter
min_tokes: 100
max_tokens: 500
encoding_name: "cl100k_base"
RecursiveSplitter
measures chunk length based on the number of tokens required to encode the text and processes a document by iterating through a list of ordered separators
(configurable in the constructor), starting with the most granular and moving to the least.
Its main parameters are:
chunk_size
: maximum size of a chunk in characters/tokens.chunk_overlap
: number of characters/tokens to overlap between chunks.separators
: list of strings to split the text on.encoding_name
: name of the encoding from tiktoken. For the list of available encodings please refer to tiktoken documentation.model_name
: name of the model from tiktoken. See the link above for more details.
splitter: pw.xpacks.llm.splitters.RecursiveSplitter
chunk_size: 400
chunk_overlap: 200
separators:
- "\n#
- "\n##"
- "\n\n"
- "\n"
model_name: "gpt-4o-mini"
By default, the DoclingParser also chunks the document by default.
You can turn it off simply with chunk: False
.
$parser: !pw.xpacks.llm.parsers.DoclingParser
chunk: False