Introduction to the LLM xpack

The LLM xpack provides you all the tools you need to use Large Language Models in Pathway. Wrappers for most common LLM services and utilities are included, making working with LLMs as easy as it can be.

You can find ready-to-run LLM and RAG examples on our App Templates page.

In order to install Pathway along with LLM xpack functionality simply run:

`pip install "pathway[xpack-llm]"`

`pip install "pathway[all]"`

Wrappers for LLMs (LLM Chats)

Out of the box, the LLM xpack provides wrappers for text generation and embedding LLMs. For text generation, you can use native wrappers for the OpenAI, HuggingFace models, Cohere and LiteLLM (which enables you to use many other popular models, including Azure OpenAI, HuggingFace (when using their API) or Gemini.

To use a wrapper, first create an instance of the wrapper, which you can then apply to a column containing prompts. More information about the other wrappers and how to use them on the LLM Chats page.

Creating a Pathway LLM pipeline

You can now combine these wrappers to create an LLM pipeline using Pathway. To learn how to do this, read our tutorial.

Preparing documents for LLMs

The Pathway xpack for LLMs provides tools for preparing your documents and texts in order to use them with LLMs. You can use one of our UDFs like UnstructuredParser for parsing your documents into texts and TokenCountSplitter for dividing texts into smaller chunks.

Parsing documents

Use the UnstructuredParser class to parse documents in Pathway. Underneath, it utilizes the Unstructured library to parse your documents. To use it, you need to read the contents of a file into a Pathway Table using any connector of your choice. Then, apply an instance of the UnstructuredParser class to get a Pathway Table with parsed content of documents. UnstructuredParser has an argument mode which takes one of three values: single, paged or elements. If set to single, the whole document is returned as one string, if set to paged then there is a string for each page in the document, and if set to elements then Unstructured's division into elements is preserved. The mode argument can be set either during initialization or execution of UnstructuredParser.

import os
import pathway as pw
from pathway.xpacks.llm.parsers import UnstructuredParser

files = pw.io.fs.read(
    os.environ.get("DATA_DIR"),
    mode="streaming",
    format="binary",
    autocommit_duration_ms=50,
)
parser = UnstructuredParser(chunking_mode="elements")
documents = files.select(elements=parser(pw.this.data))

UnstructuredParser for a document returns a list of tuples with parsed text and associated metadata returned from Unstructured. If you want to have each string with text in another row of the table, you should use the flatten function.

documents = documents.flatten(pw.this.elements) # flatten list into multiple rows
documents = documents.select(text=pw.this.elements[0], metadata=pw.this.elements[1]) # extract text and metadata from tuple

Splitting texts

Once you have some texts in a Pathway Table, you can use the TokenCountSplitter class to divide them into smaller chunks. It tries to split the text in such a way that each part has between min_token and max_token tokens, but so that sentences are not cut in half.

TokenCountSplitter has three parameters - min_token, max_token and encoding - and each of them can be overridden during the call of the function. min_token and max_token, as mentioned above, set the minimum and maximum length of each chunk, whereas encoding is the name of the tiktoken encoding to be used.

from pathway.xpacks.llm.splitters import TokenCountSplitter

splitter = TokenCountSplitter(min_tokens=100, max_tokens=300, encoding)
texts = documents.select(chunk=splitter(pw.this.text))

TokenCountSplitter returns data in the same format as UnstructuredParser - that is for each row it returns a list of tuples, where each tuple consists of a string with the text of a chunk and a dictionary with associated metadata.

Ready-to-use Document Store

With these tools it is easy to create in Pathway a pipeline serving as a DocumentStore, which automatically indexes documents and gets updated upon new data.

To make interaction with DocumentStore easier you can also use DocumentStoreServer that handles API calls.

You can learn more about Document Store in Pathway in a dedicated tutorial and check out a QA app example in the llm-app repository.

Integrating with LlamaIndex and LangChain

Vector Store offer integrations with both LlamaIndex and LangChain. These allow you to incorporate Vector Store Client in your LlamaIndex and LangChain pipelines or use LlamaIndex and LangChain components in the Vector Store. Read more about the integrations in the article on LlamaIndex and on LangChain.

Rerankers

Rerankers evaluate the relevance of documents to a given query, commonly used in a two-stage retrieval process. Initially, a vector store retrieves a broad set of documents, typically more than needed for query context, many of which may be irrelevant. This occurs because indexing flattens a document's entire meaning into a single vector, which can reduce accuracy. Rerankers refine this by reassessing each document’s relevance. Running a reranker is more expensive than running an index-based retrieval, but it usually provides a significant improvement in accuracy.

Pathway offers three rerankers:

LLMReranker asks an LLM chat of your choice to rank the relevance of a document against a query on a scale from 1 to 5,
CrossEncoderReranker is a wrapper on CrossEncoder from the sentence_transformers,
EncoderReranker uses embeddings from the sentence_transformers library to calculate the relevance of a document against a query.

Additionally, once you rank the documents, you can use rerank_topk_filter to choose k best documents.

Modular RAGs

To combine all the pieces into a RAG (Retrieval Augmented Generation), you can use one of modular RAGs available in the LLM xpack. BaseRAGQuestionAnswerer is a standard RAG, that given a query obtains k best documents from the vector store, and sends them along the question to the LLM chat. AdaptiveRAGQuestionAnswerer tries to limit the number of documents sent to the LLM chat to save tokens - to do that it initially sends only a small number of documents to the chat, which is increased until an answer is found. To read more, why that can save tokens without sacrificing accuracy, check our showcase on Adaptive RAG.

Both these RAGs are customizable with an LLM model used to answer questions, a vector store for retrieving documents and templates for embedding context chunks in the question.

Discuss tricks & tips for RAG

Join our Discord community and dive into discussions on tricks and tips for mastering Retrieval Augmented Generation