Introduction to the LLM xpack
The LLM xpack provides you all the tools you need to use Large Language Models in Pathway. Wrappers for most common LLM services and utilities are included, making working with LLMs as easy as it can be.
You can find ready-to-run LLM and RAG examples on our App Templates page.
In order to install Pathway along with LLM xpack functionality simply run:
`pip install "pathway[xpack-llm]"`
or
`pip install "pathway[all]"`
Wrappers for LLMs (LLM Chats)
Out of the box, the LLM xpack provides wrappers for text generation and embedding LLMs. For text generation, you can use native wrappers for the OpenAI, HuggingFace models, Cohere and LiteLLM (which enables you to use many other popular models, including Azure OpenAI, HuggingFace (when using their API) or Gemini.
To use a wrapper, first create an instance of the wrapper, which you can then apply to a column containing prompts. More information about the other wrappers and how to use them on the LLM Chats page
.
Creating a Pathway LLM pipeline
You can now combine these wrappers to create an LLM pipeline using Pathway. To learn how to do this, read our tutorial.
Preparing documents for LLMs
The Pathway xpack for LLMs provides tools for preparing your documents and texts in order to use them with LLMs. You can use one of our UDFs like UnstructuredParser
for parsing your documents into texts and TokenCountSplitter
for dividing texts into smaller chunks.
Parsing documents
Use the UnstructuredParser
class to parse documents in Pathway. Underneath, it utilizes the Unstructured library to parse your documents. To use it, you need to read the contents of a file into a Pathway Table using any connector of your choice. Then, apply an instance of the UnstructuredParser
class to get a Pathway Table with parsed content of documents. UnstructuredParser
has an argument mode
which takes one of three values: single
, paged
or elements
. If set to single
, the whole document is returned as one string, if set to paged
then there is a string for each page in the document, and if set to elements
then Unstructured's division into elements is preserved. The mode
argument can be set either during initialization or execution of UnstructuredParser
.
import os
import pathway as pw
from pathway.xpacks.llm.parsers import UnstructuredParser
files = pw.io.fs.read(
os.environ.get("DATA_DIR"),
mode="streaming",
format="binary",
autocommit_duration_ms=50,
)
parser = UnstructuredParser(chunking_mode="elements")
documents = files.select(elements=parser(pw.this.data))
UnstructuredParser
for a document returns a list of tuples with parsed text and associated metadata returned from Unstructured. If you want to have each string with text in another row of the table, you should use the flatten
function.
documents = documents.flatten(pw.this.elements) # flatten list into multiple rows
documents = documents.select(text=pw.this.elements[0], metadata=pw.this.elements[1]) # extract text and metadata from tuple
Splitting texts
Once you have some texts in a Pathway Table, you can use the TokenCountSplitter
class to divide them into smaller chunks. It tries to split the text in such a way that each part has between min_token
and max_token
tokens, but so that sentences are not cut in half.
TokenCountSplitter
has three parameters - min_token
, max_token
and encoding
- and each of them can be overridden during the call of the function. min_token
and max_token
, as mentioned above, set the minimum and maximum length of each chunk, whereas encoding
is the name of the tiktoken encoding to be used.
from pathway.xpacks.llm.splitters import TokenCountSplitter
splitter = TokenCountSplitter(min_tokens=100, max_tokens=300, encoding)
texts = documents.select(chunk=splitter(pw.this.text))
TokenCountSplitter
returns data in the same format as UnstructuredParser
- that is for each row it returns a list of tuples, where each tuple consists of a string with the text of a chunk and a dictionary with associated metadata.
Ready-to-use Document Store
With these tools it is easy to create in Pathway a pipeline serving as a DocumentStore
, which automatically indexes documents and gets updated upon new data.
To make interaction with DocumentStore easier you can also use DocumentStoreServer
that handles API calls.
You can learn more about Document Store in Pathway in a dedicated tutorial and check out a QA app example in the llm-app repository.
Integrating with LlamaIndex and LangChain
Vector Store offer integrations with both LlamaIndex and LangChain. These allow you to incorporate Vector Store Client in your LlamaIndex and LangChain pipelines or use LlamaIndex and LangChain components in the Vector Store. Read more about the integrations in the article on LlamaIndex and on LangChain.
Rerankers
Rerankers evaluate the relevance of documents to a given query, commonly used in a two-stage retrieval process. Initially, a vector store retrieves a broad set of documents, typically more than needed for query context, many of which may be irrelevant. This occurs because indexing flattens a document's entire meaning into a single vector, which can reduce accuracy. Rerankers refine this by reassessing each document’s relevance. Running a reranker is more expensive than running an index-based retrieval, but it usually provides a significant improvement in accuracy.
Pathway offers three rerankers:
LLMReranker
asks an LLM chat of your choice to rank the relevance of a document against a query on a scale from 1 to 5,CrossEncoderReranker
is a wrapper on CrossEncoder from thesentence_transformers
,EncoderReranker
uses embeddings from the sentence_transformers library to calculate the relevance of a document against a query.
Additionally, once you rank the documents, you can use rerank_topk_filter
to choose k
best documents.
Modular RAGs
To combine all the pieces into a RAG (Retrieval Augmented Generation), you can use one of modular RAGs available in the LLM xpack. BaseRAGQuestionAnswerer
is a standard RAG, that given a query obtains k
best documents from the vector store, and sends them along the question to the LLM chat. AdaptiveRAGQuestionAnswerer
tries to limit the number of documents sent to the LLM chat to save tokens - to do that it initially sends only a small number of documents to the chat, which is increased until an answer is found. To read more, why that can save tokens without sacrificing accuracy, check our showcase on Adaptive RAG.
Both these RAGs are customizable with an LLM model used to answer questions, a vector store for retrieving documents and templates for embedding context chunks in the question.
Join our Discord community and dive into discussions on tricks and tips for mastering Retrieval Augmented Generation