Customizing LLM templates with YAML configuration files
Pathway offers a number of ready-to-use and easy to deploy LLM templates. To fit them to your needs without the need to alter Python code, you can configure them with YAML configuration files. Pathway uses a custom YAML parser to make configuring templates easier, and this guide explores capabilities of YAML parser.
Mapping tags
YAML format allows for assigning tags to key-value mappings, by prepending a chosen string with !
. In Pathway configuration files, these tags are used to reference Python objects. In case of callables, they are called with arguments taken from the mapping. For example, this can be used to define a source Table using an input connector:
source: !pw.io.fs.read
path: data
format: binary
with_metadata: true
Since classes are also callables, this syntax can also be used to initialize objects.
llm: !pw.xpacks.llm.llms.OpenAIChat
model: "gpt-3.5-turbo"
retry_strategy: !pw.udfs.ExponentialBackoffRetryStrategy
max_retries: 6
cache_strategy: !pw.udfs.DiskCache
temperature: 0.05
capacity: 8
While these examples refer to components from Pathway package, you can use it for importing any object - in particular you can use your own functions and classes.
Referencing Enums
If the Python object referred to with !
is not a callable, the associated map needs to be empty and the object is used as is. In LLM templates this is used to set value of an argument that is an enum, e.g. BruteForceKnnFactory
requires metric
argument to be given a value from pw.stdlib.indexing.BruteForceKnnMetricKind
enum:
retriever_factory: !pw.stdlib.indexing.BruteForceKnnFactory
reserved_space: 1000
embedder: $embedder
metric: !pw.stdlib.indexing.BruteForceKnnMetricKind.COS
dimensions: 1536
Defining Schemas
If you need to define a Schema in the YAML file, the easiest way to do it is to use one of the functions for inline definition of schema. For example, defining a Schema with field text
of type str
by using pw.schema_from_types
looks like this:
schema: !pw.schema_from_types
text: str
Variables
Identifiers starting with $
are given a new meaning - these denote variables to be used later in the configuration file.
$retry_strategy: !pw.udfs.ExponentialBackoffRetryStrategy
max_retries: 6
llm: !pw.xpacks.llm.llms.OpenAIChat
model: "gpt-3.5-turbo"
retry_strategy: $retry_strategy
cache_strategy: !pw.udfs.DiskCache
temperature: 0.05
capacity: 8
embedder: !pw.xpacks.llm.embedders.OpenAIEmbedder
model: "text-embedding-ada-002"
retry_strategy: $retry_strategy
cache_strategy: !pw.udfs.DiskCache
Example: Demo-Question-Answering
To see YAMLs in practice let's look at the demo-question-answering pipeline. Note, that it differs from adaptive RAG, multimodal RAG and private RAG by the YAML configuration file - their Python code is the same.
Here is the content of app.yaml
from demo-question-answering:
$sources:
- !pw.io.fs.read
path: data
format: binary
with_metadata: true
# - !pw.xpacks.connectors.sharepoint.read
# url: $SHAREPOINT_URL
# tenant: $SHAREPOINT_TENANT
# client_id: $SHAREPOINT_CLIENT_ID
# cert_path: sharepointcert.pem
# thumbprint: $SHAREPOINT_THUMBPRINT
# root_path: $SHAREPOINT_ROOT
# with_metadata: true
# refresh_interval: 30
# - !pw.io.gdrive.read
# object_id: $DRIVE_ID
# service_user_credentials_file: gdrive_indexer.json
# name_pattern:
# - "*.pdf"
# - "*.pptx"
# object_size_limit: null
# with_metadata: true
# refresh_interval: 30
$llm: !pw.xpacks.llm.llms.OpenAIChat
model: "gpt-3.5-turbo"
retry_strategy: !pw.udfs.ExponentialBackoffRetryStrategy
max_retries: 6
cache_strategy: !pw.udfs.DiskCache
temperature: 0.05
capacity: 8
$embedder: !pw.xpacks.llm.embedders.OpenAIEmbedder
model: "text-embedding-ada-002"
cache_strategy: !pw.udfs.DiskCache
$splitter: !pw.xpacks.llm.splitters.TokenCountSplitter
max_tokens: 400
$parser: !pw.xpacks.llm.parsers.ParseUnstructured
$retriever_factory: !pw.stdlib.indexing.BruteForceKnnFactory
reserved_space: 1000
embedder: $embedder
metric: !pw.stdlib.indexing.BruteForceKnnMetricKind.COS
dimensions: 1536
$document_store: !pw.xpacks.llm.document_store.DocumentStore
docs: $sources
parser: $parser
splitter: $splitter
retriever_factory: $retriever_factory
question_answerer: !pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer
llm: $llm
indexer: $document_store
# Change host and port by uncommenting these lines
# host: "0.0.0.0"
# port: 8000
# Cache configuration
# with_cache: true
# If `terminate_on_error` is true then the program will terminate whenever any error is encountered.
# Defaults to false, uncomment the following line if you want to set it to true
# terminate_on_error: true
This demo needs the question_answerer
to be defined in the configuration file, and allows to override values of host
, port
, with_cache
and terminate_on_error
. The first thing you can try to do, is to add another input connector, e.g. that connects to files from Google Drive. The stub is already present in the file, so just uncomment it and fill object_id
and service_user_credentials_file
.
$sources:
- !pw.io.fs.read
path: data
format: binary
with_metadata: true
- !pw.io.gdrive.read
object_id: FILL_YOUR_DRIVE_ID
service_user_credentials_file: FILL_PATH_TO_CREDENTIALS_FILE
name_pattern:
- "*.pdf"
- "*.pptx"
object_size_limit: null
with_metadata: true
refresh_interval: 30
$llm: !pw.xpacks.llm.llms.OpenAIChat
model: "gpt-3.5-turbo"
retry_strategy: !pw.udfs.ExponentialBackoffRetryStrategy
max_retries: 6
cache_strategy: !pw.udfs.DiskCache
temperature: 0.05
capacity: 8
$embedder: !pw.xpacks.llm.embedders.OpenAIEmbedder
model: "text-embedding-ada-002"
cache_strategy: !pw.udfs.DiskCache
$splitter: !pw.xpacks.llm.splitters.TokenCountSplitter
max_tokens: 400
$parser: !pw.xpacks.llm.parsers.ParseUnstructured
$retriever_factory: !pw.stdlib.indexing.BruteForceKnnFactory
reserved_space: 1000
embedder: $embedder
metric: !pw.stdlib.indexing.BruteForceKnnMetricKind.COS
dimensions: 1536
$document_store: !pw.xpacks.llm.document_store.DocumentStore
docs: $sources
parser: $parser
splitter: $splitter
retriever_factory: $retriever_factory
question_answerer: !pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer
llm: $llm
indexer: $document_store
If you want to change the provider of LLM models, you can change values of llm
and embedder
. By changing llm
to be LiteLLMChat
, that uses local api_base
, and embedder
to be SentenceTransformerEmbedder
you obtain a local RAG that does not call external services (this pipeline is now very similar to private RAG from the llm-app).
$sources:
- !pw.io.fs.read
path: data
format: binary
with_metadata: true
$llm_model: "ollama/mistral"
$llm: !pw.xpacks.llm.llms.LiteLLMChat
model: $llm_model
retry_strategy: !pw.udfs.ExponentialBackoffRetryStrategy
max_retries: 6
cache_strategy: !pw.udfs.DiskCache
temperature: 0
top_p: 1
format: "json" # only available in Ollama local deploy, not usable in Mistral API
api_base: "http://localhost:11434"
$embedding_model: "avsolatorio/GIST-small-Embedding-v0"
$embedder: !pw.xpacks.llm.embedders.SentenceTransformerEmbedder
model: $embedding_model
call_kwargs:
show_progress_bar: False
$splitter: !pw.xpacks.llm.splitters.TokenCountSplitter
max_tokens: 400
$parser: !pw.xpacks.llm.parsers.ParseUnstructured
$retriever_factory: !pw.stdlib.indexing.BruteForceKnnFactory
reserved_space: 1000
embedder: $embedder
metric: !pw.engine.BruteForceKnnMetricKind.COS
dimensions: 1536
$document_store: !pw.xpacks.llm.document_store.DocumentStore
docs: $sources
parser: $parser
splitter: $splitter
retriever_factory: $retriever_factory
question_answerer: !pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer
llm: $llm
indexer: $document_store
Alternatively, you may wish to improve indexing capabilities by using the HybridIndex. In this example you'll use Hybrid Index that combines vector based index - BruteForceKNN
- and index based on text search - TantivyBM25
.
$sources:
- !pw.io.fs.read
path: data
format: binary
with_metadata: true
$llm: !pw.xpacks.llm.llms.OpenAIChat
model: "gpt-3.5-turbo"
retry_strategy: !pw.udfs.ExponentialBackoffRetryStrategy
max_retries: 6
cache_strategy: !pw.udfs.DiskCache
temperature: 0.05
capacity: 8
$embedder: !pw.xpacks.llm.embedders.OpenAIEmbedder
model: "text-embedding-ada-002"
cache_strategy: !pw.udfs.DiskCache
$splitter: !pw.xpacks.llm.splitters.TokenCountSplitter
max_tokens: 400
$parser: !pw.xpacks.llm.parsers.ParseUnstructured
$knn_index: !pw.stdlib.indexing.BruteForceKnnFactory
reserved_space: 1000
embedder: $embedder
metric: !pw.engine.BruteForceKnnMetricKind.COS
dimensions: 1536
$bm25_index: !pw.stdlib.indexing.TantivyBM25Factory
$hybrid_index_factory: !pw.stdlib.indexing.HybridIndexFactory
retriever_factories:
- $knn_index
- $bm25_index
$document_store: !pw.xpacks.llm.document_store.DocumentStore
docs: $sources
parser: $parser
splitter: $splitter
retriever_factory: $hybrid_index_factory
question_answerer: !pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer
llm: $llm
indexer: $document_store
These are just a few examples, but you can use any components from LLM xpack to have a pipeline that fully meets your need!