Full YAML Templates Examples
This page gathers entire YAML configuration files for Pathway Live Data Framework RAG templates. For each template, you can find an example for the different data sources: file system, SharePoint, Google Drive, and S3. You can find more example on how to configure the data sources on the dedicated page.
For further configurations, you can learn how to configure YAML templates and see our different YAML examples for each component.
Adaptive RAG
Configuration of Pathway Live Data Framework Adaptive RAG pipeline.The Pathway Live Data Framework provides an advanced RAG technique called Adaptive RAG that lowers the costs of the queries.
You can find the template on GitHub.Here are the
File System
$sources:
# File System connector, reading data locally.
- !pw.io.fs.read
path: data # Path to the data directory
format: binary # Format of the data to be read
with_metadata: true # Include metadata in the data
# Configures the LLM model settings for generating responses.
$llm: !pw.xpacks.llm.llms.OpenAIChat
model: "gpt-4o-mini"
retry_strategy: !pw.udfs.ExponentialBackoffRetryStrategy
max_retries: 6
cache_strategy: !pw.udfs.DefaultCache
temperature: 0
capacity: 8
# Specifies the embedder model for converting text into embeddings.
$embedder: !pw.xpacks.llm.embedders.OpenAIEmbedder
model: "text-embedding-ada-002"
cache_strategy: !pw.udfs.DefaultCache
# Defines the splitter settings for dividing text into smaller chunks.
$splitter: !pw.xpacks.llm.splitters.TokenCountSplitter
max_tokens: 400
# Configures the parser for processing and extracting information from documents.
$parser: !pw.xpacks.llm.parsers.DoclingParser
cache_strategy: !pw.udfs.DefaultCache
# Sets up the retriever factory for indexing and retrieving documents.
$retriever_factory: !pw.stdlib.indexing.BruteForceKnnFactory
reserved_space: 1000
embedder: $embedder
metric: !pw.stdlib.indexing.BruteForceKnnMetricKind.COS
# Manages the storage and retrieval of documents for the RAG template.
$document_store: !pw.xpacks.llm.document_store.DocumentStore
docs: $sources
parser: $parser
splitter: $splitter
retriever_factory: $retriever_factory
# Configures the question-answering component using the RAG approach.
question_answerer: !pw.xpacks.llm.question_answering.AdaptiveRAGQuestionAnswerer
llm: $llm
indexer: $document_store
n_starting_documents: 2
factor: 2
max_iterations: 4
SharePoint
$sources:
# Connect to your SharePoint data.
- !pw.xpacks.connectors.sharepoint.read
url: $SHAREPOINT_URL # URL of the SharePoint site
tenant: $SHAREPOINT_TENANT # Tenant ID for SharePoint
client_id: $SHAREPOINT_CLIENT_ID # Client ID for authentication
cert_path: sharepointcert.pem # Path to the certificate file
thumbprint: $SHAREPOINT_THUMBPRINT # Thumbprint of the certificate
root_path: $SHAREPOINT_ROOT # Root path in SharePoint
with_metadata: true # Include metadata in the data
refresh_interval: 30 # Interval to refresh data (in seconds)
# Configures the LLM model settings for generating responses.
$llm: !pw.xpacks.llm.llms.OpenAIChat
model: "gpt-4o-mini"
retry_strategy: !pw.udfs.ExponentialBackoffRetryStrategy
max_retries: 6
cache_strategy: !pw.udfs.DefaultCache
temperature: 0
capacity: 8
# Specifies the embedder model for converting text into embeddings.
$embedder: !pw.xpacks.llm.embedders.OpenAIEmbedder
model: "text-embedding-ada-002"
cache_strategy: !pw.udfs.DefaultCache
# Defines the splitter settings for dividing text into smaller chunks.
$splitter: !pw.xpacks.llm.splitters.TokenCountSplitter
max_tokens: 400
# Configures the parser for processing and extracting information from documents.
$parser: !pw.xpacks.llm.parsers.DoclingParser
cache_strategy: !pw.udfs.DefaultCache
# Sets up the retriever factory for indexing and retrieving documents.
$retriever_factory: !pw.stdlib.indexing.BruteForceKnnFactory
reserved_space: 1000
embedder: $embedder
metric: !pw.stdlib.indexing.BruteForceKnnMetricKind.COS
# Manages the storage and retrieval of documents for the RAG template.
$document_store: !pw.xpacks.llm.document_store.DocumentStore
docs: $sources
parser: $parser
splitter: $splitter
retriever_factory: $retriever_factory
# Configures the question-answering component using the RAG approach.
question_answerer: !pw.xpacks.llm.question_answering.AdaptiveRAGQuestionAnswerer
llm: $llm
indexer: $document_store
n_starting_documents: 2
factor: 2
max_iterations: 4
Google Drive
$sources:
# Connect to your data in Google Drive
- !pw.io.gdrive.read
object_id: $DRIVE_ID
service_user_credentials_file: gdrive_indexer.json
file_name_pattern:
- "*.pdf"
- "*.pptx"
object_size_limit: null
with_metadata: true
refresh_interval: 30
# Configures the LLM model settings for generating responses.
$llm: !pw.xpacks.llm.llms.OpenAIChat
model: "gpt-4o-mini"
retry_strategy: !pw.udfs.ExponentialBackoffRetryStrategy
max_retries: 6
cache_strategy: !pw.udfs.DefaultCache
temperature: 0
capacity: 8
# Specifies the embedder model for converting text into embeddings.
$embedder: !pw.xpacks.llm.embedders.OpenAIEmbedder
model: "text-embedding-ada-002"
cache_strategy: !pw.udfs.DefaultCache
# Defines the splitter settings for dividing text into smaller chunks.
$splitter: !pw.xpacks.llm.splitters.TokenCountSplitter
max_tokens: 400
# Configures the parser for processing and extracting information from documents.
$parser: !pw.xpacks.llm.parsers.DoclingParser
cache_strategy: !pw.udfs.DefaultCache
# Sets up the retriever factory for indexing and retrieving documents.
$retriever_factory: !pw.stdlib.indexing.BruteForceKnnFactory
reserved_space: 1000
embedder: $embedder
metric: !pw.stdlib.indexing.BruteForceKnnMetricKind.COS
# Manages the storage and retrieval of documents for the RAG template.
$document_store: !pw.xpacks.llm.document_store.DocumentStore
docs: $sources
parser: $parser
splitter: $splitter
retriever_factory: $retriever_factory
# Configures the question-answering component using the RAG approach.
question_answerer: !pw.xpacks.llm.question_answering.AdaptiveRAGQuestionAnswerer
llm: $llm
indexer: $document_store
n_starting_documents: 2
factor: 2
max_iterations: 4
S3
$sources:
# Connect to your data in S3
- !pw.io.s3.read
path: $path
format: "binary"
aws_s3_setting: !pw.io.s3.AwsS3Settings
bucket_name: $bucket
region: "eu-west-3"
access_key: $s3_access_key
secret_access_key: $s3_secret_access_key
# Configures the LLM model settings for generating responses.
$llm: !pw.xpacks.llm.llms.OpenAIChat
model: "gpt-4o-mini"
retry_strategy: !pw.udfs.ExponentialBackoffRetryStrategy
max_retries: 6
cache_strategy: !pw.udfs.DefaultCache
temperature: 0
capacity: 8
# Specifies the embedder model for converting text into embeddings.
$embedder: !pw.xpacks.llm.embedders.OpenAIEmbedder
model: "text-embedding-ada-002"
cache_strategy: !pw.udfs.DefaultCache
# Defines the splitter settings for dividing text into smaller chunks.
$splitter: !pw.xpacks.llm.splitters.TokenCountSplitter
max_tokens: 400
# Configures the parser for processing and extracting information from documents.
$parser: !pw.xpacks.llm.parsers.DoclingParser
cache_strategy: !pw.udfs.DefaultCache
# Sets up the retriever factory for indexing and retrieving documents.
$retriever_factory: !pw.stdlib.indexing.BruteForceKnnFactory
reserved_space: 1000
embedder: $embedder
metric: !pw.stdlib.indexing.BruteForceKnnMetricKind.COS
# Manages the storage and retrieval of documents for the RAG template.
$document_store: !pw.xpacks.llm.document_store.DocumentStore
docs: $sources
parser: $parser
splitter: $splitter
retriever_factory: $retriever_factory
# Configures the question-answering component using the RAG approach.
question_answerer: !pw.xpacks.llm.question_answering.AdaptiveRAGQuestionAnswerer
llm: $llm
indexer: $document_store
n_starting_documents: 2
factor: 2
max_iterations: 4