LLM Chats

Out of the box, the LLM xpack provides wrappers for text generation and embedding LLMs. For text generation, you can use native wrappers for the OpenAI chat model and HuggingFace models running locally. Many other popular models, including Azure OpenAI, HuggingFace (when using their API) or Gemini can be used with the LiteLLM wrapper. To check the full list of providers supported by LiteLLM check LiteLLM documentation.

Currently, Pathway provides wrappers for the following LLMs:

OpenAIChat

For OpenAI, you create a wrapper using the OpenAIChat class.

chat: !pw.xpacks.llm.llms.OpenAIChat
  model: "gpt-4o-mini
  api_key: $OPENAI_API_KEY

LiteLLM

Pathway has a wrapper for LiteLLM - LiteLLMChat. For example, to use Gemini with LiteLLM, create an instance of LiteLLMChat and then apply it to the column with messages to be sent over API.

llm: !pw.xpacks.llm.llms.LiteLLMChat
  model: "gemini/gemini-pro", # Choose the model you want

With the wrapper for LiteLLM, Pathway allows you to use many popular LLMs.

Hugging Face pipeline

For models from Hugging Face that you want to run locally, Pathway gives a separate wrapper called HFPipelineChat (for calling HuggingFace through API, use LiteLLM wrapper). When an instance of this wrapper is created, it initializes a HuggingFace pipeline, so any arguments to the pipeline - including the name of the model - must be set during the initialization of HFPipelineChat. Any parameters to pipeline.__call__ can be as before set during initialization or overridden during application.

llm: !pw.xpacks.llm.llms.HFPipelineChat
  model: "TinyLlama/TinyLlama-1.1B-Chat-v1.0", # Choose the model you want

Note that format of questions used in Hugging Face pipeline depends on the model. Some models, like gpt2, expect a prompt string, whereas conversation models also accept messages as a list of dicts. The model's prompt template will be used if a conversation with a list of dicts is passed.

Note that Pathway AI pipelines expect conversation models, so models like gpt2 cannot be used.

For more information, see pipeline docs.

Wrappers are asynchronous

Wrapper for OpenAI and LiteLLM, both for chat and embedding, are asynchronous, and Pathway allows you to set three parameters to set their behavior. These are:

  • capacity, which sets the number of concurrent operations allowed,
  • retry_strategy, which sets the strategy for handling retries in case of failures,
  • cache_strategy, which defines the cache mechanism.

These three parameters need to be set during the initialization of the wrapper. You can read more about them in the UDFs guide.

chat: !pw.xpacks.llm.llms.OpenAIChat
  model: "gpt-4o-mini
  capacity: 10
  retry_strategy: !pw.udfs.ExponentialBackoffRetryStrategy
    max_retries: 5
    initial_delay: 1000
    backoff_factor: 2