pw.xpacks.llm.embedders

Pathway embedder UDFs.

class pw.xpacks.llm.embedders.GeminiEmbedder(*, capacity=None, retry_strategy=None, cache_strategy=None, model='models/embedding-001', api_key=None, **gemini_kwargs)

[source]

Pathway wrapper for Google Gemini Embedding services.

The capacity, retry_strategy and cache_strategy need to be specified during object construction. All other arguments can be overridden during application.

  • Parameters
    • capacity (int | None) – Maximum number of concurrent operations allowed. Defaults to None, indicating no specific limit.
    • retry_strategy (AsyncRetryStrategy | None) – Strategy for handling retries in case of failures. Defaults to None, meaning no retries.
    • cache_strategy (CacheStrategy | None) – Defines the caching mechanism. To enable caching, a valid CacheStrategy should be provided. See Cache strategy for more information. Defaults to None.
    • model (str | None) – ID of the model to use. Check the Gemini documentation for list of available models. To specify the model in the UDF call, set it to None in the constructor.
    • api_key (str | None) – API key for Gemini API services. Can be provided in the constructor, in __call__ or by setting GOOGLE_API_KEY environment variable
    • gemini_kwargs – any other arguments accepted by gemini embedding service. Check the Gemini documentation for list of accepted arguments.

Example:

import pathway as pw
from pathway.xpacks.llm import embedders
embedder = embedders.GeminiEmbedder(model="models/text-embedding-004")
t = pw.debug.table_from_markdown('''
txt
Text
''')
t.select(ret=embedder(pw.this.txt))
import pathway as pw
from pathway.xpacks.llm import embedders
embedder = embedders.GeminiEmbedder()
t = pw.debug.table_from_markdown('''
txt  | model
Text | models/embedding-001
''')
t.select(ret=embedder(pw.this.txt, model=pw.this.model))

__call__(input, *args, **kwargs)

sourceEmbeds texts in a Column.

get_embedding_dimension(**kwargs)

sourceComputes number of embedder’s dimensions by asking the embedder to embed ".".

  • Parameters
    **kwargs – parameters of the embedder, if unset defaults from the constructor will be taken.

class pw.xpacks.llm.embedders.LiteLLMEmbedder(*, capacity=None, retry_strategy=None, cache_strategy=None, model=None, **llmlite_kwargs)

[source]

Pathway wrapper for litellm.embedding.

Model has to be specified either in constructor call or in each application, no default is provided. The capacity, retry_strategy and cache_strategy need to be specified during object construction. All other arguments can be overridden during application.

  • Parameters
    • capacity (int | None) – Maximum number of concurrent operations allowed. Defaults to None, indicating no specific limit.
    • retry_strategy (AsyncRetryStrategy | None) – Strategy for handling retries in case of failures. Defaults to None, meaning no retries.
    • cache_strategy (CacheStrategy | None) – Defines the caching mechanism. To enable caching, a valid CacheStrategy should be provided. See Cache strategy for more information. Defaults to None.
    • model (str | None) – The embedding model to use.
    • timeout – The timeout value for the API call, default 10 mins
    • litellm_call_id – The call ID for litellm logging.
    • litellm_logging_obj – The litellm logging object.
    • logger_fn – The logger function.
    • api_base – Optional. The base URL for the API.
    • api_version – Optional. The version of the API.
    • api_key – Optional. The API key to use.
    • api_type – Optional. The type of the API.
    • custom_llm_provider – The custom llm provider.

Any arguments can be provided either to the constructor or in the UDF call. To specify the model in the UDF call, set it to None.

Example:

import pathway as pw
from pathway.xpacks.llm import embedders
embedder = embedders.LiteLLMEmbedder(model="text-embedding-ada-002")
t = pw.debug.table_from_markdown('''
txt
Text
''')
t.select(ret=embedder(pw.this.txt))
import pathway as pw
from pathway.xpacks.llm import embedders
embedder = embedders.LiteLLMEmbedder()
t = pw.debug.table_from_markdown('''
txt  | model
Text | text-embedding-ada-002
''')
t.select(ret=embedder(pw.this.txt, model=pw.this.model))

__call__(input, *args, **kwargs)

sourceEmbeds texts in a Column.

get_embedding_dimension(**kwargs)

sourceComputes number of embedder’s dimensions by asking the embedder to embed ".".

  • Parameters
    **kwargs – parameters of the embedder, if unset defaults from the constructor will be taken.

class pw.xpacks.llm.embedders.OpenAIEmbedder(*, capacity=None, retry_strategy=None, cache_strategy=None, model='text-embedding-ada-002', **openai_kwargs)

[source]

Pathway wrapper for OpenAI Embedding services.

The capacity, retry_strategy and cache_strategy need to be specified during object construction. All other arguments can be overridden during application.

  • Parameters
    • capacity (int | None) – Maximum number of concurrent operations allowed. Defaults to None, indicating no specific limit.
    • retry_strategy (AsyncRetryStrategy | None) – Strategy for handling retries in case of failures. Defaults to None, meaning no retries.
    • cache_strategy (CacheStrategy | None) – Defines the caching mechanism. To enable caching, a valid CacheStrategy should be provided. See Cache strategy for more information. Defaults to None.
    • model (str | None) – ID of the model to use. You can use the List models API to see all of your available models, or see Model overview for descriptions of them.
    • encoding_format – The format to return the embeddings in. Can be either float or base64.
    • user – A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse. Learn more.
    • extra_headers – Send extra headers
    • extra_query – Add additional query parameters to the request
    • extra_body – Add additional JSON properties to the request
    • timeout – Timeout for requests, in seconds

Any arguments can be provided either to the constructor or in the UDF call. To specify the model in the UDF call, set it to None.

Example:

import pathway as pw
from pathway.xpacks.llm import embedders
embedder = embedders.OpenAIEmbedder(model="text-embedding-ada-002")
t = pw.debug.table_from_markdown('''
txt
Text
''')
t.select(ret=embedder(pw.this.txt))
import pathway as pw
from pathway.xpacks.llm import embedders
embedder = embedders.OpenAIEmbedder()
t = pw.debug.table_from_markdown('''
txt  | model
Text | text-embedding-ada-002
''')
t.select(ret=embedder(pw.this.txt, model=pw.this.model))

__call__(input, *args, **kwargs)

sourceEmbeds texts in a Column.

get_embedding_dimension(**kwargs)

sourceComputes number of embedder’s dimensions by asking the embedder to embed ".".

  • Parameters
    **kwargs – parameters of the embedder, if unset defaults from the constructor will be taken.

class pw.xpacks.llm.embedders.SentenceTransformerEmbedder(model, call_kwargs={}, device='cpu', **sentencetransformer_kwargs)

[source]

Pathway wrapper for Sentence-Transformers embedder.

  • Parameters
    • model (str) – model name or path
    • call_kwargs (dict) – kwargs that will be passed to each call of encode. These can be overridden during each application. For possible arguments check the Sentence-Transformers documentation.
    • device (str) – defines which device will be used to run the Pipeline
    • sentencetransformer_kwargs – kwargs accepted during initialization of SentenceTransformers. For possible arguments check the Sentence-Transformers documentation

Example:

import pathway as pw
from pathway.xpacks.llm import embedders
embedder = embedders.SentenceTransformerEmbedder(model="intfloat/e5-large-v2")
t = pw.debug.table_from_markdown('''
txt
Text
''')
t.select(ret=embedder(pw.this.txt))

__call__(input, *args, **kwargs)

sourceEmbeds texts in a Column.

get_embedding_dimension(**kwargs)

sourceComputes number of embedder’s dimensions by asking the embedder to embed ".".

  • Parameters
    **kwargs – parameters of the embedder, if unset defaults from the constructor will be taken.