pw.xpacks.llm.embedders
Pathway embedder UDFs.
class pw.xpacks.llm.embedders.GeminiEmbedder(*, capacity=None, retry_strategy=None, cache_strategy=None, model='models/embedding-001', api_key=None, **gemini_kwargs)
[source]Pathway wrapper for Google Gemini Embedding services.
The capacity
, retry_strategy
and cache_strategy
need to be specified during object
construction. All other arguments can be overridden during application.
- Parameters
- capacity (
int
|None
) – Maximum number of concurrent operations allowed. Defaults toNone
, indicating no specific limit. - retry_strategy (
AsyncRetryStrategy
|None
) – Strategy for handling retries in case of failures. Defaults toNone
, meaning no retries. - cache_strategy (
CacheStrategy
|None
) – Defines the caching mechanism. To enable caching, a validCacheStrategy
should be provided. See Cache strategy for more information. Defaults to None. - model (
str
|None
) – ID of the model to use. Check the Gemini documentation for list of available models. To specify the model in the UDF call, set it to None in the constructor. - api_key (
str
|None
) – API key for Gemini API services. Can be provided in the constructor, in__call__
or by settingGOOGLE_API_KEY
environment variable - gemini_kwargs – any other arguments accepted by gemini embedding service. Check the Gemini documentation for list of accepted arguments.
- capacity (
Example:
import pathway as pw
from pathway.xpacks.llm import embedders
embedder = embedders.GeminiEmbedder(model="models/text-embedding-004")
t = pw.debug.table_from_markdown('''
txt
Text
''')
t.select(ret=embedder(pw.this.txt))
import pathway as pw
from pathway.xpacks.llm import embedders
embedder = embedders.GeminiEmbedder()
t = pw.debug.table_from_markdown('''
txt | model
Text | models/embedding-001
''')
t.select(ret=embedder(pw.this.txt, model=pw.this.model))
__call__(input, *args, **kwargs)
sourceEmbeds texts in a Column.
- Parameters
input (ColumnExpression
[str]
) – Column with texts to embed
get_embedding_dimension(**kwargs)
sourceComputes number of embedder’s dimensions by asking the embedder to embed "."
.
- Parameters
**kwargs – parameters of the embedder, if unset defaults from the constructor will be taken.
class pw.xpacks.llm.embedders.LiteLLMEmbedder(*, capacity=None, retry_strategy=None, cache_strategy=None, model=None, **llmlite_kwargs)
[source]Pathway wrapper for litellm.embedding.
Model has to be specified either in constructor call or in each application, no default is provided. The capacity, retry_strategy and cache_strategy need to be specified during object construction. All other arguments can be overridden during application.
- Parameters
- capacity (
int
|None
) – Maximum number of concurrent operations allowed. Defaults to None, indicating no specific limit. - retry_strategy (
AsyncRetryStrategy
|None
) – Strategy for handling retries in case of failures. Defaults to None, meaning no retries. - cache_strategy (
CacheStrategy
|None
) – Defines the caching mechanism. To enable caching, a valid CacheStrategy should be provided. See Cache strategy for more information. Defaults to None. - model (
str
|None
) – The embedding model to use. - timeout – The timeout value for the API call, default 10 mins
- litellm_call_id – The call ID for litellm logging.
- litellm_logging_obj – The litellm logging object.
- logger_fn – The logger function.
- api_base – Optional. The base URL for the API.
- api_version – Optional. The version of the API.
- api_key – Optional. The API key to use.
- api_type – Optional. The type of the API.
- custom_llm_provider – The custom llm provider.
- capacity (
Any arguments can be provided either to the constructor or in the UDF call. To specify the model in the UDF call, set it to None.
Example:
import pathway as pw
from pathway.xpacks.llm import embedders
embedder = embedders.LiteLLMEmbedder(model="text-embedding-ada-002")
t = pw.debug.table_from_markdown('''
txt
Text
''')
t.select(ret=embedder(pw.this.txt))
import pathway as pw
from pathway.xpacks.llm import embedders
embedder = embedders.LiteLLMEmbedder()
t = pw.debug.table_from_markdown('''
txt | model
Text | text-embedding-ada-002
''')
t.select(ret=embedder(pw.this.txt, model=pw.this.model))
__call__(input, *args, **kwargs)
sourceEmbeds texts in a Column.
- Parameters
input (ColumnExpression
[str]
) – Column with texts to embed
get_embedding_dimension(**kwargs)
sourceComputes number of embedder’s dimensions by asking the embedder to embed "."
.
- Parameters
**kwargs – parameters of the embedder, if unset defaults from the constructor will be taken.
class pw.xpacks.llm.embedders.OpenAIEmbedder(*, capacity=None, retry_strategy=None, cache_strategy=None, model='text-embedding-ada-002', **openai_kwargs)
[source]Pathway wrapper for OpenAI Embedding services.
The capacity, retry_strategy and cache_strategy need to be specified during object construction. All other arguments can be overridden during application.
- Parameters
- capacity (
int
|None
) – Maximum number of concurrent operations allowed. Defaults to None, indicating no specific limit. - retry_strategy (
AsyncRetryStrategy
|None
) – Strategy for handling retries in case of failures. Defaults to None, meaning no retries. - cache_strategy (
CacheStrategy
|None
) – Defines the caching mechanism. To enable caching, a valid CacheStrategy should be provided. See Cache strategy for more information. Defaults to None. - model (
str
|None
) – ID of the model to use. You can use the List models API to see all of your available models, or see Model overview for descriptions of them. - encoding_format – The format to return the embeddings in. Can be either float or base64.
- user – A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse. Learn more.
- extra_headers – Send extra headers
- extra_query – Add additional query parameters to the request
- extra_body – Add additional JSON properties to the request
- timeout – Timeout for requests, in seconds
- capacity (
Any arguments can be provided either to the constructor or in the UDF call. To specify the model in the UDF call, set it to None.
Example:
import pathway as pw
from pathway.xpacks.llm import embedders
embedder = embedders.OpenAIEmbedder(model="text-embedding-ada-002")
t = pw.debug.table_from_markdown('''
txt
Text
''')
t.select(ret=embedder(pw.this.txt))
import pathway as pw
from pathway.xpacks.llm import embedders
embedder = embedders.OpenAIEmbedder()
t = pw.debug.table_from_markdown('''
txt | model
Text | text-embedding-ada-002
''')
t.select(ret=embedder(pw.this.txt, model=pw.this.model))
__call__(input, *args, **kwargs)
sourceEmbeds texts in a Column.
- Parameters
input (ColumnExpression
[str]
) – Column with texts to embed
get_embedding_dimension(**kwargs)
sourceComputes number of embedder’s dimensions by asking the embedder to embed "."
.
- Parameters
**kwargs – parameters of the embedder, if unset defaults from the constructor will be taken.
class pw.xpacks.llm.embedders.SentenceTransformerEmbedder(model, call_kwargs={}, device='cpu', **sentencetransformer_kwargs)
[source]Pathway wrapper for Sentence-Transformers embedder.
- Parameters
- model (
str
) – model name or path - call_kwargs (
dict
) – kwargs that will be passed to each call of encode. These can be overridden during each application. For possible arguments check the Sentence-Transformers documentation. - device (
str
) – defines which device will be used to run the Pipeline - sentencetransformer_kwargs – kwargs accepted during initialization of SentenceTransformers. For possible arguments check the Sentence-Transformers documentation
- model (
Example:
import pathway as pw
from pathway.xpacks.llm import embedders
embedder = embedders.SentenceTransformerEmbedder(model="intfloat/e5-large-v2")
t = pw.debug.table_from_markdown('''
txt
Text
''')
t.select(ret=embedder(pw.this.txt))
__call__(input, *args, **kwargs)
sourceEmbeds texts in a Column.
- Parameters
input (ColumnExpression
[str]
) – Column with texts to embed
get_embedding_dimension(**kwargs)
sourceComputes number of embedder’s dimensions by asking the embedder to embed "."
.
- Parameters
**kwargs – parameters of the embedder, if unset defaults from the constructor will be taken.