pw.xpacks.llm.rerankers
class CrossEncoderReranker(model_name, *, cache_strategy=None, **init_kwargs)
[source]Pointwise Cross encoder reranker module.
Uses the CrossEncoder from the sentence_transformers library. For reference, check out Cross encoders documentation
- Parameters
- model_name (
str
) – Embedding model to be used. - cache_strategy (
CacheStrategy
|None
) – Defines the caching mechanism. To enable caching, a valid CacheStrategy should be provided. See Cache strategy for more information. Defaults to None.
- model_name (
Suggested model: cross-encoder/ms-marco-TinyBERT-L-2-v2
Example:
import pathway as pw
import pandas as pd
from pathway.xpacks.llm import rerankers
reranker = rerankers.CrossEncoderReranker(model_name="cross-encoder/ms-marco-TinyBERT-L-2-v2")
docs = [{"text": "Something"}, {"text": "Something else"}, {"text": "Pathway"}]
df = pd.DataFrame({"docs": docs, "prompt": "query text"})
table = pw.debug.table_from_pandas(df)
table += table.select(
reranker_scores=reranker(pw.this.docs["text"], pw.this.prompt)
)
table
__call__(doc, query, **kwargs)
sourceEvaluates the doc against the query.
- Parameters
- doc (
pw.ColumnExpression[str]
) – Document or document chunk to be scored. - query (
pw.ColumnExpression[str]
) – User query or prompt that will be used to evaluate relevance of the doc. - **kwargs – override for defaults set in the constructor.
- doc (
class EncoderReranker(model_name, *, cache_strategy=None, **init_kwargs)
[source]Pointwise encoder reranker module.
Uses the encoders from the sentence_transformers library. For reference, check out Pretrained models documentation
- Parameters
- model_name (
str
) – Embedding model to be used. - cache_strategy (
CacheStrategy
|None
) – Defines the caching mechanism. To enable caching, a valid CacheStrategy should be provided. See Cache strategy for more information. Defaults to None.
- model_name (
Suggested model: BAAI/bge-large-zh-v1.5
Example:
import pathway as pw
import pandas as pd
from pathway.xpacks.llm import rerankers
reranker = rerankers.EncoderReranker(model_name="BAAI/bge-large-zh-v1.5")
docs = [{"text": "Something"}, {"text": "Something else"}, {"text": "Pathway"}]
df = pd.DataFrame({"docs": docs, "prompt": "query text"})
table = pw.debug.table_from_pandas(df)
table += table.select(
reranker_scores=reranker(pw.this.docs["text"], pw.this.prompt)
)
table
__call__(doc, query, **kwargs)
sourceEvaluates the doc against the query.
- Parameters
- doc (
pw.ColumnExpression[str]
) – Document or document chunk to be scored. - query (
pw.ColumnExpression[str]
) – User query or prompt that will be used to evaluate relevance of the doc. - **kwargs – override for defaults set in the constructor.
- doc (
class LLMReranker(llm, *, retry_strategy=udfs.ExponentialBackoffRetryStrategy(max_retries=6), cache_strategy=None, use_logit_bias=None)
[source]Pointwise LLM reranking module.
Asks LLM to evaluate a given doc against a query between 1 and 5.
- Parameters
- llm (
BaseChat
) – Chat instance to be called during reranking. - retry_strategy (
AsyncRetryStrategy
|None
) – Strategy for handling retries in case of failures. Defaults to None, meaning no retries. - cache_strategy (
CacheStrategy
|None
) – Defines the caching mechanism. To enable caching, a valid CacheStrategy should be provided. See Cache strategy for more information. Defaults to None. - use_logit_bias (
bool
|None
) – bool or None. Setting it as None checks if the LLM provider supports logit_bias argument, it can be overridden by setting it as True or False. Defaults to None.
- llm (
Example:
import pathway as pw
import pandas as pd
from pathway.xpacks.llm import rerankers, llms
chat = llms.OpenAIChat(model="gpt-3.5-turbo")
reranker = rerankers.LLMReranker(chat)
docs = [{"text": "Something"}, {"text": "Something else"}, {"text": "Pathway"}]
df = pd.DataFrame({"docs": docs, "prompt": "query text"})
table = pw.debug.table_from_pandas(df)
table += table.select(
reranker_scores=reranker(pw.this.docs["text"], pw.this.prompt)
)
table
__call__(doc, query, **kwargs)
sourceEvaluates the doc against the query.
- Parameters
- doc (
pw.ColumnExpression[str]
) – Document or document chunk to be scored. - query (
pw.ColumnExpression[str]
) – User query or prompt that will be used to evaluate relevance of the doc. - **kwargs – override for defaults set in the constructor
- doc (
rerank_topk_filter(docs, scores, k=5)
sourceApply top-k filtering to docs using the relevance scores.
- Parameters
- docs (
list
[dict
[str
,str
|dict
]]) – A column with lists of documents or chunks to rank. Each row in this column is filtered separately. - scores (
list
[float
]) – A column with lists of re-ranking scores for chunks. - k (
int
) – The number of documents to keep after filtering.
- docs (
import pathway as pw
from pathway.xpacks.llm import rerankers
import pandas as pd
retrieved_docs = [
{"text": "Something"},
{"text": "Something else"},
{"text": "Pathway"},
]
df = pd.DataFrame({"docs": retrieved_docs, "reranker_scores": [1.0, 3.0, 2.0]})
table = pw.debug.table_from_pandas(df)
docs_table = table.reduce(
doc_list=pw.reducers.tuple(pw.this.docs),
score_list=pw.reducers.tuple(pw.this.reranker_scores),
)
docs_table = docs_table.select(
docs_scores_tuple=rerankers.rerank_topk_filter(
pw.this.doc_list, pw.this.score_list, 2
)
)
docs_table = docs_table.select(
doc_list=pw.this.docs_scores_tuple[0],
score_list=pw.this.docs_scores_tuple[1],
)
pw.debug.compute_and_print(docs_table, include_id=False)