pw.xpacks.llm.rerankers

class pw.xpacks.llm.rerankers.CrossEncoderReranker(model_name, *, cache_strategy=None, **init_kwargs)

[source]
Pointwise Cross encoder reranker module.

Uses the CrossEncoder from the sentence_transformers library. For reference, check out Cross encoders documentation

  • Parameters
    • model_name (-) – Embedding model to be used.
    • cache_strategy (-) – Defines the caching mechanism. To enable caching, a valid CacheStrategy should be provided. See Cache strategy for more information. Defaults to None.
    • doc (-) – Document or document chunk to be scored.
    • query (-) – User query or prompt that will be used to evaluate relevance of the doc.

model_name and cache_strategy are initialization arguments. Rest are run time only arguments.

Suggested model: cross-encoder/ms-marco-TinyBERT-L-2-v2

Example:

import pathway as pw
import pandas as pd
from pathway.xpacks.llm import rerankers
reranker = rerankers.CrossEncoderReranker(model_name="cross-encoder/ms-marco-TinyBERT-L-2-v2")
docs = [{"text": "Something"}, {"text": "Something else"}, {"text": "Pathway"}]
df = pd.DataFrame({"docs": docs, "prompt": "query text"})
table = pw.debug.table_from_pandas(df)
table += table.select(
reranker_scores=reranker(pw.this.docs["text"], pw.this.prompt)
)
table

class pw.xpacks.llm.rerankers.EncoderReranker(model_name, *, cache_strategy=None, **init_kwargs)

[source]
Pointwise encoder reranker module.

Uses the encoders from the sentence_transformers library. For reference, check out Pretrained models documentation

  • Parameters
    • model_name (-) – Embedding model to be used.
    • cache_strategy (-) – Defines the caching mechanism. To enable caching, a valid CacheStrategy should be provided. See Cache strategy for more information. Defaults to None.
    • doc (-) – Document or document chunk to be scored.
    • query (-) – User query or prompt that will be used to evaluate relevance of the doc.

model_name and cache_strategy are initialization arguments. Rest are run time only arguments.

Suggested model: BAAI/bge-large-zh-v1.5

Example:

import pathway as pw
import pandas as pd
from pathway.xpacks.llm import rerankers
reranker = rerankers.EncoderReranker(model_name="BAAI/bge-large-zh-v1.5")
docs = [{"text": "Something"}, {"text": "Something else"}, {"text": "Pathway"}]
df = pd.DataFrame({"docs": docs, "prompt": "query text"})
table = pw.debug.table_from_pandas(df)
table += table.select(
reranker_scores=reranker(pw.this.docs["text"], pw.this.prompt)
)
table

class pw.xpacks.llm.rerankers.LLMReranker(llm, *, retry_strategy=<pathway.internals.udfs.retries.ExponentialBackoffRetryStrategy object>, cache_strategy=None, use_logit_bias=None)

[source]
Pointwise LLM reranking module.

Asks LLM to evaluate a given doc against a query between 1 and 5.

  • Parameters
    • llm (-) – Chat instance to be called during reranking.
    • retry_strategy (-) – Strategy for handling retries in case of failures. Defaults to None, meaning no retries.
    • cache_strategy (-) – Defines the caching mechanism. To enable caching, a valid CacheStrategy should be provided. See Cache strategy for more information. Defaults to None.
    • use_logit_bias (-) – bool or None. Setting it as None checks if the LLM provider supports logit_bias argument, it can be overridden by setting it as True or False. Defaults to None.
    • doc (-) – Document or document chunk to be scored.
    • query (-) – User query or prompt that will be used to evaluate relevance of the doc.

llm, use_logit_bias, retry_strategy and cache_strategy are initialization arguments. Rest are run time only arguments.

Example:

import pathway as pw
import pandas as pd
from pathway.xpacks.llm import rerankers, llms
chat = llms.OpenAIChat(model="gpt-3.5-turbo")
reranker = rerankers.LLMReranker(chat)
docs = [{"text": "Something"}, {"text": "Something else"}, {"text": "Pathway"}]
df = pd.DataFrame({"docs": docs, "prompt": "query text"})
table = pw.debug.table_from_pandas(df)
table += table.select(
reranker_scores=reranker(pw.this.docs["text"], pw.this.prompt)
)
table

pw.xpacks.llm.rerankers.rerank_topk_filter(docs, scores, k=5)

sourceApply top-k filtering to docs using the relevance scores.

  • Parameters
    • docs (-) – Documents or chunks.
    • scores (-) – Re-ranking scores for chunks.
    • k (-) – Number of documents to keep after filtering.
import pathway as pw
from pathway.xpacks.llm import rerankers
import pandas as pd
retrieved_docs = [{"text": "Something"}, {"text": "Something else"}, {"text": "Pathway"}]
df = pd.DataFrame({"docs": retrieved_docs, "reranker_scores": [1.0, 3.0, 2.0]})
table = pw.debug.table_from_pandas(df)
docs_table = table.reduce(
doc_list=pw.reducers.tuple(pw.this.docs),
score_list=pw.reducers.tuple(pw.this.reranker_scores),
)
docs_table = docs_table.select(
docs_scores_tuple=rerankers.rerank_topk_filter(
pw.this.doc_list, pw.this.score_list, 2
)
)
docs_table = docs_table.select(
doc_list=pw.this.docs_scores_tuple[0],
score_list=pw.this.docs_scores_tuple[1],
)
docs_table