pw.xpacks.llm.document_store

Pathway Document Store for processing and indexing documents.

The document store reads source documents and build a vector index over them, and exposes multiple methods for querying.

class pw.xpacks.llm.document_store.DocumentStore(docs, retriever_factory, parser=None, splitter=None, doc_post_processors=None)

[source]

Builds a document indexing pipeline for processing documents and querying closest documents to a query according to a specified index.

  • Parameters
    • docs (-) – pathway tables typically coming out of connectors which contain source documents.
    • retriever_factory (-) – factory for building an index, which will be provided texts by the DocumentStore.
    • parser (-) – callable that parses file contents into a list of documents.
    • splitter (-) – callable that splits long documents.
    • doc_post_processors (-) – optional list of callables that modify parsed files and metadata. any callable takes two arguments (text: str, metadata: dict) and returns them as a tuple.

class InputsResultSchema

[source]

classmethod from_langchain_components(docs, retriever_factory, parser=None, splitter=None, **kwargs)

sourceInitializes DocumentStore by using LangChain components.

  • Parameters
    • docs (-) – pathway tables typically coming out of connectors which contain source documents
    • retriever_factory (-) – factory for building an index, which will be provided texts by the DocumentStore.
    • parser (-) – callable that parses file contents into a list of documents
    • splitter (-) – Langchaing component for splitting documents into parts

classmethod from_llamaindex_components(docs, retriever_factory, transformations, parser=None, **kwargs)

sourceInitializes DocumentStore by using LlamaIndex TransformComponents.

  • Parameters
    • docs (-) – pathway tables typically coming out of connectors which contain source documents
    • retriever_factory (-) – factory for building an index, which will be provided texts by the DocumentStore.
    • transformations (-) – list of LlamaIndex components.
    • parser (-) – callable that parses file contents into a list of documents

inputs_query(input_queries)

sourceQuery DocumentStore for the list of input documents.

retrieve_query(retrieval_queries)

sourceQuery DocumentStore for the list of closest texts to a given query.

statistics_query(info_queries)

sourceQuery DocumentStore for statistics about indexed documents. It returns the number of indexed texts, time of last modification, and time of last indexing of input document.