pw.xpacks.llm.document_store
Pathway Document Store for processing and indexing documents.
The document store reads source documents and build a vector index over them, and exposes multiple methods for querying.
class pw.xpacks.llm.document_store.DocumentStore(docs, retriever_factory, parser=None, splitter=None, doc_post_processors=None)
[source]Builds a document indexing pipeline for processing documents and querying closest documents to a query according to a specified index.
- Parameters
- docs (
-
) – pathway tables typically coming out of connectors which contain source documents. - retriever_factory (
-
) – factory for building an index, which will be provided texts by theDocumentStore
. - parser (
-
) – callable that parses file contents into a list of documents. - splitter (
-
) – callable that splits long documents. - doc_post_processors (
-
) – optional list of callables that modify parsed files and metadata. any callable takes two arguments (text: str, metadata: dict) and returns them as a tuple.
- docs (
class InputsResultSchema
[source]classmethod from_langchain_components(docs, retriever_factory, parser=None, splitter=None, **kwargs)
sourceInitializes DocumentStore by using LangChain components.
- Parameters
- docs (
-
) – pathway tables typically coming out of connectors which contain source documents - retriever_factory (
-
) – factory for building an index, which will be provided texts by theDocumentStore
. - parser (
-
) – callable that parses file contents into a list of documents - splitter (
-
) – Langchaing component for splitting documents into parts
- docs (
classmethod from_llamaindex_components(docs, retriever_factory, transformations, parser=None, **kwargs)
sourceInitializes DocumentStore by using LlamaIndex TransformComponents.
- Parameters
- docs (
-
) – pathway tables typically coming out of connectors which contain source documents - retriever_factory (
-
) – factory for building an index, which will be provided texts by theDocumentStore
. - transformations (
-
) – list of LlamaIndex components. - parser (
-
) – callable that parses file contents into a list of documents
- docs (
inputs_query(input_queries)
sourceQuery DocumentStore
for the list of input documents.
retrieve_query(retrieval_queries)
sourceQuery DocumentStore
for the list of closest texts to a given query
.
statistics_query(info_queries)
sourceQuery DocumentStore
for statistics about indexed documents. It returns the number
of indexed texts, time of last modification, and time of last indexing of input document.