pw.xpacks.llm.vector_store
Pathway vector search server and client.
The server reads source documents and build a vector index over them, then starts serving HTTP requests.
The client queries the server and returns matching documents.
class SlidesVectorStoreServer(*docs, embedder, parser=None, splitter=None, doc_post_processors=None)
[source]Accompanying vector index server for the slide-search demo.
Builds a document indexing pipeline and starts an HTTP REST server.
Modifies the VectorStoreServer’s pw_list_document endpoint to return set of
metadata after the parsing and document post processing stages.
class InputsResultSchema
[source]classmethod from_langchain_components(*docs, embedder, parser=None, splitter=None, **kwargs)
sourceInitializes VectorStoreServer by using LangChain components.
Embeddings embedder: Langchain component for embedding documents
Callable[[bytes], list[tuple[str, dict]]] | None parser: callable that parses file contents into a list of documents
BaseDocumentTransformer | None splitter: Langchaing component for splitting documents into parts
classmethod from_llamaindex_components(*docs, transformations, parser=None, **kwargs)
sourceInitializes VectorStoreServer by using LlamaIndex TransformComponents.
list[TransformComponent]
 transformations: list of LlamaIndex components. The last component in this list
is required to inherit from LlamaIndex BaseEmbedding
- Parameters
parser (Callable[[bytes],list[tuple[str,dict]]] |None) – callable that parses file contents into a list of documents 
inputs_query(input_queries)
sourceQuery DocumentStore for the list of input documents.
retrieve_query(retrieval_queries)
sourceQuery DocumentStore for the list of closest texts to a given query.
run_server(host, port, threaded=False, with_cache=True, cache_backend=pw.persistence.Backend.filesystem('./Cache'), **kwargs)
sourceBuilds the document processing pipeline and runs it.
- Parameters
- host – host to bind the HTTP listener
 - port – to bind the HTTP listener
 - threaded (
bool) – if True, run in a thread. Else block computation - with_cache (
bool) – if True, embedding requests for the same contents are cached - cache_backend (
Backend|None) – the backend to use for caching if it is enabled. The default is the disk cache, hosted locally in the folder./Cache. You can useBackendclass of the [persistence API](/developers/api-docs/persistence-api/#pathway.persistence.Backend) to override it. - kwargs – optional parameters to be passed to 
run(). 
 - Returns
If threaded, return the Thread object. Else, does not return. 
statistics_query(info_queries)
sourceQuery DocumentStore for statistics about indexed documents. It returns the number
of indexed texts, time of last modification, and time of last indexing of input document.
class VectorStoreClient(*args, **kwargs)
[source]A client you can use to query VectorStoreServer.
Please provide either the "url", or "host" and "port".
- Parameters
- host – host on which VectorStoreServer listens
 - port – port on which VectorStoreServer listens
 - url – url at which VectorStoreServer listens
 - timeout – timeout for the post requests in seconds
 
 
get_input_files(metadata_filter=None, filepath_globpattern=None, return_status=False)
sourceFetch information on documents in the the vector store.
- Parameters
- metadata_filter (
str|None) – optional string representing the metadata filtering query in the JMESPath format. The search will happen only for documents satisfying this filtering. - filepath_globpattern (
str|None) – optional glob pattern specifying which documents will be searched for this query. - return_status (
bool) – flag telling whether _indexing_status should be returned for each document 
 - metadata_filter (
 
get_vectorstore_statistics()
sourceFetch basic statistics about the vector store.
query(query, k=3, metadata_filter=None, filepath_globpattern=None)
sourcePerform a query to the vector store and fetch results.
- Parameters
- query (
str) – - k (
int) – number of documents to be returned - metadata_filter (
str|None) – optional string representing the metadata filtering query in the JMESPath format. The search will happen only for documents satisfying this filtering. - filepath_globpattern (
str|None) – optional glob pattern specifying which documents will be searched for this query. 
 - query (
 
class VectorStoreServer(*docs, embedder, parser=None, splitter=None, doc_post_processors=None)
[source]Builds a document indexing pipeline and starts an HTTP REST server for nearest neighbors queries.
- Parameters
- docs (
Table) – pathway tables typically coming out of connectors which contain source documents. - embedder (
UDF) – callable that embeds a single document - parser (
Callable[[bytes],list[tuple[str,dict]]] |UDF|None) – callable that parses file contents into a list of documents - splitter (
Callable[[str],list[tuple[str,dict]]] |UDF|None) – callable that splits long documents - doc_post_processors (
list[Callable[[str,dict],tuple[str,dict]]] |None) – optional list of callables that modify parsed files and metadata. any callable takes two arguments (text: str, metadata: dict) and returns them as a tuple. 
 - docs (
 
class InputsResultSchema
[source]classmethod from_langchain_components(*docs, embedder, parser=None, splitter=None, **kwargs)
sourceInitializes VectorStoreServer by using LangChain components.
Embeddings embedder: Langchain component for embedding documents
Callable[[bytes], list[tuple[str, dict]]] | None parser: callable that parses file contents into a list of documents
BaseDocumentTransformer | None splitter: Langchaing component for splitting documents into parts
classmethod from_llamaindex_components(*docs, transformations, parser=None, **kwargs)
sourceInitializes VectorStoreServer by using LlamaIndex TransformComponents.
list[TransformComponent]
 transformations: list of LlamaIndex components. The last component in this list
is required to inherit from LlamaIndex BaseEmbedding
- Parameters
parser (Callable[[bytes],list[tuple[str,dict]]] |None) – callable that parses file contents into a list of documents 
inputs_query(input_queries)
sourceQuery DocumentStore for the list of input documents.
retrieve_query(retrieval_queries)
sourceQuery DocumentStore for the list of closest texts to a given query.
run_server(host, port, threaded=False, with_cache=True, cache_backend=pw.persistence.Backend.filesystem('./Cache'), **kwargs)
sourceBuilds the document processing pipeline and runs it.
- Parameters
- host – host to bind the HTTP listener
 - port – to bind the HTTP listener
 - threaded (
bool) – if True, run in a thread. Else block computation - with_cache (
bool) – if True, embedding requests for the same contents are cached - cache_backend (
Backend|None) – the backend to use for caching if it is enabled. The default is the disk cache, hosted locally in the folder./Cache. You can useBackendclass of the [persistence API](/developers/api-docs/persistence-api/#pathway.persistence.Backend) to override it. - kwargs – optional parameters to be passed to 
run(). 
 - Returns
If threaded, return the Thread object. Else, does not return. 
statistics_query(info_queries)
sourceQuery DocumentStore for statistics about indexed documents. It returns the number
of indexed texts, time of last modification, and time of last indexing of input document.