pw.xpacks.llm.vector_store

Pathway vector search server and client.

The server reads source documents and build a vector index over them, then starts serving HTTP requests.

The client queries the server and returns matching documents.

class pw.xpacks.llm.vector_store.VectorStoreClient(host=None, port=None, url=None, timeout=15, additional_headers=None)

[source]
A client you can use to query VectorStoreServer.

Please provide either the url, or host and port.

get_input_files(metadata_filter=None, filepath_globpattern=None)

sourceFetch information on documents in the the vector store.

  • Parameters
    • metadata_filter (str | None) – optional string representing the metadata filtering query in the JMESPath format. The search will happen only for documents satisfying this filtering.
    • filepath_globpattern (str | None) – optional glob pattern specifying which documents will be searched for this query.

get_vectorstore_statistics()

sourceFetch basic statistics about the vector store.

query(query, k=3, metadata_filter=None, filepath_globpattern=None)

sourcePerform a query to the vector store and fetch results.

  • Parameters
    • query (str) –
    • k (int) – number of documents to be returned
    • metadata_filter (str | None) – optional string representing the metadata filtering query in the JMESPath format. The search will happen only for documents satisfying this filtering.
    • filepath_globpattern (str | None) – optional glob pattern specifying which documents will be searched for this query.

class pw.xpacks.llm.vector_store.VectorStoreServer(*docs, embedder, parser=None, splitter=None, doc_post_processors=None, index_params=None)

[source]
Builds a document indexing pipeline and starts an HTTP REST server for nearest neighbors queries.
  • Parameters
    • docs (-) – pathway tables typically coming out of connectors which contain source documents.
    • embedder (-) – callable that embeds a single document
    • parser (-) – callable that parses file contents into a list of documents
    • splitter (-) – callable that splits long documents

classmethod from_langchain_components(*docs, embedder, parser=None, splitter=None, **kwargs)

sourceInitializes VectorStoreServer by using LangChain components.

  • Parameters
    • docs (-) – pathway tables typically coming out of connectors which contain source documents
    • embedder (-) – Langchain component for embedding documents
    • parser (-) – callable that parses file contents into a list of documents
    • splitter (-) – Langchaing component for splitting documents into parts

classmethod from_llamaindex_components(*docs, transformations, parser=None, **kwargs)

sourceInitializes VectorStoreServer by using LlamaIndex TransformComponents.

  • Parameters
    • docs (-) – pathway tables typically coming out of connectors which contain source documents
    • transformations (-) – list of LlamaIndex components. The last component in this list is required to inherit from LlamaIndex BaseEmbedding
    • parser (-) – callable that parses file contents into a list of documents

run_server(host, port, threaded=False, with_cache=True, cache_backend=<pathway.persistence.Backend object>)

sourceBuilds the document processing pipeline and runs it.

  • Parameters
    • host (-) – host to bind the HTTP listener
    • port (-) – to bind the HTTP listener
    • threaded (-) – if True, run in a thread. Else block computation
    • with_cache (-) – if True, embedding requests for the same contents are cached
    • cache_backend (-) – the backend to use for caching if it is enabled. The default is the disk cache, hosted locally in the folder ./Cache. You can use Backend class of the [persistence API](/developers/api-docs/persistence-api/#pathway.persistence.Backend) to override it.
  • Returns
    If threaded, return the Thread object. Else, does not return.