Vector Database

A vector database is a specialized database designed to store, index and query data stored in vectorized format, i.e. as multidimensional numerical representations. Vector databases are optimized for applications that require fast similarity searches over very large datasets.

How does a vector database work?

A vector database stores data as high-dimensional vectors. Regular structured or unstructured data is first converted into a multidimensional numerical representation called a vector. These vector representations (also called vector embeddings) of the data exhibit certain mathematical characteristics that make it possible to easily find similar entries from a large collection of data points.

The vectors are organized using a vector index

, which enables fast retrieval of similarity search results.

What is the difference between a vector database and a regular database?

A vector database differs from a regular database in that it stores data as mathematical vectors rather than storing structured or unstructured data in a tabular format the way standard databases do. Regular databases generally support structured queries (such as SQL) while vector databases are optimized for similarity queries.

When should I use a vector database?

Consider using a vector database when your main goal is to perform fast similarity searches over a very large dataset. For example, if you are building a recommendation system or an LLM-based interactive chatbot.

When should I not use a vector database?

If your primary data processing method is to run SQL-like queries, consider using a regular database instead.

In some cases, you may also be able to avoid using a vector database entirely by using Pathway. Our Building LLM Apps without a Vector Database tutorial uses a vector index under the hood. Try it out to get a feel for the fast performance vector indexing can deliver.