A vector index is an indexing technique applied to data stored in vector form. The vector index is used to organize the vectorized data efficiently so that they can easily be retrieved based on their similarity or distance relationships.
A vector database typically stores high-dimensional vectors representing various types of data such as images, documents, or audio signals. Each vector is composed of multiple numerical values (dimensions) that capture different aspects or features of the data.
There are two main types of vector indexes: tree-based indexes and hashing-based indexes.
Tree-based indexes use tree structures, such as binary trees or multi-dimensional trees, to hierarchically partition the vector space. These structures enable fast search results by repeatedly dividing the space into smaller regions based on vector properties.
Hashing-based indexes use hash functions to map vectors to specific locations in a data structure, typically a hash table. Because similar vectors are more likely to be mapped to the same or nearby locations, search results can be returned quickly.
In both cases, the vector index provides a way to quickly narrow down the search space and retrieve vectors that are similar to a given query vector.
Vector representations (also called vector embeddings) are generally stored in a vector database
Regular databases generally support structured queries (such as SQL) while vector databases are optimized for similarity queries.
Vector Indexes are used for a wide range of applications. One common use case for a vector index is in vector-based similarity searches. Given a query, a similarity search returns similar results.
Real-world applications of a vector index include discovering similarities in medical records, financial transaction or between websites indexed by a search engine.
Similarity searches that use a vector index can be extremely fast because of the mathematical properties that data exhibits when converted into vectors. While high-dimensional vectors can be computationally expensive to compare directly, many vector indexing techniques employ dimensionality reduction (such as Principal Component Analysis) to reduce the complexity of the data while retaining the important information regarding the similarity between data entries. This reduces the search space and enables a fast return of search results.
Our Building LLM Apps without a Vector Database tutorial uses a vector index under the hood. Try it out to get a feel for the fast performance vector indexing can deliver.