Creating vector index in Postgres: testing locally
/ 3 min read
When working with vectors and embeddings, especially performing similarity search, hitting perfomance limitation is a matter of time. In this article we will create an vector index in Postgres with pg_vector extension and validate it perfoms as expected.
IVFFlat vs HNSW
pg_vector supports two types of indexes:
- IVFFlat (Inverted File Flat) - Suitable for larger datasets but may compromise on precision.
- HNSW (Hierarchical Navigable Small World) - Provides better accuracy and efficiency but may require more memory.
We will use HNSW index in this article, but you can experiment with IVFFlat as well.
Step 1: Setting Up Sample Data
Start Postgres with pg_vector in docker container:
Then, create a table:
And lets insert some sample data - ie 1000 rows
Step 2: Sequential Scan Without Index
After the data is seeded, lets run a sample query without index:
As we can see, without index, the query requires scanning all rows in the table which can be inefficient:
Step 3: Creating an HNSW Index
Now lets add an index to improve query performance.
Depending on the nature of the dataset and your use-case, add an index for each distance function you want to use. Here’s an example of creating an index for cosine distance:
Step 4: Query Performance With Index
After creating the index, vacuum analyze the table to update the statistics:
Now, lets run the same query again:
As per docs, in order to use index, we need to combine ORDER_BY and LIMIT:
We can see that the index is used:
Conclusion
Indexes play a crucial role in enhancing query performance for large datasets in pg_vector-powered databases. By understanding how to create and use different types of indexes such as IVFFlat and HNSW, you can optimize your queries efficiently.
Experiment with both types of indexes based on your dataset size and precision requirements to find what works best for your application needs.
In the next part, we will look at how to execute it on production.