Install on Debian/Ubuntu Postgres 16
Since Debian 12 / Ubuntu 24.04, the PGDG repo ships a prebuilt extension package. Add the repo (if you haven't already):
sudo apt install -y curl ca-certificates
sudo install -d /usr/share/postgresql-common/pgdg
sudo curl -o /usr/share/postgresql-common/pgdg/apt.postgresql.org.asc \
--fail https://www.postgresql.org/media/keys/ACCC4CF8.asc
. /etc/os-release
echo "deb [signed-by=/usr/share/postgresql-common/pgdg/apt.postgresql.org.asc] \
https://apt.postgresql.org/pub/repos/apt $VERSION_CODENAME-pgdg main" \
| sudo tee /etc/apt/sources.list.d/pgdg.list
sudo apt update
sudo apt install postgresql-16 postgresql-16-pgvector
If you're on Postgres 15 or 17, swap the version in the two package names. To verify it loaded:
sudo -u postgres psql -c '\dx'
# Should not yet list "vector" — that's per-database
Create the extension
Per database:
sudo -u postgres createdb rag
sudo -u postgres psql -d rag -c 'CREATE EXTENSION vector;'
sudo -u postgres psql -d rag -c '\dx'
Schema
Vectors are a real column type. Dimension is fixed and must match what your embedding model produces (768 for nomic-embed-text, 1024 for mxbai-embed-large, 1536 for OpenAI text-embedding-3-small, 3072 for text-embedding-3-large):
CREATE TABLE chunks (
id bigserial PRIMARY KEY,
document_id bigint NOT NULL,
chunk_idx int NOT NULL,
content text NOT NULL,
embedding vector(768),
created_at timestamptz DEFAULT now()
);
CREATE INDEX ON chunks (document_id);
Note embedding can be NULL — useful when you want to insert rows now and embed asynchronously.
Insert vectors
The literal syntax is a JSON-style array of floats, single-quoted:
INSERT INTO chunks (document_id, chunk_idx, content, embedding)
VALUES (1, 0, 'WireGuard uses UDP and runs in the kernel.',
'[0.012, -0.430, 0.219, ...]');
From application code you typically pass the vector as a parameter. With psycopg + pgvector-python:
import psycopg
from pgvector.psycopg import register_vector
conn = psycopg.connect("dbname=rag")
register_vector(conn)
emb = embed_function("WireGuard uses UDP and runs in the kernel.") # returns list[float] of len 768
conn.execute(
"INSERT INTO chunks (document_id, chunk_idx, content, embedding) "
"VALUES (%s, %s, %s, %s)",
(1, 0, content, emb)
)
conn.commit()
Distance operators
pgvector ships three operators — pick the one that matches how your embedding model was trained:
| Operator | Metric | Use when |
|---|---|---|
<-> | L2 (Euclidean) distance | The default; safe choice if you don't know. |
<=> | Cosine distance | Most modern text-embedding models are trained with cosine similarity — use this. |
<#> | Negative inner product | Equivalent to cosine on already-normalized vectors, and slightly faster. |
Lower is "closer" for all three. A nearest-neighbor query:
SELECT id, content, embedding <=> %s AS distance
FROM chunks
ORDER BY embedding <=> %s
LIMIT 5;
(Yes, the query vector is bound twice — once for the projection, once for the ORDER BY. Postgres can't reuse it across clauses unless you wrap in a CTE.)
Indexes: HNSW vs IVFFlat
Without an index, the query above is a full table scan — fine for prototypes, painful past ~50k rows. pgvector supports two ANN indexes:
-- HNSW: build is slow, queries are fast, no training data needed.
-- Preferred for almost all workloads.
CREATE INDEX ON chunks USING hnsw (embedding vector_cosine_ops);
-- IVFFlat: build is fast, queries are decent, needs representative
-- training data (the table should be populated first).
CREATE INDEX ON chunks USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100); -- rule of thumb: sqrt(N rows)
Match the _ops suffix to the distance operator you'll query with: vector_l2_ops, vector_cosine_ops, vector_ip_ops. A mismatched index won't be used; the query will silently fall back to a scan.
Build-time: m (default 16, max connections per layer) and ef_construction (default 64, search width during build). Query-time: SET hnsw.ef_search = 100; — higher means better recall at the cost of latency. Start with defaults; raise ef_search only if recall numbers say to.
A complete RAG-style query
The killer feature of pgvector vs a standalone vector DB is that nearest-neighbor join keys are first-class SQL. You can filter, join, paginate, and order alongside the similarity search:
WITH q AS (
SELECT %s::vector AS embedding
)
SELECT
d.title,
c.content,
c.embedding <=> q.embedding AS distance
FROM chunks c
JOIN documents d ON d.id = c.document_id
JOIN q ON true
WHERE d.workspace_id = %s
AND d.archived = false
ORDER BY c.embedding <=> q.embedding
LIMIT 10;
That filters by workspace and archive status and does ANN search, all in one plan. With a standalone vector DB you'd have to fetch a wider candidate set, then re-filter in your app, then re-rank — pgvector skips the round trips.
Generating the embeddings
Cheapest path on a server with no GPU budget — run a small embedding model locally via Ollama:
curl http://localhost:11434/api/embeddings -d '{
"model": "nomic-embed-text",
"prompt": "WireGuard uses UDP and runs in the kernel."
}'
Returns {"embedding": [0.012, -0.430, ...]} — 768 floats — which goes straight into the vector(768) column.
Common pitfalls
- Dim mismatch.
ERROR: expected 768 dimensions, not 1536. Two embedding models with different output shapes can't share a column; either pick one and stick with it, or store them in separate columns / tables. - Index not being used.
EXPLAINshowsSeq Scan. Most common causes: distance operator doesn't match_opsin the index definition; the query has too restrictive aWHEREclause and the planner thinks a scan is cheaper; or you used a parameter without a type cast (%s::vector). - Slow index build. HNSW builds are CPU-bound and single-threaded by default. Bump
maintenance_work_memto ~2GB and setSET max_parallel_maintenance_workers = 7;before runningCREATE INDEX. The build will use all the cores you give it. - Vectors not normalized. For cosine, pgvector normalizes internally — you don't need to. For inner product (
<#>) you do, otherwise the "distances" are meaningless. - Updating embeddings rebuilds the index? No — HNSW supports inserts and deletes incrementally. But heavy churn degrades recall over time;
REINDEX INDEX CONCURRENTLYperiodically if the table is write-heavy.