pgvector for embeddings & semantic search

Install on Debian/Ubuntu Postgres 16

Since Debian 12 / Ubuntu 24.04, the PGDG repo ships a prebuilt extension package. Add the repo (if you haven't already):

sudo apt install -y curl ca-certificates
sudo install -d /usr/share/postgresql-common/pgdg
sudo curl -o /usr/share/postgresql-common/pgdg/apt.postgresql.org.asc \
  --fail https://www.postgresql.org/media/keys/ACCC4CF8.asc

. /etc/os-release
echo "deb [signed-by=/usr/share/postgresql-common/pgdg/apt.postgresql.org.asc] \
https://apt.postgresql.org/pub/repos/apt $VERSION_CODENAME-pgdg main" \
  | sudo tee /etc/apt/sources.list.d/pgdg.list

sudo apt update
sudo apt install postgresql-16 postgresql-16-pgvector

If you're on Postgres 15 or 17, swap the version in the two package names. To verify it loaded:

sudo -u postgres psql -c '\dx'
# Should not yet list "vector" — that's per-database

Create the extension

Per database:

sudo -u postgres createdb rag
sudo -u postgres psql -d rag -c 'CREATE EXTENSION vector;'
sudo -u postgres psql -d rag -c '\dx'

Schema

Vectors are a real column type. Dimension is fixed and must match what your embedding model produces (768 for nomic-embed-text, 1024 for mxbai-embed-large, 1536 for OpenAI text-embedding-3-small, 3072 for text-embedding-3-large):

CREATE TABLE chunks (
    id          bigserial PRIMARY KEY,
    document_id bigint NOT NULL,
    chunk_idx   int    NOT NULL,
    content     text   NOT NULL,
    embedding   vector(768),
    created_at  timestamptz DEFAULT now()
);

CREATE INDEX ON chunks (document_id);

Note embedding can be NULL — useful when you want to insert rows now and embed asynchronously.

Insert vectors

The literal syntax is a JSON-style array of floats, single-quoted:

INSERT INTO chunks (document_id, chunk_idx, content, embedding)
VALUES (1, 0, 'WireGuard uses UDP and runs in the kernel.',
        '[0.012, -0.430, 0.219, ...]');

From application code you typically pass the vector as a parameter. With psycopg + pgvector-python:

import psycopg
from pgvector.psycopg import register_vector

conn = psycopg.connect("dbname=rag")
register_vector(conn)

emb = embed_function("WireGuard uses UDP and runs in the kernel.")  # returns list[float] of len 768
conn.execute(
    "INSERT INTO chunks (document_id, chunk_idx, content, embedding) "
    "VALUES (%s, %s, %s, %s)",
    (1, 0, content, emb)
)
conn.commit()

Distance operators

pgvector ships three operators — pick the one that matches how your embedding model was trained:

Operator	Metric	Use when
`<->`	L2 (Euclidean) distance	The default; safe choice if you don't know.
`<=>`	Cosine distance	Most modern text-embedding models are trained with cosine similarity — use this.
`<#>`	Negative inner product	Equivalent to cosine on already-normalized vectors, and slightly faster.

Lower is "closer" for all three. A nearest-neighbor query:

SELECT id, content, embedding <=> %s AS distance
FROM chunks
ORDER BY embedding <=> %s
LIMIT 5;

(Yes, the query vector is bound twice — once for the projection, once for the ORDER BY. Postgres can't reuse it across clauses unless you wrap in a CTE.)

Indexes: HNSW vs IVFFlat

Without an index, the query above is a full table scan — fine for prototypes, painful past ~50k rows. pgvector supports two ANN indexes:

-- HNSW: build is slow, queries are fast, no training data needed.
-- Preferred for almost all workloads.
CREATE INDEX ON chunks USING hnsw (embedding vector_cosine_ops);

-- IVFFlat: build is fast, queries are decent, needs representative
-- training data (the table should be populated first).
CREATE INDEX ON chunks USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);     -- rule of thumb: sqrt(N rows)

Match the _ops suffix to the distance operator you'll query with: vector_l2_ops, vector_cosine_ops, vector_ip_ops. A mismatched index won't be used; the query will silently fall back to a scan.

HNSW tuning knobs

Build-time: m (default 16, max connections per layer) and ef_construction (default 64, search width during build). Query-time: SET hnsw.ef_search = 100; — higher means better recall at the cost of latency. Start with defaults; raise ef_search only if recall numbers say to.

A complete RAG-style query

The killer feature of pgvector vs a standalone vector DB is that nearest-neighbor join keys are first-class SQL. You can filter, join, paginate, and order alongside the similarity search:

WITH q AS (
    SELECT %s::vector AS embedding
)
SELECT
    d.title,
    c.content,
    c.embedding <=> q.embedding AS distance
FROM chunks c
JOIN documents d ON d.id = c.document_id
JOIN q ON true
WHERE d.workspace_id = %s
  AND d.archived = false
ORDER BY c.embedding <=> q.embedding
LIMIT 10;

That filters by workspace and archive status and does ANN search, all in one plan. With a standalone vector DB you'd have to fetch a wider candidate set, then re-filter in your app, then re-rank — pgvector skips the round trips.

Generating the embeddings

Cheapest path on a server with no GPU budget — run a small embedding model locally via Ollama:

curl http://localhost:11434/api/embeddings -d '{
  "model": "nomic-embed-text",
  "prompt": "WireGuard uses UDP and runs in the kernel."
}'

Returns {"embedding": [0.012, -0.430, ...]} — 768 floats — which goes straight into the vector(768) column.

Common pitfalls

Dim mismatch. ERROR: expected 768 dimensions, not 1536. Two embedding models with different output shapes can't share a column; either pick one and stick with it, or store them in separate columns / tables.
Index not being used. EXPLAIN shows Seq Scan. Most common causes: distance operator doesn't match _ops in the index definition; the query has too restrictive a WHERE clause and the planner thinks a scan is cheaper; or you used a parameter without a type cast (%s::vector).
Slow index build. HNSW builds are CPU-bound and single-threaded by default. Bump maintenance_work_mem to ~2GB and set SET max_parallel_maintenance_workers = 7; before running CREATE INDEX. The build will use all the cores you give it.
Vectors not normalized. For cosine, pgvector normalizes internally — you don't need to. For inner product (<#>) you do, otherwise the "distances" are meaningless.
Updating embeddings rebuilds the index? No — HNSW supports inserts and deletes incrementally. But heavy churn degrades recall over time; REINDEX INDEX CONCURRENTLY periodically if the table is write-heavy.