The stack
- Postgres — the actual database, one instance per node, with streaming replication between them.
- Patroni — the supervisor. One per node. Coordinates which instance is primary; promotes a replica on failover.
- etcd (or Consul, or ZooKeeper, or Kubernetes API) — the Distributed Configuration Store (DCS). Holds the cluster state; the source of truth for "who is the current leader."
- HAProxy (see that tutorial) — in front of the cluster, routes writes to the current primary by querying Patroni's REST API per backend.
- PgBouncer (see that tutorial) — optional; between HAProxy and Postgres for connection pooling.
Topology: 3 Postgres nodes + 3 etcd nodes
For real HA, etcd is its own 3-node Raft cluster (can lose 1 node), and Postgres is 3 nodes too. For homelab, you can colocate etcd on the same boxes as Postgres — smaller blast radius is the only difference.
Install on Debian (each Postgres node)
sudo apt install postgresql-16 postgresql-contrib python3-pip python3-psycopg2
sudo pip3 install patroni[etcd] # plus other DCS backends as needed
# Don't let the default Postgres start — Patroni manages it
sudo systemctl stop postgresql
sudo systemctl disable postgresql
# Stop the default cluster
sudo pg_dropcluster --stop 16 main
etcd cluster (separately or colocated)
sudo apt install etcd-server etcd-client
# /etc/default/etcd on each etcd node
ETCD_NAME="etcd-1"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_LISTEN_PEER_URLS="http://10.0.5.10:2380"
ETCD_LISTEN_CLIENT_URLS="http://10.0.5.10:2379,http://127.0.0.1:2379"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.0.5.10:2380"
ETCD_INITIAL_CLUSTER="etcd-1=http://10.0.5.10:2380,etcd-2=http://10.0.5.11:2380,etcd-3=http://10.0.5.12:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="pg-cluster"
ETCD_ADVERTISE_CLIENT_URLS="http://10.0.5.10:2379"
sudo systemctl enable --now etcd
etcdctl member list
Patroni config (one per Postgres node)
# /etc/patroni/patroni.yml on pg-1
scope: prod-pg
namespace: /db/
name: pg-1
restapi:
listen: 10.0.6.10:8008
connect_address: 10.0.6.10:8008
etcd3:
hosts: 10.0.5.10:2379,10.0.5.11:2379,10.0.5.12:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
synchronous_mode: true
synchronous_mode_strict: false
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
max_connections: 200
max_wal_senders: 10
wal_keep_size: 512MB
max_replication_slots: 10
wal_level: replica
hot_standby: on
archive_mode: on
archive_command: 'wal-g wal-push %p'
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 10.0.6.0/24 scram-sha-256
- host all all 10.0.0.0/8 scram-sha-256
- local all all peer
users:
admin:
password: <long-random>
options: [createrole, createdb]
postgresql:
listen: 10.0.6.10:5432
connect_address: 10.0.6.10:5432
data_dir: /var/lib/postgresql/16/main
bin_dir: /usr/lib/postgresql/16/bin
pgpass: /tmp/pgpass
authentication:
superuser:
username: postgres
password: <long-random>
replication:
username: replicator
password: <long-random>
parameters:
unix_socket_directories: '/var/run/postgresql'
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
Same config on pg-2 and pg-3, with their names and IPs adjusted. Drop the bootstrap: section after the first node initializes — subsequent nodes pull state from the DCS.
Run Patroni
sudo tee /etc/systemd/system/patroni.service <<'EOF'
[Unit]
Description=Patroni Postgres HA
After=network-online.target etcd.service
Wants=network-online.target
[Service]
User=postgres
Group=postgres
Type=simple
ExecStart=/usr/local/bin/patroni /etc/patroni/patroni.yml
Restart=always
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now patroni
sudo systemctl status patroni
Within 30 seconds, one node becomes the leader; the other two start streaming replication from it.
Inspect cluster state
patronictl -c /etc/patroni/patroni.yml list
# + Cluster: prod-pg (7300000000000000001) ----+----+-----------+
# | Member | Host | Role | State | TL | Lag in MB |
# +--------+------------+---------+---------+----+-----------+
# | pg-1 | 10.0.6.10 | Leader | running | 1 | |
# | pg-2 | 10.0.6.11 | Replica | running | 1 | 0 |
# | pg-3 | 10.0.6.12 | Replica | running | 1 | 0 |
# +--------+------------+---------+---------+----+-----------+
HAProxy in front for transparent failover
Patroni exposes a REST API at :8008 on each node. The endpoints /leader and /replica respond with HTTP 200 only when the local instance is in that role. HAProxy uses this as a health check:
# haproxy.cfg
listen postgres-write
bind *:5432
mode tcp
option httpchk GET /leader
http-check expect status 200
server pg-1 10.0.6.10:5432 check port 8008 inter 1s rise 2 fall 2
server pg-2 10.0.6.11:5432 check port 8008 inter 1s rise 2 fall 2 backup
server pg-3 10.0.6.12:5432 check port 8008 inter 1s rise 2 fall 2 backup
listen postgres-read
bind *:5433
mode tcp
balance roundrobin
option httpchk GET /replica
http-check expect status 200
server pg-1 10.0.6.10:5432 check port 8008 inter 1s
server pg-2 10.0.6.11:5432 check port 8008 inter 1s
server pg-3 10.0.6.12:5432 check port 8008 inter 1s
Apps connect to haproxy:5432 for writes, haproxy:5433 for read-only queries. When the primary fails, HAProxy detects the changed health-check responses (one of the formerly-replicas is now the new leader) and silently routes writes to the new one within a few seconds.
Trigger a failover (for testing)
# Graceful switch
patronictl -c /etc/patroni/patroni.yml switchover prod-pg
# Force a failover (simulated primary failure)
sudo systemctl stop patroni # on the current leader
Watch patronictl list on another node — within ~10-20 seconds a new leader is elected, HAProxy notices, clients resume writes against the new leader. Total downtime depends on Patroni's ttl and loop_wait + HAProxy's inter + rise settings; with the above, expect 15-30s.
Backups: WAL-G to S3
Don't rely on replication as backup. WAL-G ships continuous WAL + periodic base-backup to S3/MinIO:
sudo apt install wal-g
# /etc/wal-g.env
WALG_S3_PREFIX=s3://my-pg-backups/prod-pg
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_ENDPOINT=https://s3.example.com # for MinIO; omit for AWS
WALG_COMPRESSION_METHOD=brotli
# Patroni runs this on the leader via archive_command
WALG_ENV_FILE=/etc/wal-g.env wal-g backup-push /var/lib/postgresql/16/main
Daily base backup + continuous WAL streaming = point-in-time recovery to any second.
Split-brain protection
Synchronous replication (synchronous_mode: true above) requires the primary to confirm with at least one replica before acknowledging writes. Combined with Patroni's etcd-backed leader lock (a primary that loses etcd contact demotes itself), it prevents a partitioned primary from accepting writes that other partition's replicas don't see.
Worth knowing
- Patroni doesn't replace your familiarity with Postgres. All the usual Postgres tuning (work_mem, shared_buffers, max_connections, indexes) still matters; Patroni just makes the failover automatic.
- etcd's clock matters. Pair with chrony (see that tutorial) on every node; serious clock skew causes etcd leader-election issues.
- Don't run Patroni on Kubernetes from scratch. Use the official Zalando Postgres Operator or CloudNativePG; they wrap Patroni with K8s-native primitives.