The stack

  • Postgres — the actual database, one instance per node, with streaming replication between them.
  • Patroni — the supervisor. One per node. Coordinates which instance is primary; promotes a replica on failover.
  • etcd (or Consul, or ZooKeeper, or Kubernetes API) — the Distributed Configuration Store (DCS). Holds the cluster state; the source of truth for "who is the current leader."
  • HAProxy (see that tutorial) — in front of the cluster, routes writes to the current primary by querying Patroni's REST API per backend.
  • PgBouncer (see that tutorial) — optional; between HAProxy and Postgres for connection pooling.

Topology: 3 Postgres nodes + 3 etcd nodes

For real HA, etcd is its own 3-node Raft cluster (can lose 1 node), and Postgres is 3 nodes too. For homelab, you can colocate etcd on the same boxes as Postgres — smaller blast radius is the only difference.

Install on Debian (each Postgres node)

sudo apt install postgresql-16 postgresql-contrib python3-pip python3-psycopg2
sudo pip3 install patroni[etcd]  # plus other DCS backends as needed

# Don't let the default Postgres start — Patroni manages it
sudo systemctl stop postgresql
sudo systemctl disable postgresql

# Stop the default cluster
sudo pg_dropcluster --stop 16 main

etcd cluster (separately or colocated)

sudo apt install etcd-server etcd-client

# /etc/default/etcd on each etcd node
ETCD_NAME="etcd-1"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_LISTEN_PEER_URLS="http://10.0.5.10:2380"
ETCD_LISTEN_CLIENT_URLS="http://10.0.5.10:2379,http://127.0.0.1:2379"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.0.5.10:2380"
ETCD_INITIAL_CLUSTER="etcd-1=http://10.0.5.10:2380,etcd-2=http://10.0.5.11:2380,etcd-3=http://10.0.5.12:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="pg-cluster"
ETCD_ADVERTISE_CLIENT_URLS="http://10.0.5.10:2379"

sudo systemctl enable --now etcd
etcdctl member list

Patroni config (one per Postgres node)

# /etc/patroni/patroni.yml on pg-1
scope: prod-pg
namespace: /db/
name: pg-1

restapi:
  listen: 10.0.6.10:8008
  connect_address: 10.0.6.10:8008

etcd3:
  hosts: 10.0.5.10:2379,10.0.5.11:2379,10.0.5.12:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    synchronous_mode: true
    synchronous_mode_strict: false
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        max_connections: 200
        max_wal_senders: 10
        wal_keep_size: 512MB
        max_replication_slots: 10
        wal_level: replica
        hot_standby: on
        archive_mode: on
        archive_command: 'wal-g wal-push %p'

  initdb:
    - encoding: UTF8
    - data-checksums

  pg_hba:
    - host  replication  replicator  10.0.6.0/24  scram-sha-256
    - host  all          all         10.0.0.0/8   scram-sha-256
    - local all          all                      peer

  users:
    admin:
      password: <long-random>
      options: [createrole, createdb]

postgresql:
  listen: 10.0.6.10:5432
  connect_address: 10.0.6.10:5432
  data_dir: /var/lib/postgresql/16/main
  bin_dir: /usr/lib/postgresql/16/bin
  pgpass: /tmp/pgpass
  authentication:
    superuser:
      username: postgres
      password: <long-random>
    replication:
      username: replicator
      password: <long-random>
  parameters:
    unix_socket_directories: '/var/run/postgresql'

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false
  nosync: false

Same config on pg-2 and pg-3, with their names and IPs adjusted. Drop the bootstrap: section after the first node initializes — subsequent nodes pull state from the DCS.

Run Patroni

sudo tee /etc/systemd/system/patroni.service <<'EOF'
[Unit]
Description=Patroni Postgres HA
After=network-online.target etcd.service
Wants=network-online.target

[Service]
User=postgres
Group=postgres
Type=simple
ExecStart=/usr/local/bin/patroni /etc/patroni/patroni.yml
Restart=always

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now patroni
sudo systemctl status patroni

Within 30 seconds, one node becomes the leader; the other two start streaming replication from it.

Inspect cluster state

patronictl -c /etc/patroni/patroni.yml list

# + Cluster: prod-pg (7300000000000000001) ----+----+-----------+
# | Member | Host       | Role    | State   | TL | Lag in MB |
# +--------+------------+---------+---------+----+-----------+
# | pg-1   | 10.0.6.10  | Leader  | running |  1 |           |
# | pg-2   | 10.0.6.11  | Replica | running |  1 |       0   |
# | pg-3   | 10.0.6.12  | Replica | running |  1 |       0   |
# +--------+------------+---------+---------+----+-----------+

HAProxy in front for transparent failover

Patroni exposes a REST API at :8008 on each node. The endpoints /leader and /replica respond with HTTP 200 only when the local instance is in that role. HAProxy uses this as a health check:

# haproxy.cfg
listen postgres-write
    bind *:5432
    mode tcp
    option httpchk GET /leader
    http-check expect status 200

    server pg-1 10.0.6.10:5432 check port 8008 inter 1s rise 2 fall 2
    server pg-2 10.0.6.11:5432 check port 8008 inter 1s rise 2 fall 2 backup
    server pg-3 10.0.6.12:5432 check port 8008 inter 1s rise 2 fall 2 backup

listen postgres-read
    bind *:5433
    mode tcp
    balance roundrobin
    option httpchk GET /replica
    http-check expect status 200

    server pg-1 10.0.6.10:5432 check port 8008 inter 1s
    server pg-2 10.0.6.11:5432 check port 8008 inter 1s
    server pg-3 10.0.6.12:5432 check port 8008 inter 1s

Apps connect to haproxy:5432 for writes, haproxy:5433 for read-only queries. When the primary fails, HAProxy detects the changed health-check responses (one of the formerly-replicas is now the new leader) and silently routes writes to the new one within a few seconds.

Trigger a failover (for testing)

# Graceful switch
patronictl -c /etc/patroni/patroni.yml switchover prod-pg

# Force a failover (simulated primary failure)
sudo systemctl stop patroni       # on the current leader

Watch patronictl list on another node — within ~10-20 seconds a new leader is elected, HAProxy notices, clients resume writes against the new leader. Total downtime depends on Patroni's ttl and loop_wait + HAProxy's inter + rise settings; with the above, expect 15-30s.

Backups: WAL-G to S3

Don't rely on replication as backup. WAL-G ships continuous WAL + periodic base-backup to S3/MinIO:

sudo apt install wal-g

# /etc/wal-g.env
WALG_S3_PREFIX=s3://my-pg-backups/prod-pg
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_ENDPOINT=https://s3.example.com    # for MinIO; omit for AWS
WALG_COMPRESSION_METHOD=brotli

# Patroni runs this on the leader via archive_command
WALG_ENV_FILE=/etc/wal-g.env wal-g backup-push /var/lib/postgresql/16/main

Daily base backup + continuous WAL streaming = point-in-time recovery to any second.

Split-brain protection

Synchronous replication (synchronous_mode: true above) requires the primary to confirm with at least one replica before acknowledging writes. Combined with Patroni's etcd-backed leader lock (a primary that loses etcd contact demotes itself), it prevents a partitioned primary from accepting writes that other partition's replicas don't see.

Worth knowing

  • Patroni doesn't replace your familiarity with Postgres. All the usual Postgres tuning (work_mem, shared_buffers, max_connections, indexes) still matters; Patroni just makes the failover automatic.
  • etcd's clock matters. Pair with chrony (see that tutorial) on every node; serious clock skew causes etcd leader-election issues.
  • Don't run Patroni on Kubernetes from scratch. Use the official Zalando Postgres Operator or CloudNativePG; they wrap Patroni with K8s-native primitives.