The mental model
Galera uses a synchronous certification-based replication protocol. The flow per transaction:
- Client sends BEGIN + statements to any node.
- At COMMIT, the node sends the write-set (all changes) to every other node.
- Every node certifies the write-set (checks for conflicts with concurrent commits).
- If a majority certify, COMMIT succeeds everywhere; if not, the original node's COMMIT fails with a deadlock-like error.
The result: consistent reads from any node, no failover step needed (multi-master), automatic split-brain protection (quorum required).
Install on 3 nodes
On all three nodes (Debian / Ubuntu):
sudo apt install mariadb-server galera-4
# Stop the auto-started single-node instance
sudo systemctl stop mariadb
Configure
On each node, create /etc/mysql/mariadb.conf.d/60-galera.cnf:
[galera]
# Mandatory for Galera
wsrep_on = ON
wsrep_provider = /usr/lib/galera/libgalera_smm.so
binlog_format = ROW
default_storage_engine = InnoDB
innodb_autoinc_lock_mode = 2
innodb_flush_log_at_trx_commit = 0
# Cluster definition (same on every node)
wsrep_cluster_name = "prod-cluster"
wsrep_cluster_address = "gcomm://10.0.5.10,10.0.5.11,10.0.5.12"
# This node's identity (different per node)
wsrep_node_address = "10.0.5.10" # <-- THIS NODE'S IP
wsrep_node_name = "node-1" # <-- THIS NODE'S NAME
# State transfer method (rsync = simplest; mariabackup = production)
wsrep_sst_method = mariabackup
wsrep_sst_auth = "sst:<long-random>"
# Listen on all interfaces (or restrict to cluster network)
bind-address = 0.0.0.0
Verify config:
sudo galera_recovery # sanity-check the install
Bootstrap the cluster (first node only)
# On node 1 (and only on node 1, the first time)
sudo galera_new_cluster
# Verify
sudo mysql -e "SHOW STATUS LIKE 'wsrep_cluster_size'"
# Should show: wsrep_cluster_size 1
Start the other nodes
# On node 2 and node 3 (in sequence, not in parallel)
sudo systemctl start mariadb
# Verify
sudo mysql -e "SHOW STATUS LIKE 'wsrep_cluster_size'"
# After node 2 joins: 2
# After node 3 joins: 3
Each subsequent node performs a State Snapshot Transfer (SST) from an existing cluster member; for an empty cluster this is instant, for a TB-sized DB it takes time + bandwidth.
Test it
# Create a table on node 1
mysql -h 10.0.5.10 -e "CREATE DATABASE test; USE test; CREATE TABLE t (id INT PRIMARY KEY, v TEXT); INSERT INTO t VALUES (1, 'hello');"
# Read from node 2
mysql -h 10.0.5.11 -e "SELECT * FROM test.t"
# (1, hello)
# Write from node 3
mysql -h 10.0.5.12 -e "INSERT INTO test.t VALUES (2, 'world')"
# Read from node 1
mysql -h 10.0.5.10 -e "SELECT * FROM test.t"
# (1, hello), (2, world)
Any node accepts writes; all nodes see the same data.
Load balancer / read-write splitting
Front the cluster with HAProxy (see that tutorial) for client connection pooling:
# haproxy.cfg
listen mariadb-rw
bind *:3306
mode tcp
option mysql-check user haproxy_check
balance source # session affinity by client IP
server node-1 10.0.5.10:3306 check inter 1s rise 2 fall 2
server node-2 10.0.5.11:3306 check inter 1s rise 2 fall 2 backup
server node-3 10.0.5.12:3306 check inter 1s rise 2 fall 2 backup
Apps connect to haproxy:3306; HAProxy routes to node-1 primarily; switches to node-2 / node-3 on failure. Optional: round-robin all three for write traffic spread (be aware of certification conflicts — high contention on the same rows can fail commits).
Optimistic locking + retries
The certification model means a COMMIT can fail with "deadlock" error code (40001 / Galera's SQLSTATE 40001 / 1213) if concurrent commits on different nodes touched the same rows. App code must handle this:
// Pseudo-code
const MAX_RETRIES = 5;
for (int i = 0; i < MAX_RETRIES; i++) {
try {
beginTx();
// ... do work ...
commit();
break;
} catch (DeadlockException e) {
if (i == MAX_RETRIES - 1) throw;
sleep(50 * (1 << i)); // exponential backoff
}
}
Most ORMs have built-in retry-on-deadlock; configure it.
Failure scenarios
- One node dies. Cluster continues with 2 of 3; reads + writes still work. Restart the failed node; it rejoins via SST.
- Network partition (split brain). The majority partition continues; the minority partition becomes "non-primary" and refuses writes. When the partition heals, the minority re-syncs.
- All nodes shut down. Cluster must be bootstrapped from the node with the latest data (
galera_new_clusteron the right one).
Backups
# mariabackup is the right tool (live, consistent backup)
mariabackup --backup --target-dir=/backup/$(date +%F) \
--user=root --password=<pw>
# Compress + ship offsite
tar czf - /backup/$(date +%F) | restic backup --stdin --stdin-filename=mariadb-$(date +%F).tar.gz
Don't rely on Galera replication as backup — a logical corruption (bad DROP TABLE, accidental DELETE) replicates instantly to every node. Real backups offsite, restic to S3 / Garage (see that tutorial).
Performance characteristics
- Write performance is bounded by network RTT. Each commit waits for majority certification. Same-LAN = fine; cross-region = much slower than single-leader.
- Read performance scales with node count. Three nodes give roughly 3x read throughput.
- Concurrent writes to the same rows from different nodes fail. Architect schemas to avoid this (sharding writes by primary key range / time).
- Large transactions (millions of rows) are problematic. Each node certifies the whole write-set; memory + network spikes. Batch into smaller transactions.
Galera vs alternatives
- MySQL Group Replication / InnoDB Cluster — Oracle's equivalent. Similar shape, similar trade-offs.
- Postgres + Patroni (see that tutorial) — single-leader, async or sync replication. Different model; failover takes seconds; one-writer constraint at any time.
- CockroachDB (see that tutorial) — distributed SQL designed multi-region from the ground up. Postgres-compatible. Heavier ops but better cross-region story.
- MariaDB Replication (async master-slave) — the classic; no HA; async lag.
When Galera is the right pick
- App needs MySQL / MariaDB compatibility (existing code).
- Single-LAN deployment (all nodes < 10ms apart).
- Read-heavy workload + need for HA writes.
- You're comfortable handling deadlock retries in app code.
When it isn't
- Write-heavy workload with high row contention (certification failures dominate).
- Multi-region (Galera's synchronous protocol is high-latency across the WAN).
- You can use Postgres — Patroni HA is more proven and the ecosystem is richer.