The mental model

Galera uses a synchronous certification-based replication protocol. The flow per transaction:

  1. Client sends BEGIN + statements to any node.
  2. At COMMIT, the node sends the write-set (all changes) to every other node.
  3. Every node certifies the write-set (checks for conflicts with concurrent commits).
  4. If a majority certify, COMMIT succeeds everywhere; if not, the original node's COMMIT fails with a deadlock-like error.

The result: consistent reads from any node, no failover step needed (multi-master), automatic split-brain protection (quorum required).

Install on 3 nodes

On all three nodes (Debian / Ubuntu):

sudo apt install mariadb-server galera-4

# Stop the auto-started single-node instance
sudo systemctl stop mariadb

Configure

On each node, create /etc/mysql/mariadb.conf.d/60-galera.cnf:

[galera]
# Mandatory for Galera
wsrep_on = ON
wsrep_provider = /usr/lib/galera/libgalera_smm.so
binlog_format = ROW
default_storage_engine = InnoDB
innodb_autoinc_lock_mode = 2
innodb_flush_log_at_trx_commit = 0

# Cluster definition (same on every node)
wsrep_cluster_name = "prod-cluster"
wsrep_cluster_address = "gcomm://10.0.5.10,10.0.5.11,10.0.5.12"

# This node's identity (different per node)
wsrep_node_address = "10.0.5.10"        # <-- THIS NODE'S IP
wsrep_node_name = "node-1"               # <-- THIS NODE'S NAME

# State transfer method (rsync = simplest; mariabackup = production)
wsrep_sst_method = mariabackup
wsrep_sst_auth = "sst:<long-random>"

# Listen on all interfaces (or restrict to cluster network)
bind-address = 0.0.0.0

Verify config:

sudo galera_recovery     # sanity-check the install

Bootstrap the cluster (first node only)

# On node 1 (and only on node 1, the first time)
sudo galera_new_cluster

# Verify
sudo mysql -e "SHOW STATUS LIKE 'wsrep_cluster_size'"
# Should show: wsrep_cluster_size  1

Start the other nodes

# On node 2 and node 3 (in sequence, not in parallel)
sudo systemctl start mariadb

# Verify
sudo mysql -e "SHOW STATUS LIKE 'wsrep_cluster_size'"
# After node 2 joins: 2
# After node 3 joins: 3

Each subsequent node performs a State Snapshot Transfer (SST) from an existing cluster member; for an empty cluster this is instant, for a TB-sized DB it takes time + bandwidth.

Test it

# Create a table on node 1
mysql -h 10.0.5.10 -e "CREATE DATABASE test; USE test; CREATE TABLE t (id INT PRIMARY KEY, v TEXT); INSERT INTO t VALUES (1, 'hello');"

# Read from node 2
mysql -h 10.0.5.11 -e "SELECT * FROM test.t"
# (1, hello)

# Write from node 3
mysql -h 10.0.5.12 -e "INSERT INTO test.t VALUES (2, 'world')"

# Read from node 1
mysql -h 10.0.5.10 -e "SELECT * FROM test.t"
# (1, hello), (2, world)

Any node accepts writes; all nodes see the same data.

Load balancer / read-write splitting

Front the cluster with HAProxy (see that tutorial) for client connection pooling:

# haproxy.cfg
listen mariadb-rw
    bind *:3306
    mode tcp
    option mysql-check user haproxy_check
    balance source             # session affinity by client IP

    server node-1 10.0.5.10:3306 check inter 1s rise 2 fall 2
    server node-2 10.0.5.11:3306 check inter 1s rise 2 fall 2 backup
    server node-3 10.0.5.12:3306 check inter 1s rise 2 fall 2 backup

Apps connect to haproxy:3306; HAProxy routes to node-1 primarily; switches to node-2 / node-3 on failure. Optional: round-robin all three for write traffic spread (be aware of certification conflicts — high contention on the same rows can fail commits).

Optimistic locking + retries

The certification model means a COMMIT can fail with "deadlock" error code (40001 / Galera's SQLSTATE 40001 / 1213) if concurrent commits on different nodes touched the same rows. App code must handle this:

// Pseudo-code
const MAX_RETRIES = 5;
for (int i = 0; i < MAX_RETRIES; i++) {
    try {
        beginTx();
        // ... do work ...
        commit();
        break;
    } catch (DeadlockException e) {
        if (i == MAX_RETRIES - 1) throw;
        sleep(50 * (1 << i));   // exponential backoff
    }
}

Most ORMs have built-in retry-on-deadlock; configure it.

Failure scenarios

  • One node dies. Cluster continues with 2 of 3; reads + writes still work. Restart the failed node; it rejoins via SST.
  • Network partition (split brain). The majority partition continues; the minority partition becomes "non-primary" and refuses writes. When the partition heals, the minority re-syncs.
  • All nodes shut down. Cluster must be bootstrapped from the node with the latest data (galera_new_cluster on the right one).

Backups

# mariabackup is the right tool (live, consistent backup)
mariabackup --backup --target-dir=/backup/$(date +%F) \
    --user=root --password=<pw>

# Compress + ship offsite
tar czf - /backup/$(date +%F) | restic backup --stdin --stdin-filename=mariadb-$(date +%F).tar.gz

Don't rely on Galera replication as backup — a logical corruption (bad DROP TABLE, accidental DELETE) replicates instantly to every node. Real backups offsite, restic to S3 / Garage (see that tutorial).

Performance characteristics

  • Write performance is bounded by network RTT. Each commit waits for majority certification. Same-LAN = fine; cross-region = much slower than single-leader.
  • Read performance scales with node count. Three nodes give roughly 3x read throughput.
  • Concurrent writes to the same rows from different nodes fail. Architect schemas to avoid this (sharding writes by primary key range / time).
  • Large transactions (millions of rows) are problematic. Each node certifies the whole write-set; memory + network spikes. Batch into smaller transactions.

Galera vs alternatives

  • MySQL Group Replication / InnoDB Cluster — Oracle's equivalent. Similar shape, similar trade-offs.
  • Postgres + Patroni (see that tutorial) — single-leader, async or sync replication. Different model; failover takes seconds; one-writer constraint at any time.
  • CockroachDB (see that tutorial) — distributed SQL designed multi-region from the ground up. Postgres-compatible. Heavier ops but better cross-region story.
  • MariaDB Replication (async master-slave) — the classic; no HA; async lag.

When Galera is the right pick

  • App needs MySQL / MariaDB compatibility (existing code).
  • Single-LAN deployment (all nodes < 10ms apart).
  • Read-heavy workload + need for HA writes.
  • You're comfortable handling deadlock retries in app code.

When it isn't

  • Write-heavy workload with high row contention (certification failures dominate).
  • Multi-region (Galera's synchronous protocol is high-latency across the WAN).
  • You can use Postgres — Patroni HA is more proven and the ecosystem is richer.