MariaDB Galera: synchronous multi-master MariaDB

The mental model

Galera uses a synchronous certification-based replication protocol. The flow per transaction:

Client sends BEGIN + statements to any node.
At COMMIT, the node sends the write-set (all changes) to every other node.
Every node certifies the write-set (checks for conflicts with concurrent commits).
If a majority certify, COMMIT succeeds everywhere; if not, the original node's COMMIT fails with a deadlock-like error.

The result: consistent reads from any node, no failover step needed (multi-master), automatic split-brain protection (quorum required).

Install on 3 nodes

On all three nodes (Debian / Ubuntu):

sudo apt install mariadb-server galera-4

# Stop the auto-started single-node instance
sudo systemctl stop mariadb

Configure

On each node, create /etc/mysql/mariadb.conf.d/60-galera.cnf:

[galera]
# Mandatory for Galera
wsrep_on = ON
wsrep_provider = /usr/lib/galera/libgalera_smm.so
binlog_format = ROW
default_storage_engine = InnoDB
innodb_autoinc_lock_mode = 2
innodb_flush_log_at_trx_commit = 0

# Cluster definition (same on every node)
wsrep_cluster_name = "prod-cluster"
wsrep_cluster_address = "gcomm://10.0.5.10,10.0.5.11,10.0.5.12"

# This node's identity (different per node)
wsrep_node_address = "10.0.5.10"        # <-- THIS NODE'S IP
wsrep_node_name = "node-1"               # <-- THIS NODE'S NAME

# State transfer method (rsync = simplest; mariabackup = production)
wsrep_sst_method = mariabackup
wsrep_sst_auth = "sst:<long-random>"

# Listen on all interfaces (or restrict to cluster network)
bind-address = 0.0.0.0

Verify config:

sudo galera_recovery     # sanity-check the install

Bootstrap the cluster (first node only)

# On node 1 (and only on node 1, the first time)
sudo galera_new_cluster

# Verify
sudo mysql -e "SHOW STATUS LIKE 'wsrep_cluster_size'"
# Should show: wsrep_cluster_size  1

Start the other nodes

# On node 2 and node 3 (in sequence, not in parallel)
sudo systemctl start mariadb

# Verify
sudo mysql -e "SHOW STATUS LIKE 'wsrep_cluster_size'"
# After node 2 joins: 2
# After node 3 joins: 3

Each subsequent node performs a State Snapshot Transfer (SST) from an existing cluster member; for an empty cluster this is instant, for a TB-sized DB it takes time + bandwidth.

Test it

# Create a table on node 1
mysql -h 10.0.5.10 -e "CREATE DATABASE test; USE test; CREATE TABLE t (id INT PRIMARY KEY, v TEXT); INSERT INTO t VALUES (1, 'hello');"

# Read from node 2
mysql -h 10.0.5.11 -e "SELECT * FROM test.t"
# (1, hello)

# Write from node 3
mysql -h 10.0.5.12 -e "INSERT INTO test.t VALUES (2, 'world')"

# Read from node 1
mysql -h 10.0.5.10 -e "SELECT * FROM test.t"
# (1, hello), (2, world)

Any node accepts writes; all nodes see the same data.

Load balancer / read-write splitting

Front the cluster with HAProxy (see that tutorial) for client connection pooling:

# haproxy.cfg
listen mariadb-rw
    bind *:3306
    mode tcp
    option mysql-check user haproxy_check
    balance source             # session affinity by client IP

    server node-1 10.0.5.10:3306 check inter 1s rise 2 fall 2
    server node-2 10.0.5.11:3306 check inter 1s rise 2 fall 2 backup
    server node-3 10.0.5.12:3306 check inter 1s rise 2 fall 2 backup

Apps connect to haproxy:3306; HAProxy routes to node-1 primarily; switches to node-2 / node-3 on failure. Optional: round-robin all three for write traffic spread (be aware of certification conflicts — high contention on the same rows can fail commits).

Optimistic locking + retries

The certification model means a COMMIT can fail with "deadlock" error code (40001 / Galera's SQLSTATE 40001 / 1213) if concurrent commits on different nodes touched the same rows. App code must handle this:

// Pseudo-code
const MAX_RETRIES = 5;
for (int i = 0; i < MAX_RETRIES; i++) {
    try {
        beginTx();
        // ... do work ...
        commit();
        break;
    } catch (DeadlockException e) {
        if (i == MAX_RETRIES - 1) throw;
        sleep(50 * (1 << i));   // exponential backoff
    }
}

Most ORMs have built-in retry-on-deadlock; configure it.

Failure scenarios

One node dies. Cluster continues with 2 of 3; reads + writes still work. Restart the failed node; it rejoins via SST.
Network partition (split brain). The majority partition continues; the minority partition becomes "non-primary" and refuses writes. When the partition heals, the minority re-syncs.
All nodes shut down. Cluster must be bootstrapped from the node with the latest data (galera_new_cluster on the right one).

Backups

# mariabackup is the right tool (live, consistent backup)
mariabackup --backup --target-dir=/backup/$(date +%F) \
    --user=root --password=<pw>

# Compress + ship offsite
tar czf - /backup/$(date +%F) | restic backup --stdin --stdin-filename=mariadb-$(date +%F).tar.gz

Don't rely on Galera replication as backup — a logical corruption (bad DROP TABLE, accidental DELETE) replicates instantly to every node. Real backups offsite, restic to S3 / Garage (see that tutorial).

Performance characteristics

Write performance is bounded by network RTT. Each commit waits for majority certification. Same-LAN = fine; cross-region = much slower than single-leader.
Read performance scales with node count. Three nodes give roughly 3x read throughput.
Concurrent writes to the same rows from different nodes fail. Architect schemas to avoid this (sharding writes by primary key range / time).
Large transactions (millions of rows) are problematic. Each node certifies the whole write-set; memory + network spikes. Batch into smaller transactions.

Galera vs alternatives

MySQL Group Replication / InnoDB Cluster — Oracle's equivalent. Similar shape, similar trade-offs.
Postgres + Patroni (see that tutorial) — single-leader, async or sync replication. Different model; failover takes seconds; one-writer constraint at any time.
CockroachDB (see that tutorial) — distributed SQL designed multi-region from the ground up. Postgres-compatible. Heavier ops but better cross-region story.
MariaDB Replication (async master-slave) — the classic; no HA; async lag.

When Galera is the right pick

App needs MySQL / MariaDB compatibility (existing code).
Single-LAN deployment (all nodes < 10ms apart).
Read-heavy workload + need for HA writes.
You're comfortable handling deadlock retries in app code.

When it isn't

Write-heavy workload with high row contention (certification failures dominate).
Multi-region (Galera's synchronous protocol is high-latency across the WAN).
You can use Postgres — Patroni HA is more proven and the ecosystem is richer.