Pick a RAID level
- RAID 0 — striping. No redundancy. Performance only. One disk dies, everything dies. Don't use for data you can't lose.
- RAID 1 — mirror. Two disks, both hold identical data. Lose one, the other still has everything. Capacity = size of one disk.
- RAID 5 — striping with one parity disk. Survives one disk failure. Avoid for >2 TB drives — the rebuild stresses the remaining disks and the URE (uncorrectable read error) rate on modern large drives makes rebuild failure a real outcome.
- RAID 6 — striping with two parity disks. Survives two disk failures. The right pick for >4 disk arrays of modern HDDs.
- RAID 10 — stripe-of-mirrors. Half the raw capacity goes to redundancy. Fast, survives multiple disk failures as long as both halves of a mirror don't die together. Recommended for SSD arrays.
For 4 disks: RAID 10 if performance matters, RAID 6 if capacity matters. Avoid RAID 5.
Prepare the disks
lsblk
# Identify the drives you'll use, e.g. /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde
# Wipe any existing data (CAREFUL — double-check device names)
for d in /dev/sd{b,c,d,e}; do
sudo wipefs -a $d
done
# Optional: partition the disks (mdadm can work on whole disks too)
# Use whole disks if they're identical; partition if you want a small offset.
# Whole-disk style:
DEVS=(/dev/sdb /dev/sdc /dev/sdd /dev/sde)
Create the array
# RAID 10 across four disks
sudo mdadm --create /dev/md0 \
--level=10 \
--raid-devices=4 \
--chunk=512 \
/dev/sdb /dev/sdc /dev/sdd /dev/sde
# RAID 6 alternative
# sudo mdadm --create /dev/md0 --level=6 --raid-devices=4 ...
# Watch the initial sync
cat /proc/mdstat
watch -n 5 cat /proc/mdstat
Initial sync writes parity / mirrors across the array. On HDDs, expect hours per TB. The array is usable during sync but slower.
Save the config
sudo mdadm --detail --scan | sudo tee -a /etc/mdadm/mdadm.conf
sudo update-initramfs -u
# This is critical: without it, the initramfs may not assemble the array on boot.
Put LVM on top
Why LVM on top of mdraid? Two reasons: resize-friendly logical volumes without repartitioning, and the ability to use multiple file systems on the same array without committing the layout up front.
# Initialize the mdraid as a physical volume
sudo pvcreate /dev/md0
# Create a volume group named "data"
sudo vgcreate data /dev/md0
# Create a logical volume taking, say, 50% of the VG
sudo lvcreate -L 2T -n media data
sudo lvcreate -l 100%FREE -n backups data
# Format and mount
sudo mkfs.xfs /dev/data/media
sudo mkfs.ext4 /dev/data/backups
sudo mkdir -p /mnt/media /mnt/backups
sudo mount /dev/data/media /mnt/media
sudo mount /dev/data/backups /mnt/backups
fstab
# Get the UUIDs
sudo blkid /dev/data/media /dev/data/backups
# Add to /etc/fstab
UUID=<media-uuid> /mnt/media xfs defaults,noatime 0 2
UUID=<backups-uuid> /mnt/backups ext4 defaults,noatime 0 2
noatime avoids a write on every file read, which is meaningful on busy mounts.
Monitoring
Two things you need:
- mdadm monitor daemon — emails you when a disk fails. Enabled by the Debian/Ubuntu package by default; verify and configure:
sudo nano /etc/mdadm/mdadm.conf # Set: MAILADDR your-email@example.com sudo systemctl enable --now mdmonitor - SMART monitoring — predict failures before they happen.
smartmontools:sudo apt install smartmontools sudo systemctl enable --now smartd # Manual SMART query sudo smartctl -a /dev/sdb sudo smartctl -t short /dev/sdb # 1-min self-test sudo smartctl -t long /dev/sdb # multi-hour deep test
Set up Prometheus's node_exporter with the SMART collector (see that tutorial) and alert on disk-error counters trending upward.
Periodic scrub
Bit rot happens. Without periodic scrubs, latent corruption stays invisible until you need the data and find it's wrong. Linux's mdadm includes a check / repair scheduler — on Debian/Ubuntu, /etc/cron.d/mdadm runs a check on the first Sunday of each month. Verify it's enabled.
Manual scrub:
echo check | sudo tee /sys/block/md0/md/sync_action # read both sides, log mismatches
echo repair | sudo tee /sys/block/md0/md/sync_action # also rewrite to fix
cat /proc/mdstat # watch progress
Replace a failed disk
When /proc/mdstat shows [U_UU] (the underscore = a degraded drive), or you get an mdmonitor email:
cat /proc/mdstat
# md0 : active raid10 sdb[0] sdd[2] sde[3]
# 3906250240 blocks super 1.2 512K chunks 2 near-copies [4/3] [U_UU]
# Identify which physical disk is the failed one by serial
sudo mdadm --detail /dev/md0
sudo smartctl -i /dev/sdc # the failed one
# Mark it failed (if not already) and remove
sudo mdadm /dev/md0 --fail /dev/sdc
sudo mdadm /dev/md0 --remove /dev/sdc
# Power down, swap the drive, power back up.
# The new drive may show up with the same or different name; identify it
lsblk
sudo wipefs -a /dev/sdc # wipe any prior signatures on the new drive
# Add it to the array
sudo mdadm /dev/md0 --add /dev/sdc
cat /proc/mdstat # rebuild starts automatically
Rebuild time depends on disk size and load. The array is usable during rebuild, just slower.
Grow the array
Adding capacity to an existing RAID 5/6/10 array:
# Add a new disk (or two) as a hot-spare-then-reshape
sudo mdadm /dev/md0 --add /dev/sdf
sudo mdadm --grow /dev/md0 --raid-devices=5
# Reshape begins; cat /proc/mdstat to watch
# After reshape finishes, extend the LVM PV to use the new space
sudo pvresize /dev/md0
# Extend a logical volume
sudo lvextend -L +500G /dev/data/media
sudo xfs_growfs /mnt/media # XFS
# sudo resize2fs /dev/data/media # ext4
Grow operations are slow but online — the array is available throughout.
Things that go wrong
- Two drive failures in close succession during rebuild — RAID 5 with large drives is the canonical scenario. Use RAID 6 (or RAID 10) for arrays larger than ~4 TB total.
- Forgetting
update-initramfs -u— the system reboots and the array isn't assembled; the boot fails or comes up degraded. - RAID is not a backup. An
rm -rfon the mount point is gone from all mirrors immediately. Pair the RAID with restic (see that tutorial) or Btrfs send/receive (that tutorial) for actual backup. - Mixing different drive sizes in one array — mdadm uses the smallest as the capacity unit; the extra on larger drives is wasted. Use matched drives.
Alternatives worth knowing
- ZFS (see that tutorial) — integrated RAID + filesystem + checksums + native compression. More disk space efficient, end-to-end checksummed, but a steeper learning curve and licensing complications on Linux.
- Btrfs RAID — works for RAID 0/1/10; RAID 5/6 had data-loss bugs for years and is still flagged as experimental. Stick to RAID 1/10 with Btrfs, or use mdadm underneath.
- Hardware RAID — performant but tied to a specific controller. Controller dies, disks are unreadable on different hardware. For homelab / small business, software RAID is the safer bet.