Pick a RAID level

  • RAID 0 — striping. No redundancy. Performance only. One disk dies, everything dies. Don't use for data you can't lose.
  • RAID 1 — mirror. Two disks, both hold identical data. Lose one, the other still has everything. Capacity = size of one disk.
  • RAID 5 — striping with one parity disk. Survives one disk failure. Avoid for >2 TB drives — the rebuild stresses the remaining disks and the URE (uncorrectable read error) rate on modern large drives makes rebuild failure a real outcome.
  • RAID 6 — striping with two parity disks. Survives two disk failures. The right pick for >4 disk arrays of modern HDDs.
  • RAID 10 — stripe-of-mirrors. Half the raw capacity goes to redundancy. Fast, survives multiple disk failures as long as both halves of a mirror don't die together. Recommended for SSD arrays.

For 4 disks: RAID 10 if performance matters, RAID 6 if capacity matters. Avoid RAID 5.

Prepare the disks

lsblk
# Identify the drives you'll use, e.g. /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde

# Wipe any existing data (CAREFUL — double-check device names)
for d in /dev/sd{b,c,d,e}; do
    sudo wipefs -a $d
done

# Optional: partition the disks (mdadm can work on whole disks too)
# Use whole disks if they're identical; partition if you want a small offset.
# Whole-disk style:
DEVS=(/dev/sdb /dev/sdc /dev/sdd /dev/sde)

Create the array

# RAID 10 across four disks
sudo mdadm --create /dev/md0 \
    --level=10 \
    --raid-devices=4 \
    --chunk=512 \
    /dev/sdb /dev/sdc /dev/sdd /dev/sde

# RAID 6 alternative
# sudo mdadm --create /dev/md0 --level=6 --raid-devices=4 ...

# Watch the initial sync
cat /proc/mdstat
watch -n 5 cat /proc/mdstat

Initial sync writes parity / mirrors across the array. On HDDs, expect hours per TB. The array is usable during sync but slower.

Save the config

sudo mdadm --detail --scan | sudo tee -a /etc/mdadm/mdadm.conf
sudo update-initramfs -u
# This is critical: without it, the initramfs may not assemble the array on boot.

Put LVM on top

Why LVM on top of mdraid? Two reasons: resize-friendly logical volumes without repartitioning, and the ability to use multiple file systems on the same array without committing the layout up front.

# Initialize the mdraid as a physical volume
sudo pvcreate /dev/md0

# Create a volume group named "data"
sudo vgcreate data /dev/md0

# Create a logical volume taking, say, 50% of the VG
sudo lvcreate -L 2T -n media data
sudo lvcreate -l 100%FREE -n backups data

# Format and mount
sudo mkfs.xfs /dev/data/media
sudo mkfs.ext4 /dev/data/backups

sudo mkdir -p /mnt/media /mnt/backups
sudo mount /dev/data/media /mnt/media
sudo mount /dev/data/backups /mnt/backups

fstab

# Get the UUIDs
sudo blkid /dev/data/media /dev/data/backups

# Add to /etc/fstab
UUID=<media-uuid>    /mnt/media    xfs   defaults,noatime  0 2
UUID=<backups-uuid>  /mnt/backups  ext4  defaults,noatime  0 2

noatime avoids a write on every file read, which is meaningful on busy mounts.

Monitoring

Two things you need:

  1. mdadm monitor daemon — emails you when a disk fails. Enabled by the Debian/Ubuntu package by default; verify and configure:
    sudo nano /etc/mdadm/mdadm.conf
    # Set: MAILADDR your-email@example.com
    sudo systemctl enable --now mdmonitor
  2. SMART monitoring — predict failures before they happen. smartmontools:
    sudo apt install smartmontools
    sudo systemctl enable --now smartd
    
    # Manual SMART query
    sudo smartctl -a /dev/sdb
    sudo smartctl -t short /dev/sdb       # 1-min self-test
    sudo smartctl -t long  /dev/sdb       # multi-hour deep test

Set up Prometheus's node_exporter with the SMART collector (see that tutorial) and alert on disk-error counters trending upward.

Periodic scrub

Bit rot happens. Without periodic scrubs, latent corruption stays invisible until you need the data and find it's wrong. Linux's mdadm includes a check / repair scheduler — on Debian/Ubuntu, /etc/cron.d/mdadm runs a check on the first Sunday of each month. Verify it's enabled.

Manual scrub:

echo check  | sudo tee /sys/block/md0/md/sync_action     # read both sides, log mismatches
echo repair | sudo tee /sys/block/md0/md/sync_action     # also rewrite to fix
cat /proc/mdstat                                          # watch progress

Replace a failed disk

When /proc/mdstat shows [U_UU] (the underscore = a degraded drive), or you get an mdmonitor email:

cat /proc/mdstat
# md0 : active raid10 sdb[0] sdd[2] sde[3]
#       3906250240 blocks super 1.2 512K chunks 2 near-copies [4/3] [U_UU]

# Identify which physical disk is the failed one by serial
sudo mdadm --detail /dev/md0
sudo smartctl -i /dev/sdc      # the failed one

# Mark it failed (if not already) and remove
sudo mdadm /dev/md0 --fail /dev/sdc
sudo mdadm /dev/md0 --remove /dev/sdc

# Power down, swap the drive, power back up.
# The new drive may show up with the same or different name; identify it
lsblk
sudo wipefs -a /dev/sdc       # wipe any prior signatures on the new drive

# Add it to the array
sudo mdadm /dev/md0 --add /dev/sdc
cat /proc/mdstat              # rebuild starts automatically

Rebuild time depends on disk size and load. The array is usable during rebuild, just slower.

Grow the array

Adding capacity to an existing RAID 5/6/10 array:

# Add a new disk (or two) as a hot-spare-then-reshape
sudo mdadm /dev/md0 --add /dev/sdf
sudo mdadm --grow /dev/md0 --raid-devices=5
# Reshape begins; cat /proc/mdstat to watch

# After reshape finishes, extend the LVM PV to use the new space
sudo pvresize /dev/md0

# Extend a logical volume
sudo lvextend -L +500G /dev/data/media
sudo xfs_growfs /mnt/media        # XFS
# sudo resize2fs /dev/data/media   # ext4

Grow operations are slow but online — the array is available throughout.

Things that go wrong

  • Two drive failures in close succession during rebuild — RAID 5 with large drives is the canonical scenario. Use RAID 6 (or RAID 10) for arrays larger than ~4 TB total.
  • Forgetting update-initramfs -u — the system reboots and the array isn't assembled; the boot fails or comes up degraded.
  • RAID is not a backup. An rm -rf on the mount point is gone from all mirrors immediately. Pair the RAID with restic (see that tutorial) or Btrfs send/receive (that tutorial) for actual backup.
  • Mixing different drive sizes in one array — mdadm uses the smallest as the capacity unit; the extra on larger drives is wasted. Use matched drives.

Alternatives worth knowing

  • ZFS (see that tutorial) — integrated RAID + filesystem + checksums + native compression. More disk space efficient, end-to-end checksummed, but a steeper learning curve and licensing complications on Linux.
  • Btrfs RAID — works for RAID 0/1/10; RAID 5/6 had data-loss bugs for years and is still flagged as experimental. Stick to RAID 1/10 with Btrfs, or use mdadm underneath.
  • Hardware RAID — performant but tied to a specific controller. Controller dies, disks are unreadable on different hardware. For homelab / small business, software RAID is the safer bet.