How to hot-swap SATA disks on Linux

From Leo's Notes
Last edited on 10 October 2023, at 21:38.

When running Linux on a server or NAS device, you might want to hot-swap disks without bringing the system down. For machines without a RAID controller, this might be a little bit tricker since you have to tell Linux to remove and add the device manually. This page will go over how to hot-swap SATA devices in Linux.

Removing a Drive

If a device becomes unresponsive or if you simply just want to remove a SATA disk from a running system:

  1. Ensure the disk is unmounted.
  2. Ensure the disk isn't used by swap or LVM groups.
  3. Remove the disk from the system by running echo 1 > /sys/block/sdX/device/delete. This should also power off the drive. If this doesn't work, try using hdparm to take the deivce to the lowest power setting with hdparm -Y /dev/sdx and then try again.

Once the drive spins down, you can disconnect the power and SATA connector.

Rescanning SATA Bus

Depending on the chipset and SATA controller, when you connect a drive to the system, you may need to force a rescan of the bus before the drive shows up on the system. On Linux, you will need to trigger a rescan.

Determine the controller the disk is attached to and trigger a rescan. Listing the scsi_host directory should give you a clue on what host is associated with which device:

# ls -al /sys/class/scsi_host
lrwxrwxrwx  1 root root 0 Feb 18 22:21 host0 -> ../../devices/pci0000:00/0000:00:06.0/ata1/host0/scsi_host/host0/
lrwxrwxrwx  1 root root 0 Feb 18 22:21 host1 -> ../../devices/pci0000:00/0000:00:06.0/ata2/host1/scsi_host/host1/
lrwxrwxrwx  1 root root 0 Feb 18 22:21 host2 -> ../../devices/pci0000:00/0000:00:04.0/0000:01:06.0/ata3/host2/scsi_host/host2/

Trigger a rescan by writing 3 dashes (which denotes wildcards) to the scan command. Each of the 3 fields represents the channel, SCSI target ID, and LUN, respectively.

# echo "- - -" > /sys/class/scsi_host/host#/scan

If you're not sure which controller your disk is connected to, you may try to rescan all controllers:

# for arg in /sys/class/scsi_host/*/scan; do echo "- - -" > $arg; done

Once the rescan is triggered, run dmesg to see if a new disk was attached to the system.