How to Manually Re-enable Failed Paths Reported by Multipath in CentOS/RHEL

The Problem

When running the multipath command it could be seen devices reporting failed paths, like “sdb” in below outputs.


# multipath -l
3600144f0da627ad700005f6aaefc0004 dm-4 SUN     ,ZFS Storage 7120
size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=0 status=active
  `- 5:0:0:2 sdd 8:48 active undef running
3600144f0da627ad700005f6aaf110005 dm-5 SUN     ,ZFS Storage 7120
size=30G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=0 status=active
  `- 5:0:0:3 sde 8:64 active undef running
3600144f0da627ad700005f6aaec30003 dm-3 SUN     ,ZFS Storage 7120
size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=0 status=active
  `- 5:0:0:1 sdc 8:32 active undef running
3600144f0da627ad700005f6aaea10002 dm-2 SUN     ,ZFS Storage 7120
size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=0 status=enabled
  `- 5:0:0:0 sdb 8:16 failed undef running
#

After checking confirmed that the underlying storage devices were all online and available but the device mapper multipath layer did not automatically re-enable the paths.


# iscsiadm -m session -P3
... output omittend ...
                ************************
                Attached SCSI devices:
                ************************
                Host Number: 5  State: running
                scsi5 Channel 00 Id 0 Lun: 0
                        Attached scsi disk sdb          State: running
                scsi5 Channel 00 Id 0 Lun: 1
                        Attached scsi disk sdc          State: running
                scsi5 Channel 00 Id 0 Lun: 2
                        Attached scsi disk sdd          State: running
                scsi5 Channel 00 Id 0 Lun: 3
                        Attached scsi disk sde          State: running
#

The Solution

Although a full server reboot would recover access to all paths a non-disruptive manual procedure was preferred that could be performed online.

The procedure to re-enable failed or down paths is as follows:

  • Delete the “failed” device paths from the OS.
  • Rescan storage via the HBAs to reinitialize (re-enable paths for) those devices.
  • Refresh multipath view to enable the paths to those devices.

The detailed action plan is:

1. In order to delete device paths from the OS, write to a file in sysfs for the particular device. Specifically, write the value “1” to the file /sys/block/${DEV}/device/delete.

Example: To remove a failed device path for /dev/sdb, run the following command:

# echo 1 > /sys/block/sdb/device/delete

2. Do the above step for all devices that have a failed path listed in the multipath output. To rescan the HBAs issue the command:

# echo "- - -" > /sys/class/scsi_host/${HBA}/scan

The HBAs are numbered in the order they are discovered and named “hostN” (i.e., “host0”, “host1”, etc.).

Therefore to rescan HBA 0 issue the command:

# echo "- - -" > /sys/class/scsi_host/host0/scan

Needs to rescan all the HBAs that have paths leading to the devices that one has deleted using step 1.

3. To get the multipath service to look for changes to paths associated with managed devices, issue the command multipath. Using multipath -v2 to provide a more verbose output of the status.

Conclusion

The multipath service periodically rescans the devices it manages to find state-change information. The multipath command can be run manually if you do not want to wait for the periodic rescan.