Understanding Linux multipath configuration file /etc/multipath.conf

DM-Multipath allows many of the features to be user-configurable using the configuration file /etc/multipath.conf. multipath command and multipathd use the configuration information from this file. This file is consulted only during the configuration of multipath devices.

In other words, if the user makes any changes to this file, then the multipath command needs to be rerun to configure the multipath devices (i.e. the user has to do multipath -F followed by multipath). Support for many of the devices (as listed below) is inbuilt in the user-space component of DM-Multipath. If the support for a specific storage device is not inbuilt or the user wants to override some of the values only then the user needs to modify this file.

This file has 5 sections: 1. System level defaults ("defaults"): Where the user can specify system level default override. 2. Blacklisted devices ("blacklist"): User can specify the list of devices they do not want to be under the control of DM-Multipath. These devices will be excluded. 3. Blacklist exceptions ("blacklist_exceptions"): Specific devices to be treated as multipath candidates even if they exist in the blacklist. 4. Storage controller specific settings ("devices"): User specified configuration settings will be applied to devices with specified “Vendor” and “Product” information. 5. Device specific settings ("multipaths"): User can fine tune configuration settings for individual LUNs. Persistent device names: The names (uid_names or mpath names or alias names) that appear in /dev/mapper are persistent across boots, and the names dm-, dm-1 etc., can change between reboots. So, it is advisable to use the device names that appear under /dev/mapper and avoid using the dm-? names.

1. Restart of tools after changing multipath.conf file Once multipath.conf file is changed, the multipath tools need to be rerun for those configuration values to be effective. One has to kill multipathd, run multipath -F and then restart multipathd and multipath.

Applying Changes Made to the /etc/multipath.conf File: Changes to the /etc/multipath.conf file cannot take effect when multipathd is running. After you make changes, save and close the file, then do the following to apply the changes:

- Stop the multipathd service. - Clear old multipath bindings by entering

/sbin/multipath -F

- Create new multipath bindings by entering:

/sbin/multipath -v2 -l

- Start the multipathd service. - Run mkinitrd to re-create the INITRD on your system, then reboot in order for the changes to take effect.

# service multipathd reload
# dracut --force --add multipath --include /etc/multipath /etc/multipath

2. Devices with partitions: Create device partitions before running multipath, as kpartx is configured to run to create multipathed partitions that way. Partitions on device mpath0 appear as /dev/mapper/mpath0p1, /dev/mapper/mpath0p2, etc.,

3. Using binding file in a clustered environment: Bindings file holds the bindings between the device mapper names and the uid of the underlying device. By default the file is /var/lib/multipath/bindings or /etc/multipath/bindings , this can be changed by the multipath command line option -b. In a clustered environment, this file can be created in one node and can be transferred to another to get the same names. Note that the same effect can also be achieved by using alias and having the same multipath.conf file in all the nodes of the cluster.

4. Getting the multipath device name corresponding to a SCSI device: If one knows the name of a SCSI device and wants to get the device mapper name associated with that the could use multipath -l /dev/sda, where sda is the SCSI device. On the other hand, if one knows the device mapper name and wants to know the underlying device names they could use the same command with the device mapper name. i.e multipath -l mpath0, where mpath0 is the device mapper name.

5. When using LVM on dm-multipath devices, it is better to turn lvm scanning off on the underlying SCSI devices. This can be done by changing the filter parameter in /etc/lvm/lvm.conf to be filter = [ “a/dev/mapper/.*/”, “r/dev/sd.*/” ]. If your root device is also a multipathed lvm device, then make the above change before you create a new initrd image. Recreate a new initrd/initramfs before rebooting after making this change.

Rebuilding the initrd (CentOS/RHEL 5)

1. It is recommended to make a backup copy of the initrd in case the new version has an unexpected problem:

# cp /boot/initrd-$(uname -r).img /boot/initrd-$(uname -r).img.bak

2. Now build the initrd:

# mkinitrd -f -v /boot/initrd-$(uname -r).img $(uname -r)

3. If you are in a kernel version different to the initrd you are building (including if you are in Rescue Mode) you must specify the full kernel version, without architecture:

# mkinitrd -f -v /boot/initrd-2.6.39-400.21.1.el5uek.img 2.6.39-400.21.1.el5uek

The -v verbose flag causes mkinitrd to display the names of all the modules it is including in the initial ramdisk. The -f option will force an overwrite of any existing initial ramdisk image at the path you have specified.

Rebuilding the initramfs (CentOS/RHEL 6)

1. It is recommended you make a backup copy of the initrd in case the new version has an unexpected problem:

# cp /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak

2. Now rebuild the initramfs for the current kernel version:

# dracut -f

3. If you are in a kernel version different to the initrd you are building (also if you are in Rescue Mode) you must specify the full kernel version, including architecture:

# dracut -f /boot/initramfs-2.6.39-400.17.1.el6uek.x86_64.img 2.6.39-400.17.1.el6uek.x86_64

The -f option will force an overwrite of any existing initial ramdisk image at the path you have specified.

Multipath Configuration Defaults

1. polling_interval: Specifies the interval between two path checks in seconds. For properly functioning paths, the interval between checks will gradually increase to (4 * polling_interval). The default value is 5

2. udev_dir: the directory where udev device nodes are created. The default value is /dev.

3. multipath_dir: The directory where the dynamic shared objects are stored. The default value is system dependent, commonly /lib/multipath.

4. path_selector - round-robin 0: Loop through every path in the path group, sending the same amount of I/O to each. - queue-length 0: Send the next bunch of I/O down the path with the least number of outstanding I/O requests. - service-time 0: Send the next bunch of I/O down the path with the shortest estimated service time, which is determined by dividing the total size of the outstanding I/O to each path by its relative throughput. The default value is round-robin 0.

5. path_grouping_policy: Specifies the default path grouping policy to apply to unspecified multipaths. Possible values include:

- failover: 1 path per priority group. - multibus: all valid paths in 1 priority group. - group_by_serial: 1 priority group per detected serial number. - group_by_prio: 1 priority group per path priority value. Priorities are determined by callout programs specified as global, per-controller, or per-multipath options. - group_by_node_name: 1 priority group per target node name. Target node names are fetched in /sys/class/fc_transport/target*/node_name. The default value is failover.

6. getuid_callout: Specifies the default program and arguments to call out to obtain a unique path identifier. An absolute path is required. The default value is /lib/udev/scsi_id –whitelisted –device=/dev/%n.

7. prio: Specifies the default function to call to obtain a path priority value. For example, the ALUA bits in SPC-3 provide an exploitable prio value. Possible values include: - const: Set a priority of 1 to all paths. - emc: Generate the path priority for EMC arrays. - alua: Generate the path priority based on the SCSI-3 ALUA settings. As of Red Hat Enterprise Linux 6.8, if you specify prio “alua exclusive_pref_bit” in your device configuration, multipath will create a path group that contains only the path with the pref bit set and will give that path group the highest priority. - tpg_pref: Generate the path priority based on the SCSI-3 ALUA settings, using the preferred port bit. - ontap: Generate the path priority for NetApp arrays. - rdac: Generate the path priority for LSI/Engenio RDAC controller. - hp_sw: Generate the path priority for Compaq/HP controller in active/standby mode. - hds: Generate the path priority for Hitachi HDS Modular storage arrays. The default value is const.

8. path_checker: Specifies the default method used to determine the state of the paths. Possible values include: - readsector0: Read the first sector of the device. - tur: Issue a TEST UNIT READY to the device. - emc_clariion: Query the EMC Clariion specific EVPD page 0xC0 to determine the path. - hp_sw: Check the path state for HP storage arrays with Active/Standby firmware. - rdac: Check the path stat for LSI/Engenio RDAC storage controller. - directio: Read the first sector with direct I/O. The default value is directio.

9. failback: failback Manages path group failback. A value of immediate specifies immediate failback to the highest priority path group that contains active paths. A value of manual specifies that there should not be immediate failback but that failback can happen only with operator intervention. A value of follow over specifies that automatic failback should be performed when the first path of a path group becomes active. This keeps a node from automatically failing back when another node requested the failover. A numeric value greater than zero specifies deferred failback, expressed in seconds. The default value is manual.

10. rr_min_io: Specifies the number of I/O requests to route to a path before switching to the next path in the current path group. This setting is only for systems running kernels older than 2.6.31. Newer systems should use

11. user_friendly_names: If set to yes, specifies that the system should use the /etc/multipath/bindings file to assign a persistent and unique alias to the multipath, in the form of mpathn. If set to no, specifies that the system should use the WWID as the alias for the multipath. In either case, what is specified here will be overridden by any device-specific aliases you specify in the multipaths section of the configuration file. The default value is no.

12. fast_io_fail_tmo: The number of seconds the SCSI layer will wait after a problem has been detected on an FC remote port before failing I/O to devices on that remote port. This value should be smaller than the value of dev_loss_tmo. Setting this to off will disable the timeout.

13. dev_loss_tmo: The number of seconds the SCSI layer will wait after a problem has been detected on an FC remote port before removing it from the system.

14. wwid: Specifies the WWID of the multipath device to which the multipath attributes apply. This parameter is mandatory for this section of the multipath.conf file.

15. alias: Specifies the symbolic name for the multipath device to which the multipath attributes apply. If you are using user_friendly_names, do not set this value to mpathn; this may conflict with an automatically assigned user friendly name and give you incorrect device node names.