CentOS/RHEL 7 - Stale File Handle On Snapshot NFS Directories
A NetApp system is providing NFS shares which are accessed from several environments, including CentOS/RHEL 6 and 7. As part of its internal workings, the NetApp system performs periodical snapshots of these shares and also manages the lifecycle of these same snapshots, that is, snapshots eventually expire and are deleted by the NetApp system.
When using these NFS shares from Oracle Linux 7 it is observed that, whenever a snapshot becomes expired, any command that accesses an expired snapshot will produce the error “Stale file handle”.
Auto-mounted filesystems have an internal expiry mechanism that causes the filesystem to be automatically unmounted. On the kernel used by CentOS/RHEL 7.7 the working of this mechanism has changed to a sliding window:
- any operation (such as an ’ls’ on a path to the mount or a ‘df’) that checks filesystem information (even if from local cache) will cause the expiry timer to be reset regardless of the export still exists on NFS server, or not.
- if these operations are frequently done (in intervals shorter than fs.nfs.nfs_mountpoint_timeout seconds), the mount will be persisted indefinitely by the kernel
For avoiding this behavior, option “strictexpire” was introduced to the auto.master file. As documented by the below excerpt from “man 5 auto.master”:
strictexpire Use a strict expire policy for this automount. Using this option means that last use of autofs directory entries will not be updated during path walks so that mounts in an automount won't be kept mounted by applications scanning the mount tree. Note that this doesn't completely resolve the problem of expired automounts being immediately re-mounted due to application accesses triggered by the expire itself.
Edit /etc/auto.master file and add the option strictexpire to any mount-point that is being affected by this issue.
For instance, in case /home/ is being affected, change the below line from
/home /etc/auto.home --timeout=300 nobrowse
/home /etc/auto.home --timeout=300 nobrowse strictexpire
After performing all required changes, restart autofs service by executing the below command as root user:
# systemctl restart autofs