How to Clear Stuck NFS Locks on NetApp Filer(s)
Question: We are using NFS for datafile storage, without Real Application Clusters (RAC), and the mount point with the datafiles is using the ’nolock’ NFS mount option. Then 2 nodes accidentally open the same database. Here is the error that we encounter on the running startup command.
Database detects the other node, and fails during mount:
SQL> startup
ORACLE instance started.
Total System Global Area 1191182336 bytes
Fixed Size 1321528 bytes
Variable Size 703321544 bytes
Database Buffers 452984832 bytes
Redo Buffers 33554432 bytes
ORA-00600: internal error code, arguments: [kccsbck_first], [1], [526988247], [], [], [], [], []
How this can be tackled to open the database in read-write?
The Solution
The following procedure is provided for the NetApp filer, to clear locks, one can use these steps per recommendation from NetApp.
1. First shutdown all oracle database instances and kill off any stray background oracle processes, check for stray processes using:
$ ps -ef | grep ora
to get process id’s (pid’s) of remaining Oracle processes.
Kill each remaining Oracle process using the kill command:
$ kill -9 [pid]
2. Unmount all NFS partitions. e.g.:
# umount /u07
If you have problems unmounting because of open files, use /usr/sbin/lsof to assist you in determining what files are still open on what mount points.
3. Shutdown nfs statd and lockd:
# /sbin/service nfs stop
# /sbin/service nfslock stop
4. Clear NetApp filer locks: Execute the following from the NetApp filer command line:
$ priv set advanced
$ sm_mon -l
In many cases specifying the hostname does not clear all the affecting locks, so the recommendation from NetApp is to NOT specify a hostname.
5. Restart NFS services & remount NFS partitions.
# /sbin/service nfs start
# /sbin/service nfslock start
# mount -a
If it’s not possible to unmount all NFS partitions, schedule a reboot to clear the stale NFS locks on the Linux host.