How to Clear Stuck NFS Locks on NetApp Filer(s)

Question: We are using NFS for datafile storage, without Real Application Clusters (RAC), and the mount point with the datafiles is using the ’nolock’ NFS mount option. Then 2 nodes accidentally open the same database. Here is the error that we encounter on the running startup command.

Database detects the other node, and fails during mount:

SQL> startup
ORACLE instance started.
Total System Global Area 1191182336 bytes
Fixed Size                  1321528 bytes
Variable Size             703321544 bytes
Database Buffers          452984832 bytes
Redo Buffers               33554432 bytes
ORA-00600: internal error code, arguments: [kccsbck_first], [1], [526988247], [], [], [], [], []

How this can be tackled to open the database in read-write?

The Solution

The following procedure is provided for the NetApp filer, to clear locks, one can use these steps per recommendation from NetApp.

1. First shutdown all oracle database instances and kill off any stray background oracle processes, check for stray processes using:

$ ps -ef | grep ora

to get process id’s (pid’s) of remaining Oracle processes.

Kill each remaining Oracle process using the kill command:

$ kill -9 [pid]

2. Unmount all NFS partitions. e.g.:

# umount /u07

If you have problems unmounting because of open files, use /usr/sbin/lsof to assist you in determining what files are still open on what mount points.

3. Shutdown nfs statd and lockd:

# /sbin/service nfs stop
# /sbin/service nfslock stop

4. Clear NetApp filer locks: Execute the following from the NetApp filer command line:

$ priv set advanced
$ sm_mon -l

In many cases specifying the hostname does not clear all the affecting locks, so the recommendation from NetApp is to NOT specify a hostname.

5. Restart NFS services & remount NFS partitions.

# /sbin/service nfs start
# /sbin/service nfslock start
# mount -a

If it’s not possible to unmount all NFS partitions, schedule a reboot to clear the stale NFS locks on the Linux host.