Too Many kdmflush Processes observed causing ORA-27300, ORA-27301, ORA-27302 errors

The Problem

On CentOS/RHEL 6.x there are too many kdmflush processes owned by the root user which was notified after getting the following error.

ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
ORA-27302: failure occurred at: skgpspawn5

The Solution

ORA errors could be caused by:

1. The number of processes for a user exceeds the limit specified in /etc/security/limits.conf.

2. Low setting for the OS kernel parameter pid_max.

Due to an increased process list average size, the kernel fails to allocate a new PID NUMBER, because its assignable range for PID numbers is temporarily exhausted; The fork(2) system call eventually returns -EAGAIN (11) when it fails to alloc a pid number.

When checking for the processes owned by root below output is seen.

$ ps -elf | grep -i root
4 S root         1     0  0  80   0 -  5374 poll_s Nov14 ? 00:42:31 /sbin/init
1 S root         2     0  0  80   0 -     0 kthrea Nov14 ? 00:00:00 [kthreadd]
1 S root         3     2  0  80   0 -     0 run_ks Nov14 ? 00:03:26 [ksoftirqd/0]
1 S root         6     2 99 -40   - -     0 cpu_st Nov14 ? 34-16:37:47 [migration/0]
1 S root       165     2  0  80   0 -     0 worker Nov14 ? 00:00:00 [kworker/23:1]
1 S root       167     2  0  80   0 -     0 worker Nov14 ? 00:03:37 [kworker/25:1]
1 S root       170     2  0  80   0 -     0 worker Nov14 ? 00:01:10 [kworker/28:1]
1 S root       171     2  0  80   0 -     0 worker Nov14 ? 00:03:41 [kworker/29:1]
1 S root       172     2  0  80   0 -     0 worker Nov14 ? 00:04:09 [kworker/30:1]
1 S root      5584     2  0  80   0 -     0 bdi_wr 19:54 ? 00:00:00 [flush-252:188]
1 S root      5586     2  0  80   0 -     0 bdi_wr 19:54 ? 00:00:00 [flush-252:189]
1 S root      5591     2  0  80   0 -     0 bdi_wr 19:54 ? 00:00:00 [flush-252:193]
1 S root      5598     2  0  80   0 -     0 bdi_wr 19:54 ? 00:00:00 [flush-252:198]
1 S root      5600     2  0  80   0 -     0 bdi_wr 19:54 ? 00:00:00 [flush-252:199]
1 S root      5678     2  0  80   0 -     0 worker Dec09 ? 00:01:18 [kworker/30:0]
4 S root      6100 15663  0  80   0 - 28808 unix_s 19:54 ? 00:00:00 sshd: sa537610 [priv]
0 S root      6518  1863  0  80   0 - 26534 wait   19:54 pts/0 00:00:00 /bin/sh /usr/libexec/ipsec/barf
0 D root      6529  6518 39  80   0 -  1049 sleep_ 19:54 pts/0 00:00:00 egrep -q Starting Openswan /var/log/rmlog
0 S sa537610  6542  6293  0  80   0 - 25823 pipe_w 19:54 pts/7 00:00:00 grep -i root
1 S root      6864     2  0  80   0 -     0 worker 16:13 ? 00:00:04 [kworker/20:0]
1 S root      6868     2  0  60 -20 -     0 rescue Nov14 ? 00:00:00 [kmpathd]
1 S root      6869     2  0  60 -20 -     0 rescue Nov14 ? 00:00:00 [kmpath_handlerd]
1 S root      7099     2  0  60 -20 -     0 rescue Nov14 ? 00:00:00 [kdmflush]
1 S root      7101     2  0  60 -20 -     0 rescue Nov14 ? 00:00:00 [kdmflush]
1 S root      7105     2  0  60 -20 -     0 rescue Nov14 ? 00:00:00 [kdmflush]
1 S root      7110     2  0  60 -20 -     0 rescue Nov14 ? 00:00:00 [kdmflush]
1 S root      7115     2  0  60 -20 -     0 rescue Nov14 ? 00:00:00 [kdmflush]
1 S root      7120     2  0  60 -20 -     0 rescue Nov14 ? 00:00:00 [kdmflush]
1 S root      7126     2  0  60 -20 -     0 rescue Nov14 ? 00:00:00 [kdmflush]
1 S root      7132     2  0  60 -20 -     0 rescue Nov14 ? 00:00:00 [kdmflush]
1 S root      7139     2  0  60 -20 -     0 rescue Nov14 ? 00:00:00 [kdmflush]
1 S root      7147     2  0  60 -20 -     0 rescue Nov14 ? 00:00:00 [kdmflush]
# ls /dev/mapper | wc -l
  200
# ps -ef|grep -i kdmflush|wc -l
  200

In cases where there are hundreds of kdmflush processes then check for output of:

# ls /dev/mapper | wc -l
# ps -ef|grep -i kdmflush|wc -l

If the values are roughly equal, then this is normal behavior as the kdmflush is a kernel thread and there is one kdmflush thread present for each device-mapper device. The number of kdmflush processes can be ignored as this is normal behavior if there are those many device-mapper devices.