Interpreting Displayed Load Average Values in "uptime" Command Output
Describing Load Average
Load average is a measurement provided by the Linux kernel that is a simple way to represent the perceived system load over time. It can be used as a rough gauge of how many system resource requests are pending, and to determine whether system load is increasing or decreasing over time.
Every ﬁve seconds, the kernel collects the current load number, based on the number of processes in runnable and uninterruptible states. This number is accumulated and reported as an exponential moving average over the most recent 1, 5, and 15 minutes.
Understanding the Linux Load Average Calculation
The load average represents the perceived system load over a time period. Linux determines this by reporting how many processes are ready to run on a CPU, and how many processes are waiting for disk or network I/O to complete.
- The load number is essentially based on the number of processes that are ready to run (in process state R) and are waiting for I/O to complete (in process state D).
- Some UNIX systems only consider CPU utilization or run queue length to indicate system load. Linux also includes disk or network utilization because that can have as signiﬁcant an impact on system performance as CPU load. When experiencing high load averages with minimal CPU activity, examine disk and network activity.
Load average is a rough measurement of how many processes are currently waiting for a request to complete before they can do anything else. The request might be for CPU time to run the process. Alternatively, the request might be for a critical disk I/O operation to complete, and the process cannot be run on the CPU until the request completes, even if the CPU is idle. Either way, system load is impacted and the system appears to run more slowly because processes are waiting to run.
Interpreting Displayed Load Average Values
The uptime command is one way to display the current load average. It prints the current time, how long the machine has been up, how many user sessions are running, and the current load average.
[[email protected] ~]$ uptime 15:29:03 up 14 min, 2 users, load average: 2.92, 4.48, 5.20
The three values for the load average represent the load over the last 1, 5, and 15 minutes. A quick glance indicates whether system load appears to be increasing or decreasing. If the main contribution to load average is from processes waiting for the CPU, you can calculate the approximate per CPU load value to determine whether the system is experiencing signiﬁcant waiting.
The lscpu command can help you determine how many CPUs a system has. In the following example, the system is a dual-core single socket system with two hyperthreads per core. Roughly speaking, Linux will treat this as a four CPU system for scheduling purposes.
[[email protected] ~]$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 ...output omitted...
For a moment, imagine that the only contribution to the load number is from processes that need CPU time. Then you can divide the displayed load average values by the number of logical CPUs in the system. A value below 1 indicates satisfactory resource utilization and minimal wait times. A value above 1 indicates resource saturation and some amount of processing delay.
# From lscpu, the system has four logical CPUs, so divide by 4: # load average: 2.92, 4.48, 5.20 # divide by number of logical CPUs: 4 4 4 # ---- ---- --- # per-CPU load average: 0.73 1.12 1.30 # # This system's load average appears to be decreasing. # With a load average of 2.92 on four CPUs, all CPUs were in use ~73% of the time. # During the last 5 minutes, the system was overloaded by ~12%. # During the last 15 minutes, the system was overloaded by ~30%.
An idle CPU queue has a load number of 0. Each process waiting for a CPU adds a count of 1 to the load number. If one process is running on a CPU, the load number is one, the resource (the CPU) is in use, but there are no requests waiting. If that process is running for a full minute, its contribution to the one-minute load average will be 1.
However, processes uninterruptibly sleeping for critical I/O due to a busy disk or network resource are also included in the count and increase the load average. While not an indication of CPU utilization, these processes are added to the queue count because they are waiting for resources and cannot run on a CPU until they get them. This is still system load due to resource limitations that is causing processes not to run.
Until resource saturation, a load average remains below 1, since tasks are seldom found waiting in the queue. The load average only increases when resource saturation causes requests to remain queued and are counted by the load calculation routine. When resource utilization approaches 100%, each additional request starts experiencing service wait time. A number of additional tools report load average, including w and top.