Load Average

Linux commands like top, uptime and w report the system load average. For instance, the first line of top command output is something like,

top - 04:56:31 up 60 days, 23:18, 1 user, load average: 0.02, 0.06, 0.05

The last three numbers give the load average for the last one minute, five minutes and fifteen minutes respectively. Load average is an indication of whether the system resources (mainly the CPU) are adequately available for the processes (system load) that are running, runnable or in uninterruptible sleep states during the previous n minutes. A process is a running when it has the CPU. A runnable process is a process that is waiting for the CPU. A process is in uninterruptible sleep state when it is waiting for a resource and cannot be interrupted and will return from sleep only when the resource becomes available or the timeout occurs (if it had gone to sleep with a timeout). For example, a process may be waiting for disk or network I/O. Runnable processes indicate that we are short of CPUs, for each additional CPU, a runnable process would have been given that CPU, changing the state of that runnable process to running. Similarly processes in uninterruptible sleep state indicate I/O bottlenecks. The load number at any time is the number of running, runnable and uninterruptible sleep state processes (we will call these collectively runnable processes) in the system. The load average is the exponential moving average of the load number during the previous n minutes.

If in a single CPU system, the load average is 5, it is an undesirable situation because one process runs on the CPU and the other 4 have to wait for their turn. So the system is overloaded by 400%. If we had two CPUs and a load average of 5, we are a little better off but not quite. Out of 5, 2 processes get the CPU and the other 3 have to wait for their turn and the system is overloaded by 300%. If the number of CPUs equals the number of runnable processes, we are breakeven. In the above top command first line output, assuming a single-CPU system, the load average for the last minute is 0.02, which indicates that the CPU is underloaded 98%. Load average figures help in figuring out how much our system is able to cater to the processing demands and also provide useful input in load balancing of systems.

/proc/loadavg

Load average is available in the file /proc/loadavg, from where it is picked up by the commands like top. One can print this file,

$ cat /proc/loadavg
0.14 0.05 0.06 1/122 13870

The format of /proc/loadavg is that the first three words give load average for 1 minute, 5 minutes and 15 minutes respectively. The next word has two numbers separated by a slash. The number before the slash gives the number of threads running at that moment. This will obviously be less than or equal to the CPUs in the system. The number after the slash indicates the total number of threads in the system. The last word in /proc/loadavg is the process Id of the thread that ran last.

CPU utilization versus load average

Load average gives a better picture of the system's capability to meet processing requirements. CPU utilization is the time the CPU is working expressed as a percentage of the total elapsed clock time. On process intensive systems this could be 100%. Assuming a single CPU system, the load average may be any number greater than 1. In all cases, where the load average is greater than 1, CPU utilization could be around 100%. But the problem becomes more and more severe as the load average takes on higher number values (1, 2, 3, ...). But all cases, the CPU utilization may be around 100%.

See also

Software: