The Load Average on Linux servers

The load average is a measure of system activity, giving the average number of processes that are either in a runnable or uninterruptible state. It is typically expressed as three numbers, representing the load average over the past one, five, and fifteen minutes.

The load average can be viewed by running the command “uptime” in the terminal. A high load average can indicate that the server is under heavy load and may require additional resources or optimization.

Here is an example of the output you might see when running the “uptime” command on a Linux server:

$ uptime
13:45:01 up 5 days, 23:32,  3 users,  load average: 0.22, 0.16, 0.13

In this example, the load average is reported as:

0.22 over the past 1 minute
0.16 over the past 5 minutes
0.13 over the past 15 minutes

This means that on average, over the past minute, there were 0.22 processes in a runnable or uninterruptible state. Similarly, over the past 5 minutes, there were 0.16 processes in such state, and over the past 15 minutes, there were 0.13 processes in such state.

It is important to note that the load average is a relative metric, and the ideal value will depend on the number of cores, threads and the overall capacity of the system.

What a high Load Average means

A high load average is generally considered to be a load average that is significantly higher than the number of physical or virtual CPUs in the system. For example, if you have a server with 4 cores, a load average of 4 or less is likely to be normal, while a load average of 8 or higher might indicate that the server is under heavy load.

However, it’s important to note that the threshold for high load average can vary depending on the specific usage and resource constraints of the system. For example, if the system is running a lot of IO-bound or background processes that don’t use much CPU, a high load average may be acceptable.

It’s also important to monitor the load average over time, as occasional high loads may not be a cause for concern, but sustained high loads may indicate a problem that needs to be investigated.

It is also important to monitor other system metrics such as CPU utilization, memory usage, disk I/O and network traffic to get a better understanding of the state of the system and to identify the cause of high load average.

If you find that the load average is consistently high, it may be necessary to take action to reduce the load on the system, such as by adding more resources, optimizing the configuration, or scaling out the system.

What a low Load Average may indicate

A low load average can indicate that the system is not being fully utilized and that it has available resources to handle more load. This can happen if the system is not being used to its full capacity, or if there is a lack of demand for the services the system provides.

A low load average can also indicate that the system is over-provisioned and that some resources, such as CPU or memory, are not being fully used. In this case, it may be possible to reduce the costs associated with the system by downsizing or scaling down the resources.

It’s also important to note that a low load average does not necessarily mean that the server is running optimally. It’s possible for a system to have a low load average and still have performance issues or bottlenecks. In this case, it’s necessary to monitor other metrics such as CPU utilization, memory usage, disk I/O and network traffic to get a better understanding of the server’s performance.

In summary, low load average may indicate an underutilized system or a server that is over-provisioned, but it’s important to check other metrics to understand the global performance.

The main bottlenecks that cause high Load Average.

High load average can be caused by several bottlenecks, including:

CPU usage: If the system’s CPU usage is consistently high, it can cause the load average to increase as the system struggles to keep up with the demand for processing power.
Memory usage: If the system is running low on memory, the kernel may start swapping memory to disk, which can cause the load average to increase as the system struggles to keep up with the demand for memory.
Disk I/O: If the system is performing a lot of disk I/O, such as reading or writing large amounts of data, it can cause the load average to increase as the system struggles to keep up with the demand for disk access.
Network usage: High network usage can cause the load average to increase as the system struggles to keep up with the demand for network access.
Running a lot of background tasks: having a lot of background tasks running can cause a high load average, as it will have to switch between them frequently.
Running out-of-date software: Running out-of-date software can cause the system to have a high load average, as it may not be optimized for the current system resources or usage patterns.
Badly coded software: Poorly written software, such as software with memory leaks or infinite loops, can cause the load average to increase as the system struggles to keep up with the demand for resources.

It’s important to note that a high load average can be caused by a combination of these factors, and it’s often necessary to monitor several system metrics to identify the cause of high load average.

The role of databases in a high load average

Databases can play a significant role in causing a high load average, as they can be a major source of resource usage on a system. Some of the ways that databases can cause high load average include:

High CPU usage: Database processes can consume a significant amount of CPU resources, especially if they are performing complex queries or transactions.
High memory usage: Queries can consume a lot of memory, especially if they are storing large amounts of data or have a high number of concurrent connections.
High disk I/O: Transactions can generate a lot of disk I/O, especially if they are performing frequent writes or updates, or if the storage system is not optimized for the workload.
High network usage: Databases can generate a lot of network traffic, especially if they are serving many clients or if the network connection is not optimized for the workload.
Concurrent connections: Having a high number of concurrent connections to the database can cause the load average to increase, as the system must switch between the connections frequently.
Unoptimized queries: Running unoptimized queries, especially on large datasets or with poor indexing, can cause the load average to increase as the system struggles to keep up with the demand for resources.
Database maintenance tasks: Database maintenance tasks such as indexing, backup, and replication can cause load average to increase as the system struggles to keep up with these tasks.

It’s important to note that the load average should be monitored over time, and in conjunction with other system metrics to get a complete understanding of the state of the system.

How can we monitor the server Load Average

There are several ways to monitor the load average on a Linux server:

uptime command: The “uptime” command can be used to display the current load average, as well as the time since the last reboot and the number of users currently logged in.
top command: The “top” command provides a real-time, scrolling view of the processes running on the system, including the load average in the top right corner.
vmstat command: The “vmstat” command provides a detailed view of system statistics, including the load average, memory usage, CPU usage, and disk I/O.
sar command: The “sar” command (part of the sysstat package) can be used to collect and report system activity information, including the load average, memory usage, CPU usage, and disk I/O.
htop command: The “htop” command is similar to the top command, but provides a more user-friendly interface with additional features such as process tree view and the ability to filter processes by name.
Graphical monitoring tools: There are also many graphical monitoring tools available for Linux servers, such as Nagios, Cacti, and Munin, which can be used to monitor the load average and other system metrics over time.

You can also use remote server monitoring software to keep track of the load average of multiple servers from one central location.

It’s important to note that the load average should be monitored over time and in conjunction with other system metrics to get a complete picture of the system’s performance.