Confusing server load average explained!

Server load average is a pretty big word in web hosting industry. Customers trust servers with least CPU load. Moreover, I have seen they feel very secure when they are on a server averaging a cpu load lesser than 1. I am very familiar with a question on live chat desk from the new customers saying, what is your average cpu load. Now let me go into deeper in this discussion and see if I can find something new for you.

There are many metrics, modern operating system provides to measure current system performance. CPU load average is one of such metrics. It is stored under the proc file system and readable from user space.

Now, lets come to what does this metric mean. I have found couple of articles explaining the definitions and they seem pretty good enough. One of the most reputable member of Webhostingtalk cum Moderator has one article explaining this on his blog:

Server load – just a number?

Well, yes, basically the server load is a number. This number is usually under the x.xx format and can have values starting from 0.00. It expresses how may processes are waiting in the queue to access the processor(s). Of course, this is calculated for a certain period of time and of course, the smaller the number, the better. A high number is often associated with a decrease in the performance of the server.”

So, it clearly states CPU load is going to let you know the amount of processes your server processor going to execute Or is it? Let me tell you something, it is a very wrong definition of CPU load. Let me show you something from the manual of “uptime” unix command:

“uptime gives a one line display of the following informa- tion. The current time, how long the system has been run- ning, how many users are currently logged on, and the sys- tem load averages for the past 1, 5, and 15 minutes.”

That means, the average you see, is not showing you the waiting processes, but the processes waited for past 1, 5 and 15 minutes.  (Solaris includes some runtime processes, but can not at all predict processes waiting for next 1 minute queue) So, if the cpu load is 4 and you have 4 cpus, does that mean 4 processes were waiting for cpu access in last 60 seconds? Does that seem pretty a lot for current RAID controlled hard drives and 4/8GB RAM servers? Just to note, linux kernel treats threads as a process. It is possible to improve the performance a lot using threads, this is why, most of the people are utilizing thread based models these days. So, the waiting 4 processes could be 4 threads as well when linux is concerned. And in some cases, threads can be served faster than processes.

We are done with the definition, now lets get into more deeper analysis of CPU load and CPU performance. I have seen people stating your CPU is using 100% or 200% CPU after seeing the past load average crossing the number of cpus. That is a completely wrong idea to measure the performance of CPU. You can never make 200% of your CPU. If that was possible, then, I highly doubt you could ever see any multi proc/core cpu. This metric is even not at all made to measure the performance of CPU. CPU performance depends on the time it remains idle. More idle CPU time means more stable CPU. Now, the question may come, how can I measure the idle cpu time. A system admin can do this using the “sysstat” software. Most of the linux distribution have it built in. You can also check the idle CPU using the top system command or sar. Now, how the system counts idle cpu time? It excludes the time it spent for user, system softwares or services and IO wait to count the idle CPU. Does the idle cpu time have any impact on Server Load? It may or may not. But you never know. One thing, I can assure you, more your Server load fluctuates means your idle CPU is getting more exhausted. So, you may measure the fluctuation of the CPU load to understand how your system is performing. So, let me tell you something what I believe on server/cpu load, more stable your server/cpu load, means more stable server you are on. You should find your sites loading pretty fast, and if not, its time to contact the hosting support. I have seen many good hosting companies would share the mpstat/sar output with their clients to make them feel free showing the right cpu usage for around 30 minutes or so.

Most of the system admins these days, tweak the server in a way to make sure it keeps more idle CPU time. I have seen CPU load of 10 on a 4 CPU systems with 60% idle cpu running a backup/log process with less priority. Less priority is causing lesser time slice for the backup/log process resulting more waiting CPU for the system, simple math. When the idle cpu time goes to 0-5% frequently, server becomes clumsy. It is not really a right idea to judge the cpu usage from server/cpu load average of linux system. Most importantly, misjudging this metric in determining CPU usage like 200% is what I don’t feel right!

Happy reading, and never feel guilty to ask explanation from your host, you put your most important data on their hard drives, so trust them! 🙂

5 thoughts on “Confusing server load average explained!”

Leave a Reply to Krishnakumar M Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.