CPU load becomes more relevant with virtualisation

By David Mytton,
CEO & Founder of Server Density.

Published on the 22nd December, 2011.

On Linux, the decimal CPU load you can see with tools like top can generally be thought of as a queue indicator. If you have a single CPU[1] then if CPU load is below 1, nothing is having to wait for CPU time. Above 1 indicates that processes are having to wait for CPU time. This manifests itself as a slower overall response time or even requests backing up and timing out as the queue gets longer.

When we were using physical hardware, CPU load was generally ignored. This was probably a combination of fairly low load on the servers in those days anyway, but also the high spec, multi-socket, multi-core CPUs we had installed. On physical hardware those CPUs were 100% dedicated to serving our requests.

The same principle still applies with virtualisation, however there is another factor that comes into play – the host workload. As a VM, you rely on the host virtualisation layer to share out the physical CPU resources amongst all the guests, including yourself. If you don’t have control over the host such as with a VPS or on a cloud provider like Amazon EC2, this means you may be affected by their usage in unpredictable ways.

This is nothing new and one of the caveats of public virtualised environments but it means CPU load becomes relevant again. You might see low % utilisation but high CPU load because your requests for CPU time are being queued up.

Linux also has a metric called CPU steal – the st section in the top output. This indicates how much time is spent by the hypervisor servicing requests other than to your VM. It’s generally associated with the Xen hypervisor (which Amazon EC2 uses) but is also valid on VMWare platforms (where the equivalent metric in VMWare terms is CPU ready time). You can therefore see your usage (e.g. User %) as low but a high CPU steal %, which results in a high CPU load value.

top output showing CPU steal %

The combination of these 2 metrics allows you to see if VM performance problems are related to your host, which may require you to upgrade your instance type or get dedicated hardware.

[1] As a very simple explanation, the ratio changes based on the number of CPUs. See this wikipedia article for more details.

Articles you care about. Delivered.

Help us speak your language. What is your primary tech stack?

Maybe another time