Troubleshooting with ESXTOP

This post will take a look over some of the useful commands for troubleshooting performance using ESXTOP. To access esxtop open a connection to your host using the ESXi shell. This can be done with an SSH client, such as putty, using port 22 and your root login details. For more information on enabling and accessing the ESXi Shell see Using the ESXi Shell.

Once you are logged into the command line interface simply type ‘esxtop’ and hit enter. Within the esxtop screen we can use different keys to change the view and examine certain metrics such as CPU ‘c’, memory ‘m’, network ‘n’, disk ‘d’. We will look at the main parameters below for troubleshooting performance issues on an ESXi host.

At any time you can press space to update the display (although this refreshes automatically every 5 seconds) ‘q’ to quit and ‘h’ for help. Add or remove new fields using ‘f’ and if you want to save this configuration for next time type an upper-case ‘W‘.

CPU

In order to understand the information presented by esxtop we must first understand the role of the CPU scheduler. ESXi pools resources together, such as logical processors, which can then be used independently by the CPU scheduler. A virtual machine can have access to virtual processors running on logical processors within the same core, or on different physical cores. It is the job of the CPU scheduler to examine processor topology between sockets, cores and logical processors to optimise the placement of virtual CPUs across the system.

The second point we need to understand is that esxtop uses worlds to show CPU usage. A world is a VMkernel schedulable entity, similar to a processes or thread in other operating systems. You will see references to both the CPU scheduler and VM worlds in the explanations below.

Once in the esxtop screen press ‘c’ to display CPU statistics.

esxtopc

CPU load average relates to the average CPU usage for the ESXi host over the last 1, 5 and 15 minutes. A load average of 0.5 suggests CPU is half utilised, 1.0 suggests CPU is fully utilised and a value above this would mean the ESXi host is using more physical CPUs than currently available.

%RDY column shows the percentage of time spent waiting for the CPU scheduler. Consider that 1% is roughly 200 milliseconds and 100% is roughly 20,000 milliseconds. Therefore a value between 5% and 10% or higher (or 1,000 to 2,000 milliseconds) could potentially be a cause for concern.

%USED amount of time spent executing CPU core cycles by the virtual machine. A substantially higher value on one virtual machine compared with others could mean it is the cause of performance issues on the host.

%SYS shows the percentage of time spent performing system activities on behalf of the world. A value of 10% to 20% or higher could be a symptom of a high IO virtual machine.

 %CSTP is a value for virtual machines with multiple vCPUs, and shows the time spent waiting for one or more of those virtual CPUs to become ready. If this is above 3% it generally means the number of vCPUs should be decreased.

%MLMTD percentage of time a ready to run vCPU was not scheduled due to a CPU limit setting. If this value is above 0 then the limit should be removed to improve performance.

%SWPWT relates to the time spent waiting for swapped pages to be read from disk. If the value exceeds 5 you could potentially have an issue with memory over-commitment.

Memory

Once in the esxtop screen press ‘m’ to display memory statistics.

At the top of the screen you will see PMEB /MB. These values are in MB and list the total memory of the host (total) memory used by the ESXi VMKernel (VMK) other memory in use (other) and free memory (free).

Another useful field is the memory state, this categorises the ESXi host memory state based on how much of the minimum free memory amount is available. A high state means there is enough free memory available to the host, clear means there is less than 100% of the minfree and ESXi begins actively calling TPS to collapse pages, soft means there is less than 64% of minfree and the host beings reclaiming memory using the balloon driver, hard means there is less than 32% of minfree and the host swaps and compresses memory, low means there is less than 16% of minfree and ESXi blocks VMs from allocating more RAM.

esxtopm

Other useful columns are listed below, mainly these relate to over-commitment of physical memory.

MEM overcommit avg shows the average memory overcommit for the last 1, 5 and 15 minutes.

MCTLSZ amount of guest physical memory in MB that the ESXi host is reclaiming by inflating the balloon driver. This occurs when the host is over committed and does not have enough available physical memory.

SWCUR amount of memory in MB swapped by the VMKernel, again any value over 0 is another symptom of memory over-commitment.

SWR/s and SWW/s shows the rate at which the host is reading or writing to swapped memory, once again any value over 0 indicates possible memory over-commitment.

CACHEUSD shows the amount of memory in MB that has been compressed by the ESXi host. Compression occurs when the host is over-committed on memory so this should not be above 0.

ZIP/s and UNZIP/s indicates the host is actively compressing memory and accessing compressed memory respectively. Values larger than 0 imply the host is over-committed on memory.

Network

Once in the esxtop screen press ‘n’ to display network statistics.

For network you can look at the %DRPTX and %DRPRX columns. These represent the dropped packets transmitted and dropped packets received respectively. Values above 0 here could signify high network utilisation.

The USED-BY and TEAM-PNIC columns display a list of the virtual machines on the host and the vmnic that it is using.

Outside of esxtop, but also useful, back at the command line use ‘esxcli network nic list’ to list the available network adapters. To see the stats such as packets/bytes transmitted/received and dropped enter ‘esxcli network nic stats get –n vmnic0’ changing the vmnic as appropriate.

esxtopn

These are just some parts of esxtop I have found useful, it goes much deeper with more commands and syntax here: https://communities.vmware.com/docs/DOC-9279.

One thought on “Troubleshooting with ESXTOP

  1. This blog is very useful…. giving the right amount of information to get an idea of a performance bottle neck and have a quick glance.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s