Troubleshooting and Performance Monitoring Virtualized Environments
m (1 revision)
Revision as of 17:51, 30 July 2012
A virtual environment brings new considerations to troubleshooting and performance monitoring. Those considerations are discussed in this section.
Performance indicators still valid from within virtual machines. For the UC applications that support it, use RTMT or the perfmon data for to analyze the performance of the UC application. Data from these tools provides a view of the guest performance: disk, CPU, memory, and other details.
Move to the VMware infrastructure when there is a need to get the perspective from the ESXi host. Use the vSphere Client to view data:
- If vCenter is available, historical data is available through the client
- If vCenter is not available, live data from the host is available through the client
VMware and VM Configuration
- Verify your virtualization configuration matches the UC requirements/restrictions
- Verify your VM was conforms to the specifications of one of the supported configurations available from the OVA for the specific release of the application you are running.
|Note:||The released OVAs include virtual disk drives with aligned partition(s) (to optimize performance). It is required that the OVA be used to create the virtual machine.|
|Note:||Recall that VMware vCenter is mandatory for UC on UCS Specs-based and HP/IBM Specs-based, as described here. VMware vCenter is optional for UC on UCS TRC deployments.|
Just like some of the UC applications, vCenter can be configured to save more performance data. The more historical data saved, the bigger disk space needed by the database used by vCenter. Note, this is one of the main areas where you need vCenter rather than going directly to the ESXi host for performance data. vCenter can save historical data that the ESXi host does not keep.
The configurations to change the amount historical data saved by vCenter is located in the vSphere client under Administration > Server Settings. For each interval duration and save time the statistic level can be set. The statistics levels range from 1 to 4 with level 4 containing the most data. View the data size estimates to ensure there is enough space to keep all statistics.
For a UC on UCS Specs-based or HP/IBM Specs-based deployment, Statistics Level 4 is required on all statistics. Configuring VMware vCenter to capture detailed logs, as shown in Figure 1 below, is strongly recommended. If not configured by default, Cisco TAC may request enabling these settings in order to troubleshoot problems.
VMware Performance Indicators
The following table lists the performance indicators to monitor and view from a VMware perspective when a virtual machine is having suboptimal (or bad) performance. Most counters are from the ESXi host, which can give a perspective of VM interactions and overall host and data store utilization.
|Performance Area||Object||Counter||Acceptable range|
|CPU||Host||Usage||Less than 80%|
|CPU||Virtual Machine||Ready||Less than 3%|
|Memory||Host||Consumed||General trend is stable|
|Memory||Host||Balloon/Swap used||0 Kb|
|Disk||Specific datastore||Kernel command latency||Less than 3ms|
|Disk||Specific datastore||Physical device command latency||Less than 20ms|
|Disk||Specific datastore||Average commands issued per second||Less than LUN capacity|
|Network||Host||Receive packets dropped/Transmit packets dropped||0 packets|
Physical Hardware Serviceability Items
|Area||Top Items||View at||Alerted How?|
||ESXi Host or vCenter||SNMP/Email(via vCenter)|
||ESXi Host or vCenter||SNMP/Email(via vCenter)|
|| ESXi Host or vCenter ||SNMP/Email(via vCenter)|
||ESXi Host or vCenter (DAS only)||SNMP/Email(via vCenter)|
ESXi Host or vCenter
|| ESXi Host or vCenter(C-series)|
|IO Controller|| ||ESXi Host or vCenter(DAS only)||SNMP/Email(via vCenter)|
|Note:||The vSphere client can be used to view the data and alarms. vCenter is required for any automatic notification.|
A high CPU usage could be due to a small number of VMs taking all of the resources or too many VMs running on the host. For the too many VMs running case, look at the VMs running on the host and see if CPU reservations are in use (see oversubscription section). To isolate a CPU issue for a particular VM, consider moving it to another ESXi host.
To view the CPU performance indicators, go to the ESXi host's performance tab and select the Advanced button. Under Chart options, select CPU, timeframe, and then only the host (not individual cores) to view overall CPU usage on the host. You can view each VM's CPU usage from the Virtual Machines tab on the host.
To get a view of the reservations set by all of the VMs, use the Resource Allocation tab of the cluster.
|Note:||The "Resource Allocation" tab is only available via vCenter.|
Our guidelines do not support memory sharing between VMs. To verify, follow the following performance indicators to make sure swapping and ballooning counters are zero. If a given VM does not have enough memory and there are not memory issues on the specific host, consider increasing the VM's memory.
To view the memory performance indicators, go to the ESXi host's Performance tab and select the Advanced button. Under Chart options, select Memory and Timeframe, then select the following counters:
- Used memory (to view general trends)
- Swap used
Swap and Balloon should always be ZERO, otherwise memory sharing is being used (which should not be the case).
Bad disk performance often shows up as high CPU usage. IOPS data can provide information on how hard the application/VM is working the disks. Specific activities can cause spikes in IOPS: upgrades and DB maintenance are two examples. If VMs running on the same datastore are all doing these activities at the same time, the disks might not be able to keep up. IOPS data can be seen from vCenter or the SAN. Disk latency (response time) is a good indicator of disk performance.
To view the disk performance indicators, go to the ESXi host's performance tab and select the advanced button. The appropriate datastore needs to be selected, which can be found on the datastore page (see below). Under chart options, select disk and timeframe, then select the following counters:
- Physical device command latency
- Kernel command latency
- average commands issued per second
The kernel counter should not be greater than 2-3 ms. The physical device counter should not be greater than 15-20 ms. The "average commands issued per second" counter can be used if IOPS are not available from the SAN. IOPS should be considered if it looks like datastore is overload. This IOPS data is viewable from the host and each VM. Note, for NFS datastores, the physical and kernel latency data is not available. Starting in VMware 4.0 update 2 and beyond the esxtop command (see below) can be used to view NFS counters and in particular the guest latency (called GAVG in esxtop). The guest latency is a summation of the physical device and kernel latencies.
On the C-series UCS servers there have been issues with the write cache battery backup. If this battery is not operating correctly, performance will suffer. Use a tool like wbemcli to verify the battery is ok. An example of using the wbemcli:
wbemcli ei -noverify 'https://root:<password>@<ESXi Host IP>:5989/root/cimv2:VMware_HHRCBattery'
See the MegaCli User Guide for more information.
Generally, network performance issues can be seen by dropped packets. If dropped packets are seen from a ESXi host, the network infrastructure needs to be investigated for the issue, which might include a virtualized switch (Nexus 1000V). In ESXi 4.1, issues have been seen with large file transfers (e.g. SFTP/FTP transfers). For this issue, the Large Receive Offload options need to be disabled on the ESXi host. That setting is found on the host's Configuration tab -> Advanced Settings -> Net.*. Note, there are several LRO settings on this page and all of them need to be disabled. If a VM has been cloned and uses static MAC addresses, verify there are not duplicate MAC addresses in the network. LRO settings:
To view the network performance indicators, go to the ESXi host's performance tab and select the advanced button. Under chart options, select Network, timeframe, then select the following counters:
- Receive packets dropped
- Transmit packets dropped
The main thing to check is that no packets are getting dropped in the network.
|Note:||Advanced network debugging and configuration can be done on Nexus 1000v (if used, which requires vCenter and Enterprise Plus licensing).|
Alternate Access to Performance Data
If vCenter and/or the vSphere client are not available, some real time data can be pulled using command line tools. If you have a vMA VM, then the resxtop tool can be used. The resxtop tool is a remote version of the esxtop tool. Otherwise, the esxtop tool can be used directly on the ESXi host (root access must be enabled). See http://communities.vmware.com/docs/DOC-11812 for details on esxtop.
|Back to: Unified Communications in a Virtualized Environment|