Troubleshooting and Performance Monitoring Virtualized Environments

From DocWiki

(Difference between revisions)
Jump to: navigation, search
(vCenter Settings)
m (1 revision)
 
(4 intermediate revisions not shown)
Line 1: Line 1:
-
= Introduction  =
+
{| border="1" class="wikitable"
-
 
+
-
A virtual environment brings new considerations to troubleshooting and performance monitoring. Those considerations are discussed in this section.<br>
+
-
 
+
-
= General Guidelines  =
+
-
 
+
-
Performance indicators still valid from within virtual machines. For the UC applications that support it, use RTMT or the ''perfmon ''data for to analyze the performance of the UC application. Data from these tools provides a view of the guest performance: disk, CPU, memory, and other details.
+
-
 
+
-
Move to the VMware infrastructure when there is a need to get the perspective from the ESXi host. Use the vSphere Client to view data:
+
-
 
+
-
*If vCenter is available, historical data is available through the client<br>
+
-
*If vCenter is ''not ''available, live data from the host is available through the client
+
-
 
+
-
= VMware and VM Configuration  =
+
-
 
+
-
#Verify your virtualization configuration matches the UC requirements/restrictions<br>
+
-
#Verify your VM was conforms to the specifications of one of the supported configurations available from the OVA&nbsp; for the specific release of the application you are running.<br>
+
-
 
+
-
{{note| The released OVAs include virtual disk drives with aligned partition(s) (to optimize performance). It is required that the OVA be used to create the virtual machine.}}
+
-
 
+
-
<br>
+
-
= vCenter Settings  =
+
-
 
+
-
{{ note | Recall that VMware vCenter is mandatory for UC on UCS Specs-based and HP/IBM Specs-based, [[Unified Communications VMWare Requirements| as described here]].  VMware vCenter is optional for UC on UCS TRC deployments. }} <br>
+
-
 
+
-
Just like some of the UC applications, vCenter can be configured to save more performance data. The more historical data saved, the bigger disk space needed by the database used by vCenter. Note, this is one of the main areas where you need vCenter rather than going directly to the ESXi host for performance data. vCenter can save historical data that the ESXi host does not keep.
+
-
 
+
-
The configurations to change the amount historical data saved by vCenter is located in the vSphere client under '''Administration '''&gt; '''Server Settings'''. For each interval duration and save time the statistic level can be set. The statistics levels range from 1 to 4 with level 4 containing the most data. View the data size estimates to ensure there is enough space to keep all statistics.<br><br>
+
-
 
+
-
 
+
-
'''For a UC on UCS Specs-based or HP/IBM Specs-based deployment, Statistics Level 4 is required on all statistics'''.  Configuring VMware vCenter to capture detailed logs, as shown in Figure 1 below, is strongly recommended. If not configured by default, Cisco TAC may request enabling these settings in order to troubleshoot problems.
+
-
<br>
+
-
 
+
-
=== Figure 1  ===
+
-
 
+
-
[[Image:VCenter SS.jpg]]
+
-
 
+
-
= VMware Performance Indicators  =
+
-
 
+
-
The following table lists the performance indicators to monitor and view from a VMware perspective when a virtual machine is having suboptimal (or bad) performance. Most counters are from the ESXi host, which can give a perspective of VM interactions and overall host and data store utilization.
+
-
 
+
-
{| cellspacing="1" cellpadding="1" border="1"
+
|-
|-
-
! scope="col" | Performance Area
+
! style="background-color: rgb(255,215,0)" | Return to [[Unified Communications in a Virtualized Environment|Home]]
-
! scope="col" | Object
+
-
! scope="col" | Counter
+
-
! scope="col" | Acceptable range
+
-
|-
+
-
| CPU
+
-
| Host
+
-
| Usage
+
-
| Less than 80%
+
-
|-
+
-
| CPU
+
-
| Virtual Machine
+
-
| Ready
+
-
| Less than 3%
+
-
|-
+
-
| Memory
+
-
| Host
+
-
| Consumed
+
-
| General trend is stable
+
-
|-
+
-
| Memory
+
-
| Host
+
-
| Balloon/Swap used
+
-
| 0 Kb
+
-
|-
+
-
| Disk
+
-
| Specific datastore
+
-
| Kernel command latency
+
-
| Less than 3ms
+
-
|-
+
-
| Disk
+
-
| Specific datastore
+
-
| Physical device command latency
+
-
| Less than 20ms
+
-
|-
+
-
| Disk
+
-
| Specific datastore
+
-
| Average commands issued per second
+
-
| Less than LUN capacity
+
-
|-
+
-
| Network
+
-
| Host
+
-
| Receive packets dropped/Transmit packets dropped
+
-
| 0 packets
+
|}
|}
-
<br>  
+
=== <br> <br> '''Portions of the page has been moved'''  ===
-
= Physical Hardware Serviceability Items  =
 
-
{| cellspacing="1" cellpadding="1" border="1"
+
'''The new web address for this page is&nbsp;:&nbsp; [http://www.cisco.com/c/dam/en/us/td/docs/voice_ip_comm/uc_system/virtualization/collaboration-virtualization-hardware.html#vmware http://www.cisco.com/c/dam/en/us/td/docs/voice_ip_comm/uc_system/virtualization/collaboration-virtualization-hardware.html#vmware]'''
-
|-
+
-
! scope="col" | Area
+
-
! scope="col" | Top Items
+
-
! scope="col" | View at
+
-
! scope="col" | Alerted How?
+
-
|-
+
-
| CPU
+
-
|
+
-
#Temperature
+
-
#Utilization/status
+
-
#Thresholds with events
+
-
#Condition &amp; events for abnormal state
+
-
| ESXi Host or vCenter
+
'''Please update your bookmark.'''
-
| SNMP/Email(via vCenter)
+
<BR>
-
|-
+
-
| Memory
+
-
|
+
-
#Utilization/status
+
-
#Errors/condition
+
-
 
+
-
| ESXi Host or vCenter
+
-
| SNMP/Email(via vCenter)
+
-
|-
+
-
| Hard Drives
+
-
|
+
-
#Utilization/status
+
-
#Disk failure alerting
+
-
 
+
-
| ESXi Host or vCenter <br>
+
-
| SNMP/Email(via vCenter)
+
-
|-
+
-
| RAID Controller
+
-
|
+
-
#State (defunct, rebuilding, etc.)
+
-
#Cache/battery status
+
-
#Thresholds with events
+
-
 
+
-
| ESXi Host or vCenter (DAS only)
+
-
| SNMP/Email(via vCenter)
+
-
|-
+
-
| NIC
+
-
|
+
-
#Port failure events
+
-
 
+
-
| vCenter
+
-
| SNMP/Email(via vCenter)
+
-
|-
+
-
| Power Supply
+
-
|
+
-
#Voltage
+
-
#Redundancy status
+
-
#Thresholds with events
+
-
 
+
-
|
+
-
ESXi Host or vCenter
+
-
 
+
-
UCS Manager(B-series)(2)
+
-
 
+
-
| SNMP/Email(via vCenter)
+
-
|-
+
-
| Fans
+
-
|
+
-
#Status/Speed
+
-
#Thresholds with events
+
-
 
+
-
| ESXi Host or vCenter(C-series)<br>UCS Manager(B-series)
+
-
| SNMP/Email(via vCenter)
+
-
|-
+
-
| IO Controller
+
-
| <br>
+
-
| ESXi Host or vCenter(DAS only)
+
-
| SNMP/Email(via vCenter)
+
-
|}
+
-
 
+
-
<br> {{note| The vSphere client can be used to view the data and alarms. vCenter is required for any automatic notification.}}<br>
+
-
 
+
-
= CPU Troubleshooting  =
+
-
 
+
-
A high CPU usage could be due to a small number of VMs taking all of the resources or too many VMs running on the host. For the too many VMs running case, look at the VMs running on the host and see if CPU reservations are in use (see oversubscription section). To isolate a CPU issue for a particular VM, consider moving it to another ESXi host.
+
-
 
+
-
To view the CPU performance indicators, go to the ESXi host's performance tab and select the '''Advanced '''button. Under Chart options, select '''CPU''', timeframe, and then only the host (not individual cores) to view overall CPU usage on the host. You can view each VM's CPU usage from the Virtual Machines tab on the host.
+
-
 
+
-
To get a view of the reservations set by all of the VMs, use the Resource Allocation tab of the cluster.<br>
+
-
 
+
-
{{note| The "Resource Allocation" tab is only available via vCenter.}}<br>
+
-
 
+
-
= Memory Troubleshooting  =
+
-
 
+
-
Our guidelines do not support memory sharing between VMs. To verify, follow the following performance indicators to make sure swapping and ballooning counters are zero. If a given VM does not have enough memory and there are not memory issues on the specific host, consider increasing the VM's memory.
+
-
 
+
-
To view the memory performance indicators, go to the ESXi host's Performance tab and select the '''Advanced '''button. Under Chart options, select '''Memory and Timeframe''', then select the following counters:
+
-
 
+
-
*Used memory (to view general trends)<br>
+
-
*Swap used
+
-
*Balloon
+
-
 
+
-
&nbsp;Swap and Balloon should always be ZERO, otherwise memory sharing is being used (which should not be the case).<br>
+
-
 
+
-
= Disk Troubleshooting  =
+
-
 
+
-
Bad disk performance often shows up as high CPU usage. IOPS data can provide information on how hard the application/VM is working the disks. Specific activities can cause spikes in IOPS: upgrades and DB maintenance are two examples. If VMs running on the same datastore are all doing these activities at the same time, the disks might not be able to keep up. IOPS data can be seen from vCenter or the SAN. Disk latency (response time) is a good indicator of disk performance.
+
-
 
+
-
To view the disk performance indicators, go to the ESXi host's performance tab and select the advanced button. The appropriate datastore needs to be selected, which can be found on the datastore page (see below). Under chart options, select disk and timeframe, then select the following counters:
+
-
 
+
-
*Physical device command latency
+
-
*Kernel command latency
+
-
*average commands issued per second
+
-
 
+
-
The kernel counter should not be greater than 2-3 ms. The physical device counter should not be greater than 15-20 ms. The "average commands issued per second" counter can be used if IOPS are not available from the SAN. IOPS should be considered if it looks like datastore is overload. This IOPS data is viewable from the host and each VM. Note, for NFS datastores, the physical and kernel latency data is not available. Starting in VMware 4.0 update 2 and beyond the esxtop command (see below) can be used to view NFS counters and in particular the guest latency (called GAVG in esxtop). The guest latency is a summation of the physical device and kernel latencies.
+
-
 
+
-
On the C-series UCS servers there have been issues with the write cache battery backup. If this battery is not operating correctly, performance will suffer. Use a tool like wbemcli to verify the battery is ok. An example of using the wbemcli:
+
-
<pre>wbemcli ei -noverify 'https://root:&lt;password&gt;@&lt;ESXi Host IP&gt;:5989/root/cimv2:VMware_HHRCBattery'</pre>
+
-
 
+
-
See the [http://techpubs.sgi.com/library/manuals/0000/860-0488-001/pdf/860-0488-001.pdf MegaCli User Guide] for more information.
+
-
 
+
-
= Network Troubleshooting  =
+
-
 
+
-
Generally, network performance issues can be seen by dropped packets. If dropped packets are seen from a ESXi host, the network infrastructure needs to be investigated for the issue, which might include a virtualized switch (Nexus 1000V). In ESXi 4.1, issues have been seen with large file transfers (e.g. SFTP/FTP transfers). For this issue, the Large Receive Offload options need to be disabled on the ESXi host. That setting is found on the host's Configuration tab -&gt; Advanced Settings -&gt; Net.*. Note, there are several LRO settings on this page and all of them need to be disabled. If a VM has been cloned and uses static MAC addresses, verify there are not duplicate MAC addresses in the network. LRO settings:
+
-
 
+
-
To view the network performance indicators, go to the ESXi host's performance tab and select the advanced button. Under chart options, select Network, timeframe, then select the following counters:
+
-
 
+
-
*Receive packets dropped
+
-
*Transmit packets dropped
+
-
 
+
-
The main thing to check is that no packets are getting dropped in the network. <br>  
+
-
 
+
-
{{note| Advanced network debugging and configuration can be done on Nexus 1000v (if used, which requires vCenter and Enterprise Plus licensing).}}<br>
+
-
 
+
-
= Alternate Access to Performance Data  =
+
-
 
+
-
If vCenter and/or the vSphere client are not available, some real time data can be pulled using command line tools. If you have a [http://www.vmware.com/support/developer/vima/ '''vMA VM'''], then the resxtop tool can be used. The resxtop tool is a remote version of the esxtop tool. Otherwise, the esxtop tool can be used directly on the ESXi host (root access must be enabled). See '''http://communities.vmware.com/docs/DOC-11812 for details on esxtop'''. <br> <br>
+
-
 
+
-
----
+
-
 
+
-
<br>
+
-
 
+
-
{| border="1" class="wikitable"
+
-
|-
+
-
! style="background-color: rgb(255, 215, 0);" | '''Back to:''' [[Unified Communications in a Virtualized Environment|Unified Communications in a Virtualized Environment]]
+
-
|}
+

Latest revision as of 15:26, 9 January 2017

Return to Home



Portions of the page has been moved

The new web address for this page is :  http://www.cisco.com/c/dam/en/us/td/docs/voice_ip_comm/uc_system/virtualization/collaboration-virtualization-hardware.html#vmware

Please update your bookmark.

Rating: 4.2/5 (5 votes cast)

Personal tools