Cisco MDS SanOS Troubleshooting Guide -- Troubleshooting Cisco Fabric Services

From DocWiki

Revision as of 07:21, 26 July 2010 by Pravraj (Talk | contribs)
Jump to: navigation, search

This section describes procedures used to troubleshoot Cisco Fabric Services (CFS) problems in the Cisco MDS 9000 Family multilayer directors and fabric switches.

_TOC_

Contents

Overview

Many features in the Cisco MDS 9000 Family switches require configuration synchronization in all switches in the fabric. It is important to maintain configuration synchronization across a fabric for consistency.

As of Cisco MDS SAN-OS Release 2.0(1b), Cisco Fabric Services (CFS) provides a common infrastructure for automatic configuration synchronization in the fabric. It provides the transport function as well as a rich set of common services to the applications. CFS can discover CFS-capable switches in the fabric as well as their application capabilities.

Some of the applications that can be synchronized using CFS include:

  • IVR
  • NTP
  • DPVM
  • User roles
  • AAA server addresses, Radius and TACACS daemons
  • SFM
  • SDV
  • syslog
  • port-security
  • call home

As of Cisco MDS SAN-OS Release 3.2(1), the scope of configuration synchronization can be restricted to a limited set of switches within the physical scope of an application. CFS regions are designed to:

  • Fine tune the distribution of configuration for an application.
  • Restrict synchronization or merging of configuration information from a switch to a region, rather than distributing information across the entire physical scope of the application.
  • Span across some or all of the switches in the topology, within the physical scope of the application.

All switches in the fabric must be CFS capable. A Cisco MDS 9000 Family switch is CFS capable if it is running Cisco SAN-OS Release 2.0(1b) or later. Switches that are not CFS capable do not receive distributions and result in part of the fabric not receiving the intended distribution.

CFS has the following features:

  • Implicit CFS usage—The first time you issue a CFS task for a CFS-enabled application, the configuration modification process begins and the application locks the fabric.
  • Pending database—The pending database is a temporary buffer to hold uncommitted information. The uncommitted changes are not applied immediately to ensure that the database is synchronized with the database in the other switches in the fabric. When you commit the changes, the pending database overwrites the configuration database (also know as active database or the effective database).
  • CFS distribution enabled or disabled on a per-application basis—The default (enable or disable) for CFS distribution state differs between applications. If CFS distribution is disabled for an application, then that application does not distribute any configuration nor does it accept a distribution from other switches in the fabric.
  • Explicit CFS commit—Most applications require an explicit commit operation to copy the changes in the temporary buffer to the application database and distributes the new database to the fabric and releases the fabric lock. The changes in the temporary buffer are not applied if you do not perform the commit operation.
  • Globally disable CFS distribution—Use the no cfs enable command, in config mode, to isolate the switch from the rest of the fabric. The results acts like a single switch fabric. All other behaviors by the CFS and CFS enabled application are un-affected.
  • Enable IPV4 and IPV6 distribution from Fabric Manager—Go to Physical Attributes> Switches > CFS. GLOBAL indicates CFS distribution and IP MULTICAST indicates IPV4 and IPV6 distributions.

As of Cisco SAN-OS Release 3.1(2), some applications, such as Inter-VSAN Routing (IVR), require configuration distribution over some specific VSANs. These applications can specify to CFS the set of VSANs over which to restrict the distribution.

Initial Troubleshooting Checklist

Begin troubleshooting CFS issues by checking the following issues first:

Checklist

Checkoff

Verify that CFS is enabled for the same applications on all affected switches.

Verify that CFS distribution is enabled for the same applications on all affected switches.

If the CFS Regions feature is in use, verify that the application is in the same region on all the affected switches.

Verify that there are no pending changes for an application and that a CFS commit was issued for any configuration changes in a CFS enabled application.

Verify that there are no unexpected CFS locked sessions. Clear any unexpected locked sessions.

Verifying CFS Using Fabric Manager

To verify CFS using Fabric Manager or Device Manager, follow these steps:


1.Choose Admin > CFS on Device Manager to verify that an application is listed and enabled. Repeat this on all switches.

2. Step 2 To list the set of switches in which an application is registered with CFS, choose the application configuration menu on Fabric Manager and select the CFS tab. For example, to verify that DPVM is enabled and global distribution is enabled on all switches, choose Fabricxx > All VSANs > DPVM and select the CFS tab. Verify that the Oper field is enabled and the Global filed is enabled for all switches in the fabric.

3. To determine if all the switches in the fabric constitute one CFS fabric, or a multitude of partitioned CFS fabrics using Device Manager, follow these steps:

1. Choose Admin > CFS and highlight the application that you want to verify CFS on.

2. Click Details and select the Merge tab in the Details dialog box.

3. If you see multiple rows in the Merge status table, then the fabric is partitioned into multiple CFS fabrics. Some features enable CFS per VSAN and this is expected. If the selected feature should be fabric wide but you see multiple rows in the Merge status table, then the fabric may be partitioned , and the merge status may show that the merge has failed, is pending, or is waiting.


Verifying CFS Using the CLI

To verify CFS using the CLI, follow these steps:


1. To verify that an application is listed and enabled, issue the show cfs application command to all switches. An example of the show cfs application command follows:

Switch# show cfs application
-------------------------------------------
 Application    Enabled   Scope
-------------------------------------------
ivr            Yes       Physical
ntp            No        Physical
dpvm           Yes       Physical
fscm           Yes       Physical
role           Yes       Physical
radius         Yes       Physical
fctimer        No        Physical
syslogd        No        Physical
callhome       No        Physical
device-alias   Yes       Physical
port-security  Yes       Logical
Total number of entries = 11

The Physical scope means that CFS applies the configuration for that application to the entire switch. The Logical scope means that CFS applies the configuration for that application to a specific VSAN.

2. Verify the set of switches in which an application is registered with CFS, using the show cfs peers name application-name for physical scope applications, and the show cfs peers name application-name vsan vsan-id for logical scope applications.

An example command output for a physical scope application follows:

Switch# show cfs peers name dpvm
Scope      : Physical
--------------------------------------------------
 Switch WWN               IP Address
--------------------------------------------------
 20:00:00:0e:d7:0e:bf:c0  10.76.100.51    [Local]
 20:00:00:0e:d7:00:3c:9e  10.76.100.52    
Total number of entries = 2

Note Note: The show cfs peers name application-name command displays the peers for all VSANs when applied to a logical application.

An example command output for a logical scope application follows:

Switch# show cfs peers name port-security
Scope      :Logical [VSAN 1]
-----------------------------------------------------------
 Domain   Switch WWN               IP Address
-----------------------------------------------------------
 236      20:00:00:0e:d7:00:3c:9e  10.76.100.52    [Local]
 239      20:00:00:05:30:00:6b:9e  10.76.100.167
 101      20:00:00:0d:ec:06:55:c0  10.76.100.205   
Total number of entries = 3
Scope      :Logical [VSAN 2]
-----------------------------------------------------------
 Domain   Switch WWN               IP Address
-----------------------------------------------------------
 239      20:00:00:0e:d7:00:3c:9e  10.76.100.52    [Local]
 211      20:00:00:05:30:00:6b:9e  10.76.100.167
 110      20:00:00:0d:ec:06:55:c0  10.76.100.205   
Total number of entries = 3
Scope      :Logical [VSAN 3]
-----------------------------------------------------------
 Domain   Switch WWN               IP Address
-----------------------------------------------------------
 103      20:00:00:0e:d7:00:3c:9e  10.76.100.52    [Local]
 221      20:00:00:05:30:00:6b:9e  10.76.100.167
 11       20:00:00:0d:ec:06:55:c0  10.76.100.205   
Total number of entries = 3

3. To determine if all the switches in the fabric constitute one CFS fabric, or a multitude of partitioned CFS fabrics, issue the show cfs merge status name application-name command and the show cfs peers name application-name command and compare the outputs. If the outputs contain the same list of switches, the entire set of switches constitutes one CFS fabric. When this is the case the merge status should always show success at all switches. Example command outputs follow:

Switch# show cfs merge status name dpvm
Physical  Merge Status: Success [ Sat Nov 20 11:59:36 2004 ]
 Local Fabric
---------------------------------------------------------
 Switch WWN               IP Address
---------------------------------------------------------
 20:00:00:05:30:00:4a:de  10.76.100.51    [Merge Master]
 20:00:00:0d:ec:0c:f1:40  10.76.100.204
Switch# show cfs peers name dpvm
Scope      : Physical
--------------------------------------------------
 Switch WWN               IP Address
--------------------------------------------------
 20:00:00:0d:ec:0c:f1:40  10.76.100.204   [Local]
 20:00:00:05:30:00:4a:de  10.76.100.51
Total number of entries = 2

If the list of switches in the show cfs merge status name command output is shorter than that of the show cfs peers name command output, the fabric is partitioned into multiple CFS fabrics and the merge status may show that the merge has failed, is pending, or is waiting.


Merge Failure Troubleshooting

During a merge, the merge managers in the merging fabrics exchange their configuration databases with each other. The application on one of them merges the information, decides if the merge is successful, and informs all switches in the combined fabric of the status of the merge. When a merge is successful, the merged database is distributed to all switches in the combined fabric and the entire new fabric remains in a consistent state. A merge failure indicates that the merged fabrics contain inconsistent data that could not be merged.

If a new switch is added to the fabric and the merge status for any application shows "In Progress" for a prolonged period of time, then there may be an active session for that application in some switch. Check the lock status for that application on all the switches using the show cfs lock CLI command. If there are any locks, then the merge will not proceed. Commit the changes or clear the session lock so that the merge can proceed.


Note Note: Merge failures should be analyzed correctly. Exercise caution when choosing a switch for blank commit as small configurations may wipe out the large configurations.

Recovering from a Merge Failure with Fabric Manager

To recover from a merge failure using Fabric Manager, follow these steps:


1. Select the CFS tab for the application that you are configuring and check the merge field to identify a switch that shows a merge failure. For example, choose Fabricxx > All VSANS > DPVM and select the CFS tab to determine if there is a merge failure for DPVM.

2. Set the Config Action drop-down menu to commit and click Apply Changes to restore all peers in the fabric to the same configuration database.


Recovering from a Merge Failure with the CLI

To recover from a merge failure using the CLI, follow these steps:


1. To identify a switch that shows a merge failure, issue the show cfs merge status name application-name command. Example command output follows:

Switch# show cfs merge status name ntp 
Physical  Merge Status:Failure [ Mon Nov 22 06:49:52 2004 ]
 Failure Reason: Conflicting entries in the compared databases
 Local Fabric
---------------------------------------------------------
 Switch WWN               IP Address
---------------------------------------------------------
 20:00:00:05:30:00:6b:9e  10.76.100.167   [Merge Master]
 20:00:00:0e:d7:00:3c:9e  10.76.100.52
 Remote Fabric
---------------------------------------------------------
 Switch WWN               IP Address
---------------------------------------------------------
 20:00:00:0d:ec:06:55:c0  10.76.100.205   [Merge Master]

2. For a more detailed description of the merge failure, issue the show cfs internal session-history name application name detail command. Example command output follows:

switch# show cfs internal session-history name ntp detail
     --------------------------------------------------------------------------------
      Time Stamp                Source WWN               Event
      User Name                 Session ID
     --------------------------------------------------------------------------------
      Fri Aug 24 04:30:19 2007  20:00:00:0d:ec:04:99:c0  LOCK_REQUEST
      admin                     3848
      Fri Aug 24 04:30:19 2007  20:00:00:0d:ec:04:99:c0  LOCK_ACQUIRED
      admin                     3848
      Fri Aug 24 04:30:19 2007  20:00:00:0d:ec:04:99:c0  COMMIT
      admin                     3849
      Fri Aug 24 04:30:19 2007  20:00:00:0d:ec:04:99:c0  LOCK_RELEASE_REQUEST
      admin                     3848
      Fri Aug 24 04:30:19 2007  20:00:00:0d:ec:04:99:c0  LOCK_RELEASED
      admin                     3848
      Fri Aug 24 04:33:07 2007  20:00:00:0d:ec:04:99:c0  LOCK_REQUEST
      admin                     3868
      Fri Aug 24 04:33:07 2007  20:00:00:0d:ec:04:99:c0  LOCK_ACQUIRED
      admin                     3868
     --------------------------------------------------------------------------------

3. Enter configuration mode and issue the application-name commit command to restore all peers in the fabric to the same configuration database. Example command output follows:

Switch# config terminal
Switch(config)# ntp commit
Switch(config)#

Lock Failure Troubleshooting

Resolving Lock Failure Issues Using Fabric Manager

Resolving Lock Failure Issues Using the CLI

System State Inconsistent and Locks Being Held

Clearing Locks Using Fabric Manager

Clearing Locks Using the CLI

Distribution Status Verification

Verifying Distribution Using Fabric Manager

Verifying Distribution Using the CLI

CFS Regions Troubleshooting

Distribution Failure

Regions for Conditional Service

Changing Regions

Rating: 0.0/5 (0 votes cast)

Personal tools