How FACT Works
Scanning and Reporting
FACT scans a subnet by logging in to the management ports of many switches, finding the master Subnet Manager, and interrogating it through the CLI. FACT then constructs a view of the network topology and saves it to a file to answer later queries. If presented with a network with no Subnet Managers or with several master Subnet Managers, FACT has reduced function, but it can report the location of the master and standby Subnet Managers. For more information about the Subnet Manager, see the "Understanding the Subnet Manager" section.
FACT has a set of queries to scan the fabric, similar to those in the Subnet Manager, where it can show which switches are in the network, which ports are active, how they are connected, and so on. By reporting this information, FACT differentiates itself from the Subnet Manager show commands because it reports both at the chip level (InfiniBand nodes and ports) and at the chassis level (chassis, slots, and external ports).
Along with scanning the fabric for information about connectivity, FACT can perform a thorough technical-support scan, in which it collects voluminous diagnostic information from every SFS OS switch, collecting information that Cisco TAC or engineering can use to diagnose problems remotely. For more information about scans, see the "Scanning" section.
All information that FACT collects is stored in its repository, which is a directory in the file system. The FACT repository contains the following items:
- Results of all scans
- A log of analysis errors
- Transcripts of all switch CLI sessions
- A "last-known neighbor" map used to remember neighbor relationships after links go down
- A pointer to the current scan
The repository also maintains the notion of a "current scan." Because queries are always performed against the current scan, the repository has a history mechanism that allows the current scan to be rolled back to an earlier version. For more information about the repository, see the "Maintaining the Repository" section.
The scanning function requires that FACT be able to log in to each switch in the network. Also, if you are using HSM, FACT must be able to log in to each host that is running HSM> Fact has a variety of ways to connect:
- Using SSH to connect to a switch management port
- Using SSH to log in to a host running HSM and starting the HSM CLI
- Using SSH to log in to a host running HSM and running the HSM CLI directly as a subprocess if HSM and FACT are on the same host
The credential files control which mechanisms FACT uses for each managed node. For more information about credentials files, see the "About Credentials Files" section.
FACT can continuously monitor the health of a subnet. It does so by monitoring the syslog from the Subnet Manager. FACT detects that certain log messages signify real problems, so if you use the pass-through parameter in the command, FACT passes these messages through and prints the log to the screen. If you use the filter parameter, only those log messages that are annotated print to the screen, and FACT adds annotations to numeric constants to put names to the constants. For example, FACT annotates an InfiniBand node GUID with its corresponding chassis type, its chassis location, and its slot/card and chip number. FACT also annotates and InfiniBand port GUID with its chassis type and chassis location, and also the chassis type and location of its neighbor. FACT also watches for Performance Monitor (PM) messages and annotates PM warnings about port error rates. To configure FACT monitoring, the system administrator must configure the hosts and switches that can run a Subnet Manager to route their syslog output to the host that is running FACT. FACT then reads the combines syslog messages an writes to its standard output.
FACT can query the firmware versions on switches in a network and perform firmware updates, either on individual switches or on multiple switches simultaneously. FACT can also control ports. FACT can enable or disable one or several ports, individually, and change their width and speed. Using this port-control capability is more desirable than using the existing switch CLI because FACT is a single point of control. FACT allows the port to be specified by either the chassis or the chip location.
Hardware and Software Compatibility
FACT runs on a Linux host and requires the following supported software distributions:
- RedHat Enterprise Linux, Version 4 or 5, or SUSE Linux Enterprise Server Distribution, Version 9 or 10
- Python, Version 2.3 or later
FACT can log in to Cisco Server Fabric Switches that run Cisco SFS OS, to OEM switches from QLogic (Cisco SFS 7012 and Cisco SFS 7024), and to Unix/Linux-based hosts, including those that run the host-based Subnet Manager. FACT can also monitor and control unmanaged switches to a limited extent by using in-band InfiniBand messages. FACT uses the ibspark tool to upgrade firmware on unmanaged switches. FACT uses the ibportstate tool to control ports. These tools must be installed and available on a Linux host that is directly connected to the InfiniBand network and that FACT can log in to through SSH. FACT performs all other operations through IP to the management ports on a device, so it can run on any network-connected host. FACT can optionally use configurations that map node GUIDs and system image GUIDs to user-specified names. This option is especially useful when working with unmanaged switches because unmanaged switches do not have IP addresses or other identifiers; their only identifiers are their node GUIDs. However, GUID names can be used for any switch or channel adapter.
Understanding the Subnet Manager
FACT works closely with the Subnet Manager to understand the network fabric because the Subnet Manager configures and maintains fabric operations. The Subnet Manager is the central repository of all information that is required to set up and bring up the InfiniBand fabric. The master Subnet Manager does the following:
- Discovers the fabric topology
- Discovers end nodes
- Configures switches and end nodes with their parameters, such as the following:
- Partition Keys (P_Keys)
- Configures switch forwarding tables
- Receives traps from Subnet Management Agents (SMAs)
- Sweeps the subnet, discovering topology changes and managing changes as nodes are added and deleted
A network may contain multiple Subnet Managers acting as standbys, but it may contain only one master Subnet Manager.
Understanding Secure Shell
Secure Shell (SSH) is a network protocol that provides a secure remote access connection to network devices. Cisco FACT uses SSH to provide secure communication from network computers to the Cisco SFS OS CLI.