Recovering from database corruption during switch version to 8.5(1) SU3

From DocWiki

(Difference between revisions)
Jump to: navigation, search
(Recovering from database corruption during switch version to 8.5(1) SU3)
 
(3 intermediate revisions not shown)
Line 1: Line 1:
-
== Recovering from database corruption during switch version to 8.5(1)&nbsp;SU3<br>  ==
+
== Recovering from database corruption during switch version to 8.5(1)&nbsp;SU3 or later<br>  ==
{| border="1"
{| border="1"
Line 12: Line 12:
! '''Error Message'''  
! '''Error Message'''  
| If customer has rebooted during switch version and the database has become corrupted, subsequent switch versions will not be allowed to<br>go through until the database has been recovered from the good backup.<br>When a switch version fails due to database corruption, it can be detected from reviewing the install logs (/var/log/install/uccx-install.log)<br>for the following message&nbsp;:  
| If customer has rebooted during switch version and the database has become corrupted, subsequent switch versions will not be allowed to<br>go through until the database has been recovered from the good backup.<br>When a switch version fails due to database corruption, it can be detected from reviewing the install logs (/var/log/install/uccx-install.log)<br>for the following message&nbsp;:  
-
'''''Cisco Unified CCX DB appears to be corrupt.<br>Please use the CLI command \"utils uccx switch-version [db-check &#124; db-recover]\" to recover the CCX DB before retrying the switch version<br>If this CLI command is not available in your release of CCX software, please contact Cisco Technical Assistance. for this problem...'''''  
+
'''''Cisco Unified CCX DB appears to be corrupt.<br>Please use the CLI command "utils uccx switch-version [db-check &#124; db-recover]" to recover the CCX DB before retrying the switch version<br>If this CLI command is not available in your release of CCX software, please contact Cisco Technical Assistance. for this problem...''''' <br> <br>'''Note:''' The following message in the install logs can be ignored when upgrading from lower versions to 8.5(1) SU3.<br> ''/partB//opt/cisco/uccx/bin/uccx_db_l2_upgrade_partA.sh: line 220: /base_scripts/xml_writer.sh: No such file or directory''
 +
 
|-
|-
Line 20: Line 21:
! '''Recommended Action'''  
! '''Recommended Action'''  
| &nbsp;A CLI command has been introduced for recovering the database in this scenario but it can be used only in future when upgrading from 8.5(1)SU3 to higher releases.  
| &nbsp;A CLI command has been introduced for recovering the database in this scenario but it can be used only in future when upgrading from 8.5(1)SU3 to higher releases.  
-
'''''For recovery from this issue when it happens when upgrading from lower releases to 8.5(1) SU3, a recovery script (tac_sv_recover_db.sh) is being provided to TAC.'''''  
+
'''''For recovery from this issue when it happens when upgrading from lower releases to 8.5(1) SU3 or later, a recovery script (tac_sv_recover_db.sh) is being provided to TAC.'''''  
(Script Location:&nbsp;http://zed.cisco.com/confluence/display/CRSSU/List+of+ETs+given+to+customers)<br>  
(Script Location:&nbsp;http://zed.cisco.com/confluence/display/CRSSU/List+of+ETs+given+to+customers)<br>  
-
It can be used only in the following specific scenario&nbsp;:<br>-Switch version failure happened when migrating to 8.5(1) SU3<br>-Install logs indicate that switch version failed due to detection of database corruption.<br>-The last switch version attempt was not successful.
+
It can be used only in the following specific scenario&nbsp;:<br>-Switch version failure happened when migrating to 8.5(1) SU3 or later<br>-Install logs indicate that switch version failed due to detection of database corruption.<br>-The last switch version attempt was not successful.<br>-The script must be run during a maintenance window without no call activity in the system.
The script will display the timestamp of the database backup that was taken prior to Step 1 and offer to restore this backup.  
The script will display the timestamp of the database backup that was taken prior to Step 1 and offer to restore this backup.  

Latest revision as of 20:27, 9 January 2013

Recovering from database corruption during switch version to 8.5(1) SU3 or later

Problem Summary There have been a lot of customer cases due to database corruption happening during switch version. The root cause is that customer usually
feels that switch version is not progressing and restarts the CCX server in the middle of a switch version. This usually results in corruption of CCX database tables which subsequently require a lot of time and effort on the part of TAC and DE to recover.

To address this issue,  changes have been in the switch version code in 8.5(1) SU3 to detect this corruption during switch version and
to aid in easy recovery.
Traditionally during switch version, a database backup is taken as the first step in the database switch version script.
The scenario where the database gets corrupted during switch version is as follows :
1. Customer initiates first switch version. A good database backup is taken in the db switch version script. Customer subsequently restarts the box before switch version completes.
2. At this point the server comes up with a corrupt database due to the restart in step 1.
3. Customer initiates second switch version. A new database backup is taken in the db switch version script which overwrites the good backup taken in Step 1 and rules out any chance of recovery using the good database backup.

As part of the new changes introduced in 8.5(1) SU3, the CCX database is first checked for corruption before a database backup is taken in the db
switch version script. If the database is found corrupted, the switch version is aborted without taking a new backup and hence the original good
database backup taken in Step 1 is protected from being overwritten..

Error Message If customer has rebooted during switch version and the database has become corrupted, subsequent switch versions will not be allowed to
go through until the database has been recovered from the good backup.
When a switch version fails due to database corruption, it can be detected from reviewing the install logs (/var/log/install/uccx-install.log)
for the following message :

Cisco Unified CCX DB appears to be corrupt.
Please use the CLI command "utils uccx switch-version [db-check | db-recover]" to recover the CCX DB before retrying the switch version
If this CLI command is not available in your release of CCX software, please contact Cisco Technical Assistance. for this problem...


Note: The following message in the install logs can be ignored when upgrading from lower versions to 8.5(1) SU3.
/partB//opt/cisco/uccx/bin/uccx_db_l2_upgrade_partA.sh: line 220: /base_scripts/xml_writer.sh: No such file or directory


Possible Cause Root cause of this issue is customer restarting the CCX server while the switch version is in progress resulting in a corruption of the database.
Recommended Action  A CLI command has been introduced for recovering the database in this scenario but it can be used only in future when upgrading from 8.5(1)SU3 to higher releases.

For recovery from this issue when it happens when upgrading from lower releases to 8.5(1) SU3 or later, a recovery script (tac_sv_recover_db.sh) is being provided to TAC.

(Script Location: http://zed.cisco.com/confluence/display/CRSSU/List+of+ETs+given+to+customers)

It can be used only in the following specific scenario :
-Switch version failure happened when migrating to 8.5(1) SU3 or later
-Install logs indicate that switch version failed due to detection of database corruption.
-The last switch version attempt was not successful.
-The script must be run during a maintenance window without no call activity in the system.

The script will display the timestamp of the database backup that was taken prior to Step 1 and offer to restore this backup.

Once the script has completed the recovery of the database, a new switch version can be attempted.

If this recovery is performed immediately upon detection of the database corruption, there will be no loss of data.

If the system goes operational without immediate recovery and subsequently this script is used to recover from the database backup taken during switch version, then any changes done on the database after the time of the backup will be lost.

Release 8.5(1)SU3
Associated CDETS # CSCtx89404

Rating: 3.0/5 (1 vote cast)

Personal tools