RCA for Engine failover

From DocWiki

Revision as of 06:14, 24 September 2010 by Sdandu (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Abrupt Engine mastership failover to the HA node can happen for multiple reasons

  • CVD process crashed. Engine service is dependent upon CVD service. So if CVD crashes/restarted, Engine too gets restarted, thereby causing mastership failover to the other node.
  • Engine ran into OutOfMemory
    • Check the MIVR logs for the OOM reason.
      • java.lang.OutOfMemoryError: GC overhead limit exceeded
    • Debug it based upon the OOM reason. Refer to How to debug OutOfMemoryError
  • Nodes went into island mode (multiple masters) and recovered. Upon recovery publisher node retains mastership.
    • Check the MCVD logs for the failover logs.
  • Application error happened, and Engine decided to shutdown
    • Look for com.cisco.wfapi.WFKeepAliveException: KeepAliveException in ManagerManagerImpl in MIVR logs.

Rating: 5.0/5 (1 vote cast)

Personal tools