RCA for Engine failover

From DocWiki

Jump to: navigation, search

Abrupt Engine mastership failover to the HA node can happen for multiple reasons

  • CVD process crashed. Engine service is dependent upon CVD service. So if CVD crashes/restarted, Engine too gets restarted, thereby causing mastership failover to the other node.
  • Engine ran into OutOfMemory
    • Check the MIVR logs for the OOM reason.
      • java.lang.OutOfMemoryError: GC overhead limit exceeded
    • Debug it based upon the OOM reason. Refer to How to debug OutOfMemoryError
  • Nodes went into island mode (multiple masters) and recovered. Upon recovery publisher node retains mastership.
    • Check the MCVD logs for the failover logs.
  • Application error happened, and Engine decided to shutdown
    • Look for com.cisco.wfapi.WFKeepAliveException: KeepAliveException in ManagerManagerImpl in MIVR logs.

Rating: 0.0/5 (0 votes cast)

Personal tools