There are multiple layers that need to be monitored in a production Openstack deployment environment to achieve Service Assurance. At a high level, these layers have been captured in the figure below –
2. Monitoring Tools Info
Collectd is an agent based system metrics collection tool. An agent is deployed on every host that needs to be monitored. It provides a configurable plugin architecture that enables collection, storage and processing of metrics based on need. The list of plugins supported by collectd is listed here .
Graphite is real-time graphing system. It is compromised of two components – a webapp frontend and a backend storage application. It allows external application to feed monitoring data into it and then uses it’s “carbon” backend agent to process the data and store it in a specialized graphite database (eg: whisper). The graphite processes need to deployed on the monitoring management box only. Some of the advantages of using Graphite are captured here .
Nagios is a service health check alerting system. It has plugin model that allows service checks to be carried on host groups and generate alerts/notifications in case of any issues detected. The Nagios-Core agent is running on the monitoring management box that provides a webapp frontend and a daemon that reads configurations from it’s resource files and object-definition files. Nagios-NRPE is an addon that allows Nagios to execute plugins on remote hosts.
3. Monitoring Stack
Nagios can be used as a stand-alone monitoring tool for graphing and for system metric collection. Also, it can be integrated with an external metrics data collection tool such as collectd using plugins. For graphing purpose, collectd provides a plugin option to provide data to Graphite interface.
a) Graphite Interface integrated with Collectd
b) Nagios Service Health Check Interface