Service Status via SaltStack 2014.7 with Nagios
One of the most exciting new features to me in SaltStack 2014.7 is the nagios module. This module supports remote execution of nagios-plugins on your minions. It can also execute pre-defined lists of checks and targets defined (and targeted) in a Pillar. Jinja templating and Grains can of course be used as well, making for an extremely versatile monitoring and testing solution.
I haven’t implemented anything crazy with it yet, but I am really seeing the power of this. A couple ideas off the top of my head:
- Service configuration best practices checklists
- Distributed connectivity tests
- Distributed latency reporting
I would like to build a very simple internal help desk service status page with no Nagios back-end requirement, perhaps even just running out of a
*/5 * * * * ... cron with the JSON output from
salt --out=json -s -G roles:monitoring nagios.run_all_pillar nagios_test redirected to a file in a web directory, then displayed to help desk techs via jQuery/HTML5 generated page with nice green/yellow/red statuses for each item monitored based on the return codes of the checks.
** Update: ** My doc changes have been committed! I’ve got one commit in SaltStack!
Below is a working simple sample nagios check Pillar and some output from when it is run.
base: '*': - nagios
nagios_test: Ping_google: - check_icmp: 18.104.22.168 - check_icmp: google.com Load: - check_load: -w 0.8 -c 1 APT: - check_apt
This check Pillar can then be run with
salt nagios.\* nagios.run_pillar nagios_test, and produces output like below:
nagios.henchman21.net: ---------- APT: ---------- check_apt: APT OK: 0 packages available for upgrade (0 critical updates). Load: ---------- check_load_-w0.8-c1: OK - load average: 0.09, 0.13, 0.13|load1=0.090;0.800;1.000;0; load5=0.130;0.800;1.000;0; load15=0.130;0.800;1.000;0; Ping_google: ---------- check_icmp_22.214.171.124: OK - 126.96.36.199: rta 30.415ms, lost 0%|rta=30.415ms;200.000;500.000;0; pl=0%;40;80;; rtmax=30.718ms;;;; rtmin=30.226ms;;;; check_icmp_google.com: OK - google.com: rta 1.328ms, lost 0%|rta=1.328ms;200.000;500.000;0; pl=0%;40;80;; rtmax=1.425ms;;;; rtmin=1.247ms;;;;