BGP.guru

BGP.guru

Nerd blog.

21 Nov 2014

Service Status via SaltStack 2014.7 with Nagios

One of the most exciting new features to me in SaltStack 2014.7 is the nagios module. This module supports remote execution of nagios-plugins on your minions. It can also execute pre-defined lists of checks and targets defined (and targeted) in a Pillar. Jinja templating and Grains can of course be used as well, making for an extremely versatile monitoring and testing solution.

I haven’t implemented anything crazy with it yet, but I am really seeing the power of this. A couple ideas off the top of my head:

  • Service configuration best practices checklists
  • Distributed connectivity tests
  • Distributed latency reporting

I would like to build a very simple internal help desk service status page with no Nagios back-end requirement, perhaps even just running out of a */5 * * * * ... cron with the JSON output from salt --out=json -s -G roles:monitoring nagios.run_all_pillar nagios_test redirected to a file in a web directory, then displayed to help desk techs via jQuery/HTML5 generated page with nice green/yellow/red statuses for each item monitored based on the return codes of the checks.


I have made a documentation pull request to SaltStack, their example pillar on the nagios module docs doesn’t fully work as exists in the docs right now.

** Update: ** My doc changes have been committed! I’ve got one commit in SaltStack!

Below is a working simple sample nagios check Pillar and some output from when it is run.

/srv/salt/top.sls:

base:
  '*':
    - nagios

/srv/pillar/nagios.sls:

nagios_test:
  Ping_google:
    - check_icmp: 8.8.8.8
    - check_icmp: google.com
  Load:
    - check_load: -w 0.8 -c 1
  APT:
    - check_apt

This check Pillar can then be run with salt nagios.\* nagios.run_pillar nagios_test, and produces output like below:

nagios.henchman21.net:
    ----------
    APT:
        ----------
        check_apt:
            APT OK: 0 packages available for upgrade (0 critical updates).
    Load:
        ----------
        check_load_-w0.8-c1:
            OK - load average: 0.09, 0.13, 0.13|load1=0.090;0.800;1.000;0; load5=0.130;0.800;1.000;0; load15=0.130;0.800;1.000;0;
    Ping_google:
        ----------
        check_icmp_8.8.8.8:
            OK - 8.8.8.8: rta 30.415ms, lost 0%|rta=30.415ms;200.000;500.000;0; pl=0%;40;80;; rtmax=30.718ms;;;; rtmin=30.226ms;;;;
        check_icmp_google.com:
            OK - google.com: rta 1.328ms, lost 0%|rta=1.328ms;200.000;500.000;0; pl=0%;40;80;; rtmax=1.425ms;;;; rtmin=1.247ms;;;;

Theodore Baschak - Theo is a network engineer with experience operating core internet technologies like HTTP, HTTPS and DNS. He has extensive experience running service provider networks with OSPF, MPLS, and BGP.