Je suis Charlie

Autres trucs

Accueil

Seulement les RFC

Seulement les fiches de lecture

Mon livre « Cyberstructure »

Ève

My first Nagios plugin in Go

First publication of this article on 4 December 2012


An important feature of the Nagios monitoring system (or of its compatibles like the one I use, Icinga), is that you can easily write your own plugins, to test things that cannot be tested with the "standard" Nagios plugins. I wrote several such plugins in my life, to do tests which were useful to me. This one is the first I wrote in Go, and it tests a DNS zone by querying every name server, thus checking that all name servers are OK.

This is an important test because DNS is too robust for monitoring: even with almost all the name servers down or broken, the zone will work, thus leaving people with a feeling of false security. Hence the idea of testing every name server of the zone (monitoring typically requires that you test differently from what the ordinary user does, to see potential problems). The original idea comes from the check_soa program, distributed in the famous book "DNS and BIND" (the code is available on line).

And why using Go? Well, Nagios plugins can be written in any language but I like Go: faster and lighter than Perl or Python but much easier to deal with than C. A good language for system and network programming.

Nagios plugins have to abide by a set of rules (in theory: in practice, many "quick & dirty" plugins skip many of the rules). Basically, the testing plugin receives standard arguments telling it what to test and with which parameters, and it spits out one line summarizing the results and, more important, exits with a status code indicating what happened (OK, Warning, Critical, Unknown).

That's one point where my plugin fails: the standard Go library for parsing arguments (flag) cannot do exactly what Nagios wants. For instance, Nagios wants repeating options (-v increases the verbosity of the output each time it appears) and they are not possible with the standard module. So, in practice, my plugin does not conform to the standard. Here is its use:

% /usr/local/share/nagios/libexec/check_dns_soa  -h
CHECK_DNS_SOA UNKNOWN: Help requested
  -4=false: Use IPv4 only
  -6=false: Use IPv6 only
  -H="": DNS zone name
  -V=false: Displays the version number of the plugin
  -c=1: Number of name servers broken to trigger a Critical situation
  -i=3: Maximum number of tests per nameserver
  -r=false: When using -4 or -6, requires that all servers have an address of this family
  -t=3: Timeout (in seconds)
  -v=0: Verbosity (from 0 to 3)
  -w=1: Number of name servers broken to trigger a Warning

The most important argument is the standard Nagios one, -H, indicating the zone you want to test. For instance, the TLD .tf :

% /usr/local/share/nagios/libexec/check_dns_soa  -H tf
CHECK_DNS_SOA OK: Zone tf. is fine

% echo $?
0

The return code show that everything was OK. You can test other, non-TLD, domains:

% /usr/local/share/nagios/libexec/check_dns_soa  -H github.com
CHECK_DNS_SOA OK: Zone github.com. is fine

The plugin works by sending a SOA request to every name server of the zone and checking there is an answer, and it is authoritative. With tcpdump, you can see its activity (192.168.2.254 is the local DNS resolver):

# Request the list of name servers
12:00:39.954330 IP 192.168.2.4.58524 > 192.168.2.254.53: 0+ NS? github.com. (28)
12:00:39.979266 IP 192.168.2.254.53 > 192.168.2.4.58524: 0 4/0/0 NS ns2.p16.dynect.net., NS ns3.p16.dynect.net., NS ns1.p16.dynect.net., NS ns4.p16.dynect.net. (114)

# Requests the IP addresses
12:00:40.063714 IP 192.168.2.4.37993 > 192.168.2.254.53: 0+ A? ns3.p16.dynect.net. (36)
12:00:40.064237 IP 192.168.2.254.53 > 192.168.2.4.37993: 0 1/0/0 A 208.78.71.16 (52)
12:00:40.067517 IP 192.168.2.4.47375 > 192.168.2.254.53: 0+ AAAA? ns3.p16.dynect.net. (36)
12:00:40.068085 IP 192.168.2.254.53 > 192.168.2.4.47375: 0 1/0/0 AAAA 2001:500:94:1::16 (64)

# Queries the name servers
12:00:40.609507 IP 192.168.2.4.51344 > 208.78.71.16.53: 0 SOA? github.com. (28)
12:00:40.641900 IP 208.78.71.16.53 > 192.168.2.4.51344: 0*- 1/4/0 SOA (161)
12:00:40.648342 IP6 2a01:e35:8bd9:8bb0:ba27:ebff:feba:9094.33948 > 2001:500:94:1::16.53: 0 SOA? github.com. (28)
12:00:40.689689 IP6 2001:500:94:1::16.53 > 2a01:e35:8bd9:8bb0:ba27:ebff:feba:9094.33948: 0*- 1/4/0 SOA (161)

Then, you can configure Nagios (or Icinga, in my case), to use it:

define command {
        command_name    check_dns_soa
        command_line    /usr/local/share/nagios/libexec/check_dns_soa -H $HOSTADDRESS$ $ARG1$ $ARG2$
        }

define hostgroup
       hostgroup_name my-zones
       members example.com,example.org,example.net
}

define service {
       use generic-service
       hostgroup_name my-zones
       service_description CHECK_DNS_SOA
       check_command check_dns_soa
}

Nagios (or Icinga) forces you to declare a host, even if it makes no sense in this case. So, my zones are declared as a special kind of hosts:

define host{
        name                            dns-zone
        use                      generic-host
	check_command             check-always-up
        register 0
        # Other options, quite ordinary
}

define host{
        use dns-zone
        host_name example.com
}

Nagios wants a command to check that the "host" is up (otherwise, the "host" stays in the "pending" state for ever). That's why we provide check-always-up which is defined as always true:

define command{
        command_name    check-always-up
        command_line    /bin/true
}

With this configuration, it works fine and you can see nice trends of 100 % uptime if everything is properly configured and connected. And if it's not? Then, the plugin will report the zone as Warning or Critical (depending on the options you used, see -c and -w) :

% /usr/local/share/nagios/libexec/check_dns_soa  -H ml
CHECK_DNS_SOA CRITICAL: Cannot get SOA from ciwara.sotelma.ml./217.64.97.50: read udp 192.168.2.4:40836: i/o timeout

% echo $?
2

If you are more lenient and accept two broken name servers before setting the status as Critical, use -c 3 and you'll just get a Warning:

% /usr/local/share/nagios/libexec/check_dns_soa  -v 3 -c 3 -H ml
CHECK_DNS_SOA WARNING: Cannot get SOA from ciwara.sotelma.ml./217.64.97.50: read udp 192.168.2.4:35810: i/o timeout
217.64.100.112 (SERVFAIL) 

% echo $?
1

This plugin is hosted at FramaGit, where you can retrieve the code, report bugs, etc. To compile it, see the README. You'll need the excellent Go DNS library first (a good way to make DNS requests from Go).

Two implementation notes: the plugin does not use one of the Go's greatest strengths, its built-in parallelism, thanks to "goroutines". This is because making it parallel would certainly decrease the wall-clock time to run the tests but, for an unattended program like Nagios, it is not so important (it would be different for an interactive program). And, second, there is a library to write Nagios plugins in Go but I did not use it because of several limitations (and I do not use performance data, where this library could be handy). Note also that other people are using Go with Nagios in a different way.

Thanks to Miek Gieben for his reading and patching.

Version PDF de cette page (mais vous pouvez aussi imprimer depuis votre navigateur, il y a une feuille de style prévue pour cela)

Source XML de cette page (cette page est distribuée sous les termes de la licence GFDL)