Using SSH Control Master for efficient monitoring of remote hosts in Nagios

Publicado: abril 13, 2010 en linux/unix, monitoring, nagios, script, sysadmin
Etiquetas:, , , , , , , , , ,

I want to use SSH to remotelly monitor some hosts. The problem with ssh is the overhead related to the creation of new connections.

But we can use the functionality of Control Master from OpenSSH (seehttp://www.revsys.com/writings/quicktips/ssh-faster-connections.html). Using it, ssh will connect only once reusing the connection.

This is great for monitoring scripts.

To use Control Master, you have to execute something like:

ssh -o "ControlMaster=yes" -o "ControlPath=/somepath/ssh-%r@%h:%p" -N user@host

The idea: create a remote user in each monitored host, stablish persistent ssh connection and use Control Master for the monitoring scripts. To deamonize and control the ssh master connections, I will use runit

Implementing the idea

First, I will create the following directory structure:

  • SSH_CONTROL_MASTER_HOME=/etc/nagios3/ssh/: Main service directory
    • ./controlpath/: Path for the Control Master sockets
    • ./ssh-hosts-available.d/ and ./ssh-hosts-enabled.d/: Directories for hosts entries (It will be runsv directories, see below)
SSH_CONTROL_MASTER_HOME=/etc/nagios3/ssh/
mkdir -p $SSH_CONTROL_MASTER_HOME
mkdir -p $SSH_CONTROL_MASTER_HOME/controlpath
mkdir -p $SSH_CONTROL_MASTER_HOME/ssh-hosts-available.d
mkdir -p $SSH_CONTROL_MASTER_HOME/ssh-hosts-enabled.d

And we create the necesary files for the ssh connection:

  1. SSH keys. You can use passphrase or not, is your choice. If you use passphrase, you should start an SSH Agent:
    ssh-keygen -t rsa -b 2048  -f $SSH_CONTROL_MASTER_HOME/id_rsa
  1. config and known_hosts files for SSH client:
    cat > $SSH_CONTROL_MASTER_HOME/config <<EOF
    host *
        ControlMaster auto
        IdentitiesOnly yes
        ControlPath /etc/nagios3/ssh/controlpath/ssh-%r@%h:%p
        IdentityFile /etc/nagios3/ssh/id_rsa
        UserKnownHostsFile /etc/nagios3/ssh/known_hosts
    EOF
    touch $SSH_CONTROL_MASTER_HOME/known_hosts

Next, we create a default runsv directory for the ssh services (see runsv manpage). What I will do is create a common run script that reads a file called ./host.conf with diferent options.

This script depends of a ./host.conf that defines SSH_USER_HOST. runsv ensures that pwd is ./ .

mkdir $SSH_CONTROL_MASTER_HOME/basic-host/
cat >> $SSH_CONTROL_MASTER_HOME/basic-host/run <<EOF
#!/bin/sh
. ./host.conf || exit 1

exec sudo -u nagios ssh \
        -o "ControlMaster=yes" \
        -F /etc/nagios3/ssh/config \
        -N $SSH_USER_HOST
EOF
chmod +x $SSH_CONTROL_MASTER_HOME/basic-host/run

To easily create the hosts, I will use a script like following. This script copies a “template” host in $SSH_CONTROL_MASTER_HOME/ssh-hosts-available.d and initializes the host.conf. Actually the template contains only a link to $SSH_CONTROL_MASTER_HOME/basic-host/run

# Create the template
mkdir $SSH_CONTROL_MASTER_HOME/ssh-hosts-available.d/template
ln -s ../../basic-host/run $SSH_CONTROL_MASTER_HOME/ssh-hosts-available.d/template/

# Create the script
cat > $SSH_CONTROL_MASTER_HOME/create-host.sh <<"EOF2"
#!/bin/sh
cd $(dirname $0)

if [ "$1" == "" ]; then
        cat <<EOF
Uso:
        $0 [-l] <host>
EOF
fi
HOST=$1

if [ -d ssh-hosts-available.d/$HOST ]; then
        echo "ssh-hosts-available.d/$HOST already exists"
        exit 1
fi

cp -Ra ssh-hosts-available.d/template  ssh-hosts-available.d/$HOST
cat <<EOF > ssh-hosts-available.d/$HOST/host.conf
SSH_USER_HOST=nagios@$HOST
EOF
echo "Execute this command in '`pwd`' to activate the host:"
echo "ln -s ../ssh-hosts-available.d/$HOST ssh-hosts-enabled.d"
EOF2
chmod +x $SSH_CONTROL_MASTER_HOME/create-host.sh

Control script

Now we need a control script to start all the ssh services.

This script is based on Debian init.d script skeleton. It will launch the command runsvdir on $SSH_CONTROL_MASTER_HOME/ssh-hosts-available.d, using start-stop-daemon to daemonize it.

I am not really sure if this is the best way, but it works.

cat > $SSH_CONTROL_MASTER_HOME/nagios-ssh-mastercontrol.ctl.sh <<EOF
#! /bin/sh
### BEGIN INIT INFO
# Provides:          nagios-ssh-mastercontrol
# Required-Start:    $remote_fs
# Required-Stop:     $remote_fs
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: Nagios Master Control ssh services
# Description:       Starts a set of connections via SSH to servers stablishing mastercontrol
#                    connections to be used in nagios monitoring
### END INIT INFO

# Author: Hector Rivas Gandara <keymon@gmail.com>

# Do NOT "set -e"

# PATH should only include /usr/* if it runs after the mountnfs.sh script
SVDIR="/etc/nagios3/ssh/ssh-hosts-enabled.d/"

PATH=/sbin:/usr/sbin:/bin:/usr/bin
DESC="Nagios SSH Master Control connections"
NAME=nagios-ssh-mastercontrol
DAEMON=/usr/bin/runsvdir
DAEMON_ARGS=$SVDIR
PIDFILE=/var/run/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME
USER=nagios

# Exit if the package is not installed
#[ -x "$DAEMON" ] || exit 0

# Read configuration variable file if it is present
[ -r /etc/default/$NAME ] && . /etc/default/$NAME

# Load the VERBOSE setting and other rcS variables
. /lib/init/vars.sh

# Define LSB log_* functions.
# Depend on lsb-base (>= 3.0-6) to ensure that this file is present.
. /lib/lsb/init-functions

#
# Function that starts the daemon/service
#
do_start()
{
        # Return
        #   0 if daemon has been started
        #   1 if daemon was already running
        #   2 if daemon could not be started
        start-stop-daemon -u $USER -u $USER --start  --pidfile $PIDFILE --exec $DAEMON --test > /dev/null \
                || return 1
        start-stop-daemon -b -m -u $USER --start --pidfile $PIDFILE --exec $DAEMON -- \
                $DAEMON_ARGS \
                || return 2
        # Add code here, if necessary, that waits for the process to be ready
        # to handle requests from services started subsequently which depend
        # on this one.  As a last resort, sleep for some time.
}

#
# Function that stops the daemon/service
#
do_stop()
{
#       for i in $SVDIR/*; do
#               if [ -d $i ]; then
#                       echo "Stopping $i..."
#                       sv down $i
#               fi
#       done
        # Return
        #   0 if daemon has been stopped
        #   1 if daemon was already stopped
        #   2 if daemon could not be stopped
        #   other if a failure occurred
        start-stop-daemon --stop  --retry=HUP/30/TERM/5 --pidfile $PIDFILE
        RETVAL="$?"
        [ "$RETVAL" = 2 ] && return 2
        # Wait for children to finish too if this is a daemon that forks
        # and if the daemon is only ever run from this initscript.
        # If the above conditions are not satisfied then add some other code
        # that waits for the process to drop all resources that could be
        # needed by services started subsequently.  A last resort is to
        # sleep for some time.
        start-stop-daemon --stop --quiet --oknodo --retry=0/30/KILL/5 --pidfile $PIDFILE
        [ "$?" = 2 ] && return 2
        # Many daemons don't delete their pidfiles when they exit.
        #rm -f $PIDFILE
        return "$RETVAL"
}

#
# Function that sends a SIGHUP to the daemon/service
#
do_reload() {
        #
        # If the daemon can reload its configuration without
        # restarting (for example, when it is sent a SIGHUP),
        # then implement that here.
        #
        start-stop-daemon -u $USER -m --stop --signal 1 --quiet --pidfile $PIDFILE --name $NAME
        return 0
}

case "$1" in
  start)
        [ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESC" "$NAME"
        do_start
        case "$?" in
                0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
                2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
        esac
        ;;
  stop)
        [ "$VERBOSE" != no ] && log_daemon_msg "Stopping $DESC" "$NAME"
        do_stop
        case "$?" in
                0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
                2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
        esac
        ;;
  #reload|force-reload)
        #
        # If do_reload() is not implemented then leave this commented out
        # and leave 'force-reload' as an alias for 'restart'.
        #
        #log_daemon_msg "Reloading $DESC" "$NAME"
        #do_reload
        #log_end_msg $?
        #;;
  restart|force-reload)
        #
        # If the "reload" option is implemented then remove the
        # 'force-reload' alias
        #
        log_daemon_msg "Restarting $DESC" "$NAME"
        do_stop
        case "$?" in
          0|1)
                do_start
                case "$?" in
                        0) log_end_msg 0 ;;
                        1) log_end_msg 1 ;; # Old process is still running
                        *) log_end_msg 1 ;; # Failed to start
                esac
                ;;
          *)
                # Failed to stop
                log_end_msg 1
                ;;
        esac
        ;;
  *)
        #echo "Usage: $SCRIPTNAME {start|stop|restart|reload|force-reload}" >&2
        echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload}" >&2
        exit 3
        ;;
esac

:
EOF
chmod +x $SSH_CONTROL_MASTER_HOME/nagios-ssh-mastercontrol.ctl.sh

Defining a new service in Nagios via SSH

Finally, we use the nagios plugin check_by_ssh to remotely execute our plugins, adding the options needed to use ControlMaster:

  • -o "ControlMaster=no": This will use the ControlMaster, but not create it if it does not exists.
  • -o "ControlPath=/etc/nagios3/ssh/controlpath/ssh-%r@%h:%p"
  • -o "PasswordAuthentication=no"

For example:

define command{
        command_name    check_myplugin_by_ssh
        command_line    /usr/lib/nagios/plugins/check_by_ssh -H $HOSTADDRESS$ -l nagiosusr  -o "ControlMaster=no" -o "ControlPath=/etc/nagios3/ssh/controlpath/ssh-%r@%h:%p" -o "PasswordAuthentication=no" -C "/a_path/myplugin $ARG2$"
        }

Defining new hosts

First, we should create the new user that will execute nagios plugins in the remote server, and populate its $HOME/.ssh/authorized_keys with $SSH_CONTROL_MASTER_HOME/id_rsa.pub. For instance, we can do:

ssh-copy-id -i $SSH_CONTROL_MASTER_HOME/id_rsa.pub nagiosuser@machine

Next, to define a new host, we use the script create-host.sh

# ./create-host.sh ahost
Execute this command in '/etc/nagios3/ssh' to activate the host:
ln -s ../ssh-hosts-available.d/ahost ssh-hosts-enabled.d

With runsvdir, the service will start automacly if we create the link, as documented:

At least every five seconds runsvdir checks whether the time of last

modification, the inode, or the device, of the services directory dir has changed. If so, it re-scans the service directory, and if it sees a new subdirectory, or new symlink to a directory, in dir, it starts a new runsv(8) process;

But we should firstly accept the ssh host key of the client. We can do it by simply executing the run script:

# cd $SSH_CONTROL_MASTER_HOME/ssh-hosts-enabled.d/ecvignevcml1/
# sudo -u nagios ./run
The authenticity of host 'ecvignevcml1 (172.16.13.14)' can't be established.
RSA key fingerprint is 28:6d:eb:38:42:e9:82:90:22:6e:af:17:ad:86:44:83.
Are you sure you want to continue connecting (yes/no)? yes
<Ctrl+C>

And, now, we can link it:

ln -s ../ssh-hosts-available.d/ahost ssh-hosts-enabled.d

Example: Disk usage monitoring in AIX

I will use the technique described in this article with the restricted shell functionality in bash, to implement the disk usage monitoring in a remote host with AIX. We will use this data:
  • Monitoring user name: monxusr
  • User home: /srv/mon/monxusr
  • User groups: staff,sshcon (ssh restricts the connection to sshcon group)

First of all, we install the plugins: we will compile the nagios plugins in AIX, and install them with stow. I will not comment this:

tar -xvzf nagios-plugins_1.4.11.orig.tar.gz
cd nagios-plugins-1.4.11
./configure --help
./configure --prefix=/usr/local/stow/nagios-plugins-1.4.11
sudo make install
cd /usr/local/stow
sudo mkdir -p /usr/local/share/locale
sudo stow nagios-plugins-1.4.11/

In the remote host we create the user, without password. The shell will be initally the default.

RESTRICTED_USER=monxusr
USERHOME=/srv/mon/monxusr
GROUPS=staff,sshcon 

mkdir $(dirname $USERHOME)
mkuser -R compat id=10102 pgrp=staff groups=staff,sshcon home=$USERHOME maxexpired=-1 maxexpired=-1 loginretries=-1  $RESTRICTED_USER
pwdadm -R compat -c $RESTRICTED_USER

We create the restricted shell script, we add it to valid shells and we change the user to use this shell.

mkdir $USERHOME/bin
cat >$USERHOME/bin/rbash <<EOF
#!/usr/bin/bash -e
export PATH=$USERHOME/bin
f=\$1
if [ "\$1" != "" ]; then
 shift
 exec /bin/bash \$f "\$*"
else
 exec /bin/bash \$*
fi
EOF
chmod +x $USERHOME/bin/rbash

chsec -f /etc/security/login.cfg -s usw -a shells=$(lssec -f /etc/security/login.cfg -s usw -a shells | cut -f 2 -d =),$USERHOME/bin/rbash
chuser -R compat shell=$USERHOME/bin/rbash  $RESTRICTED_USER

We configure ssh keys:

sudo -u $RESTRICTED_USER mkdir $USERHOME/.ssh
sudo -u $RESTRICTED_USER tee -a $USERHOME/.ssh/authorized_keys <<EOF
ssh-rsa AAAAqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnm nagios@nagiosserver
EOF

And finally we link the plugins to the nagios plugins:

ln -s /usr/local/libexec/* $USERHOME/bin

We cant test it:

# su - monxusr
$  check_disk -w 10% -c 5% -p /tmp -p /var -C -w 100000 -c 50000 -p /
DISK CRITICAL - free space: /tmp 913 MB (89% inode=99%); /var 138 MB (22% inode=84%); / 17 MB (6% inode=59%);| /tmp=110MB;921;972;0;1024 /var=469MB;547;577;0;608 /=254MB;-99728;-49728;0;272

To set the nagios server, first we configure the ssh connection as described:

# SSH_CONTROL_MASTER_HOME=/etc/nagios3/ssh/

# cd $SSH_CONTROL_MASTER_HOME
# ./create-host.sh remoteserver
Execute this command in '/etc/nagios3/ssh' to activate the host:

# (cd ./ssh-hosts-enabled.d/remoteserver/; ./run)
The authenticity of host 'remoteserver (192.168.1.2)' can not be established.
RSA key fingerprint is 14:0c:5e:20:0c:22:34:58:e3:da:06:55:fd:e5:58:4e.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'remoteserver,192.168.1.2' (RSA) to the list of known hosts.
<Ctrl+C>

# ln -s ../ssh-hosts-available.d/remoteserver ssh-hosts-enabled.d

# ps -fea| grep ssh |grep remoteserver
nagios   29674 29143  0 17:07 ?        00:00:00 ssh -o ControlMaster=yes -F /etc/nagios3/ssh/config -N monxusr@remoteserver

And we test the plugin:

# time ssh -F /etc/nagios3/ssh/config monxusr@remoteserver check_disk -w 10% -c 5% -p /tmp -p /var -C -w 100000 -c 50000 -p /
DISK CRITICAL - free space: /tmp 913 MB (89% inode=99%); /var 138 MB (22% inode=84%); / 17 MB (6% inode=58%);| /tmp=110MB;921;972;0;1024 /var=469MB;547;577;0;608 /=254MB;-99728;-49728;0;272

real    0m0.068s
user    0m0.008s
sys     0m0.004s

As you see is quite fast if you compare it without ControlMaster:

root@dclinuxapps1:/etc/nagios3/ssh/# time ssh -o ControlMaster=no -i id_rsa -o UserKnownHostsFile=/etc/nagios3/ssh/known_hosts monxusr@xcaixnetiml1 check_disk -w 10% -c 5% -p /tmp -p /var -C -w 100000 -c 50000 -p /

DISK CRITICAL - free space: /tmp 913 MB (89% inode=99%); /var 138 MB (22% inode=84%); / 17 MB (6% inode=58%);| /tmp=110MB;921;972;0;1024 /var=469MB;547;577;0;608 /=254MB;-99728;-49728;0;272

real    0m0.581s
user    0m0.016s
sys     0m0.004s

Finally we define the nagios command and a service for the server:

define command{
        command_name    check_disk_by_ssh
        command_line    /usr/lib/nagios/plugins/check_by_ssh -H $HOSTADDRESS$ -l monxusr -o "ControlMaster=no" -o "ControlPath=/etc/nagios3/ssh/controlpath/ssh-%r@%h:%p" -o "PasswordAuthentication=no" -C "check_disk -w '$ARG1$' -c '$ARG2$' -e -p '$ARG3$'"
        }

define service {
        host_name                       remoteserver
        service_description             remoteserver disk usage
        display_name                    remoteserver disk usage
        check_command                   check_disk_by_ssh!devel!10!5!/
        use                             generic-service
}
comentarios
  1. […] Time ago I explained how to setup SSH for remote monitor servers in Nagios, using the ControlMaster feature to reuse the connection. […]

Responder

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión / Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión / Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión / Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión / Cambiar )

Conectando a %s