Archivos de la categoría ‘monitoring’

I never liked have to install agents for different tasks like Backups or monitoring. I think that is always enough with SSH. In this post I will introduce some concepts that I am using as an alternative to the NRPE for nagios.

Time ago I explained how to setup SSH for remote monitor servers in Nagios, using the ControlMaster feature to reuse the connection.

In that post I was using runit to keep the connections alive.

But in OpenSSH 5.6 a new feature has been released:

* Added a ControlPersist option to ssh_config(5) that automatically
starts a background ssh(1) multiplex master when connecting. This
connection can stay alive indefinitely, or can be set to
automatically close after a user-specified duration of inactivity.

And this is COOL! We can just use some options in the check_by_ssh plugin to automatically create the session. The options are:

  • -i /etc/nagios/nagiosssh.id_rsa: Private ssh key generated with ssh-keygen.
  • -o ControlMaster=auto: Create the control master socket automatically
  • -o ControlPersist=yes: Enable Control persist. It will spam a ssh process in background that will keep the connection (can be stopped with -O exit)
  • -o ControlPath=/var/run/nagiosssh/$HOSTNAME$: Path to the control socket. We can create a dir in /var/run/nagiosssh.
  • -l nagiosssh -H $HOSTNAME$: User and host were we are connecting.

So, the command definition can be:

define command{
command_name    check_users_ssh
command_line    $USER1$/check_by_ssh \
-o ControlMaster=auto \
-o ControlPath=/var/run/nagios/$HOSTNAME$ \
-o ControlPersist=yes \
-i $USER6$ -H $HOSTADDRESS$ -l $USER5$ \
'check_users -w $ARG1$ -c $ARG2$'

Note: You have to define the USER variables in resources.cfg.

Then we only need to create the proper user in the remote host. To improve the security, you can:

  • Use bash in restricted mode:
    1. Create the user ‘nagiosssh’ with shell=/home/nagiosssh/rbash
    2. Create a script /home/nagiosssh/rbash:
      # Restricted shell for the client.
      # Sets the path to checks
      PATH=/home/icingassh/checks exec /bin/bash --restricted "$@"
    3. Create the directory /home/icingassh/checks  and link here all the desired checks.
  • Restrict the ssh connection setting options in .ssh/authorized_keys. For example:

    no-agent-forwarding,no-port-forwarding,no-pty,no-X11-forwarding,from="" ssh-rsa AAAAB3NzaC1...

Maybe in some days I upload a chef recipe to setup this.

These is one of the proposed solutions for the job assessment commented in a previous post.

Provide a design which is able to parse Apache access-logs so we can generate an overview of which IP has visited a specific link. This design needs to be usable for a 500+ node webcluster. Please provide your configs/possible scripts and explain your choices and how scalable they are.

I will consider these requirements:

  • It is not critical to register all the log entries. It is no needed ensure that all the web hits are registered.
  • No control on duplicated log entries. It is not needed
    to check that the log entries had been already loaded.
  • It is also needed to propose a mechanism to gather the logs from the webservers.
  • It must be scalable.
  • It is a plus to make it flexible to allow further different analysis.

The problems to be solved are log storage and log gathering, but the main bottleneck will be the storage.

One realizes that the best option is a noSQL database due to
the characteristics of the data to process (log entries):

  • Time ordered entries
  • no duplicates
  • need of fast insertion
  • fixed fields
  • no data relation or conceptual integrity
  • need to be rotated (old entries removed)
  • etc…

So, I will propose the usage of MongoDB [1] (, that fits the requirements:

  • It is fast, both at inserting and querying.
  • Scales horizontally without disruption (is initially proper configured).
  • Supports replication and High Availability.
  • Well known solution. Commercial support if needed.
  • Python bindings (pyMongo)
[1] Note: I will not enter in details of a MongoDB scalable HA architecture.
See the quick start guide
to setup a single node and the documentation
for architecture examples.

To parse the logs and store them in MongoDB, I will propose a
simple python script: that:

  • Setup a direct MongoDB connection.
  • Read the access log from standard input.
  • Parse the logs and store all the fields, including: client_ip, url, referer, status code, timestamp, timezone…
  • I do not set any indexes in the NoSQL db. Indexes could be
    created on url or client_ip fields, but not having indexes allows faster
    insertions, that is the objective. The reads are very uncommon and performed
    in batch processes.
  • Notice that it should be improved to be more reliable. For instance, it
    does not check for errors (DB failures, etc.). It could buffer entries in case of DB failure.
  • A second script called queries the DB and prints the access. It gets an optional argument, the relative URL.

To feed the DB with the logs from the webservers, some solutions could be:

  • Copy the log files with a scheduled task via SSH or similar, then process them with in a centralized server (or cluster of servers).

    • Pros: Logs are centralized. Only a set of servers access to MongoDB.
      System can be stopped as needed.
    • Cons: Needs extra programming to get the logs.
      No realtime data.
  • Use a centralized syslog service, like syslog-ng
    (can be balanced and configured in HA),
    and setup all the webservers to send the logs via syslog
    (see this article).

    In the log server, we can process resulting files with a batch process or send all the messages to For instance, the configuration for syslog-ng:

    destination d_prog { program("/apath/"
                                  template-escape(no)); };
    • Pros: Centralized logs. No extra programming. Realtime data.
      Use of existent infrastructure (syslog). Only a set of servers access to MongoDB.
    • Cons: Some logs entries can be dropped. Can not be stopped, if not log entries will be lost.
  • Pipe the webserver logs directly to the script,
    In Apache configuration:

    CustomLog "|/apath/" combined
    • Pros: Easy to implement. No extra programming or infrastructure. Realtime data.
    • Cons: Some logs entries can be dropped. It can not be stopped or log entries will be lost.
      The script should be improved to make it more reliable.

These is one of the proposed solutions for the job assessment commented in a previous post.

The AIX service secldapclntd “Provides and manages connection between the AIX LDAP load module of the local host and LDAP Security Information Server, and handles transactions from the LDAP load module to the LDAP Security Information Server.”

This services fails too often. Each new version of AIX, brings new failures in this services. Failures appear more often if the LDAP server has a lot of users and groups.

When it fails:

  • sometimes does not reply, its hung
  • sometimes it consumes all CPU and lasts a lot to reply
  • sometimes it simply dies with a core.

This script will check and monitor it and restart it if necesary.

You can test this script stoping the service:

kill -STOP $(ps -fea| grep -v grep |grep /usr/sbin/secldapclntd| awk '{print $2}' )

You can add an entry to cron to execute it:

if crontab -l | grep /usr/local/bin/; then
   echo "Already configured."
  crontab -l > /tmp/$$.crontab
  cat >> /tmp/$$.crontab <<EOF
# Check secldapclntd each 5 minutes
5,10,15,20,25,30,35,40,45,50,55 * * * * /usr/local/bin/ check-and-restart > /dev/null
  crontab /tmp/$$.crontab
  rm /tmp/$$.crontab

And here goes the script (/usr/local/bin/

I want to use SSH to remotelly monitor some hosts. The problem with ssh is the overhead related to the creation of new connections.

But we can use the functionality of Control Master from OpenSSH (see Using it, ssh will connect only once reusing the connection.

This is great for monitoring scripts.

To use Control Master, you have to execute something like:

ssh -o "ControlMaster=yes" -o "ControlPath=/somepath/ssh-%r@%h:%p" -N user@host

The idea: create a remote user in each monitored host, stablish persistent ssh connection and use Control Master for the monitoring scripts. To deamonize and control the ssh master connections, I will use runit