Howto create a Monitoring Cloud Product with Puppet and Nagios for openQRM Cloud

This HowTo is about how to create a Cloud Product for Automatic-Application monitoring on the Cloud systems (VMs and Bare-Metal) with Puppet and Nagios 3 on an openQRM Cloud environment.

Requirements

  • One (or more) physical Server
  • at least 1 GB of Memory
  • at least 100 GB of Diskspace
  • VT (Virtualization Technology) enabled in the Systems BIOS so that the openQRM Server can run Virtual Machines later

Install openQRM on Debian

Install a minimal Debian on a physical Server.

Install and initialize openQRM

Please notice!
A detailed Howto about the above initial starting point is available at Install openQRM on Debian, Virtualization with KVM and openQRM on Debian and Cloud Computing with openQRM on Debian

For this howto we assume you have successfully made it through Cloud Computing with openQRM on Debian.

Goal

The Goal for this Howto is to create a "Monitoring" Cloud Product which the following properties:

  • It needs to be fully automated without regular manual tasks for the system administrator(s)
  • It needs to "selectable" as a regular Cloud Product in the openQRM Cloud Portal and in openQRM Enterprise Cloud Zones.
  • It needs to provide a basic system monitoring (ping, ssh) but also should support a custom Nagios monitoring configuration for all standard applications which are available as Cloud Products e.g.
    Apache ( http , https, certificate)
    MySQL (TCP 3306 , check_mysql )
    and other custom Application Stacks added to the openQRM Puppet plugin
  • It needs to send out Email notification to the purchaser of the Cloud System (VM or Bare-Metal)
  • The dynamically generated monitoring configuration must not affect an existing static monitoring already configured

Idea

Puppet has very good features for automatically configuring Nagios. openQRM Cloud automatically generates the Puppet configuration for a specific Cloud User System automatically according to the purchased application products. As an additional step it should store the Puppet configuration in the Database and with an asyncronous process (e.g. cron) automatically generate a custom monitoring Nagios configuration from the "Stored Configuration" in the Puppet Database.

Please check http://projects.puppetlabs.com/projects/puppet/wiki/Using_Stored_Configuration for more informations about "Stored Configurations" with Puppet.

Requirements

Since the "Monitoring" Cloud Product should be "purchasable" by the openQRM Cloud customer as any other Cloud Application Product it must be defined in the openQRM Cloud Product configuration. The best options to integrate the automated Nagios configuration is with either Puppet or Ansible. In this Howto we are going to use Puppet.

The next problem is that the Puppet Client for automatically configuring the target Cloud System is running on the Cloud System itself while the Nagios Monitoring is running on the openQRM Server. The solution for this is called "Stored Configuration". Therefore, we to customazie the PuppetMaster configuration on the central openQRM Server in the following way:

Please notice!
In the following configuration commands and Puppet classes please replace
OPENQRM_SERVER_HOSTNAME with your openQRM Servers hostname
SECRET_DB_PASSWORD with your password for the openQRM Database.
OPENQRM_DOMAIN_NAME with your actual configured Domainname in the DNS-Plugin configuration

1. Install Ruby GEM

wget --proxy http://production.cf.rubygems.org/rubygems/rubygems-2.1.7.tgz
tar -xvzf rubygems-2.1.7.tgz
cd rubygems-2.1.7
ruby setup.rb

2. Install Rails und MySQL

gem install rails -v 2.2.2
gem install mysql -- --with-mysql-config=/usr/bin/mysql_config

3. Create a Database and user for the stored Puppet configuration

create database puppet;
grant all privileges on puppet.* to puppet@localhost identified by 'SECRET_DB_PASSWORD';
flush privileges;

4. Configure the Puppet Master for stored configuration

Create a local working copy of the openQRM puppet.conf

cp /usr/share/openqrm/plugins/puppet/web/puppet/puppet.conf puppet.conf

Then please adapt it as below

[master]
templatedir=/var/lib/puppet/templates
storeconfigs = true
dbadapter = mysql
dbuser = puppet
dbpassword = SECRET_DB_PASSWORD
dbserver = localhost
dbsocket = /var/lib/mysql/mysql.sock
dbname = puppet

[agent]
classfile = $vardir/classes.txt
server = OPENQRM_SERVER_HOSTNAME

Now copy back the puppet.conf into openQRM

cp puppet.conf /usr/share/openqrm/plugins/puppet/web/puppet/puppet.conf

5. Restart Puppet and check its new functionality

/etc/init.d/puppetmaster restart

After some time there are several new tables created in the Puppet Database storing the Puppet configuration.

Thus, the prerequisites have been completed.

Puppet recipe "c_monitoring" for the Cloud Server

Due to the existing Puppet integration within openQRM, we only need to create a suitable Puppet class.

The first part of the following c_monitoring Puppet class generates the basic monitoring (ssh, ping) for the target Cloud Server.

The second part starting with "if defined (Class [])" generates the specific Application Monitoring.

class c_monitoring {
  # define Server, contactgroup and base service checks in Nagios
  @@nagios_host { "hostdefinition_${hostname}":
    use => "default-template",
    host_name => "${hostname}",
    alias => "${fqdn}",
    address => "${ipaddress_eth1}",
    check_command => "check-host-alive",
    max_check_attempts => "3",
    #checks_enabled => "1",
    notification_interval => "60",
    target => "/etc/nagios/conf.d/${hostname}_host_definition.cfg"
  }

  @@nagios_contactgroup { "contactgroup_${hostname}":
    contactgroup_name => "contactgroup_${hostname}",
    alias => "${hostname} Admins",
    members => "nagiosadmin",
    target => "/etc/nagios/conf.d/${hostname}_contactgroup_definition.cfg"
  }

  @@nagios_service { "check_ping_${hostname}":
    use => "default-service-template",
    host_name => "${hostname}",
    service_description => "check_ping",
    check_command => "check_ping!100.0,20%!500.0,60%",
    max_check_attempts => "3",
    normal_check_interval => "5",
    retry_check_interval => "1",
    check_period => "24x7",
    notification_interval => "60",
    notification_period => "24x7",
    notification_options => "c,w,u,r",
    contact_groups => "contactgroup_${hostname}",
    target => "/etc/nagios/conf.d/${hostname}_service_definition.cfg"
  }

  @@nagios_service { "check_ssh_${hostname}":
    use => "default-service-template",
    host_name => "${hostname}",
    service_description => "check_ssh",
    check_command => "check_ssh",
    max_check_attempts => "3",
    normal_check_interval => "5",
    retry_check_interval => "1",
    check_period => "24x7",
    notification_interval => "60",
    notification_period => "24x7",
    notification_options => "c,w,u,r",
    contact_groups => "contactgroup_${hostname}",
    target => "/etc/nagios/conf.d/${hostname}_service_definition.cfg"
  }

  # Monitoring for Apache
  if defined(Class[webserver]) or defined(Class[lamp]) {
    @@nagios_service { "check_http_${hostname}":
      use => "default-service-template",
      host_name => "${hostname}",
      service_description => "check_http",
      check_command => "check_http",
      max_check_attempts => "3",
      normal_check_interval => "5",
      retry_check_interval => "1",
      check_period => "24x7",
      notification_interval => "60",
      notification_period => "24x7",
      notification_options => "c,w,u,r",
      contact_groups => "contactgroup_${hostname}",
      target => "/etc/nagios/conf.d/${hostname}_service_definition.cfg"
    }

    @@nagios_service { "check_https_${hostname}":
      use => "default-service-template",
      host_name => "${hostname}",
      service_description => "check_https",
      check_command => "check_https",
      max_check_attempts => "3",
      normal_check_interval => "5",
      retry_check_interval => "1",
      check_period => "24x7",
      notification_interval => "60",
      notification_period => "24x7",
      notification_options => "c,w,u,r",
      contact_groups => "contactgroup_${hostname}",
      target => "/etc/nagios/conf.d/${hostname}_service_definition.cfg"
    }
  }

  # Monitoring for Mysql
  if defined(Class[database-server]) or defined(Class[lamp]) {
    @@nagios_service { "check_mysql_${hostname}":
      use => "default-service-template",
      host_name => "${hostname}",
      service_description => "check_mysql",
      check_command => "check_mysql",
      max_check_attempts => "3",
      normal_check_interval => "5",
      retry_check_interval => "1",
      check_period => "24x7",
      notification_interval => "60",
      notification_period => "24x7",
      notification_options => "c,w,u,r",
      contact_groups => "contactgroup_${hostname}",
      target => "/etc/nagios/conf.d/${hostname}_service_definition.cfg"
    }
  }
}

Puppet recipe for the PuppetMaster Server (openQRM)

The following Puppet recipe generates the Nagios configuration from the stored configuration of the Puppet configruation in the Database and enriches with Cloud user name and email plus a nagios reload. Since this is done periodically every 20 minutes it does not affect other monitoring processes.

class nagios {
  Nagios_host <<||>> {
    notify => [Exec[gen_nagios_contacts],Exec[make-nag-cfg-readable],Service[nagios]],
  }
  Nagios_service <<||>> {
    notify => [Exec[gen_nagios_contacts],Exec[make-nag-cfg-readable],Service[nagios]],
  }
  Nagios_contactgroup <<||>> {
    notify => [Exec[gen_nagios_contacts],Exec[make-nag-cfg-readable],Service[nagios]],
  }

  # get the Cloud User contact and email from the openQRM Database
  exec { 'gen_nagios_contacts':
    command => "/app/openqrm/tools/gen_nagios_contacts",
    cwd => "/app/openqrm/tools",
    path => "/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/global/bin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin",
  }

  # make the Nagios configuration readable
    exec {'make-nag-cfg-readable':
    command => "find /etc/nagios -type f -name '*cfg' | xargs chmod +r",
    cwd => "/etc/nagios",
    path => "/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/global/bin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin",
    require => Exec[gen_nagios_contacts],
  }

  # Nagios Service
  service { nagios:
    ensure => running,
    enable => true,
    require => Exec[make-nag-cfg-readable],
  }
}

node 'OPENQRM_SERVER_HOSTNAME' {
  include nagios
}

Shell script for generating the us the "gen_nagios_contacts"

#!/bin/bash

NAGIOS_CONF_DIR="/etc/nagios/conf.d"
MYSQL="/usr/bin/mysql -uroot -pSECRET_DB_PASSWORD openqrm -s -N"
for DATEI in ${NAGIOS_CONF_DIR}/*_contactgroup_definition.cfg
do
  HOSTNAME="`basename ${DATEI}|cut -d'_' -f1`"
  USERNAME="`echo "select u.cu_name from cloud_requests r INNER JOIN cloud_users u ON r.cr_cu_id = u.cu_id where cr_status='3' AND cr_appliance_hostname = '${HOSTNAME}';"|${MYSQL}`" 
  EMAIL="`echo "select u.cu_email from cloud_requests r INNER JOIN cloud_users u ON r.cr_cu_id = u.cu_id where cr_status='3' AND cr_appliance_hostname = '${HOSTNAME}';"|${MYSQL}`"

  if [ "X${USERNAME}" == "X" ]
  then
    rm /etc/nagios/conf.d/${HOSTNAME}_*
    continue
  fi
  if [ "X${EMAIL}" == "X" ]
  then
    rm /etc/nagios/conf.d/${HOSTNAME}_*
    continue
  fi

echo "
define contact{
  contact_name  ${USERNAME}
  use  generic-contact
  alias  Kontakt fuer Appliance ${HOSTNAME}
  email  ${EMAIL}
}
" > ${NAGIOS_CONF_DIR}/user_${USERNAME}.cfg

echo "
define contactgroup {
  members      nagiosadmin, ${USERNAME}
  contactgroup_name  contactgroup_${HOSTNAME}
  alias      ${HOSTNAME} Admins
}
" >${DATEI}
done
# allow Nagios to re-read its configuration
chmod -R +r /etc/nagios/conf.d

Automatic removal of the Monitoring-configuration

To remove the Monitoring-configuration of a Server during stop the "/usr/share/openqrm/plugins/ip-mgmt/web/openqrm-ip-mgmt-external-dns-hook.php" hook from the IP-Management plugin can be used by adding:

remove)
        shift
        remove_dns_ptr_record $@
        remove_dns_a_record $@
        rm /etc/nagios/conf.d/${1}_*
        /etc/init.d/nagios reload
        /app/openqrm/tools/kill_node_in_storedconfigs_db.rb ${1}.OPENQRM_DOMAIN_NAME

Here the content for the /app/openqrm/tools/kill_node_in_storedconfigs_db.rb tool:

#!/usr/bin/env ruby

require 'puppet/rails'

Puppet[:config] = "/etc/puppet/puppet.conf"
Puppet.parse_config
pm_conf = Puppet.settings.instance_variable_get(:@values)[:master]

adapter = pm_conf[:dbadapter]
args = {:adapter => adapter, :log_level => pm_conf[:rails_loglevel]}

case adapter
  when "sqlite3":
    args[:dbfile] = pm_conf[:dblocation]
  when "mysql", "postgresql":
    args[:host]     = pm_conf[:dbserver] unless pm_conf[:dbserver].empty?
    args[:username] = pm_conf[:dbuser] unless pm_conf[:dbuser].empty?
    args[:password] = pm_conf[:dbpassword] unless pm_conf
[:dbpassword].empty?
    args[:database] = pm_conf[:dbname]
    socket          = pm_conf[:dbsocket]
    args[:socket]   = socket unless socket.empty?
  else
    raise ArgumentError, "Invalid db adapter %s" % adapter
end

ActiveRecord::Base.establish_connection(args)

if @host = Puppet::Rails::Host.find_by_name(ARGV[0].strip)
  print "Killing #{ARGV[0]}..."
  $stdout.flush
  @host.destroy
  puts "done."
else
  puts "Can't find host #{ARGV[0]}."
end

Post Configuration

The following post configuration must be done at the end:

  • Adding the configuration for the corresponding Nagioas check Command Definitions
  • Adding the new "Monitoring" Cloud Product to the openQRM Cloud via the openQRM Cloud Product Manager
  • In our test on Redhat 6 the reload action in the init script /etc/init.d/nagios was faulty. Please make sure the "reload" action in the Nagios init script works correctly.

Conclusion

This Howto shows how easy it is to add "your own ideas" to openQRM!

Links