An Example of Monitoring a Server Room on the Basis of Nagios and NetPing Devices

  • Published In: Tutorial
  • Created Date: 2018-10-20
  • Hits: 3875

Nagios is one of the service monitoring systems that is provided on the software market. One of the biggest advantages of this system is its flexibility. In this article, a basic Nagios configuration will be regarded for working with devices for server room monitoring NetPing on the example of UniPing server solution v3 and a set of sensors.

To implement a regarded solution, we will need:

Configuring a UniPing server solution v3 Monitoring Device for Working with a Chosen Set of Sensors

To start working with a device, there is a need to perform its basic configuration. To do this, go to the section of settings of a device(1), configure necessary network parameters (2) and access parameters (3). 

Configuring UniPing server solution v3

To plug a 1-Wire sensor, we need to know its unique ID number. The process of plugging sensors is well described in this article. To get the information on plugging and configuring sensors, consult please with the official documentation

1-Wire Temperature Sensors

Plugging a 1-Wire temperature sensor to UniPing server solution v3

1-wire Humidity Sensors

Plugging a 1-Wire humidity sensor to UniPing server solution v3

Supply voltage sensor, a liquid sensordoor sensor, and airflow sensor are connected to IO lines. These sensors are the sensors of "dry contact type". Their configuration is well described in the documentation.

Plugging a sensor of dry contact type to UniPing server solution v3

Installing Nagios

Installation of the Nagios monitoring system is described in the official documentation in details. Let's examine the main points more closely.

The installation must take place when SELinux is disabled or allowed. On default, it is disabled in 18.04, but if you are not sure, there is a need to run a command: 

sudo dpkg -l selinux*

Install necessary packages: 

sudo apt-get update

sudo apt-get install -y autoconf gcc libc6 make wget unzip apache2 php libapache2-mod-php7.2 libgd-dev

Download source files of Nagios and unpack the archive:

cd /tmp
wget -O nagioscore.tar.gz https://github.com/NagiosEnterprises/nagioscore/archive/nagios-4.4.1.tar.gz
tar xzf nagioscore.tar.gz

Compile:

cd /tmp/nagioscore-nagios-4.4.1/
sudo ./configure --with-httpd-conf=/etc/apache2/sites-enabled
sudo make all

Create a user and a group nagios, add a user www-data to the nagios group:

sudo make install-groups-users
sudo usermod -a -G nagios www-data

Install:

sudo make install

Install it as a service and add to autoloading: 

sudo make install-daemoninit

Set a default configuration:

sudo make install-commandmode
sudo make install-config

Install Apache configuration files and configure it:

sudo make install-webconf
sudo a2enmod rewrite
sudo a2enmod cgi

Add exceptions in Firewall:

sudo ufw allow Apache
sudo ufw reload

Create an administrator's account to authorize to Nagios web interface:

sudo htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin

Restart Apache:

sudo systemctl restart apache2.service

Start Nagios: 

sudo systemctl start nagios.service

A system is installed, a web interface is accessible via the address http://192.168.0.150/nagios/, and a window prompting entering login and password appears as a standard dialogue box of a browser:

Nagios authorization form

Then, we are welcomed by a web interface:

Nagios home page

On default, a Nagios configuration includes monitoring of several services of a local server, and we can see them in the section Services:

Nagios list of services

We see a lot of errors. The thing is, a default installation of Nagios does not include standard monitoring plugins (for example, ping, snmp, http, etc; if your case is error-free, this means that the plugins are built into your version). Let's install them:

Packages that are necessary for compilation and operation of standard plugins are as follows:

sudo apt-get install -y autoconf gcc libc6 libmcrypt-dev make libssl-dev wget bc gawk dc build-essential snmp libnet-snmp-perl gettext

Download source files and unpack the archive:

cd /tmp
wget --no-check-certificate -O nagios-plugins.tar.gz https://github.com/nagios-plugins/nagios-plugins/archive/release-2.2.1.tar.gz
tar zxf nagios-plugins.tar.gz

Compile and install:

cd /tmp/nagios-plugins-release-2.2.1/
sudo ./tools/setup
sudo ./configure
sudo make
sudo make install

Restart Nagios:

sudo systemctl start nagios.service

Nevertheless, even now "Services" has errors, which is connected with a polling period. To avoid waiting, it is possible to plan to check an interesting service, for example, during the nearest couple of seconds. To do this, choose one of the services:

Nagios issues with services

See a detailed information about it and different commands. We are interested in "Re-schedule the next check of this service":

Nagios forced service requesting

Then click "Commit", leaving the field with unсhanged:

Nagios forced service requesting

Go back to Services, and repeat the same for each service if necessary (or just wait during the certain time). When all services are checked, we will see the next picture:

Nagios results of service requesting

Within this article, we prepared monitoring plugins for us to watch readings of temperature and humidity sensors, and IO lines. You can download them as an archive and unpack to /usr/local/nagios/libexec. Plugins are written on bash and use a package netsnmp for the operation. Such a solution was chosen as an alternative to a native plugin snmp_check because the last one is not convenient to use when working with several OID.

Then, there is a need to configure commands for these plugins. To do this, add the next line to the file /usr/local/nagios/nagios.cfg:

cfg_file=/usr/local/nagios/etc/objects/np.cfg

Then, create a file /usr/local/nagios/etc/objects/np.cfg with the next contents:

define command {
    command_name np_temp
    command_line $USER1$/np_temp $HOSTADDRESS$ $_HOSTPORT$ $_HOSTCOMMUNITY$ $ARG1$
    # ./get_temp1w ip port community N 
}

define command {
    command_name np_relhum
    command_line $USER1$/np_relhum $HOSTADDRESS$ $_HOSTPORT$ $_HOSTCOMMUNITY$ $ARG1$
    # ./np_relhum ip port community N
}

define command {
    command_name np_io
    command_line $USER1$/np_io $HOSTADDRESS$ $_HOSTPORT$ $_HOSTCOMMUNITY$ $ARG1$ $ARG2$ $ARG3$
    # ./np_io ip port community N normVal alertVal
}

define command {
    command_name np_uptime
    command_line $USER1$/np_uptime $HOSTADDRESS$ $_HOSTPORT$ $_HOSTCOMMUNITY$
    # ./np_uptime ip port community
}

define command {
    command_name np_description
    command_line $USER1$/np_description $HOSTADDRESS$ $_HOSTPORT$ $_HOSTCOMMUNITY$
    # ./np_description ip port community
}

In this file, it is described which script with which parameters must be executed when running a checking plugin/command. It is possible to send local variables of host or service ($HOSTADDRESS$, $_HOSTPORT$, $_HOSTCOMMUNITY$) to a plugin as well as set them manually when calling a plugin ($ARG1$, $ARG2$, $ARG3$). Also, you can choose necessary plugins for you at the Nagios official resource or study the documentation on their independent creating.

At this point, a major configuration of Nagios is over, and it is minimally sufficient for monitoring. For bigger functionality of the system, there is a need to add graphs and a map with a location of hosts. Nagios cannot do this "out of the box" but there are many plugins that add to the system functionality. To build graphs, use PNP4Nagios, for a map - NagMap (Reborn).

Installing and Configuring a PNP4Nagios Plugin

Installing and configuring of this plugin is described in the official documentation in details. Let's focus on the main points with more details.

Install a package for working with RRD databases:

sudo apt-get install rrdtool

Download and unpack the archive:

wget http://docs.pnp4nagios.org/_media/dwnld/pnp4nagios-head.tar.gz
tar -xvzf pnp4nagios-HEAD.tar.gz
cd pnp4nagios

If PNP4Nagios is installed for Nagios (this plugin can also work with a monitoring system Icinga), there is no need to configure it specifically, therefore: 

./configure

Afterward, we will see the results of checking the system for availability of certain components necessary for the operation of PNP4Nagios and paths to files of the plugin on the screen:

PNP4Nagios checking the availability of necessary components

If any option has "Not Found" or an error next to it, there is a need to leave plugin installation aside and focus on eliminating errors. Most often, they can be related to the unavailability of a certain package. If everything is good, we can continue:

make all
make install
make fullinstall

At this point, the installation of PNP4Nagios is complete. Now, there is a need to configure it for a correct operation with our Nagios installation. Configuration and variants of operation are described in the official documentation. Let's consider the main moments:

In this article, we choose a mode "Bulk Mode" because it is recommended as the most stable in the operation. To configure PNP4Nagios in this mode, there is a need to make the line /usr/local/nagios/etc/nagios.cfg look like this:

process_performance_data=1

And add lines to corresponding sections:

#
# service performance data
#
service_perfdata_file=/usr/local/pnp4nagios/var/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC:
:$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE:
:$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file

#
# host performance data starting with Nagios 3.0

host_perfdata_file=/usr/local/pnp4nagios/var/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA:
:$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata-file

Then add the next to the end of the file /usr/local/nagios/etc/objects/commands.cfg:

define command{
command_name process-service-perfdata-file
command_line /usr/local/pnp4nagios/libexec/process_perfdata.pl --bulk=/usr/local/pnp4nagios/var/service-perfdata
}

define command{
command_name process-host-perfdata-file
command_line /usr/local/pnp4nagios/libexec/process_perfdata.pl --bulk=/usr/local/pnp4nagios/var/host-perfdata
}

Restart Nagios:

sudo systemctl start nagios.service

Check the availability and correctness of a configuration of all necessary packages,  as well as the correctness of paths to configuration files. To do this, go to http://192.168.0.150/pnp4nagios/ and see the next:

PNP4Nagios testing the availability of necessary packages

If there is a red line next to any parameter, there is a need to see into the issues with a package (most often it is the old version of the package or it is absent). If you see the same as the screenshot shows, then go to the finishing part of the configuration of PNP4Nagios. Remove file /usr/local/pnp4nagios/share/install.php as it is prompted on the screenshot:

rm /usr/local/pnp4nagios/share/install.php

Then, there is a need to add templates for host and service for gathering data to the RRD database and building graphs. To do this, add the next to the end of the file /usr/local/nagios/etc/objects/templates.cfg:

define host {
    name host-pnp
    action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=_HOST_
    register 0
}

define service {
    name srv-pnp
    action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
    register 0
}

These templates may be enabled for hosts and services by commands use srv-pnp or use host-pnp correspondingly. Add them to our localhost for the host and the service PING. To do this, make the blocks in the file /usr/local/nagios/etc/objects/localhost.cfg look like the next:

define host{
    use linux-server,host-pnp ; Name of host templates to use
                                             ; This host definition will inherit all variables that are defined
                                             ; in (or inherited by) the linux-server host template definition.
    host_name localhost
    alias localhost
    address 127.0.0.1
}

define service{
    use local-service,srv-pnp ; Name of service template to use
    host_name localhost
    service_description PING
    check_command check_ping!100.0,20%!500.0,60%
}

Restart Nagios:

sudo systemctl start nagios.service

Go to a Nagios web interface, the list of services, and see the next:

PNP4Nagios graphs in the Nagios interface

There appeared icons with symbols of a graph next to the host and PING service. Click the icon next to PING:

PNP4Nagios Ping graph

And, we see graphs of a response time for localhost up to one year (if there are no graphs but there are errors instead of them, a solution can be found in documentation). PNP4Nagios takes data for building graphs from the output of Perfomance Data plugin if service monitoring:

Nagios information about a service status

When own plugins are written, there is a need to take care that a plugin returns a line in the format 'label'=value[UOM];[warn];[crit];[min];[max] after its operation, where:

  • label - is a data source name, variable (for example, a response time in ping_check rta);
  • value - is value;
  • UOM - units of measurement;
  • warn - is a value at which a service is switched to the warning status (yellow color in a web interface on default);
  • crit - is a value at which a service is switched to the critical status (red color in a web interface on default);
  • min, max - are minimal and maximal values (bottom and top safe thresholds of values);

More detailed information can be found in the official documentation of Nagios and PNP4Nagios. Out of the box, PNP4Nagios contains templates for building graphs of standard Nagios plugins, they are located at /usr/local/pnp4nagios/share/templates.dist. A syntax of templates is standard for utilities of building graphs from RRD databases. Plugins that have no template (a template name must coincide with a plugin name), use a default template default.php. This template takes all of the data from the RRD database and displays them on a graph with no legend, for example:

PNP4Nagios default graph template

To make it more convenient, we prepared templates of graphs for readings of temperature and humidity sensors in the context of this article. You can download them as an archive and locate their content in /usr/local/pnp4nagios/share/templates.dist/. At this point, configuring PNP4Nagios is done.

Installing a NagMap Plugin

To display our hosts on the map, we will use a plugin NagMap Reborn. It is simple to install and configure and has all the basic features. It requires Google Maps API Key for the operation.

First, download map files:

mkdir /var/www/nagmap
wget wget https://github.com/jocafamaka/nagmapReborn/archive/master.zip
unzip master.zip
cd nagmapReborn-master
cp * /var/www/nagmap

Then, copy a configuration file from the template:

cd /var/www/nagmap
cp config.php.exapmle config.php

And specify previously obtained Google Maps API Key in it:  

$nagMapR_key = '[Google Maps API Key]';

Configure  Apache: to do this, create a configuration file /etc/apache2/sites-enabled/nagmap.conf with the next contents:

<VirtualHost *:80>
    Alias /nagios/map "/var/www/nagmap"
    DocumentRoot /var/www/nagmap
    ErrorLog ${APACHE_LOG_DIR}/error.log
    CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>

And restart Nagios: 

sudo systemctl restart nagios.service

To locate a host on a map, there is a need to indicate its latitude and longitude in its configuration. The way of doing this is described below. A map is available via the address http://192.168.0.150/nagios/map, and looks like as follows:

NagMap Reborn

The inscription "For development purposes only" is caused by using a free API key. The change in host statuses is shown in the bottom part of the map.

Configuring Nagios for Working with a Monitoring Device for Server Rooms UniPing server solution v3

In Nagios, there are a lot of preinstalled plugins for monitoring standard system services, but they do not suit our needs. It will be necessary to get the values of five OID via SNMP, namely a current sensor status, current sensor temperature, bottom and top thresholds of a safe temperature range and a sensor memo to take the readings of one temperature sensor connected to a device for monitoring server rooms UniPing server solution v3. Therefore, we prepared a set of scripts plugins for Nagios to make its use more convenient. These scripts plugins are focused on working with our monitoring devices, particularly UniPing server solution v3, so they were downloaded and added to the system before.

To add a device for monitoring server rooms UniPing server solution v3 to Nagios, there is a need to create a configuration file, where the host and its services will be described (in our case, it is connected sensors). To do this, add the next line to the main configuration file Nagios /usr/local/nagios/nagios.cfg:

cfg_dir=/usr/local/nagios/etc/netping

Then, create a directory /usr/local/nagios/etc/netping and locate a file with a name that is clear for a human, for example UniPing_server_solution_v3.cfg, in it that looks like the next:

define host {
    host_name msk_UniPing_server_solution_v3
    max_check_attempts 10
    alias UniPing server solution v3
    address 192.168.0.100
    _community ping
    _port 161
    use generic-host,host-pnp
    check_command check-host-alive
    notes latlng: 55.754404,37.618481 # coordinates that are necessary for displaying a device on a map.
}

########## Check ping ########## Ping checking a host

define service {
    use local-service,srv-pnp
    host_name msk_UniPing_server_solution_v3
    service_description PING
    check_command check_ping!100.0,20%!500.0,60%
}

########## Get uptime ########## Gettng host uptime
define service {
    use generic-service
    host_name msk_UniPing_server_solution_v3
    service_description Uptime
    check_command np_uptime
}

########## Get device info ########## Getting a device model and a software version
define service {
    use generic-service
    host_name msk_UniPing_server_solution_v3
    service_description Version
    check_command np_description
}

########## Get temp ########## Getting readings of a temperature sensor №1, together with the readings, this plugin represents a top and bottom thresholds of a safe range and a sensor memo in an interface. It will also switch a service to a warning status if temperature left the thresholds.
define service {
    use generic-service,srv-pnp
    host_name msk_UniPing_server_solution_v3
    service_description Temp 1
    check_command np_temp!1
}

########## Get hum ########## Getting readings of a humidity (and temperature) sensor №1, together with readings, this plugin represents top and bottom thresholds and a sensor memo in the interface. Also, it switches a service into a warning status when readings leave their thresholds.
define service {
    use generic-service,srv-pnp
    host_name msk_UniPing_server_solution_v3
    service_description Humidity 1
    check_command np_relhum!1
}

########## Get IO lines status ########## Getting a status of dry contact sensors. In the plugin, the first argument is a number of an IO line, the second argument is a normal value, and the third value is the warning status. Additionally, the interface is going to have a memo line.
    define service {
    use generic-service,srv-pnp
    host_name msk_UniPing_server_solution_v3
    service_description IO 1
    check_command np_io!1!1!0
}

########## Get IO lines status ##########
    define service {
    use generic-service,srv-pnp
    host_name msk_UniPing_server_solution_v3
    service_description IO 1
    check_command np_io!2!1!0
}

########## Get IO lines status ##########
    define service {
    use generic-service,srv-pnp
    host_name msk_UniPing_server_solution_v3
    service_description IO 1
    check_command np_io!3!1!0
}

########## Get IO lines status ##########
    define service {
    use generic-service,srv-pnp
    host_name msk_UniPing_server_solution_v3
    service_description IO 1
    check_command np_io!4!1!0
}

Restart Nagios:  

systemctl restart nagios.service

In a web interface, Services, we see that all services are successfully added, monitoring is run, data are gathered.

Nagios UniPing server solution v3 monitoring

As soon as you can see, everything is not so good in our server room: there is an increased temperature on a temperature sensor, and a water sensor is in water. Besides, there are certain issues with a network as a response time is 335ms higher than normal. In addition, thanks to PNP4Nagios plugin, we can see graphs showing the change of readings of sensors. We can see these data, and we can see them after clicking the icons of graphs next to service names:

Host ping, the graph also shows readings for warning and a critical (preinstalled graph template) values:

Nagios graph for Ping UniPing server solution v3

Level of an IO line (a default graph template, the value "1" corresponds to a logic level "1" on an IO line in the mode "input", value "0" corresponds to a logic level "0"):

Nagios graph for an IO line Airflow UniPing server solution v3

Temperature from the first sensor (user's custom template):

Nagios temperature sensor graph for UniPing server solution v3

Readings of a humidity sensor with a built-in temperature sensor (user's custom template):

Nagios humidity sensor graph for UniPing server solution v3

Result

Nagios monitoring system with plugins included is an adequate monitoring system. It is flexible enough to fulfill the monitoring needs of even the most unusual services. A system is simple to install and configure, and its basic configuration can be deployed in a virtually half an hour. A disadvantage of the system is that it can discourage unprepared administrators because of the need to configure it by editing configuration files only. All in all, I feel free to recommend Nagios to the use in the production.


Tags: All devices
comments powered by Disqus