Learning Nagios 4
5/5
()
About this ebook
This book is great for system administrators interested in using Nagios to monitor their systems. It will also help professionals who have already worked with earlier versions of Nagios to understand the new features of Nagios 4 and provides usable solutions to reallife problems related to Nagios administration. To effectively use this book, system administration knowledge is required. If you want to create your own plugins, knowledge of scripting languages like Perl, shell and Python is expected.
Wojciech Kocjan
Wojciech Kocjan is a system administrator and a programmer with 10 years of experience. His expertise includes managing Linux, Sun, and IBM servers. He also has several years of experience in a variety of open source projects.
Read more from Wojciech Kocjan
Learning NAGIOS 3.0 Rating: 0 out of 5 stars0 ratingsTcl 8.5 Network Programming Rating: 0 out of 5 stars0 ratingsLearning Nagios - Third Edition Rating: 0 out of 5 stars0 ratings
Related to Learning Nagios 4
Related ebooks
Troubleshooting CentOS Rating: 0 out of 5 stars0 ratingsMastering Linux Shell Scripting Rating: 4 out of 5 stars4/5Linux Networking Cookbook Rating: 0 out of 5 stars0 ratingsAdvanced Splunk Rating: 5 out of 5 stars5/5Practical Linux Security Cookbook Rating: 0 out of 5 stars0 ratingsKali Linux – Assuring Security by Penetration Testing Rating: 3 out of 5 stars3/5Nginx Essentials Rating: 0 out of 5 stars0 ratingsMariaDB High Performance Rating: 0 out of 5 stars0 ratingsMastering Zabbix - Second Edition Rating: 0 out of 5 stars0 ratingsLearning Ansible 2 - Second Edition Rating: 5 out of 5 stars5/5SolarWinds Server & Application Monitor : Deployment and Administration Rating: 0 out of 5 stars0 ratingsCentOS High Performance Rating: 0 out of 5 stars0 ratingsSELinux System Administration - Second Edition Rating: 0 out of 5 stars0 ratingsMastering Windows Server 2016 Hyper-V Rating: 5 out of 5 stars5/5VMware For New Admins - Upgrade: VMware Admin Series, #3 Rating: 0 out of 5 stars0 ratingsLearn dbatools in a Month of Lunches: Automating SQL server tasks with PowerShell commands Rating: 0 out of 5 stars0 ratingsLPIC-1: Linux Professional Institute Certification Study Guide Rating: 4 out of 5 stars4/5Splunk A Complete Guide - 2021 Edition Rating: 4 out of 5 stars4/5Mastering Linux Network Administration Rating: 4 out of 5 stars4/5RHCSA Red Hat Enterprise Linux 8 (UPDATED): Training and Exam Preparation Guide (EX200), Second Edition Rating: 5 out of 5 stars5/5Red Hat Enterprise Linux Server Cookbook Rating: 2 out of 5 stars2/5CentOS System Administration Essentials Rating: 0 out of 5 stars0 ratingsRed Hat Enterprise Linux Troubleshooting Guide Rating: 4 out of 5 stars4/5Linux Services Deployment Rating: 0 out of 5 stars0 ratingsMastering Windows PowerShell Scripting Rating: 4 out of 5 stars4/5PowerShell 7 for IT Professionals Rating: 1 out of 5 stars1/5Windows Server A Complete Guide - 2019 Edition Rating: 0 out of 5 stars0 ratings
Computers For You
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5Algorithms to Live By: The Computer Science of Human Decisions Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5Storytelling with Data: Let's Practice! Rating: 4 out of 5 stars4/5Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5Elon Musk Rating: 4 out of 5 stars4/5Data Analytics for Beginners: Introduction to Data Analytics Rating: 4 out of 5 stars4/5The Alignment Problem: How Can Machines Learn Human Values? Rating: 4 out of 5 stars4/5The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 4 out of 5 stars4/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Learning the Chess Openings Rating: 5 out of 5 stars5/5Learn Algorithmic Trading: Build and deploy algorithmic trading systems and strategies using Python and advanced data analysis Rating: 0 out of 5 stars0 ratingsGet Into UX: A foolproof guide to getting your first user experience job Rating: 4 out of 5 stars4/5Artificial Intelligence: The Complete Beginner’s Guide to the Future of A.I. Rating: 4 out of 5 stars4/5People Skills for Analytical Thinkers Rating: 5 out of 5 stars5/5Good Code, Bad Code: Think like a software engineer Rating: 5 out of 5 stars5/5The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution Rating: 4 out of 5 stars4/5Blender 3D Basics Beginner's Guide Second Edition Rating: 5 out of 5 stars5/5Practical Data Analysis Rating: 4 out of 5 stars4/5Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning Rating: 5 out of 5 stars5/5Black Holes: The Key to Understanding the Universe Rating: 5 out of 5 stars5/5ChatGPT Rating: 3 out of 5 stars3/5UX/UI Design Playbook Rating: 4 out of 5 stars4/5
Reviews for Learning Nagios 4
1 rating0 reviews
Book preview
Learning Nagios 4 - Wojciech Kocjan
Table of Contents
Learning Nagios 4
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Introducing Nagios
Understanding the basics of Nagios
The benefits of monitoring resources
Main features
Soft and hard states
What's new in Nagios 4.0
Summary
2. Installing Nagios 4
Installation
Upgrading from previous versions
Installing prerequisites
Obtaining Nagios
Setting up users and groups
Compiling and installing Nagios
Compiling and installing Nagios plugins
Setting up Nagios as a system service
Resolving errors with script for Nagios system service
Configuring Nagios
Creating the main configuration file
Understanding macro definitions
Configuring hosts
Configuring host groups
Configuring services
Configuring service groups
Configuring commands
Configuring time periods
Configuring contacts
Configuring contact groups
Verifying the configuration
Understanding notifications
Templates and object inheritance
Summary
3. Using the Nagios Web Interface
Setting up the web interface
Configuring the web server
Creating an administrative user for Nagios
Accessing the web interface
Troubleshooting
Using the web interface
Checking the tactical overview
Viewing the status map
Managing hosts
Checking statuses
Viewing host information
Managing services
Checking statuses
Viewing service information
Managing downtime
Checking downtime statuses
Scheduling downtime
Managing comments
Nagios information
Viewing process information
Checking performance information
Generating reports
Changing the look of the Nagios web interface
Third-party Nagios web interfaces
Summary
4. Using the Nagios Plugins
Understanding how checks work
Monitoring using the standard network plugins
Testing the connection to a remote host
Testing the connectivity using TCP and UDP
Monitoring the e-mail servers
Checking the POP3 and IMAP servers
Testing the SMTP protocol
Monitoring network services
Checking the FTP server
Verifying the DHCP protocol
Monitoring the Nagios process
Testing the websites
Monitoring the database systems
Checking MySQL
Checking PostgreSQL
Checking Oracle
Checking other databases
Monitoring the storage space
Checking the swap space
Monitoring the disk status using SMART
Checking the disk space
Testing the free space for remote shares
Monitoring the resources
Checking the system load
Checking the processes
Monitoring the logged-in users
Monitoring other operations
Checking for updates with APT
Monitoring the UPS status
Gathering information from the lm-sensors
Using the dummy check plugin
Manipulating other plugins' output
Additional and third-party plugins
Monitoring the network software
Using third-party plugins
Summary
5. Advanced Configuration
Creating maintainable configurations
Configuring the file structure
Defining the dependencies
Creating the host dependencies
Creating the service dependencies
Using the templates
Creating the templates
Inheriting from multiple templates
Using the custom variables
Understanding flapping
Summary
6. Notifications and Events
Creating effective notifications
Using multiple notifications
Sending instant messages via Jabber
Notifying users with text messages
Integrating with HipChat
Understanding escalations
Setting up escalations
Understanding how escalations work
Sending commands to Nagios
Adding comments to hosts and services
Scheduling host and service checks
Modifying custom variables
Creating event handlers
Restarting services automatically
Modifying notifications
Using adaptive monitoring
Summary
7. Passive Checks and NSCA
Understanding passive checks
Configuring passive checks
Sending passive check results for hosts
Sending passive check results for services
Troubleshooting errors
Using NSCA
Downloading NSCA
Compiling NSCA
Configuring the NSCA server
Sending results over NSCA
Configuring NSCA for secure communication
Summary
8. Monitoring Remote Hosts
Monitoring over SSH
Configuring the SSH connection
Using the check_by_ssh plugin
Performing multiple checks
Troubleshooting the SSH-based checks
Monitoring using NRPE
Obtaining NRPE
Compiling NRPE
Configuring the NRPE daemon
Setting up NRPE as a system service
Configuring Nagios for NRPE
Using command arguments with NRPE
Troubleshooting NRPE
Comparing NRPE and SSH
Alternatives to SSH and NRPE
Summary
9. Monitoring using SNMP
Introducing SNMP
Understanding data objects
Working with SNMP and MIB
Using graphical tools
Setting up an SNMP agent
Using SNMP from Nagios
Receiving traps
Using additional plugins
Summary
10. Advanced Monitoring
Monitoring Windows hosts
Setting up NSClient++
Performing tests using check_nt
Performing checks with NRPE protocol
Performing passive checks using NSCA Protocol
Understanding distributed monitoring
Introducing obsessive notifications
Configuring Nagios instances
Performing freshness checking
Using templates for distributed monitoring
Creating the host and service objects
Customizing checks with custom variables
Summary
11. Programming Nagios
Introducing Nagios customizations
Programming in C with libnagios
Creating custom active checks
Testing the correctness of the MySQL database
Monitoring local time with a time server
Writing plugins correctly
Checking websites
Virtualization and clouds
Monitoring VMware
Monitoring Amazon Web Services
Writing commands to send notifications
Managing Nagios
Using passive checks
Summary
12. Using the Query Handler
Introducing the query handler
Communicating with the query handler
Using the query handler programmatically
Using the core service
Introducing Nagios Event Radio Dispatcher
Displaying real-time status updates
Displaying checks using Gource
Summary
Index
Learning Nagios 4
Learning Nagios 4
Copyright © 2014 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: October 2008
Second Edition: March 2014
Production Reference: 1140314
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK
ISBN 978-1-78328-864-9
www.packtpub.com
Cover Image by Francesco Langiulli (<[email protected]>)
Credits
Author
Wojciech Kocjan
Reviewers
Péter Károly Stone
Juhász
Emilien Kenler
Daniel Parraz
Pall Sigurdsson
Acquisition Editors
Anthony Albuquerque
Nikhil Chinnari
Content Development Editor
Chalini Victor
Technical Editors
Monica John
Akashdeep Kundu
Faisal Siddiqui
Copy Editors
Alisha Aranha
Roshni Banerjee
Brandt D'Mello
Deepa Nambiar
Project Coordinator
Kranti Berde
Proofreaders
Clyde Jenkins
Lucy Rowland
Indexer
Hemangini Bari
Graphics
Ronak Dhruv
Disha Haria
Yuvraj Mannari
Production Coordinator
Pooja Chiplunkar
Cover Work
Pooja Chiplunkar
About the Author
Wojciech Kocjan is a system administrator and programmer with 10 years of experience. His work experience includes several years of using Nagios for enterprise IT infrastructure monitoring. He also has experience in large variety of devices and servers, routers, Linux, Solaris, AIX servers and i5/OS mainframes. His programming experience includes multiple languages (such as Java, Ruby, Python, and Perl) and focuses on web applications as well as client-server solutions.
I'd like to thank my wife Joanna and my son Kacper for all of the help and support during the writing of this book.
About the Reviewers
Péter Károly Stone
Juhász was born in 1980 in Hungary, where he lives with his family and their cat. He holds an MSc degree in Programmer Mathematics. At the very beginning of his career, he turned toward operations. Since 2004, he has been working as a general—mainly GNU/Linux—system administrator.
His average working day includes patching in the server room, installing servers, managing PBX, maintaining VMware vSphere infrastructure and servers at Amazon AWS, managing storage and backups, monitoring with Nagios, trying out new technology, and writing scripts to ease everyday work.
His interests in IT are Linux, server administration, virtualization, artificial intelligence, network security, and distributed systems. His hobbies include learning Chinese, program developing, reading, hiking, playing the game Go, listening to music and unicycling. For his contact information or to find out more about him, you can visit his website at http://midway.hu.
Emilien Kenler, after working on small web projects, began to focus on Game Development in 2008, when he was in high school. Until 2011, he worked for different groups and has specialized in system administration. In 2011, he founded a company, HostYourCreeper (http://www.hostyourcreeper.com) to sell Minecraft servers, while he was studying Computer Science Engineering. He created a lightweight IaaS based on new technologies such as Node.js and RabbitMQ.
Thereafter, he worked at TaDaweb as a system administrator, building its infrastructure and creating tools to manage deployments and monitoring. In 2014, he began a new adventure at Wizcorp, Tokyo. He will graduate at the end of the year from the University of Technology of Compiègne.
Daniel Parraz was raised in New Mexico and began using computer-type devices at an early age. After graduating from school, he found a technical support job and started to learn Linux. He has been administrating Linux/Unix systems since 2001 and has worked on large storage engineering and installations with Fortune 500 companies and start-ups. He currently lives in Albuquerque, New Mexico, with his family, and enjoys hiking, reading, and growing fruits and vegetables as a volunteer with an agriculture group supported by a local community.
Pall Sigurdsson is a lifelong open source geek with special interest in automation and monitoring. He is known for his work in developing Adagios, a modern web status, and a configuration interface to monitor systems that are compatible with Nagios.
Pall also maintains other projects such as Pynag (a high-level python API for Nagios configuration files) and okconfig (a set of preconfigured Nagios plugins and configuration templates).
www.PacktPub.com
Support files, eBooks, discount offers and more
You might want to visit www.PacktPub.com for support files and downloads related to your book.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
http://PacktLib.PacktPub.com
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.
Why Subscribe?
Fully searchable across every book published by Packt
Copy and paste, print and bookmark content
On demand and accessible via web browser
Free Access for Packt account holders
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.
Preface
The book is a practical guide to setting up Nagios 4, an open source network monitoring tool. It is a system that checks whether hosts and services are working properly and notifies users when problems occur. The book covers the installation and configuring of Nagios 4 on various operating systems, and it focuses on the Ubuntu Linux operating system.
The book takes the reader through all the steps of compiling Nagios from sources, installing, and configuring advanced features such as setting up redundant monitoring. It also mentions how to monitor various services such as e-mail, WWW, databases, and file sharing. The book describes what SNMP is and how it can be used to monitor various devices. It also gives the details of monitoring the Microsoft Windows computers. The book contains troubleshooting sections that aid the reader in case any problems arise while setting up the Nagios functionalities.
No previous experience with network monitoring is required, although it is assumed that the reader has a basic understanding of the Unix systems. It also mentions examples to extend Nagios in several languages such as Perl, Python, Tcl, and Java so that readers who are familiar with at least one of these technologies can benefit from extending Nagios. When you finish this book, you'll be able to set up Nagios to monitor your network and will have a good understanding of what can be monitored.
What this book covers
Chapter 1, Introducing Nagios, talks about Nagios and system monitoring in general. It shows the benefits of using system monitoring software and the advantages of Nagios in particular. It also introduces the basic concepts of Nagios.
Chapter 2, Installing Nagios 4, covers the installation of Nagios both when compiling from source code or using the prebuilt packages. Details on how to configure users, hosts, and services as well as information on how Nagios sends notifications to users are given in this chapter.
Chapter 3, Using the Nagios Web Interface, talks about how to set up and use the Nagios web interface. It describes the basic views for hosts and services and gives detailed information on each individual item. It also introduces some features such as adding comments, scheduled downtimes, viewing detailed information, and generating reports.
Chapter 4, Using the Nagios Plugins, goes through the standard set of Nagios plugins that allows you to perform checks of various services. It shows how you can check for standard services such as e-mail, Web, file, and database servers. It also describes how to monitor resources such as CPU usage, storage, and memory usage.
Chapter 5, Advanced Configuration, focuses on the efficient management of large configurations and the use of templates. It shows how dependencies between hosts and services can be defined and discusses custom variables and adaptive monitoring. It also introduces the concept of flapping and how it detects services that start and stop frequently.
Chapter 6, Notifications and Events, describes the notification system in more details. It focuses on effective ways of communicating problems to the users and how to set up problem escalations. It also describes how events work in Nagios and how they can be used to perform automatic recovery of services.
Chapter 7, Passive Checks and NSCA, focuses on cases where external processes send results to Nagios. It introduces the concept of passive check, which is not scheduled and run by Nagios, and gives practical examples of when and how it can be used. It also shows how to use Nagios Service Check Acceptor (NSCA) to send notifications.
Chapter 8, Monitoring Remote Hosts, covers how Nagios checks can be run on remote machines. It walks through details of deploying checks remotely over SSH using public key authentication. It also shows how Nagios Remote Plugin Executor (NRPE) can be used for deploying plugins remotely.
Chapter 9, Monitoring using SNMP, describes how the Simple Network Management Protocol (SNMP) can be used from Nagios. It provides an overview of SNMP and its versions. It explains the reading of SNMP values from the SNMP-aware devices and covers how that can then be used to perform checks from Nagios.
Chapter 10, Advanced Monitoring, focuses on how Nagios can be set up on multiple hosts and how that information could be gathered on a central server. It also covers how to monitor computers that run the Microsoft Windows operating system.
Chapter 11, Programming Nagios, shows how to extend Nagios. It explains how to write custom check commands, how to create custom ways of notifying users, and how passive checks and NSCA can be used to integrate your solutions with Nagios. The chapter covers many programming languages to show how Nagios can be integrated with them.
Chapter 12, Using the Query Handler, focuses on the use of the Nagios query handler to send commands to Nagios as well as receive results and notifications from these commands. It shows how the query handler can be used from multiple programming languages and how it can be used to build an application to display Nagios updates in real time.
What you need for this book
This book requires a Linux server. As all of the examples are created using Ubuntu Linux, it is recommended that you use this distribution. The book goes through the process of setting up Nagios, so installing it is not a prerequisite of this book.
The Nagios web interface requires a web server. Chapter 3, Using the Nagios Web Interface, provides a step-by-step instruction on how to set up an Apache web server and configure it so that it be used with Nagios.
Who this book is for
The target readers of this book are System Administrators who are interested in using Nagios. This book will introduce Nagios along with the new features of Version 4.
Conventions
In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.
Code words in text, object names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: This service group consists of the mysql and pgsql services on the linuxbox01 host.
A block of code is set as follows:
define service{
host_name linuxbox01
service_description mysql
check_command check_ssh
servicegroups databaseservices
}
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
define service{
host_name linuxbox01
service_description mysql
check_command check_ssh
servicegroups databaseservices
}
Any command-line input or output is written as follows:
# cp /usr/src/asterisk-addons/configs/cdr_mysql.conf.sample
/etc/asterisk/cdr_mysql.conf
New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: You should start by downloading the source tarball of the latest Nagios 4.x branch. It is available under the Get Nagios Core section.
Note
Warnings or important notes appear in a box like this.
Tip
Tips and tricks appear like this.
Reader feedback
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.
To send us general feedback, simply send an e-mail to <[email protected]>, and mention the book title via the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.
Customer support
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Errata
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.
Piracy
Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated material.
We appreciate your help in protecting our authors, and our ability to bring you valuable content.
Questions
You can contact us at <[email protected]> if you are having a problem with any aspect of the book, and we will do our best to address it.
Chapter 1. Introducing Nagios
Imagine you're working as an administrator of a large IT infrastructure. You just started receiving e-mails that a web application just stopped working. When you try to access the same page, it just doesn't load. What are the possibilities? Is it the router? Is it the firewall? Perhaps the machine hosting the page is down? Before you even start thinking rationally on what to do, your boss calls about the critical situation and demands explanations. In all this panic, you'll probably start plugging everything in and out of the network, rebooting the machine…and that doesn't help.
After hours of nervous digging into the issue, you've finally found the solution: the web server was working properly, but it would time out communication with the database server. This was because the machine with the DB did not receive the correct IP as yet another box ran out of memory and killed the DHCP server on it. Imagine how much time it would take to find all that manually? It would be a nightmare if the database server was in another branch of the company or in a different time zone and perhaps guys over there were still sleeping.
But what if you had Nagios up and running across your entire company? You would just go to the web interface and see that there are no problems with the web server and the machine on which it is running. There would also be a list of issues—the machine serving IP addresses to the entire company does not do its job and the database is down. If the setup also monitored the DHCP server itself, you'd get a warning e-mail that little swap memory is available on it or too many processes are running. Maybe it would even have an event handler for such cases to just kill or restart noncritical processes. Also, Nagios will try to restart the dhcpd process over the network in case it is down.
In the worst case, Nagios would speed up hours of investigation to 10 minutes. In the best case, you would just get an e-mail that there was such a problem and another e-mail that it's already fixed. You would just disable a few services and increase the swap size for the DHCP machine and solve the problem once and for all. Nobody would even notice that there was such a problem.
Understanding the basics of Nagios
Nagios is a tool for system monitoring. It means that Nagios watches computers or devices on your network and ensures that they are working as they should. Nagios constantly checks if other machines are working properly. It also verifies that various services on those machines are working fine. In addition, Nagios accepts other processes or machines reporting their status, for example, a web server can directly report if it is not overloaded to Nagios.
The main purpose of system monitoring is to detect as soon as possible any system that is not working properly so that users of that system will not report the issue to you first.
System monitoring in Nagios is split into two categories of objects: hosts and services. Hosts represent a physical or virtual device on your network (servers, routers, workstations, printers, and so on). Services are particular functionalities, for example, a Secure Shell (SSH) server (sshd process on the machine) can be defined as a service to be monitored. Each service is associated with a host on which it is running. In addition, machines can be grouped into host groups.
A major benefit of Nagios' performance checks is that it only uses four distinct states—Ok, Warning, Critical, and Unknown. It is also based on plugins—this means if you want to check something that's not yet possible to do, you just need to write a simple piece of code, and that's it!
The approach to only offer three states allows administrators to ignore monitoring values themselves and just decide on what the warning/critical limits are. This is a proven concept, and is far more efficient than monitoring graphs and analyzing trends. For example, system administrators tend to ignore things such as gradually declining storage space. People often simply ignore the process until a critical process runs out of disk space. Having a strict limit to watch is much better, because you always catch a problem regardless of whether it turns from warning to critical in 15 minutes or in a week. This is exactly what Nagios does. Each check performed by Nagios is turned from numeric values (such as the amount of disk space or CPU usage) to one of the three possible states.
Another benefit is a report stating that X services are up and running, Y are in warning state, and Z are currently critical, which is much more readable than a matrix of values. It saves you the time of analyzing what's working and what's failing. It can also help prioritize what needs to be handled first, and which problems can be handled later.
Nagios performs all of its checks using plugins. These are external components for which Nagios passes information on what should be checked and what the warning and critical limits are. Plugins are responsible for performing the checks and analyzing results. The output from such a check is the status (working, questionable, or failure) and additional text describing information on the service in details. This text is mainly intended for system administrators to be able to read the detailed status of a service.
Nagios comes with a set of standard plugins that allow performance checks for almost all services your company might offer. See Chapter 4, Using the Nagios Plugins, for detailed information on plugins that are developed along with Nagios. Moreover, if you need to perform a specific check (for example, connect to a Web service and invoke methods), it is very easy to write your own plugins. And that's not all—they can be written in any language and it takes less than 15 minutes to write a complete check command! Chapter 11, Programming Nagios, talks about that ability in more detail.
The benefits of monitoring resources
There are many reasons for you to ensure that all your resources are working as expected. If you're still not convinced after reading the introduction to this chapter, here are a few important points why it is important to monitor your infrastructure.
The main reason is quality improvement. If your IT staff can notice failures quicker by using a monitoring tool, they will also be able to respond to them much faster. Sometimes it takes hours or days to get the first report of a failure even if many users bump into errors. Nagios ensures that if something is not working, you'll know about it. In some cases, event handling can even be done so that Nagios can switch to the backup solution until the primary process is fixed. A typical case would be to start a dial-up connection and use it as a primary connection in cases when the company VPN is down.
Another reason is much better problem determination. Very often what the users report as a failure is far from the root cause of the problem, such as an email system is down due to the LDAP service not working correctly. If you define dependencies between hosts correctly, then Nagios will point out that the POP3 e-mail server is assumed to be not working
because the LDAP service that it depends upon has a problem. Nagios will start checking the e-mail server as soon as the problem with LDAP has been resolved.
Nagios is also very flexible when it comes down to notifying people of what isn't functioning correctly. In most cases, your company has a large IT team or multiple teams. Usually, you want some people to handle servers, others to handle network switches/routers/modems. There might also be a team responsible for network printers or a division is made based on geographical locations. You can instruct Nagios on who is responsible for particular machines or groups of machines, so that when something is wrong, the right people will get to know of it. You can also use Nagios' web interface to manage who is working on what issue.
Monitoring resources not only is useful for finding problems, but also saves you from having them—Nagios handles warnings and critical situations differently. This means that it's possible to be aware of situations that may become problems really soon. For example, if your disk storage on an e-mail server is running out, it's better to be aware of this situation before it becomes a critical issue.
Monitoring can also be set up on multiple machines across various locations. These machines will then communicate all their results to a central Nagios server so that information on all hosts and services in your system can be accessed from a single machine. This gives you a more accurate picture of your IT infrastructure, as well as allows testing more complex systems such as firewalls. For example, it is vital that a testing environment is accessible from a production environment, but not the other way around.
It is also possible to set up a Nagios server outside the company's intranet (for example, over a dedicated DSL) to make sure that traffic from the Internet is properly blocked. It can be used to check if only certain services are available, for example, verify that only SSH and Hypertext Transfer Protocol (HTTP) are accessible from external IP addresses, and that services such as databases are inaccessible to users.
Main features
Nagios' main strength is flexibility—it can be configured to monitor your IT infrastructure in the way you want it. It also has a mechanism to react automatically to problems and has a powerful notification system. All of this is based on a clear object definition system, which in turn is based on a few types of objects, shown as follows:
Commands: These are definitions of how Nagios should perform particular types of checks. They are an abstraction layer on top of actual plugins that allow you to group similar types of operations.
Time periods: These are date and time spans at which an operation should or should not be performed. For example, Monday–Friday, 09:00–17:00.
Hosts and host groups: These are devices along with the possibility to group hosts. A single host might be a member of more than one group.
Services: These are various functionalities or resources to monitor on a specific host. For example, CPU usage, storage space, or Web server.
Contacts and contact groups: These are people that should be notified with information on how and when they should be contacted; contacts can be grouped, and a single contact might be a member of more than one group.
Notifications: These define who should be notified of what, for example, all errors for the linux-servers host group should go to the linux-admins contact group during working hours and to the critsit-team contact group outside of working hours. Notifications are not strictly an object, but a combination of all the preceding objects and are an essential part of Nagios.
Escalations: These are an extension to notifications; they define that after an object is in same state for specific period of time, other people should get notified of certain events—for example, a critical server being down for more than 4 hours should alert IT management so that they track the issue.
A beneficial feature of using Nagios is that it is a mature dependency system. For any administrator, it is obvious that if your router is down, then all machines accessed via it will fail. Some systems don't take that into account, and in such cases, you get a list of several failing machines and services. Nagios allows you to define dependencies between hosts to reflect actual network topology. For example, if a router that connects you to the rest of your network is down, Nagios will not perform checks for the subsequent parts and machines that are dependent on the router. This is illustrated in the following figure:
You can also define that a particular service depends on another service, either on the same host or a different host. In case one of the dependent services is down, a check for a service is not even performed.
For example, in order for your company's intranet application to function properly, both an underlying Web server and database server must be running properly. So, if a database service is not working properly, Nagios will not perform checks and/or not send notifications that your application is not working, because the root cause of the problem is that the database is not working properly. The database server might be on the same host or a different host. If the database is not working properly, if the dependent machine is down or not accessible, all services dependent on the database service will not be checked as well.
Nagios offers a consistent system of macro definitions. These are variables that can be put into all object definitions and depend on the context. They can be put inside commands, and depending on the host, service, and many other parameters, macro definitions are substituted accordingly. For example, a command definition might use an IP address of the host it is currently checking in all remote tests. It also makes it possible to put information such as the previous and current status of a service in a notification e-mail. Nagios 3 also offers various extensions to macro definitions, which make it an even more powerful