Seattle Area System Administrators Guild: Scaling Nagios To Monitor Large Heterogeneous Environments
Seattle Area System Administrators Guild: Scaling Nagios To Monitor Large Heterogeneous Environments
Seattle Area System Administrators Guild: Scaling Nagios To Monitor Large Heterogeneous Environments
Administrators Guild
Scaling Nagios to monitor large heterogeneous environments
Dave Blunt
February 21, 2008
What is Nagios?
an Open Source host, service and network monitoring program. Started as Netsaint in 1999 and became Nagios in 2002. www.nagios.org Availability and performance monitoring is it up, is it down? How much load/memory/disk is in use?
What is Nagios?
CGIs
Nagios child PID Nagios child PID Nagios child PID Nagios child PID Nagios child PID
Notification
Event Handler
Facilitates monitoring across multiple Windows domains, SNMP communities, and other security zones.
CGI
Configuration
Placing the database on a separate server will greatly improve performance and both examples support it.
Nagios parent PID CGIs
Configuration
Nagios child PID Nagios child PID Nagios child PID Nagios child PID Nagios child PID
Notification
Event Handler
Huge increase in host check capacity (8,000+ checks a minute) if pings are parallelized. Downside of passive host updates is the possibility of some extra service alarms.
Nagios parent PID CGIs Configuration Status And Events
Fping Feeder
Nagios child PID Nagios child PID Nagios child PID Nagios child PID Nagios child PID
Notification
Event Handler
Host A
Host B
Host n
DNX (dnx.sourceforge.net)
Specifically tied to distributed monitoring.
*With dual 3GHz Xeon, 4GB RAM, 10k RPM disk, RHEL4 ES 32-bit OS. **Based on a Service being checked once every 10 minutes, and 1% of Services and Hosts being in transition between OK and non-OK states. Retry interval for non-OK states is 1 minute.
10
Different tier one monitoring tools, e.g. syslog, SNMP traps, Ganglia, Cacti
Feed data from these up to your primary Nagios server by installing the right agent on that server, process results, and then submit to Nagios. Syslog-ng (www.balabit.com/network-security/syslog-ng/) Snmptt (www.snmptt.org) Ganglia (ganglia.sourceforge.net) Cacti (www.cacti.net)
11
12
13
Alternative approach
DNX? beta, but significant maintenance advantages
14
*With dual 3GHz Xeon, 4GB RAM, 10k RPM disk, RHEL4 ES 32-bit OS. **Based on a Service being checked once every 10 minutes, and 1% of Services and Hosts being in transition between OK and non-OK states. Retry interval for non-OK states is 1 minute. ***Using Bronx Event Broker and assumptions listed for note (**)
2008 GroundWork Open Source, Inc.
15
Heterogeneous environments
Mix of Operating Systems, Network security zones, Applications, and Administrators! Approaches to the problem:
Same agent type on every system
Consistent Limited coverage
Mix of methods
Flexible More difficult to maintain Must normalize data
16
Methods
UNIX
SNMP / SNMP traps SSH with plugins (www.nagios.org/downloads) NRPE with plugins (www.nagios.org/downloads) Cron with plugins Port-based checks Syslog (aka traps) SNMP / SNMP traps NRPE_NT with plugins (www.nagiosexchange.org/Windows_NRPE.66.0.html?&tx_netnagext_pi1[p_view]=235) WMI (with proxy) NT_Scheduler with plugins Port-based checks Event logs (aka traps) (www.intersectalliance.com/projects/SnareWindows/ or www.steveshipway.org/software/f_nagios.html) SNMP / SNMP traps Syslog Port-based checks SNMP / SNMP traps Syslog Port-based checks
Windows (http://www.crn.com/software/206801053)
Network
Special devices
17
GroundWork Open Source, Inc. 139 Townsend Street, Suite 100 San Francisco, CA 94107 phone: (415) 992-4500
www.groundworkopensource.com
18