一、Nagios配置文件 /usr/local/nagios/etc/nagios.cfg
使用以下命令可以检查配置文件是否有问题,会告诉你有哪些“Warnings”和“Errors”,前者不影响Nagios启动,后者会中断检查并把第一个错误信息打印给你,直到你改正错误,才能够启动Nagios。
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios已经有一些模板,可以把注释的“#”取消以启用,也可以自定义存放配置文件,或者指定使用某个目录存放配置文件。
下面是可能用到的配置文件列表:
/usr/local/nagios/etc/cgi.cfg 控制CGI的配置文件
/usr/local/nagios/etc/nagios.cfg Nagios 主配置文件
/usr/local/nagios/etc/resource.cfg 用于定义变量,其他配置文件会引用这个文件里的内容,如$USER1$
/usr/local/nagios/etc/objects 这个目录下有一些配置文件模板
/commands.cfg 命令定义配置文件
/contacts.cfg 联系人和联系人组配置文件
/localhost.cfg 监控本地主机配置文件
/printer.cfg 监控打印机的一个配置文件模板
/switch.cfg 监控路由器的一个配置文件模板
/templates.cfg 主机和服务的一个模板配置文件
/timeperiods.cfg Nagios 监控时间段的配置文件
/windows.cfg Windows 主机的一个配置文件模板
要使用的话,在nagios.cfg文件里把注释去掉,或者自己定义,比如我就把配置文件都放到了monitor目录下:
# Definitions for monitoring a router/switch
#cfg_file=/usr/local/nagios/monitor/switch.cfg
#指定使用monitor目录下的commands.cfg
cfg_file=/usr/local/nagios/monitor/commands.cfg
或者指定使用某个目录下所有配置文件
cfg_dir=/usr/local/nagios/monitor
配置文件之间的逻辑关系:
1、要监控哪些主机,主机组,服务和服务组;
2、用什么命令来实现监控;
3、监控时间段;
4、报警信息发送给哪些联系人和组;
根据逻辑关系,我们要定义这些配置文件
- 创建一个文件来定义主机和主机组,可以使用hosts.cfg,在我的案例里我使用”cfl.cfg”来定义我的主机和组,使用”radios.cfg”来定义一组微波设备
- 创建services.cfg文件来定义服务
- 使用默认的contacts.cfg文件来定义联系人和联系人组
- 使用默认的commands.cfg文件来定义命令
- 使用默认的timeperiods.cfg来定义监控时间段
- 使用默认的templates.cfg文件作为资源引用文件
二、配置案例
以增加两台路由器及链路为例,这里以Nagios自带的ping为例,增加uptime的检查。
(1)定义联系人组和联系人:我在contacts.cfg里定义的,也可以直接抄模板。这里关联的commands下面在命令里会写。
define contact {
name cfl-contact
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r,f,s
host_notification_options d,u,r,f,s
service_notification_commands cfl-notify-service-by-email
host_notification_commands cfl-notify-host-by-email
register 0
}
define contact {
contact_name axing
use cfl-contact
alias Anthony Xing
email anthony.xing@gmail.com
host_notification_options d,u,r,s
service_notification_options w,u,c,r,s
}
(2)定义命令:在commands.cfg里自定义告警发送邮件:
define command{
command_name cfl-notify-host-by-email
command_line /usr/bin/printf "%b" "Subject: $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$\n\n***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $HOSTSTATE$\nDuration: $HOSTDURATION$\n\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n\nhttps://nagios.communityfibre.co.uk/nagios/cgi-bin/extinfo.cgi?type=1&host=$HOSTNAME$" | /usr/sbin/sendmail -vt $CONTACTEMAIL$
}
define command{
command_name cfl-notify-service-by-email
command_line /usr/bin/printf "%b" "Subject: $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$\n\n***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n\nhttps://nagios.communityfibre.co.uk/nagios/cgi-bin/extinfo.cgi?type=2&host=$HOSTNAME$&service=$SERVICEDESC$" | /usr/sbin/sendmail -vt $CONTACTEMAIL$
}
(4)定义监控组:在cfl.cfg里还添加routers组:
define hostgroup {
hostgroup_name routers ; The name of the hostgroup
alias NE05E-SQ and NE05E-SE ; Long name of the group
}
(5)定义service:可以写在services.cfg文件里,也可以直接写在cfl.cfg文件里。这里要注意的是,自定义监控内容要参考MIB库找到OID,这里监控的华为NE05E路由器的uptime的OID是:1.3.6.1.6.3.10.2.1.3.0
define service {
use generic-service ; Inherit values from a template
hostgroup_name routers ; The name of the host the service is associated with
service_description PING ; The service description
check_command check_ping!200.0,20%!600.0,60% ; The command used to monitor the service
check_interval 5 ; Check the service every 5 minutes under normal conditions
retry_interval 1 ; Re-check the service every minute until its final/hard state is determined
}
# Monitor uptime via SNMP
define service {
use generic-service ; Inherit values from a template
hostgroup_name routers
service_description Uptime
check_command check_snmp!-C kjsd934js -o 1.3.6.1.6.3.10.2.1.3.0
}
其中 "generic-service" 在template.cfg里定义了,"generic" 系列还有contact, host, 等等,直接拿来use就好了。
(6)添加要监控的主机:在cfl.cfg文件里增加:
#
# Clem Attlee
#
define host {
use generic-switch ; Inherit default values from a template
host_name ar01.cle.lon ; The name we're giving to this switch
alias Clem Attlee NE05E-SQ ; A longer name associated with the switch
address 191.209.86.143 ; IP address of the switch
hostgroups routers ; Host groups this switch is associated with
parents ar01.whi.lon ; Host groups this switch is associated with
icon_image router41.jpg
# statusmap_image router.gd2
}
#
# West Kensington
#
define host {
use generic-switch ; Inherit default values from a template
host_name ar01.wsk.lon ; The name we're giving to this switch
alias West Kensington NE05E-SQ ; A longer name associated with the switch
address 191.209.86.162 ; IP address of the switch
hostgroups routers ; Host groups this switch is associated with
parents ar01.Cle.lon ; Host groups this switch is associated with
icon_image router41.jpg
# statusmap_image router.gd2
}
(7)上传到/usr/local/nagios/monitor目录下(还记得在nagios.cfg文件里定义的吗?)
运行 /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios Core 4.4.1
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
...
Running pre-flight check on configuration data...
Checking objects...
...
Checked 299 services.
Checked 176 hosts.
Checked 12 host groups.
Checked 0 service groups.
Checked 4 contacts.
...
Total Warnings: 6
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
#
有些Warnings但是不影响,现在重启Nagios
root@nagios:~# /etc/init.d/nagios-nrpe-server restart
[ ok ] Restarting nagios-nrpe-server (via systemctl): nagios-nrpe-server.service.
或者使用
root@nagios:~# service nagios restart
下面一条命令如果正常的话没有提示
好了,使用简单写到这,下一篇添加对微波的监控。