运维监控系统之Open-Falcon
一、Open-Falcon介绍
open-falcon是一款用golang和python写的监控系统,由小米启动这个项目。
1、监控系统,可以从运营级别(基本配置即可),以及应用级别(二次开发,通过端口进行日志上报),对服务器、操作系统、中间件、应用进行全面的监控,及报警,对我们的系统正常运行的作用非常重要。
2、基础监控
CPU、Load、内存、磁盘、IO、网络相关、内核参数、ss 统计输出、端口采集、核心服务的进程存活信息采集、关键业务进程资源消耗、NTP offset采集、DNS解析采集,这些指标,都是open-falcon的agent组件直接支持的。
Linux运维基础采集项:http://book.open-falcon.org/zh/faq/linux-metrics.html
对于这些基础监控选项全部理解透彻的时刻,也就是对Linux运行原理及命令进阶的时刻。
3、第三方监控
术业有专攻,运行在OS上的应用甚多,Open-Falcon的开发团队不可能把所有的第三方应用的监控全部做完,这个就需要开源社区提供更多的插件,当前对于很多常用的第三方应用都有相关插件了。
4、JVM监控
对于Java作为主要开发语言的大多数公司,对于JVM的监控不可或缺。
每个JVM应用的参数,比如GC、类加载、JVM内存、进程、线程,都可以上报给Falcon,而这些参数的获得,都可以通过MxBeans实现。
使用 Java 平台管理 bean:http://www.ibm.com/developerworks/cn/java/j-mxbeans/
5、业务应用监控
对于业务需要监控的接口,比如响应时间等。可以根据业务的需要,上报相关数据到Falcon,并通过Falcon查看结果。
中文文档:https://book.open-falcon.org/zh_0_2/
中英文档:https://book.open-falcon.org
软件下载:https://github.com/open-falcon/falcon-plus/releases
二、Open-Falcon编写的整个脑洞历程
三、环境准备
1.系统环境
[root@open-falcon-server ~]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
2.系统优化
#安装下载软件
yum install wget -y
#更换aliyun源
mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
#下载epel源
yum install epel-release.noarch -y
rpm -Uvh http://mirrors.aliyun.com/epel/epel-release-latest-7.noarch.rpm
yum clean all
yum makecache
#下载常用软件
yum install git telnet net-tools tree nmap sysstat lrzsz dos2unix tcpdump ntpdate -y
#配置时间同步
ntpdate cn.pool.ntp.org
#更改主机名
hostnamectl set-hostname open-falcon-server
hostname open-falcon-server
#开启缓存
sed -i 's#keepcache=0#keepcache=1#g' /etc/yum.conf
grep keepcache /etc/yum.conf
#关闭selinux
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
setenforce 0
#关闭防火墙
systemctl stop firewalld.service
systemctl disable firewalld.service
3.软件环境准备
(1)redis准备
#安装 redis
yum install redis -y
#redis常用命令
redis-server redis 服务端
redis-cli redis 命令行客户端
redis-benchmark redis 性能测试工具
redis-check-aof AOF文件修复工具
redis-check-dump RDB文件修复工具
redis-sentinel Sentinel 服务端
#启动redis
[root@open-falcon-server ~]# redis-server &
[1] 1662
[root@open-falcon-server ~]# 1662:C 27 Jul 14:44:56.463 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
1662:M 27 Jul 14:44:56.464 * Increased maximum number of open files to 10032 (it was originally set to 1024).
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 3.2.10 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in standalone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 6379
| `-._ `._ / _.-' | PID: 1662
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
1662:M 27 Jul 14:44:56.464 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1662:M 27 Jul 14:44:56.464 # Server started, Redis version 3.2.10
1662:M 27 Jul 14:44:56.464 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1662:M 27 Jul 14:44:56.464 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1662:M 27 Jul 14:44:56.464 * The server is now ready to accept connections on port 6379
(2)mysql准备
#安装mysql
yum install mariadb mariadb-server -y
#启动mysql
systemctl start mariadb
systemctl enable mariadb
#登录数据库测试
[root@open-falcon-server ~]# mysql -uroot -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 4
Server version: 5.5.56-MariaDB MariaDB Server
Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> exit
Bye
#检查服务
[root@open-falcon-server ~]# netstat -lntp|egrep "3306|6379"
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 1978/mysqld
tcp 0 0 0.0.0.0:6379 0.0.0.0:* LISTEN 1662/redis-server *
tcp6 0 0 :::6379 :::* LISTEN 1662/redis-server *
#初始化MySQL表结构
cd /tmp/ && git clone https://github.com/open-falcon/falcon-plus.git
cd /tmp/falcon-plus/scripts/mysql/db_schema/
mysql -h 127.0.0.1 -u root -p < 1_uic-db-schema.sql
mysql -h 127.0.0.1 -u root -p < 2_portal-db-schema.sql
mysql -h 127.0.0.1 -u root -p < 3_dashboard-db-schema.sql
mysql -h 127.0.0.1 -u root -p < 4_graph-db-schema.sql
mysql -h 127.0.0.1 -u root -p < 5_alarms-db-schema.sql
rm -rf /tmp/falcon-plus/
#设置数据库密码
mysqladmin -uroot password "123456"
#检查导入的数据库
[root@open-falcon-server ~]# mysql -uroot -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 11
Server version: 5.5.56-MariaDB MariaDB Server
Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| alarms |
| dashboard |
| falcon_portal |
| graph |
| mysql |
| performance_schema |
| test |
| uic |
+--------------------+
9 rows in set (0.00 sec)
MariaDB [(none)]> exit
Bye
(3)Go安装
#安装go语言开发包
yum install golang -y
#检查版本
[root@open-falcon-server ~]# go version
go version go1.9.4 linux/amd64
#查看Go安装路径
[root@open-falcon-server ~]# find / -name go
/etc/alternatives/go
/var/lib/alternatives/go
/usr/bin/go
/usr/lib/golang/src/cmd/go #需要这个路径
/usr/lib/golang/src/go
/usr/lib/golang/bin/go
/usr/lib/golang/pkg/linux_amd64/cmd/go
/usr/lib/golang/pkg/linux_amd64/go
四、Open-Falcon后端
#创建工作目录
export FALCON_HOME=/home/work
export WORKSPACE=$FALCON_HOME/open-falcon
mkdir -p $WORKSPACE
#下载解压二进制包
wget https://github.com/open-falcon/falcon-plus/releases/download/v0.2.1/open-falcon-v0.2.1.tar.gz
tar xf open-falcon-v0.2.1.tar.gz -C $WORKSPACE
#查看解压结果
[root@open-falcon-server ~]# cd $WORKSPACE
[root@open-falcon-server open-falcon]# ll
总用量 3896
drwxrwxr-x 7 501 501 67 8月 15 2017 agent
drwxrwxr-x 5 501 501 40 8月 15 2017 aggregator
drwxrwxr-x 5 501 501 40 8月 15 2017 alarm
drwxrwxr-x 6 501 501 51 8月 15 2017 api
drwxrwxr-x 5 501 501 40 8月 15 2017 gateway
drwxrwxr-x 6 501 501 51 8月 15 2017 graph
drwxrwxr-x 5 501 501 40 8月 15 2017 hbs
drwxrwxr-x 5 501 501 40 8月 15 2017 judge
drwxrwxr-x 5 501 501 40 8月 15 2017 nodata
-rwxrwxr-x 1 501 501 3987469 8月 15 2017 open-falcon
lrwxrwxrwx 1 501 501 16 8月 15 2017 plugins -> ./agent/plugins/
lrwxrwxrwx 1 501 501 15 8月 15 2017 public -> ./agent/public/
drwxrwxr-x 5 501 501 40 8月 15 2017 transfer
模块 | 文件所在路径 |
---|---|
aggregator | /home/work/aggregator/config/cfg.json |
graph | /home/work/graph/config/cfg.json |
hbs | /home/work/hbs/config/cfg.json |
nodata | /home/work/nodata/config/cfg.json |
api | /home/work/api/config/cfg.json |
alarm | /home/work/alarm/config/cfg.json |
#修改配置文件
sed -i 's#root:@tcp(127.0.0.1:3306)#root:123456@tcp(127.0.0.1:3306)#g' `find ./ -type f -name "cfg.json"|egrep "alarm|api|nodata|hbs|graph|aggregator"`
cat `find ./ -type f -name "cfg.json"|egrep "alarm|api|nodata|hbs|graph|aggregator"` |grep 'root:123456@tcp(127.0.0.1:3306)'
#启动后端模块
[root@open-falcon-server open-falcon]# cd /home/work/open-falcon
[root@open-falcon-server open-falcon]# ./open-falcon start
[falcon-graph] 5583
[falcon-hbs] 5592
[falcon-judge] 5600
[falcon-transfer] 5606
[falcon-nodata] 5613
[falcon-aggregator] 5620
[falcon-agent] 5628
[falcon-gateway] 5635
[falcon-api] 5641
[falcon-alarm] 5653
#检查服务启动状态
[root@open-falcon-server open-falcon]# ./open-falcon check
falcon-graph UP 5583
falcon-hbs UP 5592
falcon-judge UP 5600
falcon-transfer UP 5606
falcon-nodata UP 5613
falcon-aggregator UP 5620
falcon-agent UP 5628
falcon-gateway UP 5635
falcon-api UP 5641
falcon-alarm UP 5653
#更多命令行工具用法
# ./open-falcon [start|stop|restart|check|monitor|reload] module
./open-falcon start agent
./open-falcon check
falcon-graph UP 53007
falcon-hbs UP 53014
falcon-judge UP 53020
falcon-transfer UP 53026
falcon-nodata UP 53032
falcon-aggregator UP 53038
falcon-agent UP 53044
falcon-gateway UP 53050
falcon-api UP 53056
falcon-alarm UP 53063
#For debugging , You can check $WorkDir/$moduleName/log/logs/xxx.log
至此后端部署完成。
#其他用法
重载配置(备注:修改vi cfg.json配置文件后,可以用下面命令重载配置)
curl 127.0.0.1:1988/config/reload
五、Open-Falcon前端
#创建工作目录
export HOME=/home/work
export WORKSPACE=$HOME/open-falcon
mkdir -p $WORKSPACE
cd $WORKSPACE
#克隆前端组件代码
git clone https://github.com/open-falcon/dashboard.git
#安装依赖包
yum install -y python-virtualenv
yum install -y python-devel
yum install -y openldap-devel
yum install -y mysql-devel
yum groupinstall "Development tools" -y
#下载ez_setup.py
cd ~
wget --no-check-certificate https://bootstrap.pypa.io/ez_setup.py
python ez_setup.py --insecure
#下载安装pip
wget https://pypi.python.org/packages/11/b6/abcb525026a4be042b486df43905d6893fb04f05aac21c32c638e939e447/pip-9.0.1.tar.gz#md5=35f01da33009719497f01a4ba69d63c9
tar xf pip-9.0.1.tar.gz
cd pip-9.0.1
python setup.py install
#解决pip安装慢
mkdir -p ~/.pip
echo '[global]' >>~/.pip/pip.conf
echo 'index-url = https://pypi.tuna.tsinghua.edu.cn/simple' >>~/.pip/pip.conf
#测试是否可用
[root@open-falcon-server ~]# cd /home/work/open-falcon/dashboard
[root@open-falcon-server dashboard]# pip -V
pip 9.0.1 from /usr/lib/python2.7/site-packages/pip-9.0.1-py2.7.egg (python 2.7)
[root@open-falcon-server dashboard]# pip
Usage:
pip <command> [options]
Commands:
install Install packages.
download Download packages.
uninstall Uninstall packages.
freeze Output installed packages in requirements format.
list List installed packages.
show Show information about installed packages.
check Verify installed packages have compatible dependencies.
search Search PyPI for packages.
wheel Build wheels from your requirements.
hash Compute hashes of package archives.
completion A helper command used for command completion.
help Show help for commands.
General Options:
-h, --help Show help.
--isolated Run pip in an isolated mode, ignoring environment variables and user configuration.
-v, --verbose Give more output. Option is additive, and can be used up to 3 times.
-V, --version Show version and exit.
-q, --quiet Give less output. Option is additive, and can be used up to 3 times (corresponding to WARNING, ERROR, and CRITICAL logging levels).
--log <path> Path to a verbose appending log.
--proxy <proxy> Specify a proxy in the form [user:passwd@]proxy.server:port.
--retries <retries> Maximum number of retries each connection should attempt (default 5 times).
--timeout <sec> Set the socket timeout (default 15 seconds).
--exists-action <action> Default action when a path already exists: (s)witch, (i)gnore, (w)ipe, (b)ackup, (a)bort.
--trusted-host <hostname> Mark this host as trusted, even though it does not have valid or any HTTPS.
--cert <path> Path to alternate CA bundle.
--client-cert <path> Path to SSL client certificate, a single file containing the private key and the certificate in PEM format.
--cache-dir <dir> Store the cache data in <dir>.
--no-cache-dir Disable the cache.
--disable-pip-version-check
Don't periodically check PyPI to determine whether a new version of pip is available for download. Implied with --no-index.
#查看需要安装模块
[root@open-falcon-server dashboard]# cat pip_requirements.txt
Flask==0.10.1
Flask-Babel==0.9
Jinja2==2.7.2
Werkzeug==0.9.4
gunicorn==19.5.0
python-dateutil==2.2
requests==2.3.0
mysql-python
python-ldap
#安装模块
pip install -r pip_requirements.txt
#修改配置文件
配置说明:
dashboard的配置文件为: 'rrd/config.py',根据实际情况修改:
# API_ADDR 表示后端api组件的地址
API_ADDR = "http://127.0.0.1:8080/api/v1"
# 根据实际情况,修改PORTAL_DB_*, 默认用户名为root,默认密码为""
# 根据实际情况,修改ALARM_DB_*, 默认用户名为root,默认密码为""
配置修改:
cp rrd/config.py{,.bak}
vim rrd/config.py
修改内容:
# Falcon+ API
API_ADDR = os.environ.get("API_ADDR","http://10.0.0.100:8080/api/v1")
# portal database
# TODO: read from api instead of db
PORTAL_DB_HOST = os.environ.get("PORTAL_DB_HOST","10.0.0.100")
PORTAL_DB_PORT = int(os.environ.get("PORTAL_DB_PORT",3306))
PORTAL_DB_USER = os.environ.get("PORTAL_DB_USER","root")
PORTAL_DB_PASS = os.environ.get("PORTAL_DB_PASS","123456")
PORTAL_DB_NAME = os.environ.get("PORTAL_DB_NAME","falcon_portal")
# alarm database
# TODO: read from api instead of db
ALARM_DB_HOST = os.environ.get("ALARM_DB_HOST","10.0.0.100")
ALARM_DB_PORT = int(os.environ.get("ALARM_DB_PORT",3306))
ALARM_DB_USER = os.environ.get("ALARM_DB_USER","root")
ALARM_DB_PASS = os.environ.get("ALARM_DB_PASS","123456")
ALARM_DB_NAME = os.environ.get("ALARM_DB_NAME","alarms")
#启动服务
[root@open-falcon-server dashboard]# virtualenv ./env
New python executable in /home/work/open-falcon/dashboard/env/bin/python
Installing setuptools, pip, wheel...done.
[root@open-falcon-server dashboard]# source env/bin/activate
(env) [root@open-falcon-server dashboard]# ./control start
falcon-dashboard started..., pid=20814
(env) [root@open-falcon-server dashboard]# ./control tail
[2018-07-27 16:37:02 +0000] [20814] [INFO] Starting gunicorn 19.5.0
[2018-07-27 16:37:02 +0000] [20814] [INFO] Listening at: http://0.0.0.0:8081 (20814)
[2018-07-27 16:37:02 +0000] [20814] [INFO] Using worker: sync
[2018-07-27 16:37:02 +0000] [20819] [INFO] Booting worker with pid: 20819
[2018-07-27 16:37:02 +0000] [20820] [INFO] Booting worker with pid: 20820
[2018-07-27 16:37:02 +0000] [20821] [INFO] Booting worker with pid: 20821
[2018-07-27 16:37:02 +0000] [20826] [INFO] Booting worker with pid: 20826
^C
(env) [root@open-falcon-server dashboard]# deactivate
六、访问网站
http://10.0.0.100:8081
#dashbord用户管理
dashbord没有默认创建任何账号包括管理账号,需要你通过页面进行注册账号。
想拥有管理全局的超级管理员账号,需要手动注册用户名为root的账号(第一个帐号名称为root的用户会被自动设置为超级管理员)。
超级管理员可以给普通用户分配权限管理。
小提示:注册账号能够被任何打开dashboard页面的人注册,所以当给相关的人注册完账号后,需要去关闭注册账号功能。只需要去修改api组件的配置文件cfg.json,将signup_disable配置项修改为true,重启api即可。当需要给人开账号的时候,再将配置选项改回去,用完再关掉即可。
七、Open-Falcon客户端
#服务端操作
[root@open-falcon-server ~]# cd /home/work/open-falcon
[root@open-falcon-server open-falcon]# scp -r agent root@10.0.0.101:/home/
[root@open-falcon-server open-falcon]# scp -r open-falcon root@10.0.0.101:/home/
#客户端操作
[root@open-falcon-client ~]# mkdir -p /home/work/open-falcon
[root@open-falcon-client ~]# mkdir -p /home/work/open-falcon
[root@open-falcon-client ~]# mv /home/open-falcon /home/agent /home/work/open-falcon
[root@open-falcon-client ~]# cd /home/work/open-falcon
[root@open-falcon-client open-falcon]# vim agent/config/cfg.json
修改内容:
{
"debug": true, # 控制一些debug信息的输出,生产环境通常设置为false
"hostname": "", # agent采集了数据发给transfer,endpoint就设置为了hostname,默认通过`hostname`获取,如果配置中配置了hostname,就用配置中的
"ip": "", # agent与hbs心跳的时候会把自己的ip地址发给hbs,agent会自动探测本机ip,如果不想让agent自动探测,可以手工修改该配置
"plugin": {
"enabled": false, # 默认不开启插件机制
"dir": "./plugin", # 把放置插件脚本的git repo clone到这个目录
"git": "https://github.com/open-falcon/plugin.git", # 放置插件脚本的git repo地址
"logs": "./logs" # 插件执行的log,如果插件执行有问题,可以去这个目录看log
},
"heartbeat": {
"enabled": true, # 此处enabled要设置为true
"addr": "10.0.0.100:6030", # hbs的地址,端口是hbs的rpc端口
"interval": 60, # 心跳周期,单位是秒
"timeout": 1000 # 连接hbs的超时时间,单位是毫秒
},
"transfer": {
"enabled": true,
"addrs": [
"10.0.0.100:18433"
], # transfer的地址,端口是transfer的rpc端口, 可以支持写多个transfer的地址,agent会保证HA
"interval": 60, # 采集周期,单位是秒,即agent一分钟采集一次数据发给transfer
"timeout": 1000 # 连接transfer的超时时间,单位是毫秒
},
"http": {
"enabled": true, # 是否要监听http端口
"listen": ":1988",
"backdoor": false
},
"collector": {
"ifacePrefix": ["eth", "em"], # 默认配置只会采集网卡名称前缀是eth、em的网卡流量,配置为空就会采集所有的,lo的也会采集。可以从/proc/net/dev看到各个网卡的流量信息
"mountPoint": []
},
"default_tags": {
},
"ignore": { # 默认采集了200多个metric,可以通过ignore设置为不采集
"cpu.busy": true,
"df.bytes.free": true,
"df.bytes.total": true,
"df.bytes.used": true,
"df.bytes.used.percent": true,
"df.inodes.total": true,
"df.inodes.free": true,
"df.inodes.used": true,
"df.inodes.used.percent": true,
"mem.memtotal": true,
"mem.memused": true,
"mem.memused.percent": true,
"mem.memfree": true,
"mem.swaptotal": true,
"mem.swapused": true,
"mem.swapfree": true
}
}
#启动服务
./open-falcon start agent 启动进程
./open-falcon stop agent 停止进程
./open-falcon monitor agent 查看日志
看var目录下的log是否正常,或者浏览器访问其1988端口。另外agent提供了一个--check参数,可以检查agent是否可以正常跑在当前机器上
cd /home/work/open-falcon/agent/bin/
./falcon-agent --check
进入监控界面查看:
八、参考文档
## Open-Falcon
# 运维监控系统之Open-Falcon
https://www.cnblogs.com/nulige/p/7741580.html
# open-falcon安装使用监控树莓派
https://yq.aliyun.com/articles/437196
# 小米运维架构服务监控Open-Falcon
https://blog.csdn.net/qq_27384769/article/details/79234270
# 架构师的成长之路-博客-导图
https://github.com/csy512889371/learnDoc
# Open-Falcon编写的整个脑洞历程
http://mp.weixin.qq.com/s?__biz=MjM5OTcxMzE0MQ==&mid=400225178&idx=1&sn=c98609a9b66f84549e41cd421b4df74d