背景
[图片上传失败...(image-3d0ed5-1655084210955)]
[图片上传失败...(image-d51c22-1655084210955)]
ceph hdd 突然出现性能急剧变差,导致ovn-central无法完成选举,导致ovn-central 无法完成建成健康检查。etcd的性能也不是很理想,kube-ovn-controller etcd 选举标志无法更新到etcd,导致kube-ovn-control 频繁重启。
ovsdb 对磁盘性能敏感,普通hdd(ceph hdd性能出现下降)无法满足性能需求,导致ovn-central 无法正常选举,读写io duration 较高约4s.
也可能是存在io竞争,比如filebeat类负载。
[图片上传失败...(image-1baef7-1655084210955)]
# E:\yealink-code\ovs\ovsdb\raft.c
case RAFT_FOLLOWER:
if (now < raft->election_base + raft->election_timer) {
VLOG_WARN_RL(&rl, "ignoring vote request received after only "
"%lld ms (minimum election time is %"PRIu64" ms)",
now - raft->election_base, raft->election_timer);
return true;
}
return false;
/* The election timeout base value for leader election, in milliseconds.
* It can be set by unixctl cluster/change-election-timer. Default value is
* ELECTION_BASE_MSEC. */
uint64_t election_timer;
在选举过程中,除去网络交互,至少有一次写log,一次写ovn sb db,如果读写一次都要3s+,那么不可能在5s内完成
[图片上传失败...(image-c36dd2-1655084210955)]
[图片上传失败...(image-aabbcc-1655084210955)]
[图片上传失败...(image-d0baf9-1655084210955)]
[图片上传失败...(image-ee9769-1655084210955)]
经过分析和ceph替换盘后内存不足有关,5G limit osd pod 内存不足,< 5G osd pod无法启动。
准备迁移数据
rm -rf /var/run/openvswitch
rm -rf /var/run/ovn
rm -rf /etc/origin/openvswitch/
rm -rf /etc/origin/ovn/
rm -rf /etc/cni/net.d/00-kube-ovn.conflist
rm -rf /etc/cni/net.d/01-kube-ovn.conflist
rm -rf /var/log/openvswitch
rm -rf /var/log/ovn
/etc/origin/
├── openvswitch
│ ├── conf.db
│ └── system-id.conf
└── ovn
├── ovnnb_db.db
└── ovnsb_db.db
# 可以看到该目录的文件只有ovn和ovs相关的数据库
# 预估2000个port 数据量在30M左右,所以这个数据量不会超过1G
# db 都需要至少在ssd(ceph)存储之上
迁移步骤
建一块5G的磁盘,挂载到/etc/origin/目录,重建下ovn-central 和 ovs-ovn pod
有三个节点,逐个替换即可,重启 ovn-central
重启ovs-ovn 会断网,但是只重启master上ovs-ovn不会造成影响,业务负载不在master。
由于 master1 已调整了 ovn-central , 而master2 影响较大,所以优先调整master2
# mkdir /etc/origin/
mkdir /etc/origin1
mkfs.xfs -f /dev/vdc
mount /dev/vdc /etc/origin1
cp -fr /etc/origin/* /etc/origin1
umount /dev/vdc /etc/origin1
# 如果在当前目录内是无法umount的
# fuser -um /dev/vdc
mount /dev/vdc /etc/origin
vi /etc/fstab
/dev/vdc /etc/origin xfs defaults 0 0
问题: 如果基于空白目录直接替换,那么会有如下结果
mount /dev/vdc /etc/origin
重建ovn-central 和 ovs
ovn-central 和 ovs 都会正常初始化,但是不会继承之前的member id,所以最好采用之前的数据信息,采用复用角色,增量追赶的方式
# 错误结果记录
[root@pro-k8s-master-2 ovn]# tailf ovsdb-server-nb.log
2022-06-10T02:00:20.063Z|00451|raft|INFO|tcp:10.120.33.146:35616: syntax "{"cluster":"c0924b48-fd71-4b92-9b01-39a901e5d5c6","comment":"heartbeat","from":"d57ee44d-74f3-416a-8b41-d8448471b8ff","leader_commit":7884166,"log":[],"prev_log_index":7884166,"prev_log_term":2318,"term":2318,"to":"a9df09bf-a805-4aa0-87b5-2da62490e6dd"}": syntax error: Parsing raft append_request RPC failed: misrouted message (addressed to a9df but we're 5e13)
root@pro-k8s-master-1:/kube-ovn# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
d57e
Name: OVN_Northbound
Cluster ID: c092 (c0924b48-fd71-4b92-9b01-39a901e5d5c6)
Server ID: d57e (d57ee44d-74f3-416a-8b41-d8448471b8ff)
Address: tcp:[10.120.33.146]:6643
Status: cluster member
Role: leader
Term: 2387
Leader: self
Vote: self
Last Election started 53321 ms ago, reason: timeout
Last Election won: 50300 ms ago
Election timer: 5000
Log: [7868635, 7884644]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->248f ->a9df <-248f <-a9df
Disconnections: 243
Servers:
5e13 (5e13 at tcp:[10.120.34.53]:6643) next_index=7884644 match_index=0 last msg 1853455 ms ago # 这个是空白加入的节点,会启用新角色
248f (248f at tcp:[10.120.35.101]:6643) next_index=7884644 match_index=7884643 last msg 797 ms ago
d57e (d57e at tcp:[10.120.33.146]:6643) (self) next_index=7884625 match_index=7884643
a9df (a9df at tcp:[10.120.34.53]:6643) next_index=7884644 match_index=7884643 last msg 797 ms ago # 原来的角色
可以使用 ovs-appctl cluster/kick 将该新角色踢掉
参考
$ ovs-appctl -t /var/run/openvswitch/ovnsb_db.ctl list-commands
The available commands are:
cluster/cid DB
cluster/kick DB SERVER
cluster/leave DB
cluster/sid DB
cluster/status DB
coverage/show
exit
list-commands
memory/show
ovsdb-server/add-db DB
ovsdb-server/add-remote REMOTE
ovsdb-server/compact
ovsdb-server/connect-active-ovsdb-server
ovsdb-server/disable-monitor-cond
ovsdb-server/disconnect-active-ovsdb-server
ovsdb-server/get-active-ovsdb-server
ovsdb-server/get-sync-exclude-tables
ovsdb-server/list-dbs
ovsdb-server/list-remotes
ovsdb-server/perf-counters-clear
ovsdb-server/perf-counters-show
ovsdb-server/reconnect
ovsdb-server/remove-db DB
ovsdb-server/remove-remote REMOTE
ovsdb-server/set-active-ovsdb-server
ovsdb-server/set-sync-exclude-tables
ovsdb-server/sync-status
version
vlog/close
vlog/disable-rate-limit [module]...
vlog/enable-rate-limit [module]...
vlog/list
vlog/list-pattern
vlog/reopen
vlog/set {spec | PATTERN:destination:pattern}
ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/kick OVN_Northbound 5e13
执行
root@pro-k8s-master-1:/kube-ovn# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/kick OVN_Northbound 5e13
started removal
root@pro-k8s-master-1:/kube-ovn# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
d57e
Name: OVN_Northbound
Cluster ID: c092 (c0924b48-fd71-4b92-9b01-39a901e5d5c6)
Server ID: d57e (d57ee44d-74f3-416a-8b41-d8448471b8ff)
Address: tcp:[10.120.33.146]:6643
Status: cluster member
Role: leader
Term: 2389
Leader: self
Vote: self
Last Election started 246592 ms ago, reason: timeout
Last Election won: 241569 ms ago
Election timer: 5000
Log: [7868635, 7884726]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->248f ->a9df <-248f <-a9df
Disconnections: 243
Servers:
248f (248f at tcp:[10.120.35.101]:6643) next_index=7884726 match_index=7884725 last msg 16 ms ago
d57e (d57e at tcp:[10.120.33.146]:6643) (self) next_index=7884656 match_index=7884725
a9df (a9df at tcp:[10.120.34.53]:6643) next_index=7884726 match_index=7884723 last msg 2305 ms ago
# 可以看到已踢掉
类似的 清理南向冲突数据库
root@pro-k8s-master-1:/kube-ovn# ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound
6a5d
Name: OVN_Southbound
Cluster ID: 7be1 (7be14312-0b05-45c3-91ad-7cad7e9abd2c)
Server ID: 6a5d (6a5df53d-a44c-4524-ad98-ed1a71d9134d)
Address: tcp:[10.120.33.146]:6644
Status: cluster member
Role: follower
Term: 29685
Leader: 5e34
Vote: 5e34
Last Election started 104771 ms ago, reason: leadership_transfer
Last Election won: 104587 ms ago
Election timer: 5000
Log: [333796091, 333800501]
Entries not yet committed: 1
Entries not yet applied: 1
Connections: ->5e34 ->9699 <-5e34 <-9699
Disconnections: 149
Servers:
5e34 (5e34 at tcp:[10.120.34.53]:6644) last msg 20 ms ago
2a81 (2a81 at tcp:[10.120.34.53]:6644) last msg 8971396 ms ago
6a5d (6a5d at tcp:[10.120.33.146]:6644) (self)
9699 (9699 at tcp:[10.120.35.101]:6644) last msg 69925 ms ago
ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/kick OVN_Southbound 2a81
Last Election started 252638 ms ago, reason: leadership_transfer
Last Election won: 252454 ms ago
Election timer: 5000
Log: [333796091, 333810309]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->5e34 ->9699 <-5e34 <-9699
Disconnections: 149
Servers:
5e34 (5e34 at tcp:[10.120.34.53]:6644) last msg 3 ms ago
6a5d (6a5d at tcp:[10.120.33.146]:6644) (self)
9699 (9699 at tcp:[10.120.35.101]:6644) last msg 217792 ms ago
# 但是这个应答也太久了吧,可能是这个原因导致双主频繁切换,这个第三方应答很慢
[root@pro-k8s-master-3 ovn]# tailf ovsdb-server-sb.log
2022-06-10T04:02:43.431Z|00043|raft|INFO|server 6a5d is leader for term 29676
2022-06-10T04:03:11.661Z|00044|raft|INFO|server 5e34 is leader for term 29677
2022-06-10T04:07:26.783Z|00045|raft|INFO|server 6a5d is leader for term 29678
2022-06-10T04:07:53.643Z|00046|raft|INFO|server 5e34 is leader for term 29679
2022-06-10T04:12:17.662Z|00047|raft|INFO|server 6a5d is leader for term 29680
2022-06-10T04:12:47.224Z|00048|raft|INFO|server 5e34 is leader for term 29681
2022-06-10T04:17:25.519Z|00049|raft|INFO|server 6a5d is leader for term 29682
2022-06-10T04:17:58.691Z|00050|raft|INFO|server 5e34 is leader for term 29683
2022-06-10T04:23:23.640Z|00051|raft|INFO|server 6a5d is leader for term 29684
2022-06-10T04:23:58.321Z|00052|raft|INFO|server 5e34 is leader for term 29685
总结
etcd 和 ovs 三个db 最好都放在ssd本地盘上,本身具备副本机制。 另外其他的数据库在ceph上也是性能不大行。
[图片上传失败...(image-a6d5cd-1655084210955)]
可以看到性能差的时候
读 3.59*1000 / 6 = 600
写 2.66 * 1000 / 51 = 52
读相差600倍, 写相差52倍
[图片上传失败...(image-700a1e-1655084210955)]
可以看到磁盘切换后,再加上hdd性能恢复后,kube-ovn-controller重启次数以及频率都有所放缓。