背景####
最近上线的部分业务虚拟机存在定时同步集群文件,造成同步阶段网络流量飙高影响改宿主机上其他业务虚拟机。由于计算节点采用的两块1000M网卡做的Bond1,理论最大传输也就120MB/s左右。如果虚拟机出现网络抢占的问题,就不得不对其限速。
虚拟机
宿主机
从监控可以看到,有一台虚拟机每一个小时便会有有次入口流量高峰,几乎跑慢网卡流量,同时这台虚拟机磁盘WriteOPS开始增高,说明这段时间虚拟机上业务传输数据已经落盘。同时每隔一小时还有磁盘Read OPS高峰。看来限速必不可免啊。
操作####
可以从三个方面着手网卡的流量限速。
OVS队列+流表
libvirtd限速接口
Neutron QosPolicy
备注:本文主要采用第1种方式实现限速,简单涉及第2,3种。
OVS队列+流表
openvswitch的Port Qos policy只支持HTB
原理如下:
- 在虚拟机port上创建一条QOS
- 一条QOS队列对应一条Queue,可以是对应多条Queue
- 规则OVS通过流表
通过ovs-vsctl show
查到虚拟机接到ovs上的tap网卡
#查网卡
$ ovs-vsctl show
5a977fc5-4fdf-4fc7-aea3-a7341a305db1
Bridge br-int
fail_mode: secure
Port "tap53eeb988-c7"
tag: 4
Interface "tap53eeb988-c7"
Port "int-br-bo8eb174"
Interface "int-br-bo8eb174"
type: patch
options: {peer="phy-br-bo8eb174"}
Port br-int
Interface br-int
type: internal
Bridge br-bond_vmouter
fail_mode: secure
Port "phy-br-bo8eb174"
Interface "phy-br-bo8eb174"
type: patch
options: {peer="int-br-bo8eb174"}
Port bond_vmouter
Interface bond_vmouter
Port br-bond_vmouter
Interface br-bond_vmouter
type: internal
ovs_version: "2.5.0"
#查端口
$ ovs-ofctl show br-int
FPT_FEATURES_REPLY (xid=0x2): dpid:00005adafb219b49
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
1(int-br-bo8eb174): addr:ae:1f:61:28:e4:96
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
9(tap53eeb988-c7): addr:fe:16:3e:e5:d9:00
config: 0
state: 0
current: 10MB-FD COPPER
speed: 10 Mbps now, 0 Mbps max
LOCAL(br-int): addr:5a:da:fb:21:9b:49
config: PORT_DOWN
state: LINK_DOWN
speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
对tap53eeb988-c7
创建一条Qos,其中qos0队列限速最小700Mbps,最大800Mbps
$ ovs-vsctl -- set port tap53eeb988-c7 qos=@newqos \
-- --id=@newqos create qos type=linux-htb other-config:max-rate=800000000 queues=0=@q0 \
-- --id=@q0 create queue other-config:min-rate=700000000 other-config:max-rate=800000000
查当前流表规则
$ ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
cookie=0xb01c77077412cf51, duration=4216756.364s, table=0, n_packets=390420, n_bytes=36921445, idle_age=0, hard_age=65534, priority=2,in_port=1 actions=drop
cookie=0xb01c77077412cf51, duration=1554953.959s, table=0, n_packets=620904369, n_bytes=519069450871, idle_age=1, hard_age=65534, priority=9,in_port=19 actions=resubmit(,25)
cookie=0xb01c77077412cf51, duration=4216750.815s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=10,icmp6,in_port=9,icmp_type=136 actions=resubmit(,24)
cookie=0xb01c77077412cf51, duration=4216750.804s, table=0, n_packets=82396, n_bytes=3460632, idle_age=2242, hard_age=65534, priority=10,arp,in_port=9 actions=resubmit(,24)
cookie=0xb01c77077412cf51, duration=4216751.878s, table=0, n_packets=1037226147, n_bytes=4734216312377, idle_age=0, hard_age=65534, priority=3,in_port=1,dl_vlan=332 actions=mod_vlan_vid:4,NORMAL
cookie=0xb01c77077412cf51, duration=4216756.481s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=0 actions=NORMAL
cookie=0xb01c77077412cf51, duration=4216756.473s, table=23, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=0 actions=drop
cookie=0xb01c77077412cf51, duration=4216750.821s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,icmp6,in_port=9,icmp_type=136,nd_target=fe80::f816:3eff:fee5:d900 actions=NORMAL
cookie=0xb01c77077412cf51, duration=4216750.809s, table=24, n_packets=82154, n_bytes=3450468, idle_age=2242, hard_age=65534, priority=2,arp,in_port=9,arp_spa=10.16.32.40 actions=resubmit(,25)
cookie=0xb01c77077412cf51, duration=4216756.466s, table=24, n_packets=1556, n_bytes=65352, idle_age=5669, hard_age=65534, priority=0 actions=drop
cookie=0xb01c77077412cf51, duration=4215276.624s, table=25, n_packets=1051294240, n_bytes=707172859381, idle_age=20, hard_age=65534, priority=2,in_port=9,dl_src=fa:16:3e:e5:d9:00 actions=NORMAL
从流表里面可以看到通过in_port=9的报文在table 25里面处理,那么问题就很简单了,修改table25,将qos队列规则应用到in_port=9上就可以了,操作如下:
$ ovs-ofctl mod-flows br-int "table=25, n_packets=1051294240, n_bytes=707172859381, idle_age=20, hard_age=65534, priority=2,in_port=9,dl_src=fa:16:3e:e5:d9:00 actions=set_queue:0,NORMAL"
这个时候再观察下虚拟机的监控
很好,已经成功限制住了。
那么如何查询qos相关信息呢?
查看网卡属性
$ ovs-vsctl list port tap53eeb988-c7
_uuid : 4712ae65-bced-4ee3-bf7d-3b7fa1e52bb7
bond_active_slave : []
bond_downdelay : 0
bond_fake_iface : false
bond_mode : []
bond_updelay : 0
external_ids : {}
fake_bridge : false
interfaces : [1fe8bb0a-6383-45ba-bc86-46e1de03f4e0]
lacp : []
mac : []
name : "tap53eeb988-c7"
other_config : {net_uuid="ea7d53f9-45c6-4027-98b5-23053d10373b", network_type=vlan, physical_network="physnet1", segmentation_id="332", tag="4"}
qos : 82bd0134-4e76-405a-ac1d-22b4ea43e55a
rstp_statistics : {}
rstp_status : {}
statistics : {}
status : {}
tag : 4
trunks : []
vlan_mode : []
这个82bd0134-4e76-405a-ac1d-22b4ea43e55a
就是OVS里面QOS的uuid
查看QOS属性
$ ovs-vsctl list qos 82bd0134-4e76-405a-ac1d-22b4ea43e55a
_uuid : 82bd0134-4e76-405a-ac1d-22b4ea43e55a
external_ids : {}
other_config : {max-rate="800000000"}
queues : {0=cc4e5d2e-2dbb-4e5b-a682-d6a28bd7b743}
type : linux-htb
删除QOS并清除网卡QOS
$ ovs-vsctl -- destroy QoS 82bd0134-4e76-405a-ac1d-22b4ea43e55a -- clear Port tap53eeb988 qos
libvirtd限速接口
Libvirtd默认提供domiftune限制网卡流量
查看虚机接口的限速设置
$ virsh domiftune 4ffbd71f-3324-4500-8636-f9a275b6e479 tap53eeb988
设置虚机接口限速
$ virsh domiftune 4ffbd71f-3324-4500-8636-f9a275b6e479 tap53eeb988 --inbound 700000,800000,800000 --outbount 700000,800000,800000 --live
单位如下
average bandwidth kilobytes/second
peak bandwidth kilobytes/second
burst size kilobytes
实际限速值average, 峰值peak和突发值burst是可以合理计算出来的
建议的值:
peak=1.5*average
burst=peak/8*2=3average/8
这里要注意的是domiftune只针对网络模式为nat,route等方式,对模型为bridge, passthrough, private,和hostdev是不支持限制的。
The <bandwidth> element allows setting quality of service for a particular network (since 0.9.4). Setting bandwidth for a network is supported only for networks with a <forward> mode of route, nat, or no mode at all (i.e. an "isolated" network). Setting bandwidth is not supported for forward modes of bridge, passthrough, private, or hostdev. Attempts to do this will lead to a failure to define the network or to create a transient network.
Neutron QosPolicy
这里很惭愧,线上OpenStack的虚拟机Neutron居然没开启QOS驱动,也怪自己当时急于上线,没有考虑周到。现在再修改的怕影响线上环境,故而不敢在上面尝试,日后测试环境通过了,再更新文档不迟。
因为Neutron已经封装好了qos的实现,那么我们拿过来直接用就好了。
更改Neutron配置文件
neutron.conf
service_plugins = neutron.services.qos.qos_plugin.QoSPlugin
/etc/neutron/plugins/ml2/ml2_conf.ini
[ml2]
extension_drivers=qos
[agent]
extensions=qos
创建一个policy
$ neutron qos-policy-create test1
+-----------------+--------------------------------------+
| Field | Value |
+-----------------+--------------------------------------+
| created_at | 2017-02-27T15:56:11Z |
| description | |
| id |82bd0114-4e76-40da-ac1d-22bbea43e55a |
| name | test1 |
| revision_number | 1 |
| rules | |
| shared | False |
| tenant_id | b61372588a7e4475bc8ecdbaee3fa340 |
| updated_at | 2017-02-27T15:56:11Z |
+-----------------+--------------------------------------+
添加限速rule
$ neutron qos-bandwidth-limit-rule-createbw-limiter --max-kbps 700000 --max-burst-kbps 560000
Created a new bandwidth_limit_rule:
+----------------+--------------------------------------+
| Field | Value |
+----------------+--------------------------------------+
| id |4d65fbdf-6c08-456b-8545-6f6339f34881 |
| max_burst_kbps | 560000 |
| max_kbps | 700000 |
+----------------+--------------------------------------+
绑定Port
$ neutron port-update <port_id> --qos-policy test
绑定Network
$ neutron net-update <network_id> --qos-policy test