报文跟踪
我们知道linuxbridge,是纯粹的根据MAC转发,桥内的转发问题通常我们看看fdb表项正确,ebtables、iptables是否做了拦截就能解决大部分问题了。而ovs使用openflow 流表转发报文,情况就复杂的多,特别在使用多个datapath bridge、多级流表的情况下,通过肉眼看流表还是很费力的。
把Trace放到前面是为了方便在实验出问题的时候,能够使用这些工具,更方便的去定位解决问题。
使用OVN做报文追踪有两种方式,第一种使用ovs的trace命令行,第二种使用ovn的trace命令行。
- ovs-appctl ofproto/trace
- ovn-trace
当然输出可以相互转化的:
ovs-appctl ofproto/trace "xxxx" > tmp
ovn-detrace < tmp
两者的区别:
ovs-appctl ofproto/trace 可以在非Centrial节点执行,而且可以跟踪非OVN托管的datapath,但显示内容更加抽象,可读性差;
ovn-trace 只能用于ovn,在Centrial节点执行,跟踪OVN托管的datapath,显示内容和ovn逻辑配置对应,可读性好。
两者共同点:
无论是ovs trace还是ovn trace,本质上都是通过查询ovs流表整理出路径,并非构造真实报文。
所有后面我们会发现,需要依赖外部模块时,可能就无法使用。比如,如果pipeline 中用到ct,是无法真正进入ct模块做报文修改,如nat修改,就会造成trace失效,所以使用起来还是有一定限制的,但如果没这些情况,还是很准确的。
测试
下面列了一个并不怎么准确的例子(不准确的原因是我发现trace无法处理ct)但却误打误撞解决了问题。
测试中出现一个问题,不知做了什么,配置EIP后,vm ping 公网不通了,我们分别使用上面的命令行定位一下。
使用 ovs-appctl ofproto/trace
1)从构造icmp包从vm nic进入 br-int,期望从br-ex的公网口发出。
ovs-appctl ofproto/trace br-int in_port="sw-400-port-vm2",icmp,dl_src=fa:10:dd:1b:40:02,dl_dst=02:d4:1d:8c:40:01,nw_src=40.1.1.12,nw_dst=77.1.1.1,nw_ttl=64
发现和期望一致,出公网口(metadata=0x8,OUTPUT=3)。这里的报文虽然从公网口发出去了,但报文未做nat,因为ct模块无法工作。
2)构造ICMP包,从 br-ex 的公网口入,期望进入b-int,由 vm nic发出。
ovs-appctl ofproto/trace br-ex in_port="veth-topub-o",icmp,dl_src=2e:bb:8d:e4:2d:bd,dl_dst=0a:10:dd:1b:40:02,nw_src=77.1.1.1,nw_dst=192.168.77.42,nw_ttl=64
和期望不一致.
原因分析: 报文还在公网出口交换机中(metadata=0x8)就被丢弃了,没有进入路由器。最后一条,在table34 drop。原因是在table24、33将流量当成未知单播处理了,设置reg15=0xfffe。
[root@172-26-201-7 ~]# ovs-appctl ofproto/trace br-ex in_port="veth-topub-o",icmp,dl_src=2e:bb:8d:e4:2d:bd,dl_dst=0a:10:dd:1b:40:02,nw_src=77.1.1.1,nw_dst=192.168.77.42,nw_ttl=64
Flow: icmp,in_port=2,vlan_tci=0x0000,dl_src=2e:bb:8d:e4:2d:bd,dl_dst=0a:10:dd:1b:40:02,nw_src=77.1.1.1,nw_dst=192.168.77.42,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0
bridge("br-ex")
---------------
0. priority 0
NORMAL
-> no learned MAC for destination, flooding
bridge("br-int")
----------------
0. in_port=3,vlan_tci=0x0000/0x1000, priority 100
set_field:0xa->reg13
set_field:0x9->reg11
set_field:0x8->reg12
set_field:0x8->metadata
set_field:0x1->reg14
resubmit(,8)
8. reg14=0x1,metadata=0x8, priority 50, cookie 0x1c211b6a
resubmit(,9)
9. metadata=0x8, priority 0, cookie 0x3ef498ab
resubmit(,10)
10. metadata=0x8, priority 0, cookie 0xbb147995
resubmit(,11)
11. metadata=0x8, priority 0, cookie 0x724fee86
resubmit(,12)
12. metadata=0x8, priority 0, cookie 0x835967af
resubmit(,13)
13. metadata=0x8, priority 0, cookie 0x4ac77a3a
resubmit(,14)
14. metadata=0x8, priority 0, cookie 0x386ad89e
resubmit(,15)
15. metadata=0x8, priority 0, cookie 0x602fe8c3
resubmit(,16)
16. metadata=0x8, priority 0, cookie 0x42d24bd4
resubmit(,17)
17. metadata=0x8, priority 0, cookie 0xda595cd3
resubmit(,18)
18. metadata=0x8, priority 0, cookie 0x886340ce
resubmit(,19)
19. reg14=0x1,metadata=0x8, priority 100, cookie 0xd37a278f
resubmit(,20)
20. metadata=0x8, priority 0, cookie 0x270b74dc
resubmit(,21)
21. metadata=0x8, priority 0, cookie 0x5f917298
resubmit(,22)
22. metadata=0x8, priority 0, cookie 0x3137956e
resubmit(,23)
23. metadata=0x8, priority 0, cookie 0x956e9bd3
resubmit(,24)
24. metadata=0x8, priority 0, cookie 0x107a5839
set_field:0xfffe->reg15
resubmit(,32)
32. priority 0
resubmit(,33)
33. reg15=0xfffe,metadata=0x8, priority 100
set_field:0xa->reg13
set_field:0x1->reg15
resubmit(,34)
34. reg10=0/0x1,reg14=0x1,reg15=0x1,metadata=0x8, priority 100
drop
set_field:0xfffe->reg15
Final flow: unchanged
Megaflow: recirc_id=0,eth,ip,in_port=2,vlan_tci=0x0000/0x1fff,dl_src=2e:bb:8d:e4:2d:bd,dl_dst=0a:10:dd:1b:40:02,nw_frag=no
Datapath actions: 4
按照正常流程,在公网出口交换机中,table24中应该将到EIP的报文设置为单播报文,转发到路由器,和其他已知mac地址一样,就想下面这种匹配了mac地址之后,将对应出接口id设置到 reg15,后续单播从对应接口发送:
cookie=0xbac6e0e1, duration=481286.062s, table=24, n_packets=27, n_bytes=2646, priority=50,metadata=0x6,dl_dst=fa:10:dd:1b:40:01 actions=load:0x1->NXM_NX_REG15[],resubmit(,32)
cookie=0xa9d9de34, duration=481286.061s, table=24, n_packets=230, n_bytes=20300, priority=50,metadata=0x6,dl_dst=fa:10:dd:1b:40:02 actions=load:0x2->NXM_NX_REG15[],resubmit(,32)
cookie=0xc613bcbf, duration=472772.977s, table=24, n_packets=816, n_bytes=77747, priority=50,metadata=0x6,dl_dst=02:d4:1d:8c:40:01 actions=load:0x3->NXM_NX_REG15[],resubmit(,32)
cookie=0x1628cdab, duration=340649.864s, table=24, n_packets=0, n_bytes=0, priority=50,metadata=0x5,dl_dst=02:d4:1d:8c:30:01 actions=load:0x3->NXM_NX_REG15[],resubmit(,32)
cookie=0x1326bb75, duration=340649.864s, table=24, n_packets=0, n_bytes=0, priority=50,metadata=0x5,dl_dst=fa:10:dd:1b:30:02 actions=load:0x2->NXM_NX_REG15[],resubmit(,32)
cookie=0x173207be, duration=156661.414s, table=24, n_packets=2, n_bytes=196, priority=50,metadata=0x5,dl_dst=c0:ff:ee:00:30:11 actions=load:0x4->NXM_NX_REG15[],resubmit(,32)
cookie=0x8b9f7063, duration=146529.114s, table=24, n_packets=293, n_bytes=27538, priority=50,metadata=0x8,dl_dst=02:d4:1d:8c:ff:01 actions=load:0x2->NXM_NX_REG15[],resubmit(,32)
那么手动加一条流表:
ovs-ofctl add-flow br-int "table=24,priority=50,metadata=0x8,dl_dst=0a:10:dd:1b:40:02 actions=load:0x2→NXM_NX_REG15[],resubmit(,32)"
再次测试,ok。。。
使用ovn-trace
这里只是展示一下,可以看到其可读性更好,和ovn的逻辑配置挂钩,在清楚ovn配置的情况下更好理解。
[root@localhost ~]# ovn-trace --summary sw-400 'inport == "sw-400-port-vm2" && eth.src == fa:10:dd:1b:40:02 && eth.dst == 02:d4:1d:8c:40:01 && ip4.src==40.1.1.12 && ip4.dst==77.1.1.1 && ip.ttl==64'
2021-12-24T03:49:15Z|00001|ovntrace|WARN|eth.dst = eth.src; eth.src = 02:d4:1d:8c:30:1; ip4.dst = 30.1.1.100; ip4.src = 30.1.1.1; udp.src = 67; udp.dst = 68; outport = inport; flags.loopback = 1; output;: parsing actions failed (Invalid numeric constant.)
2021-12-24T03:49:15Z|00002|ovntrace|WARN|outport == "sw-300-port-vm1" && eth.src == 02:d4:1d:8c:30:1 && ip4.src == 30.1.1.1 && udp && udp.src == 67 && udp.dst == 68: parsing expression failed (Invalid numeric constant.)
# ip,reg14=0x2,vlan_tci=0x0000,dl_src=fa:10:dd:1b:40:02,dl_dst=02:d4:1d:8c:40:01,nw_src=40.1.1.12,nw_dst=77.1.1.1,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=64
ingress(dp="sw-400", inport="sw-400-port-vm2") {
next;
next;
outport = "sw-400-port";
output;
egress(dp="sw-400", inport="sw-400-port-vm2", outport="sw-400-port") {
output;
/* output to "sw-400-port", type "patch" */;
ingress(dp="vpc-router", inport="rt-400-port") {
next;
ip.ttl--;
reg0 = 192.168.77.254;
reg1 = 192.168.77.1;
eth.src = 02:d4:1d:8c:ff:01;
outport = "rt-pub-port";
flags.loopback = 1;
next;
get_arp(outport, reg0);
/* MAC binding to 2e:bb:8d:e4:2d:bd. */
next;
next;
output;
egress(dp="vpc-router", inport="rt-400-port", outport="rt-pub-port") {
eth.src = 0a:10:dd:1b:40:02;
ct_dnat;
ct_dnat /* assuming no un-dnat entry, so no change */ {
eth.src = 0a:10:dd:1b:40:02;
ct_snat(192.168.77.42);
ct_snat(ip4.src=192.168.77.42) {
output;
/* output to "rt-pub-port", type "patch" */;
ingress(dp="sw-pub", inport="sw-pub-port-router") {
next;
outport = "_MC_unknown";
output;
multicast(dp="sw-pub", mcgroup="_MC_unknown") {
egress(dp="sw-pub", inport="sw-pub-port-router", outport="sw-pub-port-out") {
output;
/* output to "sw-pub-port-out", type "localnet" */;
};
};
};
};
};
};
};
};
};