多网络访问场景流表梳理
目录
- 多网络访问场景设计概述
- 流表pipeline结构
- 关键字段说明
- 流表实现分析
- 访问场景列举
- 结合场景分析流表
- 多网络访问场景流表设计概述
多网络访问场景控制器是在云平台控制器上配置网络资源的时候,比如创建子网,创建网口,attach网口等等的操作,会被dpu上的云平台控制器响应去配置对应的流表。
多网络访问场景流表设计类似linux内核网络模块,具备l2,l3层转发,访问本机服务的能力,且具备类似netfilter框架钩子,实现端口&&mac绑定、出入方向限速,安全组(支持带状态),ACL等。**也支持ipv6等协议**
- 流表pipeline结构
多网络访问场景流表设计的pipeline可以大致的分成:
phase1:根据不同源端口分别处理,vxlan类型和本地port类型
phase2:Egress 流控,包含BPS, PPS
phase3:port绑定检查
phase4:Egress 安全组(支持带状态)
phase5:访问本地特殊服务,如dhcp, arp欺骗havip,metadata svc等
phase6: 二层转发 到本地或走隧道转发出去
phase7: 三层转发 访问网关,查路由,类似路由器三层转发对包修改
phase8: 二层转发 查完路由后,换了vni和子网
phase9: Ingress 安全组
phase10: Ingress 流控,BPS,PPS
phase11: Output 最终的转发出接口
关键fileds说明
寄存器说明
- reg3: ctlabel
- reg5: tunid 大二层隧道id
- reg6: inport ofport num
- reg7: outport ofport num
- reg8: 未知
- reg9: vm id
- reg11: ctmark 可用来实现状态防火墙等
- metadata: vpc+subnet
其他寄存器暂且没有太看到
实际语法中,actions居多会用NXMNXREG和OXMOF前缀的寄存器字段和fields,他们只是不同厂商对openflow协议的扩展,比如reg5 等同于 NXMNXREG5, inport 等同于 OXMOFINPORT等等
ovs conntrack状态
ct状态需要说明的是最常用到的几种状态和ovs的实现。ovs通过match域匹配ctstate来匹配状态,通过ct动作里完成 相关的操作。 ctstate是通过bit位来标志的某个状态是否置位
- trk 但凡进入到ct模块,就置位
- new 进入CT后,查不到已有连接,就新建,与+trk一起置位
- est 同一个方向来去方向都有包后,置该位,与-new互斥
- rel 跟其他已存在的会话有关联,比如icmp unreachable,或ftp,iperf的控制会话与数据面传输会话
- rpl 回程包,反向的回程包等
- inv 无效的ct状态
ovs conntrack action
- table ct动作完成后,最后跳的目的table
- commit 对+trk的包匹配ctstate后,完成commit操作; 即将会话由unconfirm表放到到confirmtable
- zone ct的上下文环境,会话在zone之间是完全隔离的
- nat 做正向snat, dnat等, 也可做反向nat
- exec([action][,action…]) 执行对ct会话的一些修改,比如设置ctmark
- force 强制commit,重建会话
流表实现分析
vpc实现
采用vxlan overlay实现大二层,arp洪泛采用本地arp代答,通过metadata和reg5在流表中实现子网,vpc隔离等
L2层访问
大二层访问,对跨节点arp请求采用本地代答的方式(包括网关的mac请求),通过本地寻址的方式,判定出口是走隧道还是送往本节点代表口。目的mac是本节点,则送往对应的代表口。
如果目的mac不是本地,则根据对应的vtep和tunid封装后从vxlan口发出去。
L3层访问
三层访问,在二层查询后发现目的mac是子网网关的mac地址,然后预检查ttl是否该丢弃, 然后看是否有acl,没有则通过查询目的网段,匹配源vni, 然后修改其相关的tunid为新vni,修改源mac成新vni子网的网关mac,目的mac为目的ip的mac,重新跳到L2表寻址。 返程类似
流控
通过meters实例实现了出入方向BPS/PPS的限速。
安全组(支持带状态)
根据端口粒度实现不带状态的安全组,也支持利用ct状态实现带状态的
ACL
根据源,目,协议号等方式匹配,然后选择放行或拒绝
NAT
访问本机的一些服务,用到了nat。 ovs的nat是基于ct动作不同参数实现
主要的访问场景
- 同vpc下同子网同宿主机
- 同vpc下同子网跨宿主机
- 同vpc下跨子网同宿主机
- 同vpc下跨子网跨宿主机
- 跨vpc三层访问
访问场景实例
同subnet跨节点访问
拓扑
Node1(10.23.10.6) 访问 Node2(10.23.10.4)
Node1:
IP: xx.xx.10.6
MAC: fa:16:3e:17:4b:9d
代表口:port-xxxxxq2py2
vtep: 10.24.40.67
Node2:
IP: xx.xx.10.4
MAC: fa:16:3e:46:09:25
代表口:port-yyyyyv66x7
vtep: xx.xx.40.70
流表Node1(发送)
arp处理
table=xx
arp均采用代答的方式(后面不再分析arp),实现原理:修改sha, spa, tha ,tpa,arpop等实现
cookie=0x170a30c1320ce4af, table=xx, priority=100,arp,metadata=0x47d100000000,arp_tpa=xx.xx.100.7,arp_op=1 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],set_field:fa:16:3e:0c:02:73->eth_src,set_field:2->arp_op,set_field:xx.xx.100.7->arp_spa,set_field:fa:16:3e:0c:02:73->arp_sha,IN_PORT
ip处理
table=0 流量入口,根据 入接口分流,设置相关寄存器;reg5是vni,reg6是入接口port number,reg9是vmid,metadata是tunidsubnetid
cookie=0x170a32cec8c3e4c1, priority=100,in_port="port-xxxxxq2py2" actions=set_field:0xd54dd->reg5,set_field:0x4->reg6,set_field:0x64->reg9,write_metadata:0xd54dd00000001,goto_table:1
table=1 ip报文都跳到限速处理
cookie=0x170a380bc44c5c23, table=1, priority=50,actions=goto_table:5_
table=5 Egress BPS限速,无限速规则不涉及
cookie=0x170a380bc44c5b6b, table=5, priority=100 actions=goto_table:6_
table=6 Egress PPS限速,无限速规则
cookie=0x170a380bc44c5b7f, table=6, priority=100 actions=goto_table:10_
table=10 Bind port and mac;reg6是port的ofport number,mac是host侧ip源ip
cookie=0x170a32cec8c3e501, table=10, priority=1000,ip,reg6=0x4,dl_src=fa:16:3e:17:4b:9d actions=goto_table:20
table=20 Egress Pre-CT; icmp报文进入到ct,zone由源端口ofport number区分
cookie=0x170a380bc44c5c9f, table=20, priority=58000,icmp actions=ct(table=25,zone=NXM_NX_REG6[0..15])
table=25 Egress匹配ct 状态,根据port号筛选zone,匹配ct状态:+new+trk;zone和状态正确,则commit确认ct状态
cookie=0x170a32cec8c3e4ed, table=25, priority=39799,ct_state=+new-est-rel-inv+trk,ip,reg6=0x4 actions=ct(commit,table=30,zone=NXM_NX_REG6[0..15])
table=30 是否访问本机服务,服务请查看30表全部流表,本次icmp不涉及
cookie=0x170a3983468be7c9, table=30, priority=50 actions=goto_table:60
table=60 根据reg5(vni)、目的mac匹配走哪个隧道封装,并设置出接口为vxlan1; 可以通过ovs-ofctl show br-int查到关系
cookie=0x170a3983468be931, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd->tun_id,set_field:xx.xx.40.70->tun_dst,set_field:0x2->reg7,goto_table:80
table=80 根据出接口,存下vmid,
cookie=0x170a3983468beaf9, table=80, priority=1000,reg7=0x4 actions=set_field:0x64->reg9,goto_table:81
table=81 svc probe, 不涉及,跳过
cookie=0x170a3983468be853, table=81, priority=100 actions=goto_table:85_
table=85 Ingress BPS, 不涉及
cookie=0x170a3983468be84b, table=85, priority=100 actions=goto_table:86
table=86 Ingress PPS, 不涉及
cookie=0x170a3983468be8db, table=86, priority=100 actions=goto_table:90
table=90 从出接口发出去, 本case是从vxlan口发出去
cookie=0x170a3983468be837, table=90, priority=1000 actions=output:NXM_NX_REG7[]
流表node2(接收)
table=0 跨节点接收,都是从vxlan口收到包
cookie=0x170a277782792425, priority=1000,in_port=vxlan1 actions=goto_table:50
table=50 匹配隧道,目的mac, 设置接收端的pipline里的寄存器,metadata是tunidsubnetid, reg=vni,
cookie=0x170a2777827925f9, table=50, priority=50,tun_id=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd00000001->metadata,set_field:0xd54dd->reg5,resubmit(,30)
table=30 是否访问本服务,不涉及
cookie=0x170a2777827923e7, table=30, priority=50 actions=goto_table:60
table=60 匹配tunid和mac,二层转发查询
cookie=0x170a277782792601, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd->tun_id,set_field:0x6->reg7,goto_table:70
table=70
cookie=0x170a277782792393, table=70, priority=58000,icmp actions=ct(table=75,zone=NXM_NX_REG7[0..15])
table=75 匹配ct状态,首包匹配+trk+new, 后续包匹配+trk+est
**首包,匹配+trk+new**
cookie=0x170a2777827925eb, duration=33902.506s, table=75, n_packets=9144, n_bytes=895680, idle_age=4, priority=39819,ct_state=+new-est-rel-inv+trk,ip,reg7=0x6 actions=ct(commit,table=80,zone=NXM_NX_REG7[0..15])**table=**
**后续包匹配+trk+est**
cookie=0x170a2777827923ed, table=75, priority=60000,ct_state=-new+est-rel-inv+trk actions=goto_table:80
table=80 匹配出接口,从某个口发出去,也就从ofport=6的接口发出去
cookie=0x170a27778279262d, table=80, priority=1000,reg7=0x6 actions=set_field:0x64->reg9,goto_table:81
table=81 probe svc,不涉及
cookie=0x170a2777827924a3, table=81, priority=100 actions=goto_table:85
table=85
cookie=0x170a277782792431, table=85, priority=100 actions=goto_table:86
table=86
cookie=0x170a27778279241d, table=86, priority=100 actions=goto_table:90
table=90
cookie=0x170a27778279239d, table=90, priority=1000 actions=output:NXM_NX_REG7[]
场景跨子网访问
拓扑
Node1(xx.xx.10.6) 访问 Node2(xx.xx.11.4)
Node1:
IP: xx.xx.10.6
MAC: fa:16:3e:17:4b:9d
下一跳:xx.xx.10.1(fa:16:3e:ec:22:0d)
代表口:port-xxxxxq2py2 (ens4的代表口)
vtep: xx.xx.40.67
Node2:
IP: xx.xx.11.4
MAC: fa:16:3e:74:20:c6
下一跳:xx.xx.11.1(fa:16:3e:c4:ed:57)
代表口:port-2zbgfw4f26
vtep: xx.xx.40.70
流表Node1(发送)
table=0 流量入口,根据 入接口分流,设置相关寄存器;reg5是vni,reg6是入接口port
cookie=0x170a48c6658cc733, priority=100,in_port="port-xxxxxq2py2" actions=set_field:0xd54dd->reg5,set_field:0x4->reg6,set_field:0x64->reg9,write_metadata:0xd54dd00000001,goto_table:1
table=1 ip报文都跳到限速处理
cookie=0x170a48c6658cc42d, table=1, priority=50 actions=goto_table:5
table=5 Egress BPS限速,无限速规则不涉及
cookie=0x170a48c6658cc435, table=5, priority=100 actions=goto_table:6
table=6 Egress PPS限速,无限速规则
cookie=0x170a48c6658cc4d5, table=6, priority=100 actions=goto_table:10
table=10 Bind port and mac;reg6是port的ofport number,mac是host侧ip源ip
cookie=0x170a48c6658cc74d, duration=1362.918s, table=10, n_packets=13516, n_bytes=1469823, idle_age=1, priority=1000,ip,reg6=0x4,dl_src=fa:16:3e:17:4b:9d actions=goto_table:20
table=20
cookie=0x170a48c6658cc4b5, table=20, priority=58000,icmp actions=ct(table=25,zone=NXM_NX_REG6[0..15])
table=25
**首包**
cookie=0x170a48c6658cc73b, table=25, priority=39799,ct_state=+new-est-rel-inv+trk,ip,reg6=0x4 actions=ct(commit,table=30,zone=NXM_NX_REG6[0..15])
**后续包**
cookie=0x170a48c6658cc4fb, table=25, priority=60000,ct_state=-new+est-rel-inv+trk actions=goto_table:30
table=30
cookie=0x170a48c6658cc4a5, table=30, priority=50 actions=goto_table:60
table=60
cookie=0x170a48c6658cc5fb, table=60, priority=100,metadata=0xd54dd00000001,dl_dst=fa:16:3e:ec:22:0d actions=goto_table:100
table=100 三层转发入口,目的ip不是本机,则到pre routing
cookie=0x170a48c6658cc4e1, table=100, priority=50,ip actions=goto_table:110
table=110 路由前,查ttl若为0或1则丢包,否则继续
cookie=0x170a4b2b174d5e1d, table=110, priority=100 actions=goto_table:120
table=120 匹配acl,没有规则,跳过
cookie=0x170a4b2b174d5e1f, table=120, priority=50 actions=goto_table:130
table=130 查目的网段是xx.xx.0.0/16,则去查精细路由
cookie=0x170a4b2b174d5f1d, duration=180.589s, table=130, n_packets=4860, n_bytes=733806, idle_age=1, priority=10016,ip,metadata=0xd54dd00000001,nw_dst=xx.xx.0.0/16 actions=goto_table:140
table=140 查精细路由,根据reg5筛选大二层,根据目的ip查到具体路由,通过修改大二层metadata, 修改目的ip的mac未目的mac(原先是网关mac),修改源mac为目的网段网关mac,ttl减1, 跳到postrouting
cookie=0x170a4b2b174d5ed3, table=140, priority=100,ip,reg5=0xd54dd,nw_dst=xx.xx.11.4 actions=set_field:0xd54dd00000002->metadata,set_field:fa:16:3e:74:20:c6->eth_dst,set_field:fa:16:3e:c4:ed:57->eth_src,dec_ttl,goto_table:160
table=160 postrouting没有动作,跳过
cookie=0x170a4b2b174d5d6b, table=160, priority=50 actions=resubmit(,170)
table=170 查完路由后,重新二层转发,也就是根据目的mac查找出接口
cookie=0x170a4b2b174d5db3, table=170, priority=50 actions=resubmit(,30)
table=30 不访问本地服务,直接查mac表
cookie=0x170a4c013de71d19, table=30, priority=50 actions=goto_table:60
table=60 根据大二层vni和目的mac(目的ip的实际mac),进行隧道封装,注意这里的reg7的赋值,他是出接口
cookie=0x170a4c013de71e81, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:74:20:c6 actions=set_field:0xd54dd->tun_id,set_field:xx.xx.40.70->tun_dst,set_field:0x2->reg7,goto_table:80
table=80
cookie=0x170a4c013de71d77, table=80, priority=1000,reg7=0x2 actions=output:vxlan1
流表 Node2(接收)
table=0
cookie=0x170d7535048f9acb, priority=1000,in_port=vxlan1 actions=goto_table:50
table=50 l3 lookup
cookie=0x170d7535048f9cc3, table=50, priority=50,tun_id=0xd54dd,dl_dst=fa:16:3e:c4:ed:57 actions=set_field:0xd54dd00000002->metadata,set_field:0xd54dd->reg5,goto_table:140
table=140 查询目的网关
ookie=0x170d7535048f9cd9, table=140, priority=100,ip,reg5=0xd54dd,nw_dst=xx.xx.11.4 actions=set_field:0xd54dd00000002->metadata,set_field:fa:16:3e:74:20:c6->eth_dst,set_field:fa:16:3e:c4:ed:57->eth_src,dec_ttl,goto_table:160
table=160 postrouting
cookie=0x170d7535048f9acf, table=160, priority=50 actions=resubmit(,170)
table=170 ingress acl
cookie=0x170d7535048f9b29, table=170, priority=50 actions=resubmit(,30)
table=30
cookie=0x170d7535048f9afd, table=30, priority=50 actions=goto_table:60
table=60
cookie=0x170d7535048f9e0d, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:74:20:c6 actions=set_field:0xd54dd->tun_id,set_field:0x7->reg7,goto_table:70
table=70
cookie=0x170d7535048f9bd5, table=70, priority=58000,tcp actions=ct(table=75,zone=NXM_NX_REG7[0..15])
table=75
cookie=0x170d7535048f9df5, table=75, priority=39819,ct_state=+new-est-rel-inv+trk,ip,reg7=0x6 actions=ct(commit,table=80,zone=NXM_NX_REG7[0..15])
table=80
cookie=0x170d7535048f9dfb, table=80, priority=1000,reg7=0x6 actions=set_field:0x64->reg9,goto_table:81
table=81
cookie=0x170d7535048f9b91, table=81, priority=100 actions=goto_table:85
table=85
cookie=0x170d7535048f9be1, table=85, priority=100 actions=goto_table:86
table=86
cookie=0x170d7535048f9b49, table=86, priority=100 actions=goto_table:90
table=90
cookie=0x170d7535048f9bef, table=90, priority=1000 actions=output:NXM_NX_REG7[]
场景 访问本地服务
访问 访问metadata接口
cookie=0x170b616612e8b709, table=10, priority=2000,tcp,nw_dst=169.254.169.254,tp_dst=8000 actions=goto_table:30
cookie=0x170b616612e8b781, table=45, priority=100,tcp,nw_dst=169.254.169.254,tp_dst=8000 actions=set_field:fa:16:3e:25:fd:7e->eth_dst,set_field:8111->tcp_dst,goto_table:46
cookie=0x170b616612e8b759, table=46, priority=2000,tcp,nw_dst=169.254.169.254,tp_dst=8111 actions=move:NXM_NX_REG6[]->NXM_OF_IP_SRC[],set_field:128.0.0.0/16->ip_src,output:1
返程
cookie=0x170b616612e8b859, priority=100,in_port=1 actions=goto_table:47
cookie=0x170b616612e8bd73, table=47, priority=100,tcp,nw_dst=128.0.0.2,tp_src=8111 actions=set_field:169.254.169.254->ip_src,set_field:xx.xx.100.5->ip_dst,set_field:8000->tcp_src,output:"port-17icfhnrgo"
场景 安全组 基于port不带状态安全组
table=70
cookie=0x170b616612e8bdc1, table=70, priority=39800,ip,reg7=0x4 actions=set_field:0x46->reg8,goto_table:200
> cookie=0x170b616612e8bdc5, table=70, priority=39800,ipv6,reg7=0x4 actions=set_field:0x46->reg8,goto_table:200
场景 基于port带状态安全组
cookie=0x170b616612e8becb, table=75, priority=39800,ct_state=+new-est-rel-inv+trk,ip,reg7=0xd actions=set_field:0x4b->reg8,goto_table:200
> cookie=0x170b616612e8bf11, table=75, priority=39800,ct_state=+new-est-rel-inv+trk,ipv6,reg7=0xd actions=set_field:0x4b->reg8,goto_table:200
场景 nat
cookie=0x170b616612e8bf69, table=44, priority=2000,ct_state=+new-est-rel-inv+trk,tcp,reg6=0x9,tp_dst=20048 actions=encap(tcp_option(tlv(254,0x0a156b03000047d1))),ct(commit,table=80,nat(src=xx.xx.9.207,random))
cookie=0x170b616612e8c0f1, table=44, priority=2000,ct_state=+new-est-rel-inv+trk,tcp6,reg6=0xa,tp_dst=20048 actions=encap(tcp_option(tlv(254,0x010000000007000000123d2100080041))),ct(commit,table=80,nat(src=240e:108:4:200:1:2:0:70f,random))
场景 流控
table=85, priority=1000,reg9=0x64 actions=meter:101,goto_table:86
table=86, priority=1000,reg9=0x64 actions=meter:102,goto_table:90
场景 ACL
table=170,priority=55533,icmp,metadata=0x2076370000000a,nw_src=10.2.2.11,nw_dst=10.2.1.11 actions=resubmit(,30)
table=170, priority=24535,icmp6,metadata=0x1cc23500000000 actions=set_field:0xaa->reg8,goto_table:200
访问场景实例
同subnet跨节点访问
拓扑
Node1(xx.xx.10.6) 访问 Node2(xx.xx.10.4)
Node1:
IP: xx.xx.10.6
MAC: fa:16:3e:17:4b:9d
代表口:port-xxxxxq2py2
vtep: xx.xx.40.67
Node2:
IP: xx.xx.10.4
MAC: fa:16:3e:46:09:25
代表口:port-yyyyyv66x7
vtep: xx.xx.40.70
流表Node1(发送)
arp处理
table=35
arp均采用代答的方式(后面不再分析arp),实现原理:修改sha, spa, tha ,tpa,arpop等实现
cookie=0x170a30c1320ce4af, table=35, priority=100,arp,metadata=0x47d100000000,arp_tpa=xx.xx.100.7,arp_op=1 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],set_field:fa:16:3e:0c:02:73->eth_src,set_field:2->arp_op,set_field:xx.xx.100.7->arp_spa,set_field:fa:16:3e:0c:02:73->arp_sha,IN_PORT
ip处理
table=0 流量入口,根据 入接口分流,设置相关寄存器;reg5是vni,reg6是入接口port number,reg9是vmid,metadata是tunidsubnetid
cookie=0x170a32cec8c3e4c1, priority=100,in_port="port-xxxxxq2py2" actions=set_field:0xd54dd->reg5,set_field:0x4->reg6,set_field:0x64->reg9,write_metadata:0xd54dd00000001,goto_table:1
table=1 ip报文都跳到限速处理
cookie=0x170a380bc44c5c23, table=1, priority=50,actions=goto_table:5_
table=5 Egress BPS限速,无限速规则不涉及
cookie=0x170a380bc44c5b6b, table=5, priority=100 actions=goto_table:6_
table=6 Egress PPS限速,无限速规则
cookie=0x170a380bc44c5b7f, table=6, priority=100 actions=goto_table:10_
table=10 Bind port and mac;reg6是port的ofport number,mac是host侧ip源ip
cookie=0x170a32cec8c3e501, table=10, priority=1000,ip,reg6=0x4,dl_src=fa:16:3e:17:4b:9d actions=goto_table:20
table=20 Egress Pre-CT; icmp报文进入到ct,zone由源端口ofport number区分
cookie=0x170a380bc44c5c9f, table=20, priority=58000,icmp actions=ct(table=25,zone=NXM_NX_REG6[0..15])
table=25 Egress匹配ct 状态,根据port号筛选zone,匹配ct状态:+new+trk;zone和状态正确,则commit确认ct状态
cookie=0x170a32cec8c3e4ed, table=25, priority=39799,ct_state=+new-est-rel-inv+trk,ip,reg6=0x4 actions=ct(commit,table=30,zone=NXM_NX_REG6[0..15])
table=30 是否访问本机服务,服务请查看30表全部流表,本次icmp不涉及
cookie=0x170a3983468be7c9, table=30, priority=50 actions=goto_table:60
table=60 根据reg5(vni)、目的mac匹配走哪个隧道封装,并设置出接口为vxlan1; 可以通过ovs-ofctl show br-int查到关系
cookie=0x170a3983468be931, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd->tun_id,set_field:xx.xx.40.70->tun_dst,set_field:0x2->reg7,goto_table:80
table=80 根据出接口,存下vmid,
cookie=0x170a3983468beaf9, table=80, priority=1000,reg7=0x4 actions=set_field:0x64->reg9,goto_table:81
table=81 svc probe, 不涉及,跳过
cookie=0x170a3983468be853, table=81, priority=100 actions=goto_table:85_
table=85 Ingress BPS, 不涉及
cookie=0x170a3983468be84b, table=85, priority=100 actions=goto_table:86
table=86 Ingress PPS, 不涉及
cookie=0x170a3983468be8db, table=86, priority=100 actions=goto_table:90
table=90 从出接口发出去, 本case是从vxlan口发出去
cookie=0x170a3983468be837, table=90, priority=1000 actions=output:NXM_NX_REG7[]
流表node2(接收)
table=0 跨节点接收,都是从vxlan口收到包
cookie=0x170a277782792425, priority=1000,in_port=vxlan1 actions=goto_table:50
table=50 匹配隧道,目的mac, 设置接收端的pipline里的寄存器,metadata是tunidsubnetid, reg=vni,
cookie=0x170a2777827925f9, table=50, priority=50,tun_id=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd00000001->metadata,set_field:0xd54dd->reg5,resubmit(,30)
table=30 是否访问本服务,不涉及
cookie=0x170a2777827923e7, table=30, priority=50 actions=goto_table:60
table=60 匹配tunid和mac,二层转发查询
cookie=0x170a277782792601, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd->tun_id,set_field:0x6->reg7,goto_table:70
table=70
cookie=0x170a277782792393, table=70, priority=58000,icmp actions=ct(table=75,zone=NXM_NX_REG7[0..15])
table=75 匹配ct状态,首包匹配+trk+new, 后续包匹配+trk+est
**首包,匹配+trk+new**
cookie=0x170a2777827925eb, duration=33902.506s, table=75, n_packets=9144, n_bytes=895680, idle_age=4, priority=39819,ct_state=+new-est-rel-inv+trk,ip,reg7=0x6 actions=ct(commit,table=80,zone=NXM_NX_REG7[0..15])**table=**
**后续包匹配+trk+est**
cookie=0x170a2777827923ed, table=75, priority=60000,ct_state=-new+est-rel-inv+trk actions=goto_table:80
table=80 匹配出接口,从某个口发出去,也就从ofport=6的接口发出去
cookie=0x170a27778279262d, table=80, priority=1000,reg7=0x6 actions=set_field:0x64->reg9,goto_table:81
table=81 probe svc,不涉及
cookie=0x170a2777827924a3, table=81, priority=100 actions=goto_table:85
table=85
cookie=0x170a277782792431, table=85, priority=100 actions=goto_table:86
table=86
cookie=0x170a27778279241d, table=86, priority=100 actions=goto_table:90
table=90
cookie=0x170a27778279239d, table=90, priority=1000 actions=output:NXM_NX_REG7[]
场景跨子网访问
拓扑
Node1(xx.xx.10.6) 访问 Node2(xx.xx.11.4)
Node1:
IP: xx.xx.10.6
MAC: fa:16:3e:17:4b:9d
下一跳:xx.xx.10.1(fa:16:3e:ec:22:0d)
代表口:port-xxxxxq2py2 (ens4的代表口)
vtep: xx.xx.40.67
Node2:
IP: xx.xx.11.4
MAC: fa:16:3e:74:20:c6
下一跳:xx.xx.11.1(fa:16:3e:c4:ed:57)
代表口:port-2zbgfw4f26
vtep: xx.xx.40.70
流表Node1(发送)
table=0 流量入口,根据 入接口分流,设置相关寄存器;reg5是vni,reg6是入接口port
cookie=0x170a48c6658cc733, priority=100,in_port="port-xxxxxq2py2" actions=set_field:0xd54dd->reg5,set_field:0x4->reg6,set_field:0x64->reg9,write_metadata:0xd54dd00000001,goto_table:1
table=1 ip报文都跳到限速处理
cookie=0x170a48c6658cc42d, table=1, priority=50 actions=goto_table:5
table=5 Egress BPS限速,无限速规则不涉及
cookie=0x170a48c6658cc435, table=5, priority=100 actions=goto_table:6
table=6 Egress PPS限速,无限速规则
cookie=0x170a48c6658cc4d5, table=6, priority=100 actions=goto_table:10
table=10 Bind port and mac;reg6是port的ofport number,mac是host侧ip源ip
cookie=0x170a48c6658cc74d, duration=1362.918s, table=10, n_packets=13516, n_bytes=1469823, idle_age=1, priority=1000,ip,reg6=0x4,dl_src=fa:16:3e:17:4b:9d actions=goto_table:20
table=20
cookie=0x170a48c6658cc4b5, table=20, priority=58000,icmp actions=ct(table=25,zone=NXM_NX_REG6[0..15])
table=25
**首包**
cookie=0x170a48c6658cc73b, table=25, priority=39799,ct_state=+new-est-rel-inv+trk,ip,reg6=0x4 actions=ct(commit,table=30,zone=NXM_NX_REG6[0..15])
**后续包**
cookie=0x170a48c6658cc4fb, table=25, priority=60000,ct_state=-new+est-rel-inv+trk actions=goto_table:30
table=30
cookie=0x170a48c6658cc4a5, table=30, priority=50 actions=goto_table:60
table=60
cookie=0x170a48c6658cc5fb, table=60, priority=100,metadata=0xd54dd00000001,dl_dst=fa:16:3e:ec:22:0d actions=goto_table:100
table=100 三层转发入口,目的ip不是本机,则到pre routing
cookie=0x170a48c6658cc4e1, table=100, priority=50,ip actions=goto_table:110
table=110 路由前,查ttl若为0或1则丢包,否则继续
cookie=0x170a4b2b174d5e1d, table=110, priority=100 actions=goto_table:120
table=120 匹配acl,没有规则,跳过
cookie=0x170a4b2b174d5e1f, table=120, priority=50 actions=goto_table:130
table=130 查目的网段是xx.xx.0.0/16,则去查精细路由
cookie=0x170a4b2b174d5f1d, duration=180.589s, table=130, n_packets=4860, n_bytes=733806, idle_age=1, priority=10016,ip,metadata=0xd54dd00000001,nw_dst=xx.xx.0.0/16 actions=goto_table:140
table=140 查精细路由,根据reg5筛选大二层,根据目的ip查到具体路由,通过修改大二层metadata, 修改目的ip的mac未目的mac(原先是网关mac),修改源mac为目的网段网关mac,ttl减1, 跳到postrouting
cookie=0x170a4b2b174d5ed3, table=140, priority=100,ip,reg5=0xd54dd,nw_dst=xx.xx.11.4 actions=set_field:0xd54dd00000002->metadata,set_field:fa:16:3e:74:20:c6->eth_dst,set_field:fa:16:3e:c4:ed:57->eth_src,dec_ttl,goto_table:160
table=160 postrouting没有动作,跳过
cookie=0x170a4b2b174d5d6b, table=160, priority=50 actions=resubmit(,170)
table=170 查完路由后,重新二层转发,也就是根据目的mac查找出接口
cookie=0x170a4b2b174d5db3, table=170, priority=50 actions=resubmit(,30)
table=30 不访问本地服务,直接查mac表
cookie=0x170a4c013de71d19, table=30, priority=50 actions=goto_table:60
table=60 根据大二层vni和目的mac(目的ip的实际mac),进行隧道封装,注意这里的reg7的赋值,他是出接口
cookie=0x170a4c013de71e81, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:74:20:c6 actions=set_field:0xd54dd->tun_id,set_field:xx.xx.40.70->tun_dst,set_field:0x2->reg7,goto_table:80
table=80
cookie=0x170a4c013de71d77, table=80, priority=1000,reg7=0x2 actions=output:vxlan1
流表 Node2(接收)
table=0
cookie=0x170d7535048f9acb, priority=1000,in_port=vxlan1 actions=goto_table:50
table=50 l3 lookup
cookie=0x170d7535048f9cc3, table=50, priority=50,tun_id=0xd54dd,dl_dst=fa:16:3e:c4:ed:57 actions=set_field:0xd54dd00000002->metadata,set_field:0xd54dd->reg5,goto_table:140
table=140 查询目的网关
ookie=0x170d7535048f9cd9, table=140, priority=100,ip,reg5=0xd54dd,nw_dst=xx.xx.11.4 actions=set_field:0xd54dd00000002->metadata,set_field:fa:16:3e:74:20:c6->eth_dst,set_field:fa:16:3e:c4:ed:57->eth_src,dec_ttl,goto_table:160
table=160 postrouting
cookie=0x170d7535048f9acf, table=160, priority=50 actions=resubmit(,170)
table=170 ingress acl
cookie=0x170d7535048f9b29, table=170, priority=50 actions=resubmit(,30)
table=30
cookie=0x170d7535048f9afd, table=30, priority=50 actions=goto_table:60
table=60
cookie=0x170d7535048f9e0d, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:74:20:c6 actions=set_field:0xd54dd->tun_id,set_field:0x7->reg7,goto_table:70
table=70
cookie=0x170d7535048f9bd5, table=70, priority=58000,tcp actions=ct(table=75,zone=NXM_NX_REG7[0..15])
table=75
cookie=0x170d7535048f9df5, table=75, priority=39819,ct_state=+new-est-rel-inv+trk,ip,reg7=0x6 actions=ct(commit,table=80,zone=NXM_NX_REG7[0..15])
table=80
cookie=0x170d7535048f9dfb, table=80, priority=1000,reg7=0x6 actions=set_field:0x64->reg9,goto_table:81
table=81
cookie=0x170d7535048f9b91, table=81, priority=100 actions=goto_table:85
table=85
cookie=0x170d7535048f9be1, table=85, priority=100 actions=goto_table:86
table=86
cookie=0x170d7535048f9b49, table=86, priority=100 actions=goto_table:90
table=90
cookie=0x170d7535048f9bef, table=90, priority=1000 actions=output:NXM_NX_REG7[]