openflow流表实践与分析

多网络访问场景流表梳理

目录

  1. 多网络访问场景设计概述
  2. 流表pipeline结构
  3. 关键字段说明
  4. 流表实现分析
  5. 访问场景列举
  6. 结合场景分析流表

- 多网络访问场景流表设计概述

多网络访问场景控制器是在云平台控制器上配置网络资源的时候,比如创建子网,创建网口,attach网口等等的操作,会被dpu上的云平台控制器响应去配置对应的流表。

 多网络访问场景流表设计类似linux内核网络模块,具备l2,l3层转发,访问本机服务的能力,且具备类似netfilter框架钩子,实现端口&&mac绑定、出入方向限速,安全组(支持带状态),ACL等。**也支持ipv6等协议**

- 流表pipeline结构

多网络访问场景流表设计的pipeline可以大致的分成:

phase1:根据不同源端口分别处理,vxlan类型和本地port类型

phase2:Egress 流控,包含BPS, PPS

phase3:port绑定检查

phase4:Egress 安全组(支持带状态)

phase5:访问本地特殊服务,如dhcp, arp欺骗havip,metadata svc等

phase6: 二层转发 到本地或走隧道转发出去

phase7: 三层转发 访问网关,查路由,类似路由器三层转发对包修改

phase8: 二层转发 查完路由后,换了vni和子网

phase9: Ingress 安全组

phase10: Ingress 流控,BPS,PPS

phase11: Output 最终的转发出接口


关键fileds说明

寄存器说明
  1. reg3: ctlabel
  2. reg5: tunid 大二层隧道id
  3. reg6: inport ofport num
  4. reg7: outport ofport num
  5. reg8: 未知
  6. reg9: vm id
  7. reg11: ctmark 可用来实现状态防火墙等
  8. metadata: vpc+subnet

其他寄存器暂且没有太看到

实际语法中,actions居多会用NXMNXREG和OXMOF前缀的寄存器字段和fields,他们只是不同厂商对openflow协议的扩展,比如reg5 等同于 NXMNXREG5, inport 等同于 OXMOFINPORT等等

ovs conntrack状态

ct状态需要说明的是最常用到的几种状态和ovs的实现。ovs通过match域匹配ctstate来匹配状态,通过ct动作里完成 相关的操作。 ctstate是通过bit位来标志的某个状态是否置位

    • trk 但凡进入到ct模块,就置位
    • new 进入CT后,查不到已有连接,就新建,与+trk一起置位
    • est 同一个方向来去方向都有包后,置该位,与-new互斥
    • rel 跟其他已存在的会话有关联,比如icmp unreachable,或ftp,iperf的控制会话与数据面传输会话
    • rpl 回程包,反向的回程包等
    • inv 无效的ct状态
ovs conntrack action
  1. table ct动作完成后,最后跳的目的table
  2. commit 对+trk的包匹配ctstate后,完成commit操作; 即将会话由unconfirm表放到到confirmtable
  3. zone ct的上下文环境,会话在zone之间是完全隔离的
  4. nat 做正向snat, dnat等, 也可做反向nat
  5. exec([action][,action…]) 执行对ct会话的一些修改,比如设置ctmark
  6. force 强制commit,重建会话

流表实现分析

vpc实现

采用vxlan overlay实现大二层,arp洪泛采用本地arp代答,通过metadata和reg5在流表中实现子网,vpc隔离等

L2层访问

大二层访问,对跨节点arp请求采用本地代答的方式(包括网关的mac请求),通过本地寻址的方式,判定出口是走隧道还是送往本节点代表口。目的mac是本节点,则送往对应的代表口。
如果目的mac不是本地,则根据对应的vtep和tunid封装后从vxlan口发出去。

L3层访问

三层访问,在二层查询后发现目的mac是子网网关的mac地址,然后预检查ttl是否该丢弃, 然后看是否有acl,没有则通过查询目的网段,匹配源vni, 然后修改其相关的tunid为新vni,修改源mac成新vni子网的网关mac,目的mac为目的ip的mac,重新跳到L2表寻址。 返程类似

流控

通过meters实例实现了出入方向BPS/PPS的限速。

安全组(支持带状态)

根据端口粒度实现不带状态的安全组,也支持利用ct状态实现带状态的

ACL

根据源,目,协议号等方式匹配,然后选择放行或拒绝

NAT

访问本机的一些服务,用到了nat。 ovs的nat是基于ct动作不同参数实现


主要的访问场景

  • 同vpc下同子网同宿主机
  • 同vpc下同子网跨宿主机
  • 同vpc下跨子网同宿主机
  • 同vpc下跨子网跨宿主机
  • 跨vpc三层访问

访问场景实例

同subnet跨节点访问

拓扑
Node1(10.23.10.6) 访问 Node2(10.23.10.4)
Node1:
IP: xx.xx.10.6
MAC: fa:16:3e:17:4b:9d
代表口:port-xxxxxq2py2
vtep: 10.24.40.67
Node2:
IP: xx.xx.10.4
MAC: fa:16:3e:46:09:25
代表口:port-yyyyyv66x7
vtep: xx.xx.40.70
流表Node1(发送)
arp处理

table=xx

arp均采用代答的方式(后面不再分析arp),实现原理:修改sha, spa, tha ,tpa,arpop等实现

cookie=0x170a30c1320ce4af, table=xx, priority=100,arp,metadata=0x47d100000000,arp_tpa=xx.xx.100.7,arp_op=1 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],set_field:fa:16:3e:0c:02:73->eth_src,set_field:2->arp_op,set_field:xx.xx.100.7->arp_spa,set_field:fa:16:3e:0c:02:73->arp_sha,IN_PORT
ip处理

table=0 流量入口,根据 入接口分流,设置相关寄存器;reg5是vni,reg6是入接口port number,reg9是vmid,metadata是tunidsubnetid

cookie=0x170a32cec8c3e4c1, priority=100,in_port="port-xxxxxq2py2" actions=set_field:0xd54dd->reg5,set_field:0x4->reg6,set_field:0x64->reg9,write_metadata:0xd54dd00000001,goto_table:1

table=1 ip报文都跳到限速处理

cookie=0x170a380bc44c5c23, table=1, priority=50,actions=goto_table:5_

table=5 Egress BPS限速,无限速规则不涉及

cookie=0x170a380bc44c5b6b, table=5, priority=100 actions=goto_table:6_

table=6 Egress PPS限速,无限速规则

cookie=0x170a380bc44c5b7f, table=6, priority=100 actions=goto_table:10_

table=10 Bind port and mac;reg6是port的ofport number,mac是host侧ip源ip

cookie=0x170a32cec8c3e501, table=10, priority=1000,ip,reg6=0x4,dl_src=fa:16:3e:17:4b:9d actions=goto_table:20

table=20 Egress Pre-CT; icmp报文进入到ct,zone由源端口ofport number区分

cookie=0x170a380bc44c5c9f, table=20, priority=58000,icmp actions=ct(table=25,zone=NXM_NX_REG6[0..15])

table=25 Egress匹配ct 状态,根据port号筛选zone,匹配ct状态:+new+trk;zone和状态正确,则commit确认ct状态

cookie=0x170a32cec8c3e4ed, table=25, priority=39799,ct_state=+new-est-rel-inv+trk,ip,reg6=0x4 actions=ct(commit,table=30,zone=NXM_NX_REG6[0..15])

table=30 是否访问本机服务,服务请查看30表全部流表,本次icmp不涉及

cookie=0x170a3983468be7c9, table=30, priority=50 actions=goto_table:60

table=60 根据reg5(vni)、目的mac匹配走哪个隧道封装,并设置出接口为vxlan1; 可以通过ovs-ofctl show br-int查到关系

cookie=0x170a3983468be931, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd->tun_id,set_field:xx.xx.40.70->tun_dst,set_field:0x2->reg7,goto_table:80

table=80 根据出接口,存下vmid,

cookie=0x170a3983468beaf9, table=80, priority=1000,reg7=0x4 actions=set_field:0x64->reg9,goto_table:81

table=81 svc probe, 不涉及,跳过

cookie=0x170a3983468be853, table=81, priority=100 actions=goto_table:85_

table=85 Ingress BPS, 不涉及

cookie=0x170a3983468be84b, table=85, priority=100 actions=goto_table:86

table=86 Ingress PPS, 不涉及

cookie=0x170a3983468be8db, table=86, priority=100 actions=goto_table:90

table=90 从出接口发出去, 本case是从vxlan口发出去

cookie=0x170a3983468be837, table=90, priority=1000 actions=output:NXM_NX_REG7[]
流表node2(接收)

table=0 跨节点接收,都是从vxlan口收到包

cookie=0x170a277782792425, priority=1000,in_port=vxlan1 actions=goto_table:50

table=50 匹配隧道,目的mac, 设置接收端的pipline里的寄存器,metadata是tunidsubnetid, reg=vni,

cookie=0x170a2777827925f9, table=50, priority=50,tun_id=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd00000001->metadata,set_field:0xd54dd->reg5,resubmit(,30)

table=30 是否访问本服务,不涉及

cookie=0x170a2777827923e7, table=30, priority=50 actions=goto_table:60

table=60 匹配tunid和mac,二层转发查询

cookie=0x170a277782792601, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd->tun_id,set_field:0x6->reg7,goto_table:70

table=70

cookie=0x170a277782792393, table=70, priority=58000,icmp actions=ct(table=75,zone=NXM_NX_REG7[0..15])

table=75 匹配ct状态,首包匹配+trk+new, 后续包匹配+trk+est

**首包,匹配+trk+new**
cookie=0x170a2777827925eb, duration=33902.506s, table=75, n_packets=9144, n_bytes=895680, idle_age=4, priority=39819,ct_state=+new-est-rel-inv+trk,ip,reg7=0x6 actions=ct(commit,table=80,zone=NXM_NX_REG7[0..15])**table=**

**后续包匹配+trk+est**
cookie=0x170a2777827923ed, table=75, priority=60000,ct_state=-new+est-rel-inv+trk actions=goto_table:80 

table=80 匹配出接口,从某个口发出去,也就从ofport=6的接口发出去

cookie=0x170a27778279262d, table=80, priority=1000,reg7=0x6 actions=set_field:0x64->reg9,goto_table:81

table=81 probe svc,不涉及

cookie=0x170a2777827924a3, table=81, priority=100 actions=goto_table:85

table=85

cookie=0x170a277782792431, table=85, priority=100 actions=goto_table:86

table=86

cookie=0x170a27778279241d, table=86, priority=100 actions=goto_table:90

table=90

cookie=0x170a27778279239d, table=90, priority=1000 actions=output:NXM_NX_REG7[]

场景跨子网访问

拓扑
Node1(xx.xx.10.6) 访问 Node2(xx.xx.11.4)
Node1:
IP: xx.xx.10.6
MAC: fa:16:3e:17:4b:9d
下一跳:xx.xx.10.1(fa:16:3e:ec:22:0d)
代表口:port-xxxxxq2py2 (ens4的代表口)
vtep: xx.xx.40.67
Node2:
IP: xx.xx.11.4
MAC: fa:16:3e:74:20:c6
下一跳:xx.xx.11.1(fa:16:3e:c4:ed:57)
代表口:port-2zbgfw4f26
vtep: xx.xx.40.70
流表Node1(发送)

table=0 流量入口,根据 入接口分流,设置相关寄存器;reg5是vni,reg6是入接口port

cookie=0x170a48c6658cc733, priority=100,in_port="port-xxxxxq2py2" actions=set_field:0xd54dd->reg5,set_field:0x4->reg6,set_field:0x64->reg9,write_metadata:0xd54dd00000001,goto_table:1

table=1 ip报文都跳到限速处理

cookie=0x170a48c6658cc42d, table=1, priority=50 actions=goto_table:5

table=5 Egress BPS限速,无限速规则不涉及

cookie=0x170a48c6658cc435, table=5, priority=100 actions=goto_table:6

table=6 Egress PPS限速,无限速规则

cookie=0x170a48c6658cc4d5, table=6, priority=100 actions=goto_table:10

table=10 Bind port and mac;reg6是port的ofport number,mac是host侧ip源ip

cookie=0x170a48c6658cc74d, duration=1362.918s, table=10, n_packets=13516, n_bytes=1469823, idle_age=1, priority=1000,ip,reg6=0x4,dl_src=fa:16:3e:17:4b:9d actions=goto_table:20

table=20

cookie=0x170a48c6658cc4b5, table=20, priority=58000,icmp actions=ct(table=25,zone=NXM_NX_REG6[0..15])

table=25

**首包**
cookie=0x170a48c6658cc73b, table=25, priority=39799,ct_state=+new-est-rel-inv+trk,ip,reg6=0x4 actions=ct(commit,table=30,zone=NXM_NX_REG6[0..15])

**后续包**
cookie=0x170a48c6658cc4fb, table=25, priority=60000,ct_state=-new+est-rel-inv+trk actions=goto_table:30

table=30

cookie=0x170a48c6658cc4a5, table=30, priority=50 actions=goto_table:60

table=60

cookie=0x170a48c6658cc5fb, table=60, priority=100,metadata=0xd54dd00000001,dl_dst=fa:16:3e:ec:22:0d actions=goto_table:100

table=100 三层转发入口,目的ip不是本机,则到pre routing

cookie=0x170a48c6658cc4e1, table=100, priority=50,ip actions=goto_table:110

table=110 路由前,查ttl若为0或1则丢包,否则继续

cookie=0x170a4b2b174d5e1d, table=110, priority=100 actions=goto_table:120

table=120 匹配acl,没有规则,跳过

cookie=0x170a4b2b174d5e1f, table=120, priority=50 actions=goto_table:130

table=130 查目的网段是xx.xx.0.0/16,则去查精细路由

cookie=0x170a4b2b174d5f1d, duration=180.589s, table=130, n_packets=4860, n_bytes=733806, idle_age=1, priority=10016,ip,metadata=0xd54dd00000001,nw_dst=xx.xx.0.0/16 actions=goto_table:140

table=140 查精细路由,根据reg5筛选大二层,根据目的ip查到具体路由,通过修改大二层metadata, 修改目的ip的mac未目的mac(原先是网关mac),修改源mac为目的网段网关mac,ttl减1, 跳到postrouting

cookie=0x170a4b2b174d5ed3, table=140, priority=100,ip,reg5=0xd54dd,nw_dst=xx.xx.11.4 actions=set_field:0xd54dd00000002->metadata,set_field:fa:16:3e:74:20:c6->eth_dst,set_field:fa:16:3e:c4:ed:57->eth_src,dec_ttl,goto_table:160

table=160 postrouting没有动作,跳过

cookie=0x170a4b2b174d5d6b, table=160, priority=50 actions=resubmit(,170)

table=170 查完路由后,重新二层转发,也就是根据目的mac查找出接口

cookie=0x170a4b2b174d5db3, table=170, priority=50 actions=resubmit(,30)

table=30 不访问本地服务,直接查mac表

cookie=0x170a4c013de71d19, table=30, priority=50 actions=goto_table:60

table=60 根据大二层vni和目的mac(目的ip的实际mac),进行隧道封装,注意这里的reg7的赋值,他是出接口

cookie=0x170a4c013de71e81, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:74:20:c6 actions=set_field:0xd54dd->tun_id,set_field:xx.xx.40.70->tun_dst,set_field:0x2->reg7,goto_table:80

table=80

cookie=0x170a4c013de71d77, table=80, priority=1000,reg7=0x2 actions=output:vxlan1
流表 Node2(接收)

table=0

cookie=0x170d7535048f9acb, priority=1000,in_port=vxlan1 actions=goto_table:50

table=50 l3 lookup

cookie=0x170d7535048f9cc3, table=50, priority=50,tun_id=0xd54dd,dl_dst=fa:16:3e:c4:ed:57 actions=set_field:0xd54dd00000002->metadata,set_field:0xd54dd->reg5,goto_table:140

table=140 查询目的网关

ookie=0x170d7535048f9cd9, table=140, priority=100,ip,reg5=0xd54dd,nw_dst=xx.xx.11.4 actions=set_field:0xd54dd00000002->metadata,set_field:fa:16:3e:74:20:c6->eth_dst,set_field:fa:16:3e:c4:ed:57->eth_src,dec_ttl,goto_table:160

table=160 postrouting

cookie=0x170d7535048f9acf, table=160, priority=50 actions=resubmit(,170)

table=170 ingress acl

cookie=0x170d7535048f9b29, table=170, priority=50 actions=resubmit(,30)

table=30

cookie=0x170d7535048f9afd, table=30, priority=50 actions=goto_table:60

table=60

 cookie=0x170d7535048f9e0d, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:74:20:c6 actions=set_field:0xd54dd->tun_id,set_field:0x7->reg7,goto_table:70

table=70

cookie=0x170d7535048f9bd5, table=70, priority=58000,tcp actions=ct(table=75,zone=NXM_NX_REG7[0..15])

table=75

cookie=0x170d7535048f9df5, table=75, priority=39819,ct_state=+new-est-rel-inv+trk,ip,reg7=0x6 actions=ct(commit,table=80,zone=NXM_NX_REG7[0..15])

table=80

cookie=0x170d7535048f9dfb, table=80, priority=1000,reg7=0x6 actions=set_field:0x64->reg9,goto_table:81

table=81

cookie=0x170d7535048f9b91, table=81, priority=100 actions=goto_table:85

table=85

cookie=0x170d7535048f9be1, table=85, priority=100 actions=goto_table:86

table=86

cookie=0x170d7535048f9b49, table=86, priority=100 actions=goto_table:90

table=90

cookie=0x170d7535048f9bef, table=90, priority=1000 actions=output:NXM_NX_REG7[]

场景 访问本地服务

访问 访问metadata接口

cookie=0x170b616612e8b709, table=10, priority=2000,tcp,nw_dst=169.254.169.254,tp_dst=8000 actions=goto_table:30
cookie=0x170b616612e8b781, table=45, priority=100,tcp,nw_dst=169.254.169.254,tp_dst=8000 actions=set_field:fa:16:3e:25:fd:7e->eth_dst,set_field:8111->tcp_dst,goto_table:46
cookie=0x170b616612e8b759, table=46, priority=2000,tcp,nw_dst=169.254.169.254,tp_dst=8111 actions=move:NXM_NX_REG6[]->NXM_OF_IP_SRC[],set_field:128.0.0.0/16->ip_src,output:1

返程

cookie=0x170b616612e8b859, priority=100,in_port=1 actions=goto_table:47
cookie=0x170b616612e8bd73, table=47, priority=100,tcp,nw_dst=128.0.0.2,tp_src=8111 actions=set_field:169.254.169.254->ip_src,set_field:xx.xx.100.5->ip_dst,set_field:8000->tcp_src,output:"port-17icfhnrgo"

场景 安全组 基于port不带状态安全组

table=70

cookie=0x170b616612e8bdc1, table=70, priority=39800,ip,reg7=0x4 actions=set_field:0x46->reg8,goto_table:200
>  cookie=0x170b616612e8bdc5, table=70, priority=39800,ipv6,reg7=0x4 actions=set_field:0x46->reg8,goto_table:200

场景 基于port带状态安全组

cookie=0x170b616612e8becb, table=75, priority=39800,ct_state=+new-est-rel-inv+trk,ip,reg7=0xd actions=set_field:0x4b->reg8,goto_table:200
>  cookie=0x170b616612e8bf11, table=75, priority=39800,ct_state=+new-est-rel-inv+trk,ipv6,reg7=0xd actions=set_field:0x4b->reg8,goto_table:200

场景 nat

cookie=0x170b616612e8bf69, table=44, priority=2000,ct_state=+new-est-rel-inv+trk,tcp,reg6=0x9,tp_dst=20048 actions=encap(tcp_option(tlv(254,0x0a156b03000047d1))),ct(commit,table=80,nat(src=xx.xx.9.207,random))
cookie=0x170b616612e8c0f1, table=44, priority=2000,ct_state=+new-est-rel-inv+trk,tcp6,reg6=0xa,tp_dst=20048 actions=encap(tcp_option(tlv(254,0x010000000007000000123d2100080041))),ct(commit,table=80,nat(src=240e:108:4:200:1:2:0:70f,random))

场景 流控

table=85, priority=1000,reg9=0x64 actions=meter:101,goto_table:86
table=86, priority=1000,reg9=0x64 actions=meter:102,goto_table:90

场景 ACL

table=170,priority=55533,icmp,metadata=0x2076370000000a,nw_src=10.2.2.11,nw_dst=10.2.1.11 actions=resubmit(,30)  
table=170, priority=24535,icmp6,metadata=0x1cc23500000000 actions=set_field:0xaa->reg8,goto_table:200

访问场景实例

同subnet跨节点访问

拓扑
Node1(xx.xx.10.6) 访问 Node2(xx.xx.10.4)
Node1:
IP: xx.xx.10.6
MAC: fa:16:3e:17:4b:9d
代表口:port-xxxxxq2py2
vtep: xx.xx.40.67
Node2:
IP: xx.xx.10.4
MAC: fa:16:3e:46:09:25
代表口:port-yyyyyv66x7
vtep: xx.xx.40.70
流表Node1(发送)
arp处理

table=35

arp均采用代答的方式(后面不再分析arp),实现原理:修改sha, spa, tha ,tpa,arpop等实现

cookie=0x170a30c1320ce4af, table=35, priority=100,arp,metadata=0x47d100000000,arp_tpa=xx.xx.100.7,arp_op=1 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],set_field:fa:16:3e:0c:02:73->eth_src,set_field:2->arp_op,set_field:xx.xx.100.7->arp_spa,set_field:fa:16:3e:0c:02:73->arp_sha,IN_PORT
ip处理

table=0 流量入口,根据 入接口分流,设置相关寄存器;reg5是vni,reg6是入接口port number,reg9是vmid,metadata是tunidsubnetid

cookie=0x170a32cec8c3e4c1, priority=100,in_port="port-xxxxxq2py2" actions=set_field:0xd54dd->reg5,set_field:0x4->reg6,set_field:0x64->reg9,write_metadata:0xd54dd00000001,goto_table:1

table=1 ip报文都跳到限速处理

cookie=0x170a380bc44c5c23, table=1, priority=50,actions=goto_table:5_

table=5 Egress BPS限速,无限速规则不涉及

cookie=0x170a380bc44c5b6b, table=5, priority=100 actions=goto_table:6_

table=6 Egress PPS限速,无限速规则

cookie=0x170a380bc44c5b7f, table=6, priority=100 actions=goto_table:10_

table=10 Bind port and mac;reg6是port的ofport number,mac是host侧ip源ip

cookie=0x170a32cec8c3e501, table=10, priority=1000,ip,reg6=0x4,dl_src=fa:16:3e:17:4b:9d actions=goto_table:20

table=20 Egress Pre-CT; icmp报文进入到ct,zone由源端口ofport number区分

cookie=0x170a380bc44c5c9f, table=20, priority=58000,icmp actions=ct(table=25,zone=NXM_NX_REG6[0..15])

table=25 Egress匹配ct 状态,根据port号筛选zone,匹配ct状态:+new+trk;zone和状态正确,则commit确认ct状态

cookie=0x170a32cec8c3e4ed, table=25, priority=39799,ct_state=+new-est-rel-inv+trk,ip,reg6=0x4 actions=ct(commit,table=30,zone=NXM_NX_REG6[0..15])

table=30 是否访问本机服务,服务请查看30表全部流表,本次icmp不涉及

cookie=0x170a3983468be7c9, table=30, priority=50 actions=goto_table:60

table=60 根据reg5(vni)、目的mac匹配走哪个隧道封装,并设置出接口为vxlan1; 可以通过ovs-ofctl show br-int查到关系

cookie=0x170a3983468be931, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd->tun_id,set_field:xx.xx.40.70->tun_dst,set_field:0x2->reg7,goto_table:80

table=80 根据出接口,存下vmid,

cookie=0x170a3983468beaf9, table=80, priority=1000,reg7=0x4 actions=set_field:0x64->reg9,goto_table:81

table=81 svc probe, 不涉及,跳过

cookie=0x170a3983468be853, table=81, priority=100 actions=goto_table:85_

table=85 Ingress BPS, 不涉及

cookie=0x170a3983468be84b, table=85, priority=100 actions=goto_table:86

table=86 Ingress PPS, 不涉及

cookie=0x170a3983468be8db, table=86, priority=100 actions=goto_table:90

table=90 从出接口发出去, 本case是从vxlan口发出去

cookie=0x170a3983468be837, table=90, priority=1000 actions=output:NXM_NX_REG7[]
流表node2(接收)

table=0 跨节点接收,都是从vxlan口收到包

cookie=0x170a277782792425, priority=1000,in_port=vxlan1 actions=goto_table:50

table=50 匹配隧道,目的mac, 设置接收端的pipline里的寄存器,metadata是tunidsubnetid, reg=vni,

cookie=0x170a2777827925f9, table=50, priority=50,tun_id=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd00000001->metadata,set_field:0xd54dd->reg5,resubmit(,30)

table=30 是否访问本服务,不涉及

cookie=0x170a2777827923e7, table=30, priority=50 actions=goto_table:60

table=60 匹配tunid和mac,二层转发查询

cookie=0x170a277782792601, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd->tun_id,set_field:0x6->reg7,goto_table:70

table=70

cookie=0x170a277782792393, table=70, priority=58000,icmp actions=ct(table=75,zone=NXM_NX_REG7[0..15])

table=75 匹配ct状态,首包匹配+trk+new, 后续包匹配+trk+est

**首包,匹配+trk+new**
cookie=0x170a2777827925eb, duration=33902.506s, table=75, n_packets=9144, n_bytes=895680, idle_age=4, priority=39819,ct_state=+new-est-rel-inv+trk,ip,reg7=0x6 actions=ct(commit,table=80,zone=NXM_NX_REG7[0..15])**table=**

**后续包匹配+trk+est**
cookie=0x170a2777827923ed, table=75, priority=60000,ct_state=-new+est-rel-inv+trk actions=goto_table:80 

table=80 匹配出接口,从某个口发出去,也就从ofport=6的接口发出去

cookie=0x170a27778279262d, table=80, priority=1000,reg7=0x6 actions=set_field:0x64->reg9,goto_table:81

table=81 probe svc,不涉及

cookie=0x170a2777827924a3, table=81, priority=100 actions=goto_table:85

table=85

cookie=0x170a277782792431, table=85, priority=100 actions=goto_table:86

table=86

cookie=0x170a27778279241d, table=86, priority=100 actions=goto_table:90

table=90

cookie=0x170a27778279239d, table=90, priority=1000 actions=output:NXM_NX_REG7[]

场景跨子网访问

拓扑
Node1(xx.xx.10.6) 访问 Node2(xx.xx.11.4)
Node1:
IP: xx.xx.10.6
MAC: fa:16:3e:17:4b:9d
下一跳:xx.xx.10.1(fa:16:3e:ec:22:0d)
代表口:port-xxxxxq2py2 (ens4的代表口)
vtep: xx.xx.40.67
Node2:
IP: xx.xx.11.4
MAC: fa:16:3e:74:20:c6
下一跳:xx.xx.11.1(fa:16:3e:c4:ed:57)
代表口:port-2zbgfw4f26
vtep: xx.xx.40.70
流表Node1(发送)

table=0 流量入口,根据 入接口分流,设置相关寄存器;reg5是vni,reg6是入接口port

cookie=0x170a48c6658cc733, priority=100,in_port="port-xxxxxq2py2" actions=set_field:0xd54dd->reg5,set_field:0x4->reg6,set_field:0x64->reg9,write_metadata:0xd54dd00000001,goto_table:1

table=1 ip报文都跳到限速处理

cookie=0x170a48c6658cc42d, table=1, priority=50 actions=goto_table:5

table=5 Egress BPS限速,无限速规则不涉及

cookie=0x170a48c6658cc435, table=5, priority=100 actions=goto_table:6

table=6 Egress PPS限速,无限速规则

cookie=0x170a48c6658cc4d5, table=6, priority=100 actions=goto_table:10

table=10 Bind port and mac;reg6是port的ofport number,mac是host侧ip源ip

cookie=0x170a48c6658cc74d, duration=1362.918s, table=10, n_packets=13516, n_bytes=1469823, idle_age=1, priority=1000,ip,reg6=0x4,dl_src=fa:16:3e:17:4b:9d actions=goto_table:20

table=20

cookie=0x170a48c6658cc4b5, table=20, priority=58000,icmp actions=ct(table=25,zone=NXM_NX_REG6[0..15])

table=25

**首包**
cookie=0x170a48c6658cc73b, table=25, priority=39799,ct_state=+new-est-rel-inv+trk,ip,reg6=0x4 actions=ct(commit,table=30,zone=NXM_NX_REG6[0..15])

**后续包**
cookie=0x170a48c6658cc4fb, table=25, priority=60000,ct_state=-new+est-rel-inv+trk actions=goto_table:30

table=30

cookie=0x170a48c6658cc4a5, table=30, priority=50 actions=goto_table:60

table=60

cookie=0x170a48c6658cc5fb, table=60, priority=100,metadata=0xd54dd00000001,dl_dst=fa:16:3e:ec:22:0d actions=goto_table:100

table=100 三层转发入口,目的ip不是本机,则到pre routing

cookie=0x170a48c6658cc4e1, table=100, priority=50,ip actions=goto_table:110

table=110 路由前,查ttl若为0或1则丢包,否则继续

cookie=0x170a4b2b174d5e1d, table=110, priority=100 actions=goto_table:120

table=120 匹配acl,没有规则,跳过

cookie=0x170a4b2b174d5e1f, table=120, priority=50 actions=goto_table:130

table=130 查目的网段是xx.xx.0.0/16,则去查精细路由

cookie=0x170a4b2b174d5f1d, duration=180.589s, table=130, n_packets=4860, n_bytes=733806, idle_age=1, priority=10016,ip,metadata=0xd54dd00000001,nw_dst=xx.xx.0.0/16 actions=goto_table:140

table=140 查精细路由,根据reg5筛选大二层,根据目的ip查到具体路由,通过修改大二层metadata, 修改目的ip的mac未目的mac(原先是网关mac),修改源mac为目的网段网关mac,ttl减1, 跳到postrouting

cookie=0x170a4b2b174d5ed3, table=140, priority=100,ip,reg5=0xd54dd,nw_dst=xx.xx.11.4 actions=set_field:0xd54dd00000002->metadata,set_field:fa:16:3e:74:20:c6->eth_dst,set_field:fa:16:3e:c4:ed:57->eth_src,dec_ttl,goto_table:160

table=160 postrouting没有动作,跳过

cookie=0x170a4b2b174d5d6b, table=160, priority=50 actions=resubmit(,170)

table=170 查完路由后,重新二层转发,也就是根据目的mac查找出接口

cookie=0x170a4b2b174d5db3, table=170, priority=50 actions=resubmit(,30)

table=30 不访问本地服务,直接查mac表

cookie=0x170a4c013de71d19, table=30, priority=50 actions=goto_table:60

table=60 根据大二层vni和目的mac(目的ip的实际mac),进行隧道封装,注意这里的reg7的赋值,他是出接口

cookie=0x170a4c013de71e81, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:74:20:c6 actions=set_field:0xd54dd->tun_id,set_field:xx.xx.40.70->tun_dst,set_field:0x2->reg7,goto_table:80

table=80

cookie=0x170a4c013de71d77, table=80, priority=1000,reg7=0x2 actions=output:vxlan1
流表 Node2(接收)

table=0

cookie=0x170d7535048f9acb, priority=1000,in_port=vxlan1 actions=goto_table:50

table=50 l3 lookup

cookie=0x170d7535048f9cc3, table=50, priority=50,tun_id=0xd54dd,dl_dst=fa:16:3e:c4:ed:57 actions=set_field:0xd54dd00000002->metadata,set_field:0xd54dd->reg5,goto_table:140

table=140 查询目的网关

ookie=0x170d7535048f9cd9, table=140, priority=100,ip,reg5=0xd54dd,nw_dst=xx.xx.11.4 actions=set_field:0xd54dd00000002->metadata,set_field:fa:16:3e:74:20:c6->eth_dst,set_field:fa:16:3e:c4:ed:57->eth_src,dec_ttl,goto_table:160

table=160 postrouting

cookie=0x170d7535048f9acf, table=160, priority=50 actions=resubmit(,170)

table=170 ingress acl

cookie=0x170d7535048f9b29, table=170, priority=50 actions=resubmit(,30)

table=30

cookie=0x170d7535048f9afd, table=30, priority=50 actions=goto_table:60

table=60

 cookie=0x170d7535048f9e0d, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:74:20:c6 actions=set_field:0xd54dd->tun_id,set_field:0x7->reg7,goto_table:70

table=70

cookie=0x170d7535048f9bd5, table=70, priority=58000,tcp actions=ct(table=75,zone=NXM_NX_REG7[0..15])

table=75

cookie=0x170d7535048f9df5, table=75, priority=39819,ct_state=+new-est-rel-inv+trk,ip,reg7=0x6 actions=ct(commit,table=80,zone=NXM_NX_REG7[0..15])

table=80

cookie=0x170d7535048f9dfb, table=80, priority=1000,reg7=0x6 actions=set_field:0x64->reg9,goto_table:81

table=81

cookie=0x170d7535048f9b91, table=81, priority=100 actions=goto_table:85

table=85

cookie=0x170d7535048f9be1, table=85, priority=100 actions=goto_table:86

table=86

cookie=0x170d7535048f9b49, table=86, priority=100 actions=goto_table:90

table=90

cookie=0x170d7535048f9bef, table=90, priority=1000 actions=output:NXM_NX_REG7[]
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容