FD - Io Intro - OpNFV FD - Io Day

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 29

fd.

io Intro
Ed Warnicke

fd.io Foundation 1
Evolution of Programmable Networking
CLOU • Many industries are transitioning to a more dynamic model to
deliver network services
D
• The great unsolved problem is how to deliver network services
in this more dynamic environment
SDN NFV
• Inordinate attention has been focused on the non-local
network control plane (controllers)
• Necessary, but insufficient

Programmable Data Plane • There is a giant gap in the capabilities that foster delivery of
dynamic Data Plane Services

fd.io Foundation 2
Introducing Fast Data: fd.io
fd.io Charter
• New project in Linux Foundation Create a Platform that enables Data Plane Services
• Multi-party that are:
• Multi-project Highly performant
Modular and extensible
• What does multi-party mean? Open source
• Multiple members - Open to all Interoperable
Multi-Vendor
• What does multi-project mean?
• Multiple subprojects Platform fosters innovation and synergistic
• Subproject autonomy interoperability between Data Plane Services
• Cross project synergy
• Open to new subprojects Source of Continuous Integration resources for Data
• Anyone can propose a subproject Plane services based on the Consortium’s
• Allows for innovation project/subprojects

Meet the functionality needs of developers, deployers,


datacenter operators
fd.io Foundation 3
Fast Data Scope
• Fast Data Scope: Bare Metal/VM/Container
• IO
• Hardware/vHardware <-> cores/threads Management Agent
• Processing
• Classify Processing
• Transform
IO
• Prioritize
• Forward
• Terminate
• Management Agents
• Control/manage IO/Processing

fd.io Foundation 4
Fd.io Members

fd.io Foundation 5
Legend:
Fd.io Projects - New Projects

- Core Projects

Management Agent

Testing/Performance/Support

deb_dpdk
Honeycomb

Packet Processing

NSH_SFC ONE VPP Sandbox TLDK

VPP

CSIT
Network IO

fd.io Foundation 6
Governance – At a Glance
Anyone May Participate – Not just members
 Anyone can contribute code
 Anyone can rise to being a committer via meritocracy
 Anyone can propose a subproject

Subprojects:
 Composed of the committers to that subproject – those who can merge code
 Responsible for sub project oversight and autonomous releases
 Make technical decisions for that subproject by consensus, or failing that, majority vote.

Technical Steering Committee


 Fosters collaboration among subprojects, but is not involved in day to day management of sub-projects
 Approves new subprojects, sets development process guidelines for the community, sets release guidelines for multi-project or
simultaneous releases, etc.
 Initial TSC will be seeded with representatives from Platinum Membership and core project PTLs with the goal of replacing
representatives with Project Leads after the first year

Governing Board will Oversee Business Decision Making


 Set Scope and Policy of Consortium
 Composed of Platinum member appointees, elected Gold, Silver, and Committer member representatives
 Examples of business needs include: budgeting, planning for large meetings (e.g. a Summit, Hackfest), marketing, websites,
developer infrastructure, test infrastructure, etc.

fd.io Foundation 7
VPP:
Vector Packet Processing

fd.io Foundation 8
Introducing Vector Packet Processor - VPP
• VPP is a rapid packet processing development platform for highly
performing network applications.

Bare Metal/VM/Container • It runs on commodity CPUs and leverages DPDK


Data Plane Management Agent
• It creates a vector of packet indices and processes them using a
Packet Processing
directed graph of nodes – resulting in a highly performant solution.
Network IO
• Runs as a Linux user-space application

• Ships as part of both embedded & server products, in volume

• Active development since 2002

fd.io Foundation 9
Packet vector
VPP Architecture -
Modularity Enabling Flexible Plugins …

Plugins == Subprojects
ethernet-input

Plug-in to
Plugins can: enable new HW
input Nodes
• Introduce new graph nodes
• Rearrange packet processing graph mpls-ethernet-input
ip4input llc-input
• Can be built independently of VPP source tree ip6-input
arp-input
• Can be added at runtime (drop into plugin
directory) …
• All in user space
ip6-lookup
Enabling: Plug-in to create new nodes
• Ability to take advantage of diverse hardware
when present
• Support for multiple processor architectures (x86, ip6-rewrite-transmit
ip6-local
ARM, PPC) Custom-A Custom-B
• Few dependencies on the OS (clib) allowing
easier ports to other Oses/Env
VPP Feature Summary
IPv4/IPv6 IPv4 L2
14+ MPPS, single core GRE, MPLS-GRE, NSH-GRE,
Multimillion entry FIBs VXLAN VLAN Support
Source RPF IPSEC Single/ Double tag
Thousands of VRFs DHCP client/proxy L2 forwarding with
Controlled cross-VRF CG NAT EFP/BridgeDomain concepts
lookups VTR – push/pop/Translate
Multipath – ECMP and Unequal IPv6 (1:1,1:2, 2:1,2:2)
Cost Mac Learning – default limit of
Neighbor discovery
Multiple million Classifiers – 50k addresses
Router Advertisement
Arbitrary N-tuple Bridging – Split-horizon group
DHCPv6 Proxy
VLAN Support – Single/Double support/EFP Filtering
L2TPv3
tag Proxy Arp
Segment Routing
Counters for everything Arp termination
MAP/LW46 – IPv4aas
Mandatory Input Checks: IRB – BVI Support with
iOAM
TTL expiration RouterMac assignment
header checksum MPLS Flooding
L2 length < IP length Input ACLs
ARP resolution/snooping MPLS-o-Ethernet – Interface cross-connect
ARP proxy Deep label stacks
supported

fd.io Foundation 11
Code Activity
• Fd.io has more code activities (commits, contributors) than other
dataplane project (data from openhub)

Fd.io ovs dpdk

fd.io Foundation 12
Contributor/Committer Diversity

Universitat Politècnica de Catalunya (UPC)

fd.io Foundation 13
VPP 16.06 Release
• Released 2016-06-17
• Enhanced Switching & Routing • New and improved interface support
• jumbo frame support for vhost-user
• IPv6 Segment Routing multicast
• Netmap interface support
support
• AF_Packet interface support
• LISP xTR support
• Expanded and improved programmability
• VXLAN over IPv6 underlay • Python API bindings
• per interface whitelists • Enhanced JVPP Java API bindings
• shared adjacencies in FIB • Enhanced debugging cli
• Expanded Hardware and Software Support
• Support for ARM 32 targets
• Support for Raspberry Pi
• Support for DPDK 16.04

fd.io Foundation 14
NDR rates for 2p10GE, 1 core, L2 NIC-to-NIC
[IMIX Gbps]
20
18
16
14
12
10
8
6
4
2
0 VPP

2 MACs OVSDPDK
2k MACs
20k MACs

NDR rates for 12 port 10GE, 12 cores, IPv4

[IMIX Gbps]
VPP technology in a nutshell 120.0

100.0
• VPP data plane throughput not impacted by large FIB size
80.0
• OVSDPDK data plane throughput heavily impacted by FIB size
60.0
• VPP and OVSDPDK tested on Haswell x86 platform with E5-2698v3 40.0
2x16C 2.3GHz (Ubuntu 14.04 trusty)
20.0 not tested
0.0
12 routes VPP
1k routes
100k OVSDPDK
routes 500k
routes 1M routes
2M routes
VNET-SLA BENCHMARKING AT SCALE: IPV6 Phy-VS-Phy

VPP-based vSwitch
Zero-packet-loss Throughput for 12 port 40GE, 24 cores, IPv6 Zero-packet-loss Throughput for 12 port 40GE, 24 cores, IPv6

[Gbps]
500.0 [Mpps]
250.0
450.0
400.0 200.0
350.0
300.0 150.0
250.0
200.0 100.0
150.0
100.0 50.0
50.0
0.0 1518B 0.0 1518B
12 routes IMIX 12 routes IMIX
1k routes 1k routes
100k routes 100k routes
500k routes 64B 500k routes 64B
1M routes 1M routes
2M routes 2M routes

• FD.io VPP data plane throughput not impacted by large size of IPv6 FIB VPP vSwitch IPv4 routed forwarding VPP vSwitch IPv4 routed forwarding
FIB with 2 milion IPv6 entries FIB with 2 milion IPv6 entries
• VPP tested on UCS 4-CPU-socket server with 4 of Intel “Haswell" x86-64 12x40GE (480GE) 64B frames
processors E7-8890v3 18C 2.5GHz 12x40GE (480GE) IMIX frames
200Mpps zero frame loss
• 24 Cores used – Another 48 cores can be used for other network services! 480Gbps zero frame loss
NIC and PCIe is the limit not VPP
“Sky” is the limit not VPP
VNET BENCHMARKING AT SCALE: IPV4+SECURITY
Zero-Packet-Loss Throughput for 18 port 40GE, 36 cores, IPv4
500 300
[Gbps] [Mpps]
450
250
400

350
200
300

250 150
200

150 100

100
50
50

0 1518B
0 1518B
1k routes IMIX
500k routes 1k routes IMIX
1M routes 500k routes
2M routes 64B 1M routes
That is Right 4M
– routes
No Impact
8M routes on IMIX and 1518B Performance
2M routes
4M routes
8M routes
64B

 FD.io VPP data plane throughput not impacted by large size of IPv4 FIB VPP vSwitch IPv4 rtd fwding, FIB up to 8M IPv4, 2k white-list
Zero Packet Loss Measurements
 VPP tested on UCS 4-CPU server with 4x Intel E7-8890v3 (18C 2.5GHz)
IMIX => 342 Gbps
 36 Core used – NIC RSS=2 to drive NIC performance, VPP cores not busy!
64B => 238 Mpps 1518B => 462 Gbps
 Another 36 cores available for other services!
NIC and PCIe is the limit not VPP “Sky” is the limit not VPP
VPP Cores Not Completely Busy And How Do We Know This?
VPP Vectors Have Space For More Services and More Packets!! Simples – A Well Engineered Telemetry
PCIe 3.0 and NICs Are The Limit In Linux and VPP Tells Us So

========
TC5    120ge.vpp.24t24pc.ip4.cop
TC5.0    120ge.2pnic.6nic.rss2.vpp.24t24pc.ip4.cop
d. testcase-vpp-ip4-cop-scale
        120ge.2pnic.6nic.rss2.vpp.24t24pc.ip4.2m.cop.2.copip4dst.2k.match.100
            64B, 138.000Mpps, 92,736Gbps
            IMIX, 40.124832Mpps, 120.000Gbps
            1518, 9.752925Mpps, 120.000Gbps
            ---------------
            Thread 1 vpp_wk_0 (lcore 2)
            Time 45.1, average vectors/node 23.44, last 128 main loops 1.44 per node 23.00
              vector rates in 4.6791e6, out 4.6791e6, drop 0.0000e0, punt 0.0000e0
                         Name                 State         Calls          Vectors        Suspends         Clocks       Vectors/Call
            TenGigabitEtherneta/0/1-output   active            9003498       211054648               0          1.63e1           23.44
            TenGigabitEtherneta/0/1-tx       active            9003498       211054648               0          7.94e1           23.44
            cop-input                        active            9003498       211054648               0          2.23e1           23.44
            dpdk-input                       polling          45658750       211054648               0          1.52e2            4.62
            ip4-cop-whitelist                active            9003498       211054648               0          4.34e1           23.44
            ip4-input                        active            9003498       211054648               0          4.98e1           23.44
            ip4-lookup                       active            9003498       211054648               0          6.25e1           23.44
            ip4-rewrite-transit              active            9003498       211054648               0          3.43e1           23.44
            ---------------
            Thread 24 vpp_wk_23 (lcore 29)
            Time 45.1, average vectors/node 27.04, last 128 main loops 1.75 per node 28.00
              vector rates in 4.6791e6, out 4.6791e6, drop 0.0000e0, punt 0.0000e0
                         Name                 State         Calls          Vectors        Suspends         Clocks       Vectors/Call
            TenGigabitEthernet88/0/0-outpu   active            7805705       211055503               0          1.54e1           27.04
            TenGigabitEthernet88/0/0-tx      active            7805705       211055503               0          7.75e1           27.04
            cop-input                        active            7805705       211055503               0          2.12e1           27.04
            dpdk-input                       polling          46628961       211055503               0          1.60e2            4.53
            ip4-cop-whitelist                active            7805705       211055503               0          4.35e1           27.04
            ip4-input                        active            7805705       211055503               0          4.86e1           27.04
            ip4-lookup                       active            7805705       211055503               0          6.02e1           27.04
            ip4-rewrite-transit              active            7805705       211055503               0          3.36e1           27.04
VPP Cores Not Completely Busy And How Do We Know This?
VPP Vectors Have Space For More Services and More Packets!! Simple – A Well Engineered Telemetry
PCIe 3.0 and NICs Are The Limit In Linux and VPP Tells Us So

========
TC5    120ge.vpp.24t24pc.ip4.cop
VPP average vector size below shows 23-to-27
TC5.0    120ge.2pnic.6nic.rss2.vpp.24t24pc.ip4.cop This indicates VPP program worker threads are not busy
d. testcase-vpp-ip4-cop-scale
        120ge.2pnic.6nic.rss2.vpp.24t24pc.ip4.2m.cop.2.copip4dst.2k.match.100 Busy VPP worker threads should be showing 255
            64B, 138.000Mpps, 92,736Gbps This means that VPP worker threads operate at 10% capacity
            IMIX, 40.124832Mpps, 120.000Gbps
            1518, 9.752925Mpps, 120.000Gbps
            ---------------
            Thread 1 vpp_wk_0 (lcore 2)
It’s like driving 1,000hp car at 100hp power – lots of space
            Time 45.1, average vectors/node 23.44, last 128 main loops 1.44 per for
nodeadding
23.00 (service) acceleration and (sevice) speed.
              vector rates in 4.6791e6, out 4.6791e6, drop 0.0000e0, punt 0.0000e0
                         Name                 State         Calls          Vectors        Suspends         Clocks       Vectors/Call
            TenGigabitEtherneta/0/1-output   active            9003498       211054648               0          1.63e1           23.44
            TenGigabitEtherneta/0/1-tx       active            9003498       211054648               0          7.94e1           23.44
            cop-input                        active            9003498       211054648               0          2.23e1           23.44
            dpdk-input                       polling          45658750       211054648               0          1.52e2            4.62
            ip4-cop-whitelist                active            9003498       211054648               0          4.34e1           23.44
            ip4-input                        active            9003498       211054648               0          4.98e1           23.44
            ip4-lookup                       active            9003498       211054648               0          6.25e1           23.44
            ip4-rewrite-transit              active            9003498       211054648               0          3.43e1           23.44
            ---------------
            Thread 24 vpp_wk_23 (lcore 29)
            Time 45.1, average vectors/node 27.04, last 128 main loops 1.75 per node 28.00
              vector rates in 4.6791e6, out 4.6791e6, drop 0.0000e0, punt 0.0000e0
                         Name                 State         Calls          Vectors        Suspends         Clocks       Vectors/Call
            TenGigabitEthernet88/0/0-outpu   active            7805705       211055503               0          1.54e1           27.04
            TenGigabitEthernet88/0/0-tx      active            7805705       211055503               0          7.75e1           27.04
            cop-input                        active            7805705       211055503               0          2.12e1           27.04
            dpdk-input                       polling          46628961       211055503               0          1.60e2            4.53
            ip4-cop-whitelist                active            7805705       211055503               0          4.35e1           27.04
            ip4-input                        active            7805705       211055503               0          4.86e1           27.04
            ip4-lookup                       active            7805705       211055503               0          6.02e1           27.04
            ip4-rewrite-transit              active            7805705       211055503               0          3.36e1           27.04
VPP Cores Not Completely Busy And How Do We Know This?
VPP Vectors Have Space For More Services and More Packets!! Simples – A Well Engineered Telemetry
PCIe 3.0 and NICs Are The Limit In Linux and VPP Tells Us So

========
TC5    120ge.vpp.24t24pc.ip4.cop
VPP average vector size below shows 23-to-27
TC5.0    120ge.2pnic.6nic.rss2.vpp.24t24pc.ip4.cop This indicates VPP program worker threads are not busy
d. testcase-vpp-ip4-cop-scale
        120ge.2pnic.6nic.rss2.vpp.24t24pc.ip4.2m.cop.2.copip4dst.2k.match.100 Busy VPP worker threads should be showing 255
            64B, 138.000Mpps, 92,736Gbps This means that VPP worker threads operate at 10% capacity
            IMIX, 40.124832Mpps, 120.000Gbps
            1518, 9.752925Mpps, 120.000Gbps
            ---------------
            Thread 1 vpp_wk_0 (lcore 2)
It’s like driving 1,000bhp car at 100bhp power – lots of space
            Time 45.1, average vectors/node 23.44, last 128 main loops 1.44 per fornodeadding
23.00 (service) acceleration and (sevice) speed.
              vector rates in 4.6791e6, out 4.6791e6, drop 0.0000e0, punt 0.0000e0
                         Name                 State         Calls          Vectors        Suspends         Clocks       Vectors/Call
            TenGigabitEtherneta/0/1-output   active            9003498       211054648               0          1.63e1           23.44
VPP  is  also
            TenGigabitEtherneta/0/1-tx       active      counting
 9003498 the
    cycles-per-packet
  211054648        (CPP)
      0          7.94e1           23.44
            cop-input                        active            9003498       211054648               0          2.23e1           23.44
We know exactly what feature, service, packet processing activity is using the CPU cores
            dpdk-input                       polling          45658750       211054648               0          1.52e2            4.62
We can
            ip4-cop-whitelist                active    engineer, we  can
     9003498    capacity
211054648 plan, we  can
           automate
0         service
 4.34e1 placement
          23.44
            ip4-input                        active            9003498       211054648               0          4.98e1           23.44
            ip4-lookup                       active            9003498       211054648               0          6.25e1           23.44
We can
            ip4-rewrite-transit              active    scale across many
     9003498       many CPU  cores
211054648       and
     computers
0          3.43e1           23.44
            ---------------
            Thread 24 vpp_wk_23 (lcore 29) And AUTOMATE it easily – as it is after all just SOFTWARE
            Time 45.1, average vectors/node 27.04, last 128 main loops 1.75 per node 28.00
              vector rates in 4.6791e6, out 4.6791e6, drop 0.0000e0, punt 0.0000e0
                         Name                 State         Calls          Vectors        Suspends         Clocks       Vectors/Call
            TenGigabitEthernet88/0/0-outpu   active            7805705       211055503               0          1.54e1           27.04
            TenGigabitEthernet88/0/0-tx      active            7805705       211055503               0          7.75e1           27.04
            cop-input                        active            7805705       211055503               0          2.12e1           27.04
            dpdk-input                       polling          46628961       211055503               0          1.60e2            4.53
            ip4-cop-whitelist                active            7805705       211055503               0          4.35e1           27.04
            ip4-input                        active            7805705       211055503               0          4.86e1           27.04
            ip4-lookup                       active            7805705       211055503               0          6.02e1           27.04
            ip4-rewrite-transit              active            7805705       211055503               0          3.36e1           27.04
ONE MORE THING – THE LONG TERM MAX DELAY
The Fast Data
 Low long-term max packet delay with FD.io VPP Project (FD.io) The Soak Test Proof:
 >>120 msec long-term max packet delay measured by 18 x 7.7trillion packets forwarded.
others for other vSwitches
 But it is just not not there with VPP and stock Ubuntu Max Packet Delay <3.5 msec incl. the outliers!!
14.04 (no Linux tuning!)

Compute Node Version


Min Packet Delay 7..10 usec, Avg Packet Delay <23 usec.
Software
Host Operating Ubuntu 14.04.3 LTS
System Kernel version: 3.13.0-63-generic
DPDK DPDK 2.2.0
FD.io VPP vpp v1.0.0-174~g57a90e5

Compute Node Cisco UCS C460 M4


Hardware
Avg Min Max
Chipset Intel® C610 series chipset
Delay Delay Delay
CPU 4 x Intel® Xeon® Processor E7-8890 v3 (18
cores, 2.5GHz, 45MB Cache)

Memory 2133 MHz, 512 GB Total


NICs 9 x 2p40GE Intel XL710
18 x 40GE = 720GE !!
Implementation Example: VPP as a vRouter/vSwitch
Out of the box vSwitch/vRouter
• Including CLI Linux Host

Switching
Can Create
• Bridge Domains VPP App
• Ports (including tunnel ports) Switch-1 VRF-1
• Connect ports to bridge domains
Switch-2 VRF-2
• Program ARP termination
• etc

Routing DPDK
Can Create
• VRFs - thousands
Kernel
• Routes - millions

fd.io Foundation 22
VPP vRouter/vSwitch: Local Programmability
Low Level API
• Complete
• Feature Rich
Linux Host
• High Performance
• Example: 900k routes/s
• Shared memory/message queue
• Box local External App VPP App
• All CLI tasks can be done via API

Generated Low Level Bindings - existing


today
• C clients
DPDK
• Java clients
• Others can be done Kernel

fd.io Foundation 23
VPP vRouter/vSwitch: Remote
Programmability
High Level API: An approach netconf/yang REST Other (BGP)
• Data Plane Management Agent
• Speaks low level API to VPP
• Box (or VM or container) local Linux Host
• Exposes higher level API via some
binding
Data Plane VPP App
Flexibility: Management
• VPP does not force a particular Data Agent
Plane Management Agent
• VPP does not force only *one* High
Level API
• Anybody can bring a Data Plane DPDK
Management Agent
• High Level API/Data Plane Management Kernel
Agent
• Match VPP app needs

fd.io Foundation 24
Broader
Ecosystem
fd.io Foundation 25
OpenDaylight Virtual Bridge Domain
• Working on installer changes needed
to add ODL & VPP as the networking
stack for Openstack
Control Plane

• Provides a programmable interface to


VBD app
include ‘edge’ network programming
into a virtual environment
• Allows for multiple data plane
forwarders to be active
Data Plane

HC

OVS VPP
• Integration happening in OpNFV “Fast
Data Stacks” project

fd.io Foundation 26
OpNFV FDS: Integrating
OpenStack/ODL/fdio
• Working on installer changes needed
Openstack
to add ODL & VPP as the networking
Neutron stack for Openstack
ODL
Plugin • Provides a programmable interface to
include ‘edge’ network programming
Control
Plane

into a virtual environment


• Allows for multiple data plane
forwarders to be active
Data Plane

HC

OVS VPP
• Integration happening in OpNFV “Fast
Data Stacks” project

fd.io Foundation 27
Next Steps – Get Involved
We invite you to Participate in fd.io

• Get the Code, Build the Code, Run the Code


• Read/Watch the Tutorials
• Join the Mailing Lists
• Join the IRC Channels
• Explore the wiki
• Join fd.io as a member

Come to fd.io breakout session at 5pm in Bellevue

fd.io Foundation 28
fd.io Foundation 29

You might also like