BRKDCT 2081

Download as pdf or txt
Download as pdf or txt
You are on page 1of 97

Cisco FabricPath

Technology and Design


Elyor Khakimov,
Technical Marketing Engineer
BRKDCT-2081
Session Abstract
“This session provides an introduction to Cisco's FabricPath technology, which
enables simplified high-performance L2 switching fabrics. Topics include a review
of FabricPath technology benefits, implementation details, and use-case/design
options and considerations.”
Session Goal
• To provide you with a conceptual and technical understanding of Cisco
FabricPath, including control-plane functions, data-plane forwarding model, and
typical network designs

• Related Session:
- BRKDCT-3313 – FabricPath Operation and Troubleshooting
Agenda
Introduction to FabricPath
• FabricPath Forwarding
• FabricPath Designs
• Key Takeaways
Introduction to FabricPath
Why Layer 2 in the Data Center?
• Provides “plug-and-play” setup
• Certain protocols / applications require it
• Allows virtual machine / workload mobility
Typical Data Center Design

L3
L2

POD A POD B POD C

Layer 2 benefits limited to a POD


Possible Solution for End-to-End L2?

L3
L2

STP

Just extend STP to the whole network (!?)


Limitations of Traditional Layer 2
• Local problems have network-wide impact L3
L2
• Tree topology provides limited bandwidth
• Tree topology introduces sub-optimal paths
• MAC address tables don’t scale
STP
Cisco FabricPath Goal

Switching Routing
 Easy Configuration  Stable and Scalable
 Plug-and-Play  Multipathing (ECMP)
 Flexible Provisioning  Fast Convergence

FabricPath

FabricPath combines benefits of Layer 3


routing with simplicity of Layer 2 switching
FabricPath
Why FabricPath?
• Reduction / elimination of Spanning-Tree
Protocol (STP) FabricPath

• Better stability and convergence


characteristics
• Simplified configuration
• Leverage parallel paths
• Deterministic throughput and latency using typical designs
• “VLAN anywhere” – flexibility, L2 adjacency, and VM mobility
Agenda
• Introduction to FabricPath
FabricPath Forwarding
• FabricPath Designs
• Key Takeaways
FabricPath Forwarding
FabricPath Forwarding – Control Plane
Key FabricPath control plane elements:
• Routing table – FabricPath IS-IS learns switch IDs (SIDs) and builds routing
table
• Multidestination trees – FabricPath IS-IS elects roots and builds
multidestination forwarding trees
• Mroute table – IGMP snooping learns group membership at the edge,
FabricPath IS-IS floods group-membership LSPs (GM-LSPs) into the fabric
FAQ: Is This MAC-Based Routing?
• NO!
• Routing information consists of Switch IDs
• Forwarding in fabric based on Switch IDs, not MAC addresses
FabricPath Routing Table
• Contains shortest path(s) to each SID, based on link metrics / path cost
• Equal-cost multipath (ECMP) supported on up to 16 next-hop interfaces

S10 S20 S30 S40

FabricPath
Routing Table on S100
Switch IF
One ‘best’ path S10 L1
to S10 (via L1) S20 L2 L1
L2
S30 L3
S100
L3
S200
FabricPath S300
S40 L4 L4

S200 L1, L2, L3, L4


… …
Four equal-cost
S300 L1, L2, L3, L4
paths to S300
FabricPath Routing Table
S100# sh fabricpath route
FabricPath Unicast Route Table
'a/b/c' denotes ftag/switch-id/subswitch-id
'[x/y]' denotes [admin distance/metric]
ftag 0 is local ftag
subswitch-id 0 is default subswitch-id Topology (Ftag),
Switch ID

FabricPath Unicast Route Table for Topology-Default


Administrative distance,
0/100/0, number of next-hops: 0
routing metric
via ---- , [60/0], 0 day/s 04:43:51, local
1/10/0, number of next-hops: 1
via Po10, [115/20], 0 day/s 02:24:02, isis_fabricpath-default Route age
1/20/0, number of next-hops: 1
via Po20, [115/20], 0 day/s 04:43:25, isis_fabricpath-default
1/30/0, number of next-hops: 1 Client protocol
via Po30, [115/20], 0 day/s 04:43:25, isis_fabricpath-default
1/40/0, number of next-hops: 1
via Po40, [115/20], 0 day/s 04:43:25, isis_fabricpath-default Next-hop interface(s)
1/200/0, number of next-hops: 4
via Po10, [115/40], 0 day/s 02:24:02, isis_fabricpath-default
via Po20, [115/40], 0 day/s 04:43:06, isis_fabricpath-default
S10 S20 S30 S40
via Po30, [115/40], 0 day/s 04:43:06, isis_fabricpath-default
via Po40, [115/40], 0 day/s 04:43:06, isis_fabricpath-default
1/300/0, number of next-hops: 4
via Po10, [115/40], 0 day/s 02:24:02, isis_fabricpath-default
via Po20, [115/40], 0 day/s 04:43:25, isis_fabricpath-default po10 FabricPath
via Po30, [115/40], 0 day/s 04:43:25, isis_fabricpath-default po20
po30
via Po40, [115/40], 0 day/s 04:43:25, isis_fabricpath-default
po40
S100# S100 S200 S300

A B C
FAQ: How Are ECMP Load-Sharing Decisions
Made?
• ECMP path chosen based on hash function
• Hash uses SIP/DIP + L4 + VLAN by default
• Use show fabricpath load-balance unicast to determine ECMP path for a
given packet
FabricPath Multidestination Trees
S10 S20 S30 S40
Root for Root for • Multidestination traffic constrained to tree topology
Tree 1 Tree 2
- Network-wide identifier (Ftag) assigned to each tree

• Support for multiple trees provides multipathing for


multidestination traffic
- Two trees per topology supported today

FabricPath • Root switch elected for each multidestination tree in


S100 S200 S300 each FabricPath topology

S100 S20 S100 S10

S10 S200 S30 S40 S200 S20

Root S300 S40 Root S300 S30


Logical Logical
Tree 1 (Ftag 1) Tree 2 (Ftag 2)
Multidestination Root Selection
• FabricPath network elects a primary root switch for the first multidestination tree
in the topology
• Switch with highest priority value becomes root for the tree
– Tie break: root priority → highest system ID → highest SID
• Primary root determines roots of additional trees and announces them in Router
Capability TLV
– Roots spread among available switches to balance load
FAQ: Trees? Roots? Sounds Like Spanning Tree…
• NO! – More like IP multicast routing
• Trees do NOT dictate forwarding path of unicast frames, only multidestination
frames
• Multiple trees allow load-sharing for any multidestination frames
• Control plane state further constrains IP multicast forwarding (based on mrouter
and receiver activity)
Best Practice: Identify the Roots
• Use the root-priority command to explicitly identify primary, secondary,
and tertiary root switches
• Optimizes forwarding paths for multidestination frames
• Simplifies troubleshooting
FabricPath Forwarding – Data Plane
Key FabricPath data plane elements:
• MAC table – Hardware performs MAC lookups at CE/FabricPath edge only
• Switch table – Hardware performs destination SID lookups to forward unicast
frames to other switches
• Multidestination table – Hash function selects tree, multidestination table
identifies on which interfaces to flood based on selected tree
FabricPath MAC Table
• Edge switches perform MAC table lookups on ingress frames
• Lookup result identifies output interface or destination FabricPath switch
FabricPath MAC Table
S100# sh mac address-table dynamic vlan 100
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link,
(T) - True, (F) - False
VLAN MAC Address Type age Secure NTFY Ports/SWID.SSID.LID Local MAC entry
---------+-----------------+--------+---------+------+----+------------------ (directly connected to
* 100 0000.0000.000a dynamic 0 F F Eth2/13 CE edge port)
* 100 0000.0279.5af1 dynamic 0 F F Eth2/17
* 100 0000.02b0.550e dynamic 0 F F Eth2/17
100 0000.09c3.dd9d dynamic 0 F F 300.0.230
100 0000.0c9f.f001 dynamic 0 F F 1100.0.65535
100 0026.51bf.fb41 dynamic 0 F F 20.0.1054
100 0026.51cf.ae41 dynamic 660 F F 10.0.1054 Remote MAC entries
100 0c00.060b.5481 dynamic 0 F F 200.0.75 (reached through
100 0000.0000.000b dynamic 0 F F 200.0.72 FabricPath)
* 100 1000.021a.ba87 dynamic 0 F F Eth2/17
100 1400.0579.0395 dynamic 0 F F 200.0.73
100 1400.0640.5dc8 dynamic 0 F F 200.0.74
100 1400.092d.9cd4 dynamic 0 F F 300.0.230
100 1800.0536.0ded dynamic 0 F F 200.0.73 S10 S20 S30 S40
100 1800.0992.3254 dynamic 0 F F 300.0.230
* 100 1c00.0128.901e dynamic 0 F F Eth2/22
100 0000.0000.000c dynamic 120 F F 300.0.231
100 1c00.0ae2.7579 dynamic 120 F F 300.0.221 po10 FabricPath
* 100 2000.0134.9a04 dynamic 0 F F Eth2/18 po20
* 100 2400.029b.533e dynamic 0 F F Eth2/17 po30
--More-- po40
S100 S200 S300

A B C
Forwarding through the Fabric –
FabricPath Encapsulation
Classical Ethernet Frame DMAC SMAC 802.1Q Etype Payload CRC

16 bytes
Original CE Frame

Cisco FabricPath Outer


DA
Outer
SA
FP
Tag DMAC SMAC 802.1Q Etype Payload
CRC
(new)
Frame (48) (48) (32)

6 bits 1 1 2 bits 1 1 12 bits 8 bits 16 bits 16 bits 10 bits 6 bits


OOO/DL
RSVD

Endnode ID Endnode ID Sub Etype


U/L
I/G

Switch ID LID Ftag TTL


(5:0) (7:6) Switch ID 0x8903

• Switch ID – Unique number identifying each FabricPath switch


• Ftag (Forwarding tag) – Unique number identifying topology or
multidestination tree
FabricPath Switch ID (SID)

• Every FabricPath switch automatically assigned a Switch ID


– Optionally, network administrator can manually configure SIDs
• FabricPath network automatically detects conflicting SIDs and
prevents data path initialization on violating switch
• Encoded in “Outer MAC addresses” of FabricPath MAC-in-MAC
frames
Best Practice: Manually Assign SIDs
• Use the fabricpath switch-id command to manually assign Switch IDs
• Simplifies management and eases troubleshooting
• Enables deterministic numbering schemes, e.g.:
– Spine switches assigned two-digit SIDs
– Leaf switches assigned three-digit SIDs
– VPC+ virtual SIDs assigned four-digit SIDs
– etc.
FabricPath Forwarding Tag (Ftag)
• Forwarding tag – Unique 10-bit number encoded in FabricPath header
• Overloaded field that identifies FabricPath topology or multidestination tree
• For unicast packets, identifies which FabricPath IS-IS topology to use
• For multidestination packets (broadcast, multicast, unknown unicast), identifies
which multidestination tree to use
FAQ: What about VLANs in FabricPath?
• VLANs are still relevant in FabricPath!
• Every frame in fabric carries 802.1Q
• VLANs still define a broadcast domain in FabricPath – define scope of flooding
• FabricPath switches still look at VLAN ID – including ‘core / spine’ switches
• Frames tagged with VLAN ID that does not exist in the VLAN database are
dropped
Best Practice: Configure All VLANs on All
Switches in Topology
• If a FabricPath switch belongs to a topology, configure all VLANs in that
topology on that switch
• Failure to do so can result in multidestination forwarding issues (black-holing)
• Deviate from this Best Practice carefully!
– Specific designs can work, but make sure to account for all failure cases!
FAQ: What about QoS in FabricPath?
• FabricPath hardware designed to accommodate for FabricPath encapsulation
• All FabricPath-encapsulated frames carry 802.1Q/802.1p (COS)
• IP packets in FabricPath also carry DSCP
• For FabricPath-encapsulated frames, hardware can still:
– Queue based on COS/DSCP
– Match based on L2/L3/L4 header information
– Match/set DSCP
Putting It All Together – Host A to Host B
(1) Broadcast ARP Request
Root for Root for
Multidestination Tree 1 Tree 2
Trees on Switch 10 S10 S20 S30 S40

Tree IF FFFF.FFFF.FFFF

Ftag → 1 po100,po200,po300 SA→100


po300
2 po100 Ftag→1
po100 po200
FFFF.FFFF.FFFF

SMAC→A
Multidestination Payload
Trees on Switch 100
po10 po20 po30 po20 po30 po40

Tree IF po40 po10


Hash
Hash Result→ 1 po10 S100 S200 Multidestination S300
2 po10,po20,po30,po40 Trees on Switch 300
Tree IF
FabricPath
MAC Table on S100
e1/13
Ftag → 1 po10,po20,po30,po40 e2/29
Payload
FFFF.FFFF.FFFF
2 po40 SMAC→A
MAC IF/SID SMAC→A
FFFF.FFFF.FFFF
A e1/13 (local)
Payload
MAC A MAC B
FabricPath
MAC Table on S300
MAC IF/SID
Don’t learn MACs from
Learn MACs of directly-connected unknown flood frames
devices unconditionally
MAC Address Tables After Broadcast ARP
• S100:
S100# sh mac address-table dynamic
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link
VLAN MAC Address Type age Secure NTFY Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+------------------
* 10 0000.0000.000a dynamic 0 F F Eth1/13 MAC A learned as
S100#
local entry on e1/13

• S10 (and S20, S30, S40, S200):


S10# sh mac address-table dynamic
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link
VLAN MAC Address Type age Secure NTFY Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+------------------
MAC A not learned
S10# on other switches

• S300:
S300# sh mac address-table dynamic
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link
VLAN MAC Address Type age Secure NTFY Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+------------------

S300#
Broadcast Forwarding
• Ingress FabricPath switch determines which tree to use based on hash result
• Outer Destination MAC remains all-ones (same as Inner DMAC)
• Other FabricPath switches honor Tree ID selected by ingress switch (Tree 1 in
this case) – flood frame on all core ports belonging to selected tree
• Edge FabricPath switches remove FabricPath header and flood in VLAN
– Flood FabricPath encapsulated frame on other core ports as well, if necessary
FAQ: What Is the Destination SID for a
Multidestination Frame?
• Broadcast – Copy inner DMAC to outer DMAC
• Multicast – Copy inner DMAC to outer DMAC
• Unknown Unicast – Use reserved multicast MAC “MC1” (010F.FFC1.01C0)
Putting It All Together – Host A to Host B
(2) Unicast ARP Reply

Root for Root for


Multidestination Tree 1 Tree 2
Trees on Switch 10 S10 S20 S30 S40

Tree IF

Ftag → 1 po100,po200,po300 010F.FFC1.01C0*


po300
2 po100 SA→300
po100 po200
Ftag→1

DMAC→A
Multidestination SMAC→B
Trees on Switch 100 po20 po30 po40
po10 po20 po30 Payload
Tree IF po40 po10
Hash
Ftag → 1 po10
S100 S200 S300
2 po10,po20,po30,po40 Multidestination
Trees on Switch 300
FabricPath e1/13 Tree IF e2/29
MAC Table on S100 Payload DMAC→A
Hash Result→ 1 po10,po20,po30,po40
SMAC→B
MAC IF/SID SMAC→B
2 po40
Payload
A→ A HIT! e1/13 (local) DMAC→A
MAC A MAC B
B 300 (remote)
FabricPath
MAC Table on S300
MAC IF/SID

If DMAC is known, then A→ MISS!


learn remote MAC B e2/29 (local)
*MC1 DMAC
MAC Address Tables After Unicast ARP Reply
• S100:
S100# sh mac address-table dynamic
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link
VLAN MAC Address Type age Secure NTFY Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+------------------
* 10 0000.0000.000a dynamic 90 F F Eth1/13
10 0000.0000.000b dynamic 60 F F 300.0.64 S100 learns MAC B as
remote entry reached
S100# through S300

• S300:
S300# sh mac address-table dynamic
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link
VLAN MAC Address Type age Secure NTFY Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+------------------
* 10 0000.0000.000b dynamic 0 F F Eth2/29 MAC B learned as
local entry on e2/29
S300#
Unknown Unicast Forwarding
• Ingress FabricPath switch determines which tree to use based on hash result
• Outer Destination MAC set to well-known “flood to fabric” multicast address
(MC1)*
• Other FabricPath switches honor Tree ID selected by ingress switch (Tree 1 in
this case) – flood frame on all core ports belonging to selected tree
• Edge FabricPath switches remove FabricPath header and flood in VLAN
• Flood FabricPath encapsulated frame on other core ports as well, if necessary

*MC1 = 010F.FFC1.01C0
FAQ: What Is Conversational MAC Learning?
• New MAC learns performed only on unicast frames destined to a local MAC
address
• Edge switches only need to learn:
– Locally connected host MACs
– MACs with which those local hosts are bidirectionally communicating

• Reduces MAC table capacity requirements on edge switches


• Beware of devices that have conversations with every other host (e.g. default
gateway, file server, firewall, etc.)
– Design options discussed later
FAQ: Is MAC Learning in FabricPath Software-
Based?
• NO!
• Binding of MAC address to destination SID at FabricPath edge switches is
completely hardware based
• FabricPath-capable platforms have hardware logic to perform conversational
MAC learning without punting anything to software
FAQ: What Happens If a Host Moves?
FabricPath rules for MAC learning on flood frames:
• Do not perform new learns based on broadcast / unknown unicast frames
• Do perform MAC table updates based on broadcast / unknown unicast frames
• Do perform new learns based on multicast frames (required for learning gateway MACs)

So how are host moves handled?


• Same as in Classical Ethernet
• Incumbent on moving host to tell the network it moved
– Gratuitous ARP, Reverse ARP are typical mechanisms

• FabricPath switches update existing entries based on these frames


FAQ: What Happens When the MAC Table
Capacity Is Exceeded?
• Same thing as in traditional Ethernet – new MAC learns may fail
– MAC lookups based on hash function – hash collisions are possible
• If a hash collision occurs on a new SMAC, the MAC is not learned
• If a DMAC lookup returns a Miss, the packet is flooded as unknown unicast

MAC MAC
Table Table
Hash Hash
SMAC X Line Full! No new learn DMAC X No match Flood
w w

n pages * m lines n pages * m lines

Hash collision – no new learn MAC table Miss – flood to VLAN


Putting It All Together – Host A to Host B
(3) Unicast Data

FabricPath Routing
Table on S30
S10 S20 S30 S40
Switch IF
… …

S300 → S300 po300


po300
DA→300
FabricPath Routing SA→100
Table on S100
Ftag→1
Switch IF
DMAC→B
S10 po10
SMAC→A po20 po30 po40
po10 po20 po30
S20 po20
Payload Hash po40 po10
S30 po30
S200 FabricPath Routing S300
S40 po40 S100
po10, po20,
Table on S300
S200
po30, po40 Switch IF
e1/13
… … e2/29
S300 → S300
po10, po20,
po30, po40 Payload
S300 → S300 MAC Lookup
SMAC→A
DMAC→B
FabricPath DMAC→B
SMAC→A FabricPath
MAC Table on S100 MAC A MAC B
Payload MAC Table on S300
MAC IF/SID MAC IF/SID
A e1/13 (local) A S100 (remote)
If DMAC is known, then
B→ B HIT! 300 (remote) B→ B HIT! e2/29 (local) learn remote MAC
MAC Address Tables After Unicast Data
• S100:
S100# sh mac address-table dynamic
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link
VLAN MAC Address Type age Secure NTFY Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+------------------
* 10 0000.0000.000a dynamic 90 F F Eth1/13
10 0000.0000.000b dynamic 60 F F 300.0.64

S100#

• S300:
S300# sh mac address-table dynamic
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link
VLAN MAC Address Type age Secure NTFY Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+------------------
10 0000.0000.000a dynamic 30 F F 100.0.12 S100 learns MAC A as
* 10 0000.0000.000b dynamic 90 F F Eth2/29 remote entry reached
through S100
S300#
Creating Multicast State – IGMP Snooping
Ftag 1
• IGMP snooping learns about Ftag 2

interested receivers on edge


switches
IGMP
• Membership tracked on CE S200
snooping
IGMP
ports based on receiving IGMP snooping Root of
Receiver G1
Tree 1
reports / leaves
IGMP Reports
S100

FabricPath
Receiver G1
IGMP Reports
Root of
Tree 2

S300
IGMP
snooping
Creating Multicast State – GM-LSPs
Ftag 1
• FabricPath IS-IS uses Group Membership LSPs Ftag 2

(GM-LSPs) to build multicast forwarding state for


the fabric
• Flooded to other switches to advertise which
edge switches need which multicast groups IS-IS S200

Root of
• Builds Layer 2 multicast forwarding state for Tree 1 Receiver G1
FabricPath core ports GM-LSPs IGMP Reports
S100

FabricPath GM-LSPs
Receiver G1
IGMP Reports
Root of
Tree 2

IS-IS S300
Multicast State
FabricPath Edge Switch with Receiver
S100# sh ip igmp snooping groups
Ftag 1
Type: S - Static, D - Dynamic, R - Router port, F - Fabricpath core port
IGMP snooping knows local Ftag 2
Vlan Group Address Ver Type Port list OIF…
10 */* - RF Po10
RF Po40
IS-IS knows remote
10 239.0.0.1 v2 D Eth1/13
receiver…
S100# sh fabricpath isis ip mroute | section 239.0.0.1
VLAN 10: (*, 239.0.0.1)
Outgoing interface list: (count: 1) M2RIB knows about both
SWID: 0xc8 (200)
S100# sh fabricpath mroute | section 239.0.0.1
S200
(vlan/10, 0.0.0.0, 239.0.0.1), uptime: 00:00:30, isis igmp
Outgoing interface list: (count: 2)
Switch-id 200, uptime: 00:00:28, isis
Root 1 Rcvr-G1
Interface Ethernet1/13, uptime: 00:00:30, igmp GM-LSPs
IGMP Reports
S100
po10 FabricPath GM-LSPs
po40
Rcvr-G1 e1/13

IGMP Reports Root 2

S300
Pruned Forwarding Trees for IP Multicast Groups
Multidestination Tree 1 Multidestination Tree 2 Ftag 1
Ftag 2

Multidestination Tree 1 Multidestination Tree 2


S200 S200
G1 Pruned Tree G1 Pruned Tree

Root 1

Rcvr-G1 Rcvr-G1
S100 S100

FabricPath FabricPath Can’t go this way –


not part of Tree 2!

Rcvr-G1 Rcvr-G1
Root 2

Can’t go this way –


not part of Tree 1! Src-G1 Src-G1
S300 S300
FabricPath IP Multicast Data Plane
Tree Selection and Group Lookup on Ingress Switch
Multidestination Tree 1
G1 Pruned Tree

S200

Root 1
S10
Rcvr-G1
S100
FabricPath

po1 po2
Rcvr-G1 Root 2 Data Traffic

Src-G1
S300
FabricPath FabricPath MAC Table
Multicast Trees
Tree
Hash VLAN Group SID IFs
VLAN Tree (Ftag) (Ftag)

Packet data → → 10 1 → Tree 1 → 1 10 G1 S100,S200 po2


10 2 2 10 G1 S100,S200 po3
FabricPath IP Multicast Data Plane
Group Lookup on Core Switch
FabricPath MAC Table
Multidestination Tree 1
Tree
VLAN Group SID IFs
(Ftag)
G1 Pruned Tree
Tree 1 → 1 10 G1 S100,S200 po4,po5
2 10 G1 S100,S200 po4,po5
S200

Root 1
S10
po4 po5 Rcvr-G1
S100
po3
FabricPath Data Traffic

Rcvr-G1 Root 2

Src-G1
S300
FabricPath IP Multicast Data Plane
Group Lookup on Egress Switches
FabricPath MAC Table
Tree
Multidestination Tree 1
VLAN Group SID IFs
(Ftag)
G1 Pruned Tree
Tree 1 → 1 10 G1 S100,S200 po6,e1/29
2 10 G1 S100,S200 po6,e1/29
S200
Data Traffic

Root 1 e1/29
S10
po6 Rcvr-G1
S100
Data Traffic
po7 FabricPath
e1/13
po8

Rcvr-G1 Root 2

Src-G1
FabricPath MAC Table S300

Tree
VLAN Group SID IFs
(Ftag)

Tree 1 → 1 10 G1 S100,S200 po7,e1/13


2 10 G1 S100,S200 po7,e1/13
Best Practice: Connect Dual-Homed CE
Devices via VPC+
• Dual-homed devices should be connected using VPC+
• Provides active/active uplinks from CE to FabricPath
FabricPath
• ECMP toward CE-attached hosts within fabric VPC+
• Removes complexity of STP integration
S1 S2
– BPDUs still filtered at edge
– TCNs not propagated through fabric

STP Device Host


VPC+ – Physical Topology
Peer link runs as Peer link and
FabricPath core port PKA required

VPCs configured S10 S20 S30 S40


as normal

VLANs must be
FabricPath VLANs

No requirements for
attached devices other
FabricPath
than port-channel support S100 S300

MAC A MAC B MAC C


VPC+ – Logical Topology

S10 S20 S30 S40


Virtual switch
introduced

S1000

S100
FabricPath S300

MAC A MAC B MAC C


Remote MAC Entries for VPC+
S200# sh mac address-table dynamic
Legend:
View from * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
S200 age - seconds since last seen,+ - primary entry using vPC Peer-Link
VLAN MAC Address Type age Secure NTFY Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+------------------
* 10 0000.0000.000c dynamic 1500 F F Eth1/30
10 0000.6500.000a dynamic 1500 F F 1000.11.4513

MAC A S200#

S10 S20 S30 S40

S1000
Remote Switch ID 1000,
not S10 or S20
po2
po1

S100
FabricPath S200

1/30

MAC A MAC B MAC C


FabricPath Routing for VPC+
S200# sh fabricpath route topology 0 switchid 1000
FabricPath Unicast Route Table
'a/b/c' denotes ftag/switch-id/subswitch-id
'[x/y]' denotes [admin distance/metric]
ftag 0 is local ftag
subswitch-id 0 is default subswitch-id

FabricPath Unicast Route Table for Topology-Default


Route to S1000 (from S200)
1/1000/0, number of next-hops: 2
via Po1, [115/10], 0 day/s 01:09:56, isis_l2mp-default
ECMP toward S1000 via Po2, [115/10], 0 day/s 01:09:56, isis_l2mp-default
S200#
from S200

S10 S20 S30 S40

S1000

po2
po1

FabricPath
S100 S200

1/30

MAC A MAC B MAC C


VPC+ and Active/Active HSRP
Physical Topology Logical Topology

HSRP Active HSRP Standby


0100.5E00.0002 SVI SVI
SSID→1000
S10 S20 S30 S40 S10 S20 S30 S40
0100.5E00.0002

SMAC→HSRP

Payload

S1000

po2 po2

FabricPath po1
FabricPath po1

S100 S200 S100 S200

HSRP MAC
FabricPath FabricPath FabricPath FabricPath
MAC Table on S200 Routing Table on S200 MAC Table on S200 Routing Table on S200
MAC IF/SID SID IF MAC IF/SID SID IF
HSRP S1000 (remote) S1000 po1,po2 HSRP S1000 (remote) S1000 po1,po2
Anycast HSRP
Physical Topology Logical Topology
HSRP HSRP HSRP HSRP
Active Standby Listen Listen
0100.5E00.0002 SVI SVI SVI SVI
SSID→1000
S10 S20 S30 S40 S10 S20 S30 S40
0100.5E00.0002

SMAC→HSRP

Payload

S1000

po4 po4
po2 po2
po3 po3

FabricPath po1
FabricPath po1

S100 S200 S100 S200

HSRP MAC
FabricPath FabricPath FabricPath FabricPath
MAC Table on S200 Routing Table on S200 MAC Table on S200 Routing Table on S200
MAC IF/SID SID IF MAC IF/SID SID IF
HSRP S1000 (remote) S1000 po1,po2,po3,po4 HSRP S1000 (remote) S1000 po1,po2,po3,po4
n-Way Active HSRP in FabricPath
VPC+ with FHRP / Anycast HSRP

• Hellos sent by Active router


– Outer destination MAC is VMAC
– Outer source is virtual SID

• FabricPath edge switches learn VMAC as reached through virtual SID


• Traffic destined to VMAC leverages ECMP
• Any VPC+ peer / Anycast member can route traffic destined to VMAC
VPC+ versus Anycast for n-Way Active HSRP

VPC+ Anycast
Number of active routers Two Four (NX-OS 6.2)
Peer link / Peer keepalive link Required Not Required
Leaf software requirement None NX-OS 6.2-based
FabricPath Transit mode
• In FabricPath network, a pure Layer2 spine node can be configured as transit
mode. In transit mode, all the incoming traffic is mapped to one internal bridge-
domain.
• Switch (config)# fabricpath mode transit

Platform First Supported Difference


Release
Nexus 5xxx/6xxx series switches 7.0(0)N1(1) Need to enable at least one
VLAN as FabricPath mode (It
can be vlan1, but it is not
recommended. Set another vlan
as FP mode.)
Nexus 7xxx series switches 6.2.2 No need to enable any VLAN as FP
mode. Vlan1 is by default there on spine
node as CE mode.
Agenda
• Introduction to FabricPath
• FabricPath Forwarding
FabricPath Designs
• Key Takeaways
FabricPath Designs
FabricPath Designs
• Explore a variety of FabricPath designs and evaluate how they meet key design
criteria
• Introduce concepts / design building-blocks to help you build a design that
meets your requirements
• Assumption is the choice to go L2/FabricPath has already been made
– Objective is not to debate "Layer 2 vs. Layer 3" or "why Layer 2 in the Data Center?"
FabricPath Designs
High-level design options considered in this presentation:
• Routing at Aggregation
• Centralized Routing
• Multi-POD
Routing at Aggregation Designs

L3
Routed core

L3

L2/L3 boundary L2/L3 boundary


Aggregation

FabricPath FabricPath

Access Layer 3 Link


Layer 2 CE
Layer 2 FabricPath
Routing at Aggregation
Key Design Highlights

• Evolution of current design practices


• Aggregation layer functions as FabricPath spine and L2/L3 boundary
– FabricPath switching for East ↔ West intra-VLAN traffic
– SVIs for East ↔ West inter-VLAN routing. VPC+ for active/active HSRP.
– Routed uplinks for North ↔ South routed flows

• Access layer provides pure L2 functions


– FabricPath core ports facing aggregation layer
– CE edge ports facing hosts
– Option for VPC+ for active/active host connections
This design meets MANY
Routing at Aggregation network requirements!

Two Spine Design


• Simplest design option

• Extension of traditional aggregation/access designs L3


SVIs SVIs
Immediate benefits:

• Simplified configuration L2/L3 boundary

• Removal of STP

• Traffic distribution over all uplinks without VPC port-channels


FabricPath
• Active/active gateways

• “VLAN anywhere” at access layer

• Topological flexibility
– Direct-path forwarding option
– Easily provision additional access ↔ aggregation bandwidth
– Easily deploy L4-7 services
– Option for VPC+ for legacy access switches
Routing at Aggregation
Two Spine Design Details
SVIs/routed ports provided by
Nexus 7000 M+F-Series or F- HSRP between agg
Series, or Nexus 5696Q switches for FHRP
L3
SVIs SVIs

FabricPath core ports Active Standby L2/L3 boundary


provided by F-Series
modules or Nexus 5696Q HSRP
VPC+
Can run VPC+ for
active/active
HSRP

Nexus 7000
F-Series Layer 3 Link
modules for Layer 2 CE
Nexus 5500/6000 for Layer 2 FabricPath
EoR/MoR access
ToR access
FEX
Routing at Aggregation
Anycast HSRP L3

Anycast HSRP All Anycast HSRP forwarders


between agg switches L3 share same VIP and VMAC
Anycast HSRP

SVI GWY IP X SVI GWY IP X SVI GWY IP X SVI GWY IP X


GWY MAC A GWY MAC A GWY MAC A GWY MAC A
L2/L3 boundary

FabricPath Layer 3 Link


Layer 2 CE
Layer 2 FabricPath
GWY MAC A→L1,L2,L3,L4

Hosts resolve shared


VIP to shared VMAC

Routed traffic spread


over spines based on
ECMP
MAC Scale in Routing at Aggregation Designs
Nexus 7000/7700 F3 or F2/F2E at aggregation:

• 64K (F3) or 16K (F2/F2E) unique host MACs when SVIs enabled
– With SVIs, any ingress SOC must know enough information to route packets to any other VLAN, regardless of whether that
VLAN exists on one of its ports
– n * if SVI VLAN-ranges spread over multiple router pairs

Nexus 7000 M+F1/F2E at aggregation:

• 16K unique host MACs due to mixed chassis learning behavior prior to NX-OS 6.2
– FabricPath core ports must learn SMACs on ingress
– Several typical topologies can result in MAC table overflow (e.g., aggregation ISL/VPC+ peer-link)

• 128K unique host MACs with “proxy L2 learning” in NX-OS 6.2


– Disables core port learning in mixed chassis, and uses M-series MAC table only
– Requires that access switches install all their local MACs on their core ports (behavior starts in NX-OS 6.1)

Nexus 5600/6000 at aggregation:

• 32K unique host MACs with Nexus 5600/6000


– Loss of high availability/ISSU and certain aggregation features
– Check MAC/ARP HW resource carving templates: http://tiny.cc/n5k-mac-carve-template
Centralized Routing Designs
L3

Layer 3 Border leaf


L2/L3 boundary
switches

FabricPath spine

FabricPath
Server access leaf
switches
Centralized Routing Designs
Alternative View

FabricPath spine

FabricPath Layer 3 Border leaf


switches
Server access leaf
switches
L2/L3 boundary

L3
Leaf switches each have
“personality” – most for
server access… …but some for Layer 3 services (routing)
and/or L4-7 services (SLB, FW, etc.)
Centralized Routing
Key Design Highlights
• Paradigm shift with respect to typical designs
• Traditional “aggregation” layer becomes pure FabricPath spine
– Provides uniform any-to-any connectivity between leaf switches
– In simplest case, only FabricPath switching occurs in spine
– Optionally, some CE edge ports exist to provide external router connections

• FabricPath leaf switches, connecting to spine, have specific “personality”


– Most leaf switches provide server connectivity, like traditional access switches in “Routing at
Aggregation” designs
– Two or more leaf switches provide L2/L3 boundary, inter-VLAN routing and North ↔ South routing
– Other (or same) leaf switches have L4-7 services personality (future)
• Decouples L2/L3 boundary and L4-7 services provisioning from spine
– Simplifies spine design
Centralized Routing
Single Router Pair (FabricPath-Connected Leaf)

FabricPath spine with F-Series modules or Nexus 5696Q


FabricPath core ports provided by
provides transit fabric (no routing, no MAC learning)
F-series modules or Nexus 5696Q

… Can run VPC+ or


Anycast for
FabricPath active/active HSRP
VPC+
Active Standby
VPC+ SVIs SVIs L2/L3 boundary
VPC+

HSRP
L3

All VLANs SVIs for all VLANs on leaf L3 border


available at all leaf switch pair (provided by M+F-series or Layer 3 Link
switches F-Series modules, or Nexus 5696Q) Layer 2 CE
Layer 2 FabricPath
HSRP between L3 services
switches for FHRP
Centralized Routing
Single Router Pair (FabricPath-Connected Leaf)

INTER-VLAN
ROUTED FLOWS
BRIDGED FLOWS NORTH↔SOUTH
ROUTED FLOWS
FabricPath
VPC+

VPC+ SVI SVI


VPC+

L3
Centralized Routing
Multiple Router Pairs (FabricPath-Connected Leaf)

Routing adjacencies provide transit


Variations include >2 L3 path for inter-set routing
services routers with Anycast
HSRP, etc.

FabricPath
OSPF etc.
VPC+ VPC+
SVIs SVIs

VPC+ SVIs SVIs

L3
All VLANs
available at all This router pair has SVIs for This pair has SVIs for
access switches some VLANs (VLAN set 1) other VLANs (VLAN set 2)
Centralized Routing
Details of Multiple Router Pairs Option

• Discreet SVI “sets”, with one set per L3-services leaf pair
• Transit VLAN to provide inter-set routing
• Requires appropriate platform for L3 services leaf switches to avoid MAC
learning on core ports
• Nexus 7000 with F3/F2E modules, or M+F with “proxy L2 learning” feature (NX-OS 6.2)
• Nexus 5696Q

• All leaf switches must have all VLANs defined (due to multidestination tree-
building behavior)
• With multi-topology (NX-OS 6.2), can prune VLANs from certain leaf switches
Centralized Routing
Multiple Router Pairs (FabricPath-Connected Leaf)

INTER-VLAN
ROUTED FLOWS Transit routing
(Inter-VLAN-set)

FabricPath

VPC+ VPC+
SVIs SVIs

VPC+ SVIs SVIs

L3
BFD over FabricPath
FabricPath as Transport for BFD

• Routing protocol / FHRP peering over FabricPath network


• BFD between SVIs or sub-interfaces
Nexus 5500 / 5600 / 6000 / Nexus 5500 / 5600 / 6000 / Nexus 5500 / 5600 / L3 / SVI /
7x00 * 7x00 * 6000 / 7x00 sub-interface

SVI SVI

FabricPath FabricPath FabricPath

SVI

L3 / SVI / L3 / SVI /
* - From NX-OS 7.2 sub-interface sub-interface
82
MAC Scale with Nexus 7000 F-Series at Spine
With F1/F2/F2E/F3 FabricPath core ports only at spine
• Core ports do not learn MAC addresses*
• MAC scale not gated by spine switches

With F1/F2E/F3 FabricPath core ports plus CE edge ports at spine


• Core ports do not learn MAC addresses
• CE edge ports perform per-SOC conversational learning
– Only MACs of VLANs on SOC front-panel ports learned
– No practical limit to MAC scale – theoretically allows for (mac_capacity * num_of_SOCs)
MACs
Note: F2 requires no hardware fabricpath mac-learning option when functioning as pure spine
Multi-Pod Designs
L3

Routed core FabricPath core

Aggregation L2/L3 boundary


FabricPath

Access
POD 1 POD 2 POD 3
Multi-Pod Design
Key Design Highlights

• Hybrid of elements used in other design alternatives


• Combines “Routing at Aggregation” and “Centralized Routing” design elements
• Three possible classes of VLAN in FabricPath domain
– POD-local – VLANs exist only in one POD
– DC-wide – VLANs exist in all PODs
– Multi-POD – VLANs exist only in subset of PODs (not illustrated)
Active/Active HSRP
Multi-Pod Design for VLANs 2000-2099

FabricPath core interconnects Core must include VLANs


PODs for bridging in DC-wide from all PODs due to
VLANs multidestination tree behavior

L3 CORE
VLANs 100-199
VLANs 200-299
Active/Active HSRP VLANs 300-399
Active/Active HSRP for VLANs 200-299 VLANs 2000-2099
for VLANs 100-199
Active/Active HSRP
for VLANs 300-399

Native Mixed
FabricPath FabricPath/CE
PODs POD

Any
POD 1 POD 2 POD 3 device
VLANs 100-199 VLANs 200-299 VLANs 300-399
VLANs 2000-2099 VLANs 2000-2099 VLANs 2000-2099
POD local Layer 3 Link
DC-wide
VLANs Layer 2 CE
VLANs
Layer 2 FabricPath
Layer 2 FP Default Topology

Multi-Pod Design POD 1 Topology + Default Topology


POD 2 Topology + Default Topology
POD 3 Topology + Default Topology

FabricPath Multi-Topology
Only DC-wide VLANs exist
in FabricPath core
L3 CORE
VLANs 2000-2099
Core ports in POD belong to default
topology and also mapped to POD-
local topology Core ports
belong only to
default
topology

POD 1 POD 2 POD 3


VLANs 100-199 VLANs 200-299 VLANs 300-399
VLANs 2000-2099 VLANs 2000-2099 VLANs 2000-2099

POD-local VLANs exist only in POD switches, Layer 3 Link


mapped to POD-specific topology Layer 2 CE
Layer 2 FabricPath
Multi-Pod Design
FabricPath Multi-Topology (NX-OS 6.2)

• Allows more elegant DC-wide versus POD-local VLAN definition/isolation


• No need for POD-local VLANs to exist in core
• Can support VLAN ID reuse in multiple PODs

• Define FabricPath VLANs → map VLANs to topology → map topology to


FabricPath core port(s)
• Depending on design, option exists to use a single non-default topology for all PODs
(“disconnected” topology)
• Default topology always includes all FabricPath core ports
• Map DC-wide VLANs to default topology
• POD-local core ports also mapped to POD-local topology
• Map POD-local VLANs to POD-local topology
Agenda
• Introduction to FabricPath
• FabricPath Forwarding
• FabricPath Designs
Key Takeaways
Key Takeaways
Key Takeaways – FabricPath Technology
• FabricPath is simple
– Keeps the attractive aspects of Layer 2 – No addressing, simple configuration and
deployment
– Integrates stability and scale of Layer 3 – Frame routing, TTL, RPF check

• FabricPath is efficient
– High bisectional bandwidth (ECMP)
– Optimal path between any two nodes

• FabricPath is scalable
– Can extend a bridged domain without extending the risks generally associated with
Layer 2
Key Takeaways – FabricPath Design
• You can deploy FabricPath today, with traditional network designs
• FabricPath introduces immediate, tangible benefits to any design:
– Simple configuration, eliminate Spanning Tree, leverage parallel network paths, extend
VLANs safely, mitigate loops, etc.
• Provides multiple design options to help you build a network that meets your
requirements
Conclusion
• Thank you for your time today!
• You should now have a thorough understanding of FabricPath
concepts, technology, and design considerations!
Complete Your Online Session Evaluation
• Give us your feedback to be
entered into a Daily Survey
Drawing. A daily winner
will receive a $750 Amazon
gift card.
• Complete your session surveys
though the Cisco Live mobile
app or your computer on
Cisco Live Connect.
Don’t forget: Cisco Live sessions will be available
for viewing on-demand after the event at
CiscoLive.com/Online
Continue Your Education
• Demos in the Cisco Campus
• Walk-in Self-Paced Labs
• Table Topics
• Meet the Engineer 1:1 meetings
• Related sessions
Thank you

You might also like