BRKDCT 2081
BRKDCT 2081
BRKDCT 2081
• Related Session:
- BRKDCT-3313 – FabricPath Operation and Troubleshooting
Agenda
Introduction to FabricPath
• FabricPath Forwarding
• FabricPath Designs
• Key Takeaways
Introduction to FabricPath
Why Layer 2 in the Data Center?
• Provides “plug-and-play” setup
• Certain protocols / applications require it
• Allows virtual machine / workload mobility
Typical Data Center Design
L3
L2
L3
L2
STP
Switching Routing
Easy Configuration Stable and Scalable
Plug-and-Play Multipathing (ECMP)
Flexible Provisioning Fast Convergence
FabricPath
FabricPath
Routing Table on S100
Switch IF
One ‘best’ path S10 L1
to S10 (via L1) S20 L2 L1
L2
S30 L3
S100
L3
S200
FabricPath S300
S40 L4 L4
A B C
FAQ: How Are ECMP Load-Sharing Decisions
Made?
• ECMP path chosen based on hash function
• Hash uses SIP/DIP + L4 + VLAN by default
• Use show fabricpath load-balance unicast to determine ECMP path for a
given packet
FabricPath Multidestination Trees
S10 S20 S30 S40
Root for Root for • Multidestination traffic constrained to tree topology
Tree 1 Tree 2
- Network-wide identifier (Ftag) assigned to each tree
A B C
Forwarding through the Fabric –
FabricPath Encapsulation
Classical Ethernet Frame DMAC SMAC 802.1Q Etype Payload CRC
16 bytes
Original CE Frame
Tree IF FFFF.FFFF.FFFF
SMAC→A
Multidestination Payload
Trees on Switch 100
po10 po20 po30 po20 po30 po40
• S300:
S300# sh mac address-table dynamic
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link
VLAN MAC Address Type age Secure NTFY Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+------------------
S300#
Broadcast Forwarding
• Ingress FabricPath switch determines which tree to use based on hash result
• Outer Destination MAC remains all-ones (same as Inner DMAC)
• Other FabricPath switches honor Tree ID selected by ingress switch (Tree 1 in
this case) – flood frame on all core ports belonging to selected tree
• Edge FabricPath switches remove FabricPath header and flood in VLAN
– Flood FabricPath encapsulated frame on other core ports as well, if necessary
FAQ: What Is the Destination SID for a
Multidestination Frame?
• Broadcast – Copy inner DMAC to outer DMAC
• Multicast – Copy inner DMAC to outer DMAC
• Unknown Unicast – Use reserved multicast MAC “MC1” (010F.FFC1.01C0)
Putting It All Together – Host A to Host B
(2) Unicast ARP Reply
Tree IF
DMAC→A
Multidestination SMAC→B
Trees on Switch 100 po20 po30 po40
po10 po20 po30 Payload
Tree IF po40 po10
Hash
Ftag → 1 po10
S100 S200 S300
2 po10,po20,po30,po40 Multidestination
Trees on Switch 300
FabricPath e1/13 Tree IF e2/29
MAC Table on S100 Payload DMAC→A
Hash Result→ 1 po10,po20,po30,po40
SMAC→B
MAC IF/SID SMAC→B
2 po40
Payload
A→ A HIT! e1/13 (local) DMAC→A
MAC A MAC B
B 300 (remote)
FabricPath
MAC Table on S300
MAC IF/SID
• S300:
S300# sh mac address-table dynamic
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link
VLAN MAC Address Type age Secure NTFY Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+------------------
* 10 0000.0000.000b dynamic 0 F F Eth2/29 MAC B learned as
local entry on e2/29
S300#
Unknown Unicast Forwarding
• Ingress FabricPath switch determines which tree to use based on hash result
• Outer Destination MAC set to well-known “flood to fabric” multicast address
(MC1)*
• Other FabricPath switches honor Tree ID selected by ingress switch (Tree 1 in
this case) – flood frame on all core ports belonging to selected tree
• Edge FabricPath switches remove FabricPath header and flood in VLAN
• Flood FabricPath encapsulated frame on other core ports as well, if necessary
*MC1 = 010F.FFC1.01C0
FAQ: What Is Conversational MAC Learning?
• New MAC learns performed only on unicast frames destined to a local MAC
address
• Edge switches only need to learn:
– Locally connected host MACs
– MACs with which those local hosts are bidirectionally communicating
MAC MAC
Table Table
Hash Hash
SMAC X Line Full! No new learn DMAC X No match Flood
w w
FabricPath Routing
Table on S30
S10 S20 S30 S40
Switch IF
… …
S100#
• S300:
S300# sh mac address-table dynamic
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link
VLAN MAC Address Type age Secure NTFY Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+------------------
10 0000.0000.000a dynamic 30 F F 100.0.12 S100 learns MAC A as
* 10 0000.0000.000b dynamic 90 F F Eth2/29 remote entry reached
through S100
S300#
Creating Multicast State – IGMP Snooping
Ftag 1
• IGMP snooping learns about Ftag 2
FabricPath
Receiver G1
IGMP Reports
Root of
Tree 2
S300
IGMP
snooping
Creating Multicast State – GM-LSPs
Ftag 1
• FabricPath IS-IS uses Group Membership LSPs Ftag 2
Root of
• Builds Layer 2 multicast forwarding state for Tree 1 Receiver G1
FabricPath core ports GM-LSPs IGMP Reports
S100
FabricPath GM-LSPs
Receiver G1
IGMP Reports
Root of
Tree 2
IS-IS S300
Multicast State
FabricPath Edge Switch with Receiver
S100# sh ip igmp snooping groups
Ftag 1
Type: S - Static, D - Dynamic, R - Router port, F - Fabricpath core port
IGMP snooping knows local Ftag 2
Vlan Group Address Ver Type Port list OIF…
10 */* - RF Po10
RF Po40
IS-IS knows remote
10 239.0.0.1 v2 D Eth1/13
receiver…
S100# sh fabricpath isis ip mroute | section 239.0.0.1
VLAN 10: (*, 239.0.0.1)
Outgoing interface list: (count: 1) M2RIB knows about both
SWID: 0xc8 (200)
S100# sh fabricpath mroute | section 239.0.0.1
S200
(vlan/10, 0.0.0.0, 239.0.0.1), uptime: 00:00:30, isis igmp
Outgoing interface list: (count: 2)
Switch-id 200, uptime: 00:00:28, isis
Root 1 Rcvr-G1
Interface Ethernet1/13, uptime: 00:00:30, igmp GM-LSPs
IGMP Reports
S100
po10 FabricPath GM-LSPs
po40
Rcvr-G1 e1/13
S300
Pruned Forwarding Trees for IP Multicast Groups
Multidestination Tree 1 Multidestination Tree 2 Ftag 1
Ftag 2
Root 1
Rcvr-G1 Rcvr-G1
S100 S100
Rcvr-G1 Rcvr-G1
Root 2
S200
Root 1
S10
Rcvr-G1
S100
FabricPath
po1 po2
Rcvr-G1 Root 2 Data Traffic
Src-G1
S300
FabricPath FabricPath MAC Table
Multicast Trees
Tree
Hash VLAN Group SID IFs
VLAN Tree (Ftag) (Ftag)
Root 1
S10
po4 po5 Rcvr-G1
S100
po3
FabricPath Data Traffic
Rcvr-G1 Root 2
Src-G1
S300
FabricPath IP Multicast Data Plane
Group Lookup on Egress Switches
FabricPath MAC Table
Tree
Multidestination Tree 1
VLAN Group SID IFs
(Ftag)
G1 Pruned Tree
Tree 1 → 1 10 G1 S100,S200 po6,e1/29
2 10 G1 S100,S200 po6,e1/29
S200
Data Traffic
Root 1 e1/29
S10
po6 Rcvr-G1
S100
Data Traffic
po7 FabricPath
e1/13
po8
Rcvr-G1 Root 2
Src-G1
FabricPath MAC Table S300
Tree
VLAN Group SID IFs
(Ftag)
VLANs must be
FabricPath VLANs
No requirements for
attached devices other
FabricPath
than port-channel support S100 S300
S1000
S100
FabricPath S300
MAC A S200#
S1000
Remote Switch ID 1000,
not S10 or S20
po2
po1
S100
FabricPath S200
1/30
S1000
po2
po1
FabricPath
S100 S200
1/30
SMAC→HSRP
Payload
S1000
po2 po2
FabricPath po1
FabricPath po1
HSRP MAC
FabricPath FabricPath FabricPath FabricPath
MAC Table on S200 Routing Table on S200 MAC Table on S200 Routing Table on S200
MAC IF/SID SID IF MAC IF/SID SID IF
HSRP S1000 (remote) S1000 po1,po2 HSRP S1000 (remote) S1000 po1,po2
Anycast HSRP
Physical Topology Logical Topology
HSRP HSRP HSRP HSRP
Active Standby Listen Listen
0100.5E00.0002 SVI SVI SVI SVI
SSID→1000
S10 S20 S30 S40 S10 S20 S30 S40
0100.5E00.0002
SMAC→HSRP
Payload
S1000
po4 po4
po2 po2
po3 po3
FabricPath po1
FabricPath po1
HSRP MAC
FabricPath FabricPath FabricPath FabricPath
MAC Table on S200 Routing Table on S200 MAC Table on S200 Routing Table on S200
MAC IF/SID SID IF MAC IF/SID SID IF
HSRP S1000 (remote) S1000 po1,po2,po3,po4 HSRP S1000 (remote) S1000 po1,po2,po3,po4
n-Way Active HSRP in FabricPath
VPC+ with FHRP / Anycast HSRP
VPC+ Anycast
Number of active routers Two Four (NX-OS 6.2)
Peer link / Peer keepalive link Required Not Required
Leaf software requirement None NX-OS 6.2-based
FabricPath Transit mode
• In FabricPath network, a pure Layer2 spine node can be configured as transit
mode. In transit mode, all the incoming traffic is mapped to one internal bridge-
domain.
• Switch (config)# fabricpath mode transit
L3
Routed core
L3
FabricPath FabricPath
• Removal of STP
• Topological flexibility
– Direct-path forwarding option
– Easily provision additional access ↔ aggregation bandwidth
– Easily deploy L4-7 services
– Option for VPC+ for legacy access switches
Routing at Aggregation
Two Spine Design Details
SVIs/routed ports provided by
Nexus 7000 M+F-Series or F- HSRP between agg
Series, or Nexus 5696Q switches for FHRP
L3
SVIs SVIs
Nexus 7000
F-Series Layer 3 Link
modules for Layer 2 CE
Nexus 5500/6000 for Layer 2 FabricPath
EoR/MoR access
ToR access
FEX
Routing at Aggregation
Anycast HSRP L3
• 64K (F3) or 16K (F2/F2E) unique host MACs when SVIs enabled
– With SVIs, any ingress SOC must know enough information to route packets to any other VLAN, regardless of whether that
VLAN exists on one of its ports
– n * if SVI VLAN-ranges spread over multiple router pairs
• 16K unique host MACs due to mixed chassis learning behavior prior to NX-OS 6.2
– FabricPath core ports must learn SMACs on ingress
– Several typical topologies can result in MAC table overflow (e.g., aggregation ISL/VPC+ peer-link)
FabricPath spine
FabricPath
Server access leaf
switches
Centralized Routing Designs
Alternative View
FabricPath spine
L3
Leaf switches each have
“personality” – most for
server access… …but some for Layer 3 services (routing)
and/or L4-7 services (SLB, FW, etc.)
Centralized Routing
Key Design Highlights
• Paradigm shift with respect to typical designs
• Traditional “aggregation” layer becomes pure FabricPath spine
– Provides uniform any-to-any connectivity between leaf switches
– In simplest case, only FabricPath switching occurs in spine
– Optionally, some CE edge ports exist to provide external router connections
HSRP
L3
INTER-VLAN
ROUTED FLOWS
BRIDGED FLOWS NORTH↔SOUTH
ROUTED FLOWS
FabricPath
VPC+
L3
Centralized Routing
Multiple Router Pairs (FabricPath-Connected Leaf)
FabricPath
OSPF etc.
VPC+ VPC+
SVIs SVIs
L3
All VLANs
available at all This router pair has SVIs for This pair has SVIs for
access switches some VLANs (VLAN set 1) other VLANs (VLAN set 2)
Centralized Routing
Details of Multiple Router Pairs Option
• Discreet SVI “sets”, with one set per L3-services leaf pair
• Transit VLAN to provide inter-set routing
• Requires appropriate platform for L3 services leaf switches to avoid MAC
learning on core ports
• Nexus 7000 with F3/F2E modules, or M+F with “proxy L2 learning” feature (NX-OS 6.2)
• Nexus 5696Q
• All leaf switches must have all VLANs defined (due to multidestination tree-
building behavior)
• With multi-topology (NX-OS 6.2), can prune VLANs from certain leaf switches
Centralized Routing
Multiple Router Pairs (FabricPath-Connected Leaf)
INTER-VLAN
ROUTED FLOWS Transit routing
(Inter-VLAN-set)
FabricPath
VPC+ VPC+
SVIs SVIs
L3
BFD over FabricPath
FabricPath as Transport for BFD
SVI SVI
SVI
L3 / SVI / L3 / SVI /
* - From NX-OS 7.2 sub-interface sub-interface
82
MAC Scale with Nexus 7000 F-Series at Spine
With F1/F2/F2E/F3 FabricPath core ports only at spine
• Core ports do not learn MAC addresses*
• MAC scale not gated by spine switches
Access
POD 1 POD 2 POD 3
Multi-Pod Design
Key Design Highlights
L3 CORE
VLANs 100-199
VLANs 200-299
Active/Active HSRP VLANs 300-399
Active/Active HSRP for VLANs 200-299 VLANs 2000-2099
for VLANs 100-199
Active/Active HSRP
for VLANs 300-399
Native Mixed
FabricPath FabricPath/CE
PODs POD
Any
POD 1 POD 2 POD 3 device
VLANs 100-199 VLANs 200-299 VLANs 300-399
VLANs 2000-2099 VLANs 2000-2099 VLANs 2000-2099
POD local Layer 3 Link
DC-wide
VLANs Layer 2 CE
VLANs
Layer 2 FabricPath
Layer 2 FP Default Topology
FabricPath Multi-Topology
Only DC-wide VLANs exist
in FabricPath core
L3 CORE
VLANs 2000-2099
Core ports in POD belong to default
topology and also mapped to POD-
local topology Core ports
belong only to
default
topology
• FabricPath is efficient
– High bisectional bandwidth (ECMP)
– Optimal path between any two nodes
• FabricPath is scalable
– Can extend a bridged domain without extending the risks generally associated with
Layer 2
Key Takeaways – FabricPath Design
• You can deploy FabricPath today, with traditional network designs
• FabricPath introduces immediate, tangible benefits to any design:
– Simple configuration, eliminate Spanning Tree, leverage parallel network paths, extend
VLANs safely, mitigate loops, etc.
• Provides multiple design options to help you build a network that meets your
requirements
Conclusion
• Thank you for your time today!
• You should now have a thorough understanding of FabricPath
concepts, technology, and design considerations!
Complete Your Online Session Evaluation
• Give us your feedback to be
entered into a Daily Survey
Drawing. A daily winner
will receive a $750 Amazon
gift card.
• Complete your session surveys
though the Cisco Live mobile
app or your computer on
Cisco Live Connect.
Don’t forget: Cisco Live sessions will be available
for viewing on-demand after the event at
CiscoLive.com/Online
Continue Your Education
• Demos in the Cisco Campus
• Walk-in Self-Paced Labs
• Table Topics
• Meet the Engineer 1:1 meetings
• Related sessions
Thank you