Our Nexus Data Center Network With VPC
Our Nexus Data Center Network With VPC
Our Nexus Data Center Network With VPC
vPC Summary:
For more information on VPC please refer to the Cisco vPC white paper.
We simulated two access designs. On the left was the traditional Spanning Tree access design. The
right side was a 5K access switch configuration with the new vPC access design guidelines and
recommendations.
Right access switch N5K-vPC (in the diagram) and VLANs 201 and 202 are applicable to vPC
access design.
vPC has a concept of primary and secondary switch. N7K-01 here is configured as vPC
Primary and N7K-02 is vPC Secondary Switch.
As per vPC best practices Primary vPC switch (N7K-01) is configured as Spanning Tree Root
and also HSRP Primary for VLAN 201 and 202.
vPC Keep-alive (ft) link is a 1Gig L3 p2p link between the two Nexus-7Ks; whereas the vPC
peer links (regular L2 trunk) between the two chassis leverage 2x10 Gig ports (as per
guidelines).
Left access switch N5K-SPT (in the diagram) and VLANs 101 and 102 are applicable to SPT
access design.
N7K-01 is configured as Spanning Tree Root and HSRP Primary for VLAN 101 where as N7K-02
is Spanning Tree Root and HSRP Primary for VLAN 102.
Spanning Tree Access design is leveraging a traditional looped triangle design with odd VLANs
preferring left Aggregation switch and even ones preferring the right switch.
In our testing, most of the tests were to demonstrate the functionality and observe traffic patterns
and convergence times in case of different failure scenarios. Performance testing was out of scope.
We also tried to compare vPC design with traditional SPT looped triangle access design behaviors.
Traffic generators were leveraged to simulate traffic flow from both core switches down to the access
layer. Below is a snapshot of test scenarios and observed traffic patterns.
Normal Conditions:
All members of Po1 @N5K-vPC are in forwarding state. No spanning tree blocked VLANs/ports.
Even though HSRP active for VLAN 201 and 202 is N7K-01 both switches (N7K-01 and N7K-02) are
forwarding traffic out to core switches with minimal traffic passing across the vPC peer-link
Return Traffic is being CEF load-balanced to both N7K from the cores and forwarded directly to
N5K-vPC switch.
Po1 @ N5K-SPT in SPT forwarding state for VLAN 101 and blocking for 102; vice versa for Po2.
Traffic being forwarded out to core switches by N7K-01 for VLAN 101 and by N7K-02 for VLAN102.
Return Traffic being CEF load-balanced to both N7K form the core switches and forwarded directly
to N5K-SPT switch for respective VLANs.
Some traffic observed across the L2 Trunk between two 7Ks (vPC peer links).
VLAN SVIs (interface VLANs) 201 and 202 are down on N7K-02, since there are no active interfaces
in these VLANs.
Since these VLANs are down, N7K-02 stops advertising a route for VLAN 201 and 202 subnets to
the cores which receive only a single route from N7K-01.
Once the peer links are restored N7K-02 brings back Po10 (within 3 - 4 secs) and advertises route
for VLAN 201 and 202 subnets to the cores and traffic flows as per normal scenario. No packet drops
observed.
Since no loop exists any more … N5K-SPT starts forwarding VLANs 101 and 102 on both Port
channels.
HSRP for VLANs 101 and 102 on both N7Ks go in active-active state.
Root guard blocks VLAN 101 on Po3@N7K-02 and VLAN 102 on Po2@N7K-01
Outbound (northbound) traffic for VLAN 101 is being forwarded via N7K-01 and for VLAN 102
forwarded via N7K-02.
Incoming asymmetric traffic (traffic for VLAN 102 landing on N7K-01 and vice versa) is black-holed
as root-guard is blocking the path to the VLANs. (Route tuning was not configured which will be the
way to go if SPT design is the selected access design).
Once the L2 Trunk was brought back up SPT converges back to the Normal Condition state as
stated above. Packet loss observed during re-convergence as well.
No changes in any packet forwarding. Just a syslog message on both N7K specifying the keep-alive
link is down. This is not a traffic disrupting condition by itself
Double failure scenario - 1 (VPC peer link failure followed by a VPC Keep-alive link failure):
vPC access design converges to Failure of L2 Trunk/VPC peer state as listed above when the peer
link fails.
Next when the keep-alive link fails a syslog message on both N7K specifying the keep-alive link is
down.
Hence all traffic both directions is now flowing through N7K-01 and once the peer links are
restored N7K-02 brings back Po10 and advertises route for VLAN 201 and 202 subnets to cores and
traffic flows as per normal scenario.
Since the L2 Trunk (vPC peer link) ha failed SPT converges to Failure of L2 Trunk/VPC peer link SPT
access design state as stated above.
Double failure scenario - 2 (VPC Keep-alive link failure followed by a VPC peer link failure):
When the keep-alive link goes down a syslog message is generated on both N7K specifying the
keep-alive link.
Next when the peer-link fails this is a split brain condition and vPC members have no way to verify
the state of other peer.
No vPC port channels are shutdown and both N7Ks keep forwarding packets.
Existing flows continue to be forwarded as before the failure; but new flows learning are impaired
and uncertain forwarding (or broken state) for new flows are observed.
Once the peer links and keep-alive link are restored traffic flows are restored as per normal
scenario.
Since the L2 Trunk (vPC peer link) has failed SPT converges to Failure of L2 Trunk/VPC peer link SPT
access design state as stated above
Hopefully this gives you an idea of where vPC technology is headed in the DC and how it
differentiates from SPT-type implementations. Also, as I mentioned in my last blog, even though it
was done with an EFT code (not an official CCO release), while testing we were able to demonstrate
all above expected behaviors. As of last Friday (2/6) 4.1.3 Nexus 7000 code has been released on CCO
and is available for deployments.