A better multipathing system

Question

We're lucky, every server we have has multiple NICs/HBAs/CNAs connected to multiple switches and this approach has kept our platform up on numerous occasions. That said we ran into a problem last week that I'm not sure how to fix.

We had a switch that was carrying a good chunk of our traffic crash (the details aren't important but it was a Cisco 6509, it had a hard CPU crash and didn't come back up automatically). Unfortunately it left its line cards working (i.e. L1 & L2 up) but lost all of its uplinks. The servers connected were the following;

Windows Server 2003 32-bit EE SP2 with Veritas Storage Foundation
Oracle Enterprise Linux 5.3 64-bit
VMWare ESXi 4.0
NetApp 3040 running OnTap 7.3.2

All of these machines failed to detect the crashed switch and kept sending traffic its way rather than detecting the failure and moving their traffic to the another switch.

I need help looking at my options for better multipathing, this can't be the first time this has happened - there must be other ways of doing this (polling the HSRP interfaces for instance) - can you help?

Thanks in advance.

Antoine Benkemoun · Accepted Answer · 2010-09-20 14:37:45Z

4

If the switches between your Cisco 6509 and your servers are also Cisco you have an option to shut down all the ports if one (or more) ports goes down. You set a set of "upstream" ports and "downstream" ports. If all the upstream ports go down, the switch will take down the downstream ports.

It is called link state tracking and it is designed for situations like yours.

You will find a little info on this page.

answered Sep 20, 2010 at 14:37

Antoine Benkemoun

7,3343 gold badges44 silver badges60 bronze badges

Sorry, I can't have been clear enough, we only use 6509s (and Nexus's in the core), the servers connect directly to the 6509s in this case. The problem with this option is that it's based on the assumption that the switches lose their actual links rather than the switches just crashing - in this situation they have no ability to action anything at all - they're dead. Thank you for your suggestion though
– Chopper3
Commented Sep 20, 2010 at 15:14
I guess pinging the HSRP interface is a solution then but it doesn't look very good. Another way to do this would be to traceroute some host and expect it to have at least X hops. If it doesn't reach the Xth hop, then you can suppose that interface is no good.
– Antoine Benkemoun
Commented Sep 20, 2010 at 15:25
1

The problem is that I want that 'pinging' ability built into four different platforms - two of which I can't really make changes to (ESXi and Netapp). Cheers.
– Chopper3
Commented Sep 20, 2010 at 15:49

Add a comment |

Stack Exchange Network

A better multipathing system

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
resilience
.

Hot Network Questions

A better multipathing system

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged resilience.

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
resilience
.