Dual ISP Failover With RPM Ip Monitoring
Dual ISP Failover With RPM Ip Monitoring
Dual ISP Failover With RPM Ip Monitoring
Internet isnt perfect and we may have link failures from time to time. How do we react to these
failures? Manually or we have an automatic way. I would like to show on this post how Junos
can take action upon an upstream gateway reachability issue and how SRX flow behaves in such
a scenario. To achieve this task we will use a handful of features currently available on an SRX
box. Before getting started, check my test topology below in order to understand this post. It is a
simulated Internet environment with some fake public IP addresses. BranchC is our client side
SRX device and we have two connected PCs and we will do every config magic on this BranchC
device.
Test Plan
1) Create two routing instances for each ISP & cross import the routes between these two
instances
2) Forward Debian1 traffic to ISP1 and HostC traffic to ISP2 by using Filter Based
Forwarding
3) Monitor each ISP by using RPM (Real Time Performance Monitoring) feature
5) If any ISP link fails, failover the default route to the other ISP by using ip monitoring
feature
1
2
3
4
5
6
7
8
9
10
[edit]
root@branchC# show routing-options
rib-groups {
ISP1-to-ISP2 {
import-rib [ ISP1.inet.0 ISP2.inet.0 ];
}
ISP2-to-ISP1 {
import-rib [ ISP2.inet.0 ISP1.inet.0 ];
}
}
Then create routing instances and activate rib-groups.
[edit]
root@branchC# show routing-in
ISP1 {
instance-type virtual-router;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[edit]
root@branchC# show routing-instances
ISP1 {
instance-type virtual-router;
interface ge-0/0/0.951;
routing-options {
interface-routes {
rib-group inet ISP1-to-ISP2;
}
static {
route 0.0.0.0/0 next-hop 173.1.1.1;
}
}
}
ISP2 {
instance-type virtual-router;
interface ge-0/0/0.202;
routing-options {
interface-routes {
rib-group inet ISP2-to-ISP1;
}
static {
route 0.0.0.0/0 next-hop 212.44.1.1;
}
}
}
Now routing table should be ready i.e routes from each instances should be cross imported.
root@branchC> show route
inet.0: 2 destinations, 2 routes (
+ = Active Route, - = Last Activ
1
2
3
4
5
6
*[Direct/0] 1d 01:58:44
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
*[Static/5] 1d 01:53:34
> to 173.1.1.1 via ge-0/0/0.951
173.1.1.0/24
*[Direct/0] 1d 01:54:14
> via ge-0/0/0.951
173.1.1.2/32
*[Local/0] 1d 01:54:14
Local via ge-0/0/0.951
173.1.1.10/32
*[Static/1] 1d 01:54:14
Receive
212.44.1.0/30
*[Direct/0] 1d 01:37:00 <<<< --- This is the route of ISP2
> via ge-0/0/0.202
212.44.1.2/32
*[Local/0] 1d 01:37:00
Local via ge-0/0/0.202
ISP2.inet.0: 5 destinations, 5 routes (5 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
0.0.0.0/0
*[Static/5] 1d 01:54:14
> to 212.44.1.1 via ge-0/0/0.202
173.1.1.0/24
*[Direct/0] 1d 01:37:00 <<<< --- This is the route of ISP1
> via ge-0/0/0.951
173.1.1.2/32
*[Local/0] 1d 01:37:00
Local via ge-0/0/0.951
212.44.1.0/30
*[Direct/0] 1d 01:54:14
> via ge-0/0/0.202
212.44.1.2/32
*[Local/0] 1d 01:54:14
Local via ge-0/0/0.202
We have completed the first task. Each routing instance is aware of the brother routing instance.
Now we should route traffic from clients to the respective ISPs.
2) Forward Debian1 traffic to ISP1 and HostC traffic to ISP2
Below by using firewall filters, we redirect each traffic to the routing instances.
[edit]
root@branchC# show firew all
family inet {
filter redirect-to-isp {
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
[edit]
root@branchC# show firewall
family inet {
filter redirect-to-isp {
term to-isp1 {
from {
source-address {
173.63.1.100/32;
}
}
then {
routing-instance ISP1;
}
}
term to-isp2 {
from {
source-address {
173.63.1.200/32;
}
}
then {
routing-instance ISP2;
}
}
term default-allow {
then accept;
}
}
}
but it isnt activated until we apply it on the incoming interface
[edit]
root@branchC# show interface
vlan-id 963;
family inet {
1
2
3
4
5
6
7
8
[edit]
root@branchC# show interfaces ge-0/0/0.963
vlan-id 963;
family inet {
filter {
input redirect-to-isp; <<< --- We are redirecting client traffic.
}
address 173.63.1.1/24;
9 }
Redirecting client traffic to routing instances is also completed. Now we will monitor ISP links.
3) Monitor each ISP by using RPM
Junos has a great real time monitoring feature. You can continuously check link quality and
probe remote hosts. RPM requires another dedicated post actually but shortly what we do below
is that we probe each ISP gateway with 1 seconds interval 5 times by using ICMP and if the total
loss of in a single test is 5, then TEST FAILS. What does a test failure mean practially for us? It
means we can take an IP monitoring action for this failure.
[edit]
root@branchC# show services
probe probe-isp1 {
test test-isp1 {
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
[edit]
root@branchC# show services rpm
probe probe-isp1 {
test test-isp1 {
probe-type icmp-ping;
target address 173.1.1.1;
probe-count 5;
probe-interval 1;
test-interval 3;
source-address 173.1.1.2;
routing-instance ISP1;
thresholds {
total-loss 5;
}
}
}
probe probe-isp2 {
test test-isp2 {
probe-type icmp-ping;
target address 212.44.1.1;
probe-count 5;
probe-interval 1;
test-interval 3;
source-address 212.44.1.2;
routing-instance ISP2;
thresholds {
total-loss 5;
}
}
30
}
If we want to check the probe results
root@branchC>
show services rpm probe-resu
Ow ner: probe-isp1, Test: tes
Target address: 173.1.1.1, So
I am running traceroute from each hosts and traffic follows different ISP for each host. This is
what we wanted to do first of all when dual links are functional.
root@debian1:~# traceroute -n
traceroute to 87.1.1.6 (87.1.1.6)
1 173.63.1.1 3.857 ms 3.811
2 173.1.1.1 13.120 ms 13.130
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
6
7
8
9
10
[edit]
root@branchC# show services ip-monitoring
policy track-isp1 {
match {
rpm-probe probe-isp1;
}
then {
preferred-route {
routing-instances ISP1 {
route 0.0.0.0/0 {
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
next-hop 212.44.1.1;
}
}
}
}
}
policy track-isp2 {
match {
rpm-probe probe-isp2;
}
then {
preferred-route {
routing-instances ISP2 {
route 0.0.0.0/0 {
next-hop 173.1.1.1;
}
}
}
}
}
Now we will simulate a failure on the ISP1 after which Debian1 device will also be routed
through the ISP2 instead of ISP1. Aha, link failed!
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Nov
10 22:36:39
22:36:39.463629:CID-0:RT:<173
matched filter rpm-ff:
1
2
3
4
5
And after these two seconds also pass, flow deletes the session from session table.
Nov 10 22:36:45 22:36:45.1575
Nov 10 22:36:45 22:36:45.1575
Nov 10 22:36:45 22:36:45.1575