Spare Capacity Planning For Survivable Mesh Networks
Spare Capacity Planning For Survivable Mesh Networks
Spare Capacity Planning For Survivable Mesh Networks
Abstract. The design of survivable mesh based STM networks has re-
ceived considerable attention in recent years and is a complex multi-
constraint optimization problem. In this paper, a new spare capacity
planning methodology is proposed utilizing genetic algorithms. The method
is based on forcing flows/traffic which are on paths that are disjoint to
share backup spare capacity. The major advantages of the new approach
are a polynomial time complexity and the capability of incorporating
nonlinear variables such as nonlinear cost functions into the solution al-
gorithm. Numerical results illustrating the form of the genetic algorithm
solution and comparing the proposed methodology to existing techniques
from the literature are presented.
1 Introduction
imply that the network topology is a full mesh, but rather that the network
nodes are at least two connected. Operational backbone networks typically have
an average nodal degree in the range of 2.5 to 4.5 [5].
Previous research on spare capacity assignment in STM mesh networks adopts
the problem context above and uses either mathematical programming tech-
niques [1–7, 11] or heuristics[8–11] to determine the spare capacity allocation.
In both cases, algorithms have been developed for link restoration and path
restoration. In link restoration, the nodes adjacent to a failed link are respon-
sible for rerouting the affected traffic flows around the failed link. Thus, one
merely patches the hole in the original route. In contrast, for path restoration,
the source destination node pairs whose traffic traversed the failed device are
responsible for restoration and reroute over the entire path set between each
affected source destination pair. In general, path restoration is known to require
less spare capacity than link restoration [1, 3–6]. However, path restoration is
more complex to implement as many more nodes are involved in the restora-
tion process. Also, in the general case path restoration requires the release of
stub capacity (i.e., the capacity on the unaffected portion of a failed working
path) both upstream and downstream of the failed device. A variation on path
restoration which speeds up restoration and does not require stub release is path
restoration with link disjoint routes [11]. In path restoration with link disjoint
routes, no portion of the failed working path is used for backup routes, thus
the restoration process can begin quickly without additional signaling overhead
specifying exactly which component failed in the working path and confirming
the release of stub capacity. This is an important advantage in current STM and
WDM networks where fault identification signaling capabilities are limited.
In both link and path restoration one is interested in determining a set of
backup routes for the working traffic and the amount of spare capacity required
on the backup routes to achieve a desired level of traffic restoration for a specific
failure scenario. A typical goal is to determine the spare capacity placement
such that all normal traffic can be restored for any single link failure in the
network. Mathematical programming methods have been used to formulate the
spare capacity planning problem for link and path restoration approaches as
Integer Programming (IP) models [2–6, 11]. The objective function adopted is
to minimize the spare capacity required for achieving restoration from a specific
failure condition. Unfortunately, the resulting IP formulation is NP-hard and
in order to solve the model sub-optimal heuristic approaches have been tried.
One approach is to approximate the IP model by a Linear Programming (LP)
model which has a polynomial time bound solution and then round the solution
to integer values. An alternate approach is to solve the IP model exactly over a
reduced search space. Typically, the search space is reduced by limiting the set of
paths over which the backup routes are determined. For example, placing a hop
count limit on the path set equal to the diameter of the network or a limit on
the number of hops in addition to the shortest path hop count for each source-
destination pair. How to pick the path set within a general setting in order to
achieve good results while maintaining a computational feasible problem that
3
two members of the current generation are selected for breeding and are bred
via a crossover operation creating a pair of offspring to add to the population.
In mutation a portion of the population is randomly selected and stochastically
altered in order to maintain diversity of the genetic search. The new population
is evaluated (sorted according to the fitness function) and the new generation is
selected from the best members of the population with the remainder being elim-
inated. This process is repeated until a stopping criterion is met, usually either
a finite number of generations of the population or when the improvement in
the fitness function between successive generations is less than a specified value.
There are many variations of GAs based on the choice of reproduction,
crossover and mutation operators. Here we develop a GA for our problem do-
main based in part on the approach given in [13] for the solution of nonlinear IP
problems. Before applying the GA to determine spare capacity, the paths to be
used and the required capacity under normal conditions are determined based
on the traffic demands, topology and shortest path routing [15, 18]. We use a
two-stage algorithm to implement the GA method for spare capacity planning.
Stage I determines an initial population for the GA, in effect finding a set of fea-
sible solutions to the spare capacity assignment problem. Stage II then tries to
improve the spare capacity assignment resulting from Stage I via a GA search as
shown in Fig. 1. The two stages of the GA are briefly described below; detailed
information is given in [19].
NPop
Cr p y
Op osso
er ve
Co
ati r
on
sort
die
Mutation
Step 1 involves randomly choosing a link (jk) in the backup topology matrix
BT op then temporarily deleting it and checking if the topology of the virtual
backup network is still 2-connected. The term 2-connected means that there
are at least 2 disjoint paths between every node pair in the topology. If the
backup topology matrix BT op is no longer 2-connected then the deleted link
is returned and Step 1 is repeated.
Step 2 enacts the successful repetition of Step 1, until the number of deleted
links equals a specified input parameter delete.
Note, that Step 1 and 2 in effect randomly reduce the path set to be con-
sidered for routing backup connections, in contrast many IP formulations
reduce the path set by imposing hop count limits.
Step 3 determines the backup paths BP k and spare capacity requirements SAk
for a specific failure scenario and restoration level requirement. Here we
discuss the case of 100% restoration of all traffic affected by the failure of
any single link in the network; other scenarios are given in [19] and require
slight modifications. The BP k and SAk are found by applying the following
steps, with f = 1 initially:
Step 3-1. Fail link f in the network by removing it from the topology ma-
trix T op and determining which working paths in W P are affected by
the failure. Also remove the failed link from the current backup topology
matrix BT op.
Step 3-2. For a source destination pair ij affected by the failure, remove
the working path W Pij between node i and node j, by deleting all the
links of W Pij from the backup topology matrix BT op.
Step 3-3. Find backup path BPij between node i and node j using the
current backup topology matrix BT op to determine the possible path
set and shortest path routing to find the actual path. Determine the
spare capacity requirements for the links on the backup path.
Step 3-4. Return the links of the working path W Pij between node i and
node j to the current backup topology matrix BT op.
Step 3-5. Repeat Steps 3-2 to 3-4 for every source destination pair affected
by the failure of link f , and adjust the spare capacity requirements for
each link. That is, if a link (lm) is used on multiple backup paths for the
same link failure case then the spare capacity requirement at that link
6
Yes
Return Is BTop
BTopi, j No Add offspring to new
2-connected?
population SAN
step 5
step 2 Yes
Perform Mutation step 6
step 9
Is
Return the deleted links to BTop | CSA1t+1-CSA1t | < E
step 5 or (t+1)>rot ?
(BTop = Top)
step 6
Yes
is the sum of the spare capacities needed by each backup path passing
through the link.
Step 3-6. Repeat Steps 3-1 to 3-5 for every link in the network T op (i.e.,
increment f ) and adjust the spare capacity requirements. The final spare
capacity requirement on a link (lm) is the maximum spare capacity re-
quired by the backup paths using that link for any single link failure
f.
The steps above result in each backup path BPij between a pair of nodes
i and j being link disjoint with its corresponding working path W Pij thus
allowing implementation of path restoration with link disjoint routes.
Step 4 calculates the cost CSAk of the current spare capacity assignment SAk .
Step 5 returns the links deleted in Step 1 to BT op resulting in BT op = T op.
Step 6 repeats Steps 1-5, until the number of spare capacity assignments gen-
erated equals the specified population size N pop - an input parameter.
Step 1 sorts the N pop spare capacity assignments SA in ascending order ac-
cording to their costs CSA to form the initial generation SAt with cost
CSAt for t = 1.
Step 2 selects pairs of the N pop spare capacity assignments for breeding. The
pairs (SAy1 y2
t , SAt ) were selected according to a quadratic procedure by ap-
plying the following equation twice to get y1 and y2. √
y = x2 where x is a real number uniformly distributed between [0, N pop].
This approach is biased towards choosing the better solutions in the cur-
rent generation for breeding. Steps 3 and 4 perform breeding and are called
the crossover steps in GA terminology. These steps combine elements of two
existing spare capacity assignments (parents) to produce two new spare ca-
pacity assignments (offspring) that are added to the population for possible
continuation into the next generation. Several types of crossover operators
were tested to find the operator best suited for our problem domain [19].
The crossover operator that was selected was two-position random crossover
as it produced the highest rate of offspring which were feasible solutions.
Step 3 randomly selects two links from each parent, that is links (jk and lm)
for SAy1t and links (pq and rs) for SAt .
y2
offspring are discarded and Steps 3 and 4 are repeated. Otherwise the two
offspring are added to the population of new solutions SAN .
Step 5 repeats Steps 2 to Step 4 N bred times, where N bred is a input parameter
determining the number of children bred by each generation.
Step 6 performs mutation by first selecting a specific percentage of the current
population, set by mrate (the mutation rate in GA terminology). Then the
spare capacity values for the selected solutions are altered randomly with
mutation probability mprob. The resulting alternated solutions, if feasible,
are added to the population of new solutions SAN .
Step 7 calculates the cost CSAN for all the new spare capacity assignments
SAN .
Step 8 forms a new generation of spare capacity assignments SAt+1 . First, by
copying the N reprod best assignments from the current generation to the
new generation along with their cost. Note N reprod is a user input param-
eter. Second, the new solutions from SAN are added to the new generation
along with their cost CSAN . The new generation is then sorted in ascending
order according to the cost and the first N pop spare capacity assignments
are retained in the next generation with the rest being discarded.
Step 9 repeats Steps 2-8, until the improvement in the spare capacity cost be-
tween successive generations is less than E or the number of the generations
created equals rot. Both E and rot are input parameters.
3 Numerical Results
Extensive numerical experimentation was conducted with the GA approach pre-
sented above for a variety of network topologies, loading conditions, failure sce-
narios and restoration requirements with details in [19]. The experimentation had
two purposes, namely: (1) selection of parameter values and sensitivity analysis
for the GA method and (2) comparison with existing spare capacity planning
methods.
In the first set of experiments, the effects of different population sizes, num-
ber of links deleted in determining the initial population, reproduction rates,
crossover algorithms, mutation rates, mutation probabilities and stopping cri-
teria were studied. The parameters governing the performance of the Stage I
algorithm outlined above are the population size, N pop, and delete the number
of links deleted randomly from the virtual backup topology before determining
the backup paths. Consider a network topology with N nodes and L links. We
assume that the network nodes are at least two connected so the average net-
work node degree deg ranges from 2 for a ring to (N − 1) for a full mesh. The
corresponding number of links L ranges from N ≤ L ≤ (N (N −1)/2). Obviously,
as the number of links and the average nodal degree increases, the greater the
spare capacity assignment search space and therefore N pop and delete should
increase. Numerical experimentation showed that a population size in the range
N pop ∈ [(deg − 1)L, (deg)L] worked well. For the number of links to delete,
it was found that the formula delete = L − (N − 1) − (L − N )/(deg − 1) yielded
good results.
9
(L + N × log N )). Step 6 involves performing mutation and a test for feasibility
of the mutated solution resulting in an O(N pop × L × N 2 × (L + N × log N ))
complexity. Steps 7 and 8 consist of calculating the cost of the various spare
capacity assignments and forming a new generation and the complexity is O(L+
N pop). Step 9 involves testing if a stopping criterion is meet. If we assume the
stopping criterion is the maximum number of generations rot, then the overall
complexity of Stage II is O(rot × N pop × (log N pop + L3 × N 2 × (L + N × log N )).
Note that both stages of the genetic algorithm method have a polynomial time
complexity.
Fig. 4. Network 1 (13 nodes, 23 links) Fig. 5. Network 2 (15 nodes, 27 links)
12(9) 12(14)
13 9 7
7(12) 18(10) 21(14) 15(11)
13 9 7 20
10(6) 8(0) 10(10)
8(10) 15(0)
17 7(6) 4(8) 10(7)
24(0) 15(4) 13(0) 9(0)
11(0) 1 2 3 17 7(0)
9(5) 4(0) 31(0) 20(7) 4(15)
12(8) 1 2 3
12
11(7) 8(8)
9(5) 13(0) 6 12 13(11)
4(12) 11(5) 17(0) 8(6) 11(9)
19(10) 20(0) 6 19
5(14) 13(0) 13(0)
15(7) 12(0)
10(0) 4 5 19(12) 18(0) 8(9)
10 10(0) 4 5
7(0) 10
16 11(0) 6(6) 9(8) 13(0)
16 11(0) 7(8)
12(8) 14(11)
18(0) 10(7) 19(0) 10(8)
8 18
8 10(11)
11 10(7) 11 10(0)
10(8) 6(10) 14 13(11) 14
14(11)
15 15
Fig. 6. Network 3 (17 nodes, 31 links) Fig. 7. Network 4 (20 nodes, 37 links)
Four networks with topologies shown in Fig. 4–7 was used to evaluate the
above GA method. One traffic demand (e.g. STS-1) between each directed node
pair in the network was assumed. Table 1 summarizes the parameters of the four
networks. The allocated working capacity for each network was determined using
shortest path routing. The working capacity and the spare capacity required to
provide fault tolerance for any single link failure are marked beside each link as
the first and second number respectively in Fig. 4–7. The bold lines indicate links
with non-zero spare capacity. Notice that the spare capacity is concentrated on
a subset of the links in the network with some links having zero spare capacity.
In determining the results shown in Fig. 4–7, the GA tried to minimize the total
11
network cost when a nonlinear capacity cost function was employed. The total
link cost is the cost of the total capacity as determined by the summation of
both the working and the spare capacity in the link. The parameters for the
piece-wise cost function are selected from an actual network service provider
charges for bandwidth namely: 3.7, 5.8, 19.6, 64.2 for link units of OC-1, OC-3,
OC-12, OC-48 respectively [16]. In order to permit a fair comparison with other
approaches, the path set used for backup topologies in the GA was hop count
limited with a limit of 7.
In the second set of experiments the GA approach was compared with three
methods from the literature: (1) the Spare Link Placement Algorithm (SLPA)
heuristic [5]; (2) Link restoration using Integer Programming (Link IP) [2]; and
(3) Path restoration with link disjoint routes using Integer Programming (Path
IP) [3, 5]. The Spare Link Placement Algorithm (SLPA) is a heuristic approach
with polynomial time complexity, and has been used by several telecom network
operators for spare capacity planning. The Link IP approach reroutes all the
affected traffic demands locally around the failure. As noted earlier, the Path IP
approach is expected to provide the best results since it reroutes failed traffic
over the remaining paths between each source destination pair after a failure.
Both the Link IP and the Path IP techniques were solved using the commercial
optimization package CPLEX. Our implementations of the three methods above
were validated by reproducing published results from the literature. Note that
a lower bound on the spare capacity requirements can be found by taking the
Path IP formulation and relaxing the integer requirement, i.e., solving a Linear
Programming version of the problem.
In Table 2, the total spare network capacity for 100% restoration for any
single link failure as determined by the four algorithms using a hop count limit
at 7 is given. The lower bound on the spare capacity requirements is found by
solving the LP relaxation of the Path IP model is also shown. The % redundancy
as measured by the ratio of spare capacity to the working capacity is also given in
Table 2. For the Link IP and Path IP approaches the objective is to minimize the
total spare capacity in the network. Compared to the Path IP results, the genetic
algorithm has about 7-12% higher redundancy in the four networks. However, it
results in 7-23% less redundancy than the spare capacity assignments given by
the Link IP and SLPA approaches. Note, that the GA approach was designed
to minimize the cost not the amount of spare capacity.
Neither the Link IP nor Path IP methods can deal directly with a nonlinear
cost function. Adapting IP algorithms to nonlinear cost functions usually leads to
an explosion of the search space size. On the contrary, whether the cost function
is linear or not makes no difference to the GA approach. In order to compare the
Link IP, Path IP and SLPA approaches cost, we solve the problem in two steps.
The first step utilizes the algorithms to minimize the spare capacity. The second
step minimizes the total network cost by fixing the link capacity at the level
found in the first step and then finding the combination of capacity assignments
that minimizes the cost. For example if a link requires 9 units of capacity this
could be achieved be either 9 OC-1 or 3 OC-3 links, here the minimum cost
option of 3 OC-3 links would be selected. Table 3 shows the total network costs
from the above algorithms. The least cost solution is obtained by the Path IP
method. The Link IP and SLPA yield solutions with a higher cost than the GA
method or the Path IP method. Notice, that the GA approach can produce a
cost close to that given by the Path IP method (within 3-7%).
times shown below are not directly comparable as they are on different platforms
but the scalability of each scheme can be inferred. From the results one can see
that the execution time of the GA approach is the least affected by increases
in the network size. Additional, numerical results for other network topologies
including networks of more than 100 nodes (with average node degree 3.2) are
given in [19].
4 Conclusions
References
1. R. Doverspike and B. Wilson. Comparison of capacity efficiency of DCS net-
work restoration routing techniques. Journal of Network and System Management,
2(2):88–99, 1994.
2. M. Herzberg, S. Bye, and U. Utano. The hop-limit approach for spare-capacity
assignment in survivable networks. IEEE/ACM Transactions on Networking, pages
775–784, December 1995.
3. R. Iraschko, M. MacGregor, and W. Grover. Optimal capacity placement for path
restoration in STM or ATM mesh survivable networks. IEEE/ACM Transactions
on Networking, pages 325–336, June 1998.
4. M. Herzberg, D. Wells, and A. Herschtal. Optimal resource allocation for path
restoration in mesh-type self-healing networks. In Proceedings of 15th International
Teletraffic Congress, volume 15, Washington, D.C., June 1997.
5. W. D. Grover, R. R.Iraschko, and Y. Zheng. Comparative methods and issues
in design of mesh-restorable STM and ATM networks. In P. Soriano B.Sanso,
editor, Telecommunication Network Planning, pages 169–200. Kluwer Academic
Publishers, 1999.
14
6. Yijun Xiong and Lorne G. Mason. Restoration strategies and spare capacity re-
quirements in self-healing ATM networks. IEEE/ACM Transactions on Network-
ing, 7(1):98–110, February 1999.
7. H. Sakauchi, Y. Nishimura, and S. Hasegawa. A self-healing network with an
economical spare-channel assignment. In Proceeding of IEEE Global Conference
on Communications, pages 438–442, November 1990.
8. B. Venables, W. Grover, and W. MacGregor. Two strategies for spare capacity
placement in mesh restorable networks. In Proceeding of IEEE Global Conference
on Communications, pages 267–271, November 1993.
9. J. Yamada. A spare capacity design method for restorable networks. In Proceeding
of IEEE Global Conference on Communications, pages 931–935, November 1995.
10. B. Van Caenegem, N. Wauters, and P. Demeester. Spare capacity assignment for
different restoration strategies in mesh survivable networks. Proceeding of IEEE
International Conference on Communications, pages 288–292, June 1997.
11. B. Van Caenegem, W. Van Parys, F. De Turck, and P. M. Demeester. Dimensioning
of survivable WDM networks. IEEE Journal on Selected Areas of Communications,
16(7):1146–1157, September 1998.
12. B. Norman and J. Bean. Random keys genetic algorithm for complex scheduling
problems. Naval Research Logistics, 46:199–211, 1999.
13. B. Hadj-Alouane and J. Bean. A genetic algorithm for the multiple choice integer
program. Operations Research, 46:92–101, Jan.-Feb. 1997.
14. K.-T. Ko, K.-S. Tang, C.-Y. Chan, K.-F. Man, and S. Kwong. Using genetic
algorithms to design mesh networks. IEEE Computer, pages 56–60, August 1997.
15. D. Medhi and D. Tipper. Some approaches to solving a multi-hour broadband
network capacity design problem. to appear Telecommunication Systems.
16. E. McDysan and D. L. Spahn. Hands-on ATM. McGraw-Hill, 1998.
17. D. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning.
Addison-Wesley Publishing, 1989.
18. A. Kershenbaum. Telecommunication Network Design Algorithms. McGraw-Hill,
1993.
19. A. Al-Rumaih. A Spare Capacity Planning Methodology for Wide Area Survivable
Networks. Ph.D. dissertation, Department of Information Science and Telecom-
munications. University of Pittsburgh, May 1999.