PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY VOLUME 13 MAY 2006 ISSN 1307-6884
Dynamic Load Balancing Strategy
for Grid Computing
Belabbas Yagoubi and Yahya Slimani
Abstract— Workload and resource management are two essential
functions provided at the service level of the grid software infrastructure. To improve the global throughput of these software environments, workloads have to be evenly scheduled among the available
resources. To realize this goal several load balancing strategies and
algorithms have been proposed. Most strategies were developed in
mind, assuming homogeneous set of sites linked with homogeneous
and fast networks. However for computational grids we must address
main new issues, namely: heterogeneity, scalability and adaptability.
In this paper, we propose a layered algorithm which achieve dynamic
load balancing in grid computing. Based on a tree model, our
algorithm presents the following main features: (i) it is layered;
(ii) it supports heterogeneity and scalability; and, (iii) it is totally
independent from any physical architecture of a grid.
Keywords— Grid Computing, Load balancing, Workload, Treebased model.
I. I NTRODUCTION
The development of computational grids and the associated
middleware has been actively pursued in recent years to deal
with the emergence of greedy applications of large computing
tasks and amounts of data [6], [7]. Managing such applications
leads to some complex problems for which traditional architectures are insufficient. There are many potential advantages
of use grid architectures, including the ability to solve large
scale advanced scientific and engineering applications whose
computational requirements exceed local resources, and the
reduction of job turnaround time through workload balancing across multiple computing facilities [1]. In his reference
book, Foster [7] defined a computational grid as an emerging
computing infrastructure that enables effective access to high
performance computing resources. An important issue of such
systems is the efficient assignment of tasks and utilization of
resources, commonly referred to as load balancing problem.
The main contributions of this paper are two fold. First we
propose a tree-based model to represent grid architecture
with the perspective of managing workload. This model is
characterized by three main features: (1) it’s hierarchical in
order to minimize the load balancing computation overhead;
(2) the model is based on a structure using an univocal
transformation of any grid architecture into a tree, with at
most four levels; (3) it’s totally independent from any physical
grid architecture. The second contribution , the proposed load
balancing algorithm which is layered in accordance with the
strategy based on the above model. We seek to achieve load
balancing that privilege workload neighborhood, to reduce
amount of messages. Our strategy deals with a three layers
algorithms (intra-site, intra-cluster and intra-grid).
The remainder of this paper is organized as follows: Section 2
PWASET VOLUME 13 MAY 2006 ISSN 1307-6884
describes the essential aspects of the load balancing problem.
The mapping of any grid architecture into a tree-based model
is explained in Section 3. Section 4 describes the main
steps of our proposed load balancing strategy and outlines
the layered algorithm developed over the model. Behavior
and performance of our algorithm are evaluated in Section
5. Finally, Section 6 concludes the paper and gives some
interesting areas for future researches.
II. L OAD BALANCING P ROBLEM
This problem has been discussed in traditional distributed
systems literature for more than two decades. Various strategies and algorithms have been proposed, implemented and
classified in a number of studies [4], [13], [15]. Load balancing
algorithms can be classified into two categories: static or
dynamic. In static algorithms, the decisions related to load
balance are made at compile time when resource requirements
are estimated. A multicomputer with dynamic load balancing
allocate/reallocate resources at runtime based on no a priori
task information, which may determine when and whose tasks
can be migrated. A good description of customized load balancing strategies for a network of workstations can be found in
[15]. More recently, Houle and al. [9] consider algorithms for
static load balancing on trees, assuming that the total load is
fixed. Contrary to the traditional distributed systems for which
a plethora of algorithms have been proposed, few of which
were focussed on grid computing. This is due to the innovation
and the specific characteristics of this infrastructure.
Load balancing algorithms can be defined by their implementation of the following policies [10]:
• Information policy: specifies what workload information
to be collected, when it is to be collected and from where.
• Triggering policy: determines the appropriate period to
start a load balancing operation.
• Resource type policy: classifies a resource as server or
receiver of tasks according to its availability status.
• Location policy: uses the results of the resource type
policy to find a suitable partner for a server or receiver.
• Selection policy: defines the tasks that should be migrated
from overloaded resources (source) to most idle resources
(receiver).
A. Performance parameters
The main objective of load balancing methods is to speed
up the execution of applications on resources whose workload
varies at run time in unpredictable way [5]. Hence, it is
significant to define metrics to measure the resource workload.
260
© 2006 WASET.ORG
PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY VOLUME 13 MAY 2006 ISSN 1307-6884
Every dynamic load balancing method must estimate the
timely workload information of each resource. This is a key
information in a load balancing system where responses are
given to following questions: (i) how to measure resource
workload? (ii) what criteria are retaining to define this workload? (iii) how to avoid the negative effects of resources
dynamicity on the workload; and, (iv) how to take into account
the resources heterogeneity in order to obtain an instantaneous
average workload representative of the system?
Several load indices have been proposed in the literature,
like CPU queue length, average CPU queue length, CPU
utilization, etc. The success of a load balancing algorithm
depends from stability of the number of messages (small
overhead), support environment, low cost update of the workload, and short mean response time which is a significant
measurement for a user [2]. It is also essential to measure the
communication cost induced by a load balancing operation.
B. Problems Specific to grid computing
The most proposed load balancing algorithms were developed in mind, assuming homogeneous set of sites linked with
homogeneous and fast networks [11]. If this assumption is
true in traditional distributed systems, it is not realistic in grid
architectures because following properties that characterize
them [3]:
• Heterogeneity: A Grid involves multiple resources that are
heterogeneous in nature and might span numerous administrative domains across a potentially global expanse.
• Scalability: A Grid might grow from few resources to
millions. This raises the problem of potential performance
degradation as the size of a Grid increases.
• Adaptability: In a Grid, a resource failure is the rule, not
the exception. That means that the probability of some
resources fail is naturally high. Resource managers must
tailor their behaviour dynamically so that they can extract
the maximum performance from the available resources
and services.
These properties make the load balancing problem more
complex than in traditional parallel and distributed systems,
which offer homogeneity and stability of their resources.
Also interconnected networks on grids have very disparate
performances and tasks submitted to the system can be very
diversified and irregular. These various observations show that
it is very difficult to define a load balancing system which can
integrate all these factors.
SE ijk
PWASET VOLUME 13 MAY 2006 ISSN 1307-6884
ijk
Switch SW jk
Site S
ik
Site S jk
Cluster
Ck
Gategtk
Site S lk
Cluster C
l
Cluster C m
Fig. 1.
Grid topology
B. Load balancing generic model
Our model is based on an incremental tree. First, for each
site we create a two-level subtree. The leaves of this subtree
correspond to the computing elements of a site, and the root
is a virtual node associated to the site. These subtrees, that
correspond to sites of a cluster, are then aggregated to form
a three-level sub-tree. Finally, these sub-trees are connected
together to generate a four-level tree called load balancing
generic model. This model is denoted by G/S/M, where G is
the number of clusters that compose a grid, S the number of
sites in the grid and M the number of Computing Elements.
This model can be transformed into three specific models:
G/S/M , 1/S/M and 1/1/M , depending on the values of G
and S. The generic model is a non cyclic connected graph
where each level has specific functions.
•
•
III. T REE - BASED BALANCING MODEL
In order to well explain our model, we first define the
topological structure for a grid computing.
A. Grid topology
We suppose that a grid computing (see Fig. 1) is a finite
set of G clusters Ck , interconnected by gates gtk , k ∈
{0, ..., G − 1}, where each cluster contains one or more sites
Sjk interconnected by switches SWjk and every site contains
some Computing Elements CEijk and some Storage Elements
SEijk , interconnected by a local area network.
CE
•
•
Level 0: In this first level (top level of the tree), we
have a virtual node that corresponds to the root of
the tree. It is associated to the grid and performs two
main functions: (i) manage the workload information of
the grid; (ii) decides, upon receiving tasks from users,
where these tasks can be launched, based on the user
requirements and the current load of the grid.
Level 1: This level contains G virtual nodes, each one
associated to a physical cluster of the grid. In our load
balancing strategy, this virtual node is responsible to
manage workload of its sites.
Level 2: In this third level, we find S nodes associated
to physical sites of all clusters of the grid. The main
function of these nodes is to manage the workload of
their physical computing elements.
Level 3: At this last level (leaves of the tree), we find
the M Computing Elements of the grid linked to their
respective sites and clusters.
Figure 2 shows the generic tree model associated to a grid,
with its three variants: 1/1/M, 1/S/M and G/S/M .
261
© 2006 WASET.ORG
PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY VOLUME 13 MAY 2006 ISSN 1307-6884
is to decrease the amount of messages between computing
elements. As consequence of this goal, the overhead induced
by our strategy is reduced. In terms of complexity, we can
easily prove that the amount of messages is linear according
to the number of computing elements.
grid manager
Level 0
Grid
sites manager
Cluster G
Cluster 1
Level 1
CE’s manager
Site 1,1 .................... Site 1,j ...........
CE
1,1,1
CE
1,1,i
CE
1,1,k
CE
1,j,1
Level 2
Site G,S
CE
1,j,l
CE
G,S,1
CE
G,S,M
Level 3
MODEL 1/1/M
B. Algorithms
MODEL 1/S/M
MODEL G/S/M
Fig. 2.
According to the strategy described above, we define three
levels of load balancing algorithms: intra-site, intra-cluster
and intra-grid.
Load balancing generic model
C. Characteristics of the proposed model
The main features of our proposed load balancing generic
model are listed below:
1) it is hierarchical: this characteristic facilitate the information flow through the tree and well defines the
message traffic in our strategy;
2) it supports heterogeneity and scalability of grids: adding
or removing entities (computing elements, sites or clusters) are very simple operations in our model (adding or
removing nodes, subtrees);
3) it is totally independent from any physical architecture
of a grid: the transformation of a grid into a tree is an
univocal transformation. Each grid corresponds to one
and only one tree.
IV. L OAD BALANCING STRATEGY
A. Principles
In accordance with the structure of proposed model, the load
balancing strategy is also hierarchical. Hence, we distinguish
between three load balancing levels:
1) Intra-site load balancing: In this first level, depending
on its current load, each site decides to start a load
balancing operation. In this case, the site tries, in priority, to load balance its workload among its computing
elements.
2) Intra-cluster load balancing: In this second level, load
balance concerns only one cluster, Ck , among the clusters of a grid. This kind of load balance is achieved only
if some sites Sjk fail to load balance its workload among
their respective computing elements. In this case, they
need the assistance of its direct parent, namely cluster
Ck .
3) Intra-grid load balancing: The load balance at this level
is used only if some clusters Ck ’s fail to load balance
their load among their associated sites.
The main advantage of this strategy is to prioritize local load
balancing first (within a site, then within a cluster and finally
on the whole grid). The goal of this neighborhood strategy
PWASET VOLUME 13 MAY 2006 ISSN 1307-6884
1) Notations: Notations used in the following algorithms
are defined in [14] as follows:
Ck = k th cluster; Sjk = j th site of k th cluster; CEijk = ith
computing element of j th site of kth cluster; T =Tolerance
factor expressed in (%); Lijk =actual workload of CEijk ;
Ljk =actual workload of Sjk ; Lk =actual workload of Ck ; Avrg
= grid average load; sit-max = parameter to start an intracluster load balancing; clu-max = parameter to start an interclusters load balancing.
2) Intra-site load balancing algorithm: This algorithm is
considered as the kernel of our load balancing strategy. It
is executed when CE’s managers find that there exists an
imbalance between computer elements under their control.
To make this report, the managers receive periodically local
workload information from computing elements. Based on
these information and on the Avrg of average grid workload,
the managers analyze the workload of sites periodically. Depending on the result of this analysis, either they decide to
start a local load balancing between CE’s of the same site, or
either they inform their manager site (cluster) that are currently
overloaded.
262
Case of a site Sjk
BEGIN
For Every CEijk AND Periodically do
-Update actual workload Lijk of CEijk
-Send load information to CE’s manager Sjk
end For
- Update current load Ljk of site Sjk
- Send load information to sites manager Ck
- Receive grid average load from Ck
|L −Avrg|
If ( jkLjk
> T ) then
imbalance state
else
return
end If
© 2006 WASET.ORG
PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY VOLUME 13 MAY 2006 ISSN 1307-6884
[Partition CE’s of Sj k into overloaded, underloaded and
balanced]
CES ← ∅; CER← ∅; CEN ← ∅
For Every CEijk of Sjk do
Switch
- Lijk > Avrg + T :
CES ← CES ∪ {CEijk }
- Lijk < Avrg:
CER ← CER ∪ {CEijk }
- Avrg ≤ Lijk ≤ Avrg + T :
CEN ← CEN ∪ {ECijk }
end Switch
end For
While (CES = ∅ AND CER = ∅) do
- Sort CES by descending order of Lijk
- Sort CER by ascending order of Lijk
- CEsjk ← CE most overloaded;
- ECrjk ← CE most lightly overloaded
- Load offered by ECrjk = Avrg − Lrjk
- Tasks migration stage from CEsjk to CErjk
- Update current workload of CEsjk , CErjk
- Update sets CES , CER and CEN
done
If (card(CES) ≥ sit-max) then
-Intra-Site load balancing fail
-Call intra cluster load balancing algorithm
else
success load balancing
end If
END.
3) Intra-cluster load balancing algorithm: This algorithm
is executed only when some CE’s managers fail to balance
locally the overload of their CE’s. Knowing the global state
of each site, the sites manager can evenly distribute the global
overload between its sites.
For every Sjk of Ck do
Switch
-Ljk > Avrg + T :
SS ← SS ∪ {Sjk }
-Ljk < Avrg :
SR ← SR ∪ {Sjk }
-Avrg ≤ Ljk ≤ Avrg + T :
SN ← SN ∪ {Sjk }
end Switch
end For
While (SS = ∅ AND SR = ∅) do
- Sort SS by descending order of workload Ljk
- Sort SR by ascending order of Ljk
- Slk ← most overloaded site ;
-Srk ← most underloaded site
- Load offered by Srk = Avrg − Lrk
- Tasks migration decision from Slk to Srk
- Start Intra site load balancing algorithm
done
If (card(SS) ≥ clu-max) then
-Intra-Cluster Load balancing fail
-Call intra-grid load balancing algorithm
else
success load balancing
end If
END.
4) Intra-grid load balancing algorithm: This third algorithm performs a global load balancing among all clusters of
a grid. It is executed only if the other two levels are failed to
achieve a complete load balance.
It is significant to remark that at this level, the algorithm
always succeeds load balancing all these clusters. It is thus
useless to test if some clusters are still imbalanced.
Case of a cluster Ck
Case of the global grid
BEGIN
For Every site Sjk of Ck AND Periodically do
Update current workload Ljk and send it to sites
manager Ck
end For
- Update actual load Lk of Ck
- Send it to grid manager
- Receive grid average load from grid manager
If (Overloaded sites number ≥ sit-max) then
Ck is in imbalance state
else
Return
end If
[Partition sites of Ck into overloaded, underloaded and
balanced sites]
SS ← ∅; SR ← ∅; SN ← ∅
BEGIN
For Every Ck AND Periodically do
- Update current workload Lk
- Send it to grid manager
end For
- Update global grid workload
- Compute grid average load
- Send it to all clusters
If (overloaded clusters number ≥ clu-max) then
Grid is imbalanced
else
Return
end If
[Grid Partition into overloaded, underloaded and balanced
clusters]
CS ← ∅; CR ← ∅; CN ← ∅
PWASET VOLUME 13 MAY 2006 ISSN 1307-6884
263
© 2006 WASET.ORG
PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY VOLUME 13 MAY 2006 ISSN 1307-6884
For Every Ck do
Switch
Lk > Avrg + T : CS ← CS ∪ {Ck }
Lk < Avrg : CR ← CR ∪ {Ck }
Avrg ≤ Lk ≤ Avrg +T : CN ← CN ∪{Ck }
end Switch
end For
While (CS = ∅ AND CR = ∅) do
- Sort CS in decreasing order of workload Lk
- Sort CR in increasing order of Lk
- Cs ← most overloaded cluster
- Cr ← most underloaded
- Tasks migration stage from Cs to Cr
- Start intra-Cluster load balancing algorithm
done
END.
#CE’s
50
100
150
200
250
TABLE I
R ESPONSE TIME VS N UMBER OF CE’ S (N UMBER
#Tasks
1000
1250
1500
1750
2000
Response time(sec)
Before After Gain (%)
113
95
15.93
134
114
14.93
145
122
15.86
156
132
15.38
164
139
15.24
Cost
(Sec)
41
50
49
47
49
OF TASKS =2000)
Cost
(Sec)
22
33
37
47
49
TABLE II
R ESPONSE TIME VS N UMBER OF TASKS (N UMBER OF CE’ S =250)
V. E XPERIMENTAL STUDY
A. Modelling parameters
In order to evaluate the practicability and the performance of
our model we have developed a grid simulator. This simulator
was built in Java and uses the following parameters:
1) CE’s parameters: these parameters give information
about available CE’s during load balancing period such
as: (i) number of sites; (ii) number of CE’s in each site;
(iii) CE’s speeds; (iv) date to send workload information
from CE’s; and, (v) tolerance factor.
2) Tasks parameters: these parameters include: (i) number
of tasks queued at every CE; (ii) task submission date;
(iii) number of instructions per task; (iv) task size; and,
(v) priority.
3) Network parameter: bandwidth size.
4) Workload index: as workload for Computing Elements,
inst
,
we have used their occupation ratio: workload= speed
where inst denotes the total number of instructions
queued on a given CE and speed is its speed.
5) Performance parameters: in our experimentations, we
focused on two performance parameters: tasks average
response time and cost communication.
obtain a gain varying between 10.65% and 21.43%, with
negligible cost of communication.
2) For a number of CE’s equal to 250 and for a number
of tasks varying from 1000 to 2000 by step of 250, the
gain varies from 14.93% to 15.93%.
3) During our experiments, we have remarked that the best
gains are obtained when the grid is in a stable state
(neither overloaded nor completely idle).
Besides the decrease in the average response times, we notice
a stability on the time communication and, therefore, a small
amount of messages induced by neighborhood strategy. Figures 3illustrates the variation of time communication according
to the number of tasks and according to the number of
computing elements.
B. Experimental results
All the experiments were performed on PC Pentium IV of
2.8 GHz, with a 256 MB RAM and running under Windows
XP. In order to obtain reliable results, we reiterated the same
experiments more than ten (10) times.
In the sequel, we will give the experimental results relating
to the response time according to the number of tasks and
according to the number of computing elements. The following
tables (see Tables I and II) show the variation of the average
response time before and after execution of the intra-site load
balancing algorithm.
In Tables I and II, Bef ore and Af ter represent mean
response time before and after load balancing is performed and
cost defines the communication cost expressed in seconds.
From these tables, we remark that our strategy leads to a good
load balancing:
1) For a number of tasks fixed at 2000 and for a number
of CE’s varying from 50 to 250 by step of 50, we
PWASET VOLUME 13 MAY 2006 ISSN 1307-6884
Response time(sec)
Before After Gain (%)
817
730
10.65
409
353
13.69
273
230
15.75
126
99
21.43
164
139
15.24
264
Fig. 3.
Communication time Vs Number of CE’s and tasks
© 2006 WASET.ORG
PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY VOLUME 13 MAY 2006 ISSN 1307-6884
VI. C ONCLUSION AND FUTURE WORKS
R EFERENCES
In this paper, we addressed the problem of load balancing
in grid computing. We proposed a load balancing strategy
based on a tree model representation of a grid architecture. The
model allows the transformation of any grid architecture into
an unique tree with at most four levels. From this generic tree,
we can derive three sub-models depending on the elements that
compose a grid: one site, one cluster, or in the general case
multiple clusters. Using this model, we defined a hierarchical
load balancing strategy that gives priority to local load balancing within sites. The proposed strategy leads to a layered
algorithm which an prototype was implemented and evaluated
on a grid simulator developed for the circumstance. The first
results of our experimentations show that the proposed model
can lead to a better load balancing between CE’s of a grid
without high overhead. We have observed that significant
benefit in mean response time was realized with a reduction
of communication cost between clusters.
The model presented in this paper raises a number of challenges for further researches. First, we plan to test our model
on others grid simulators [8] . Second, we plan to experiment
our model on real grid environments like Globus [6] and
XtremWeb [12], using a realistic grid application, in order
to validate the practicality of the model.
[1] E. Deelman A.Chervenak and al. High performance remote access to
climate simulation data: a challenge problem for data grid technologies.
In Proceeding. of 22th parallel computing, volume 29(10), pages 13–35,
1997.
[2] E. Badidi. Architecture and services for load balancing in object
distributed systems. PhD thesis, Faculty of High Studies, University
of Montreal, Mai 2000.
[3] F. Berman, G. Fox, and Y. Hey. Grid Computing: Making the Global
Infrastructure a Reality. Wiley Series in Communications Networking
& Distributed Systems, 2003.
[4] T.L. Casavant and J.G. Kuhl. A taxonomy of scheduling in general
purpose distributed computing systems. IEEE Transactions on Software
Engineering, 14(2):141–153, 1994.
[5] D.L. Eager, E.D. Lazowska, and J. Zahorjan. Adaptive load sharing in
homogeneous distributed systems. In IEEE Trans. on Soft. Eng., volume
12(5), pages 662–675, 1986.
[6] I. Foster and C. Kesselman. Globus: a metacomputing infrastructure
toolkit. Int. Jour. of Super-Computer and High Performance Computing
Applications, 11(2):115–128, 1997.
[7] I. Foster and C. Kesselman. The Grid: Blueprint for a New Computing
Infrastructure. Morgan Kaufmann, 1998.
[8] GridSim.
A grid simulation toolkit for resource modelling
and application scheduling for parallel and distributed computing.
www.buyya.com/gridsim/.
[9] M. Houle, A. Symnovis, and D. Wood. Dimension-exchange algorithms
for load balancing on trees. In Proc. of 9th Int. Colloquium on Structural
Information and Communication Complexity, pages 181–196, Andros,
Greece, June 2002.
[10] H.D. Karatza. Job scheduling in heterogeneous distributed systems.
Journal. of Systems and Software, 56:203–212, 1994.
[11] W. Leinberger, G. Karypis, V. Kumar, and R. Biswas. Load balancing
across near-homogeneous multi-resource servers. In 9th Heterogeneous
Computing Workshop, pages 60–71, 2000.
[12] XtremWeb.
A global computing experimental platform.
http://www.lri.fr/fedak/XtremWeb/introduction.php3.
[13] C.Z. Xu and F.C.M. Lau. Load Balancing in Parallel Computers: Theory
and Practice. Kluwer, Boston, MA, 1997.
[14] B. Yagoubi. Dynamic load balancing for beowulf clusters. In Proceeding
of the 2005 International Arab Conference On information Technology,
pages 394–401, Israa University, Jordan, December 6th 8th 2005.
[15] M.J. Zaki, W. Li, and S. Parthasarathy. Customized dynamic load
balancing for a network of workstations. In Proc. of the 5th IEEE
Int. Symp. HDPC, pages 282–291, 1996.
PWASET VOLUME 13 MAY 2006 ISSN 1307-6884
View publication stats
265
© 2006 WASET.ORG