Simulation of Dynamic Grid Replication Strategies in Optorsim
Simulation of Dynamic Grid Replication Strategies in Optorsim
Simulation of Dynamic Grid Replication Strategies in Optorsim
discussions, stats, and author profiles for this publication at: http://www.researchgate.net/publication/2830193
CITATIONS
DOWNLOADS
VIEWS
74
183
134
5 AUTHORS, INCLUDING:
William Hamish Bell
A. Paul Millar
University of Geneva
Deutsches Elektronen-Synchrotron
SEE PROFILE
SEE PROFILE
Kurt Stockinger
Zurich University of Applied Sciences
115 PUBLICATIONS 3,026 CITATIONS
SEE PROFILE
Abstra
t. Computational Grids normally deal with large
omputationally intensive problems on small data sets. In
ontrast, Data Grids mostly
deal with large
omputational problems that in turn require evaluating
and mining large amounts of data. Repli
ation is regarded as one of the
major optimisation te
hniques for providing fast data a
ess.
Within this paper, several repli
ation algorithms are studied. This is
a
hieved using the Grid simulator: OptorSim. OptorSim provides a modular framework within whi
h optimisation strategies
an be studied under dierent Grid
ongurations. The goal is to explore the stability and
transient behaviour of sele
ted optimisation te
hniques.
1 Introdu
tion
Within the Grid
ommunity mu
h work has been done on providing the basi
infrastru
ture for a typi
al Grid environment. Globus [4, Condor [1 and re
ently
the EU DataGrid [3 have
ontributed substantially to
ore Grid middleware
servi
es software that are available as the basis for further appli
ation development. However, little eort has been made so far to optimise the use of Grid
resour
es.
To use a Data Grid, users typi
ally submit job s. In order for a job to be
exe
uted, three types of resour
es are required:
omputing fa
ilities, data a
ess
and storage, and network
onne
tivity. The Grid must make s
heduling de
isions
for ea
h job based on the
urrent state of these resour
es (workload and features
of Computing Elements, lo
ation of data, network load). Complete optimisation
is a
hieved when the
ombined resour
e impa
t of all jobs is minimised, allowing
jobs to run as fast as possible.
File repli
ation (i.e. spread of multiple
opies of les a
ross the Grid) is an
ee
tive te
hnique for redu
ing data a
ess overhead. Maintaining an optimal
distribution of repli
as implies that the Grid optimisation servi
e [7 must be able
to modify the geographi
lo
ation of data les. This is a
hieved by triggering
both repli
ation and deletion of data les. By re
e
ting the dynami
load on
the Grid, su
h repli
a management will ae
t the migration of parti
ular les
toward sites that show in
reased frequen
y of le-a
ess requests.
In order to study the
omplex nature of a typi
al Grid environment and evaluate various repli
a optimisation algorithms, a Grid simulator (
alled OptorSim)
was developed. In this paper the design
on
epts of OptorSim are dis
ussed and
preliminary results based on sele
ted repli
ation algorithms are reported.
The paper is stru
tured as follows. Se
tion 2 des
ribes the design of the
simulator OptorSim. Various repli
ation algorithms are dis
ussed in Se
tion 3.
After setting the simulation
onguration in Se
tion 4, Se
tion 5 is dedi
ated
to a des
ription of simulation results. Se
tion 6 highlights related work. Finally,
Se
tion 7
on
ludes the paper and reports on future work.
2 Simulation Design
OptorSim [2 is a simulation pa
kage written in Java. It was developed to study
Resource Broker
Replica
Manager
Replica
Optimiser
Computing
Element
Computing
Element
Storage
Element
Fig. 1.
Replica Manager
Replica Manager
Replica
Optimiser
Replica
Optimiser
Storage
Element
Computing
Element
Storage
Element
The simulation was
onstru
ted assuming that the Grid
onsists of several
sites, ea
h of whi
h may provide
omputational and data-storage resour
es for
submitted jobs. Ea
h site
onsists of zero or more Computing Elements and zero
or more Storage Elements. Computing Elements run jobs, whi
h use the data in
les stored on Storage Elements and a Resour
e Broker
ontrols the s
heduling
of jobs to Computing Elements. Sites without Storage or Computing Elements
a
t as network nodes or routers.
The de
ision about data movement asso
iated with jobs between sites is performed by a
omponent
alled the Repli
a Manager. Within the Repli
a Manager
the de
ision to
reate or delete repli
as is
ontrolled by a Repli
a Optimiser
alled
Optor. At the heart of Optor is a repli
a optimisation algorithm, the properties
of whi
h are dis
ussed in Se
tion 3.
2.2 Internals
In the simulation ea
h Computing Element is represented by a thread. Job submission to the Computing Elements is managed by another thread: the Resour
e
Broker. The Resour
e Broker ensures every Computing Element
ontinuously
runs jobs by frequently attempting to distribute jobs to all the Computing Elements. When the Resour
e Broker nds an idle Computing Element, it sele
ts
a job to run on it a
ording to the poli
y of the Computing Element, i.e. whi
h
type of jobs it will run and how often it will run ea
h job.
At any time, a Computing Element will be running at most one job. As soon
as the job nishes, another is assigned by the Resour
e Broker. So, although there
is no expli
it job s
heduling algorithm, all Computing Elements pro
ess jobs for
the duration of the simulation but are never overloaded. Currently, optimisation
only o
urs after a job has been s
heduled to a Computing Element. The more
omplex s
enario of optimising both job s
heduling and data a
ess will be part
of future work.
Ea
h job has a set of les it may request. Two types of referen
e may be used
for a le: a logi
al le name (LFN) and a physi
al le name (PFN). An LFN is
an abstra
t referen
e to a le that is independent of both where the le is stored
and how many repli
as exist. A PFN refers to a spe
i
repli
a of some LFN,
lo
ated at a denite site. Ea
h LFN will have one PFN for ea
h repli
a in the
Grid.
A job will typi
ally request a set of LFNs for data a
ess. The order in whi
h
those les are requested is determined by the a
ess pattern. The following a
ess
patterns were
onsidered: sequential (the set of LFNs is ordered, forming a list
of su
essive requests), random (les are sele
ted randomly from set with a
at
probability distribution), unitary random walk (set is ordered and su
essive le
requests are exa
tly one element away from previous le request, dire
tion is
random) and Gaussian random walk (as with unitary random walk, but les are
sele
ted from a Gaussian distribution
entred on the previous le request).
When a le is required by a job, the le's LFN is used to lo
ate the best repli
a
via the Repli
a Optimiser fun
tion getBestFile(LFN, destinationStorageElement), where destinationStorageElement is the Storage Element to whi
h
the repli
a may be
opied. It is assumed the Computing Element on whi
h the
job is running and requested Storage Element are lo
ated at the same site.
getBestFile()
he
ks the Repli
a Catalogue for
opies of the le. The Repli
a Catalogue is a Grid middleware servi
e
urrently implemented within the simulation as a table of LFNs and all
orresponding PFNs. By examining the available bandwidth between destinationStorageElement and all sites on whi
h
a repli
a of the le is stored, getBestFile()
an
hoose the PFN that will be
a
essed fastest and hen
e de
rease the job running time.
The simulated version of getBestFile() partially fulls the fun
tionality
as des
ribed in [7. It is a blo
king
all that may
ause repli
ation to a Storage Element lo
ated in the site where the job is running. After any repli
ation
has
ompleted, the PFN of the best available repli
a is returned to the job. If
repli
ation has not o
urred, the best repli
a is lo
ated on a remote site and is
a
essed by the job using remote I/O.
Both the repli
ation time (if repli
ation o
urs) and the le a
ess time (if
from a remote site) are dependent on the network
hara
teristi
s over the duration of the
onne
tion. At any time, the bandwidth available to a transfer is
limited by the lowest bandwidth along the transfer path. For transfers utilising
a
ommon network element, the bandwidth of that element is shared so ea
h
transfer re
eives an equal share.
3 Optimisation Algorithms
Repli
a optimisation algorithms are the
ore of the Repli
a Optimiser. Over
the duration of a submitted job, PFNs for ea
h LFN are requested by
alling
getBestFile(). Optimisation algorithms implement getBestFile() so that it
may
opy the requested le from the remote site to a Storage Element on the
same site as the requesting Computing Element. If all Storage Elements on this
site are full then a le must be deleted for the repli
ation to su
eed.
The strategy used to de
ide whi
h le should be deleted dierentiates optimisation algorithms. In the following, we brie
y present three simple algorithms
and a more sophisti
ated one in greater detail. These algorithms have been implemented into OptorSim.
Repli ation De ision. Within our e onomi model the Repli a Optimiser
of f then no repli
ation o
urs. Otherwise, the least valuable le is sele
ted for
deletion and a new repli
a of f is
reated on the Storage Element. If multiple
les on the Storage Element share the minimum value, the le having the earliest
last a
ess time is deleted.
The evaluation fun
tion E (f; r; n) is dened by the equation
E (f; r; n)
n
X
i=1
pi (f );
(1)
2iS
22iS id(f ) s + iS
1
jid(f )
j iS
s
(2)
where s is the mean value of the binomial distribution , S is the maximum value
for the step, and id(f ) is a unique le identier (for instan
e, the LFN). Then,
the most probable number of times le f will be requested during the next n
requests is given by (1).
A time interval t des
ribes how far ba
k the history goes and thus determines the number r of previous requests whi
h are
onsidered in the predi
tion
fun
tion. We assume that the mean arrival rate of requests is
onstant. On
e t
has been de
ided, n is obtained by
n
=r
t0
t
(3)
where t is the future interval for whi
h we intend to do the predi
tion.
The value for S in (2) depends on the value of r. The mean value s is obtained from the re
ent values of the step in the random walk. In parti
ular, s
is
al
ulated as the weighted average of the last r steps, where weights de
rease
over past time.
0
We assume a mapping between le names and identiers that preserve le
ontent
similarity.
in Figure 2. Within this model, ea
h site was allo
ated storage resour
es proportional to their a
tual hardware allo
ations. Ea
h TestBed site, ex
luding CERN,
was assigned a Computing and Storage Element. CERN was allo
ated a Storage Element to hold all of the master les but was not assigned a Computing
Element. Routers, as previously stated, were des
ribed by
reating a site without Computing or Storage Elements. The size of the Storage Elements for ea
h
TestBed site are given in Table 1.
0110
RAL
0110
Imperial College
2.5G
11
00
2.5G
10G
0110
11001100
NIKHEF
2.5G
10G
00111100
11
00
00
11
Lyon
2.5G
10G
2.5G
155M
10G
155M
622M
0110
CERN
0011
00111100
10G
Padova
1010
1
0
Milano 1
0 10M 0011 45M 10
Bologna
1010 45M 155M 10
10G
1G
Torino
10M
0110
Catania
The EU DataGrid TestBed 1 sites and the approximate network geometry. The
numbers indi
ate the bandwidth between two sites.
Fig. 2.
Site Name
Bologna Catania CERN Imperial College Lyon
Storage Element (GBytes)
30
30 10000
80
50
Site Name
Milano NIKHEF NorduGrid Padova RAL Torino
Storage Element (GBytes) 50
70
63
50
50
50
A list of resour
es allo
ated to the TestBed 1 sites, from whi
h the results in
this paper were generated.
Table 1.
between the set of les ea
h job a
essed. The total size of the le a
essed by
any job type were estimated in [12 and are summarised in Table 2. Ea
h set of
les was assumed to be
omposed of 10GByte les.
There will be some distribution of jobs ea
h site performs. In the simulation,
we modelled this distribution su
h that ea
h site ran an equal number of jobs
of ea
h type ex
ept for a preferred job type, whi
h ran twi
e as often. This job
type was
hosen for ea
h site based on storage
onsiderations; for the repli
ation
algorithms to be ee
tive, the lo
al storage on ea
h site had to be able to hold
all the les for the preferred job type.
Data Sample Total Size (GBytes)
Central J=
1200
High pt leptons
200
In
lusive ele
trons
5000
In
lusive muons
1400
High Et photons
5800
Z 0 bb
600
Table 2.
5 Results
A histogram of job duration (left) and the progression of job duration over
ourse of the simulation (right).
Fig. 3.
The left histogram in Figure 3 shows a typi
al spread of job duration for a
single job type at a sele
ted Computing Element over the
ourse of a simulation
run. The large spike near zero is due to the job requesting les that are available
on the lo
al site, hen
e no time-
onsuming le transfers need to take pla
e. The
longer durations are due to the job requesting some les not present at the lo
al
site. The spread is due to the network load, whi
h
an vary over time, ae
ting
the le transfer times.
The variation of job duration over the simulation is shown in the right histogram in Figure 3 for the same job type and Computing Element as above.
There is
learly a large variation in the job duration due to the fa
tors already
mentioned, but the general trend is for jobs to be exe
uted more qui
kly over
time, indi
ating the movement toward a more optimal repli
a
onguration.
Integrated running times for 10000 jobs using ea
h a
ess pattern and repli
a
optimisation algorithm.
Fig. 4.
Further tests were
ondu
ted simulating 10000 jobs using ea
h of the four
algorithms:
1.
2.
3.
4.
No repli
ation
Un
onditional repli
ation, oldest le deleted
Un
onditional repli
ation, least a
essed le deleted
E
onomi
Model
6 Related Work
Re
ently there has been great interest in modelling Data Grid environments. A
simulator for modelling
omplex data a
ess patterns of
on
urrent users in a
distributed system is found in [13. These studies were mainly
ondu
ted within
the setting of s
ienti
experiments su
h as the LHC, whi
h nally resulted in
the
reation of the EU DataGrid proje
t [3.
Mi
roGrid [18 is a simulation tool for designing and evaluating Grid middleware, appli
ations and network servi
es for the
omputational Grid. Currently,
this simulator does not take data management issues into
onsideration. Further
Grid simulators are presented in [11, 6
In [15 an approa
h is proposed for automati
ally
reating repli
as in a typi
al de
entralised Peer-to-Peer network. The goal is to
reate a
ertain number
of repli
as on a given site in order to guarantee some minimal availability requirements.
In Nimrod-G [8, 5 an e
onomi
model for job s
heduling is introdu
ed in
where \Grid
redits" are assigned to users that are proportional to their level of
priority. In this model, optimisation is a
hieved at the s
heduling stage of a job.
However, our approa
h diers by in
luding both optimal repli
a sele
tion and
automated repli
a
reation in addition to s
heduling-stage optimisation.
Various repli
ation and
a
hing strategies within a simulated Grid environment are dis
ussed in [16 and their
ombination with s
heduling algorithms is
studied in [17. The repli
ation algorithms proposed are based on the assumption
that popular les in one site are also popular in other sites. Repli
ation from
one site to another is triggered when the popularity of a le over
omes a threshold and the destination site is
hosen either randomly or by sele
ting the least
loaded site. We take a
omplementary approa
h. Our repli
ation algorithms are
used by Grid sites when they need data lo
ally and are based on the assumption
that in
omputational Grids there are areas (so
alled \data hot-spots") where
parti
ular sets of data are highly requested. Our algorithms have been designed
to move data les toward \data hot-spots".
A
knowledgements
The authors thank Erwin Laure, Heinz Sto
kinger and Ekow Otoo for valuable
dis
ussions during the preparation of this paper.
William Bell and David Cameron thank PPARC for funding as part of the
GridPP(EDG) proje
t and as an e-S
ien
e student respe
tively; Paul Millar
thanks SHEFC for funding under the S
otGRID proje
t.
Referen
es
1. The
ondor proje
t. http://www.
s.wis
.edu/
ondor/.
2. OptorSim - A Repli
a Optimiser Simulation. http://grid-data-management.
web.
ern.
h/grid-data-management/optimisati%on/optor/.
3. The DataGrid Proje
t. http://www.eu-datagrid.org.
4. The Globus Proje
t. http://www.globus.org.
5. D. Abramson, R. Buuya, and J. Giddy. A Computational E
onomy for Grid Computing and its Implementation in the Nimrod-G Resour
e Broker. In Future Generation Computer Systems, to appear.
6. K. Aida, A. Takefusa, H. Nakaka, S. Matsuoka, S. Sekigu
hi, and U. Nagashima.
Performan
e Evaluation Model for S
heduling in a Global Computing System.
International Journal of High Performan
e Appli
ations, 14(3), 2000.
7. W. H. Bell, D. G. Cameron, L. Capozza, P. Millar, K. Sto
kinger, and F. Zini. Design of a Query Optimisation Servi
e. Te
hni
al report, CERN, 2002. WP2 - Data
Management, EU DataGrid Proje
t. http://edms.
ern.
h/do
ument/337977.
8. R. Buyya, H. Sto
kinger, J. Giddy, and D. Abramson. E
onomi
Models for Management of Resour
es in Peer-to-Peer and Grid Computing. In Commer
ial Appli
ations for High-Performan
e Computing, SPIE's International Symposium on
the Convergen
e of Information Te
hnologies and Communi
ations (ITCom 2001),
Denver, Colorado, USA, August 2001.
9. L. Capozza, K. Sto
kinger, and F. Zini. Preliminary Evaluation of Revenue Predi
tion Fun
tions for E
onomi
ally-Ee
tive File Repli
ation, June 2002.
10. M. Carman, F. Zini, L. Serani, and K. Sto
kinger. Towards an E
onomy-Based
Optimisation of File A
ess and Repli
ation on a Data Grid. In International
Workshop on Agent based Cluster and Grid Computing at International Symposium
on Cluster Computing and the Grid (CCGrid 2002), Berlin, Germany, May 2002.
IEEE Computer So
iety Press. Also appears as IRST Te
hni
al Report 0112-04,
Istituto Trentino di Cultura, De
ember 2001.
11. H. Casanova, G. Obertelli an F. Berman, and R. Wolski. The AppLeS Parameter
Sweep Template: User-Level Middleware for the Grid. In Pro
. of Super Computing
2002, Dallas, Texas, USA, November 2002.
12. B. T. Human et al. The CDF/D0 UK GridPP Proje
t. CDF Internal Note. 5858.
13. I. C. Legrand. Multi-Threaded, Dis
rete Event Simulation of Distributed Computing Systems. In Pro
. of Computing in High Energy Physi
s (CHEP 2000),
Padova, Italy, February 2000.
14. EU DataGrid Proje
t. The DataGrid Ar
hite
ture, 2001.
15. K. Ranganathan, A. Iamnit
hi, and I. Foster. Improving Data Availability through
Dynami
Model-Driven Repli
ation in Large Peer-to-Peer Communities. In Global
and Peer-to-Peer Computing on Large S
ale Distributed Systems Workshop, Berlin,
Germany, May 2002.
16. K. Ranganathana and I. Foster. Identifying Dynami
Repli
ation Strategies for
a High Performan
e Data Grid. In Pro
. of the International Grid Computing
Workshop, Denver, Colorado, USA, November 2001.
17. K. Ranganathana and I. Foster. De
oupling Computation and Data S
heduling
in Distributed Data-Intensive Appli
ations. In International Symposium of High
Performan
e Distributed Computing, Edinburgh, S
otland, July 2002. To appear.
18. H. J. Song, X. Liu, D. Jakobsen, R. Bhagwan, X. Zhang, K. Taura, and A. Chien.
The Mi
roGrid: a S
ienti
Tool for Modeling Computational Grids. S
ienti
Programming, 8(3):127{141, 2000.