Simgrid Tutorial
Simgrid Tutorial
Simgrid Tutorial
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 2/130
Large-Scale Distributed Systems Research
Large-scale distributed systems are in production today
I Grid platforms for ”e-Science” applications
I Peer-to-peer file sharing
I Distributed volunteer computing
I Distributed gaming
Researchers study a broad range of systems
I Data lookup and caching algorithms
I Application scheduling algorithms
I Resource management and resource sharing strategies
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 2/130
Large-Scale Distributed Systems Research
Large-scale distributed systems are in production today
I Grid platforms for ”e-Science” applications
I Peer-to-peer file sharing
I Distributed volunteer computing
I Distributed gaming
Researchers study a broad range of systems
I Data lookup and caching algorithms
I Application scheduling algorithms
I Resource management and resource sharing strategies
They want to study several aspects of their system performance
I Response time I Robustness
I Throughput I Fault-tolerance
I Scalability I Fairness
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 2/130
Large-Scale Distributed Systems Research
Large-scale distributed systems are in production today
I Grid platforms for ”e-Science” applications
I Peer-to-peer file sharing
I Distributed volunteer computing
I Distributed gaming
Researchers study a broad range of systems
I Data lookup and caching algorithms
I Application scheduling algorithms
I Resource management and resource sharing strategies
They want to study several aspects of their system performance
I Response time I Robustness
I Throughput I Fault-tolerance
I Scalability I Fairness
Main question: comparing several solutions in relevant settings
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 2/130
Large-Scale Distributed Systems Science?
Requirement for a Scientific Approach
I Reproducible results
I You can read a paper,
I reproduce a subset of its results,
I improve
I Standard methodologies and tools
I Grad students can learn their use and become operational quickly
I Experimental scenario can be compared accurately
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 3/130
Large-Scale Distributed Systems Science?
Requirement for a Scientific Approach
I Reproducible results
I You can read a paper,
I reproduce a subset of its results,
I improve
I Standard methodologies and tools
I Grad students can learn their use and become operational quickly
I Experimental scenario can be compared accurately
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 4/130
Agenda
Experiments for Large-Scale Distributed Systems Research
Methodological Issues
Main Methodological Approaches
Real-world experiments
Simulation
Tools for Experimentations in Large-Scale Distributed Systems
Resource Models in SimGrid
Analytic Models Underlying SimGrid
Experimental Validation of the Simulation Models
Platform Instanciation
Platform Catalog
Synthetic Topologies
Using SimGrid for Practical Grid Experiments
Overview of the SimGrid Components
SimDag: Comparing Scheduling Heuristics for DAGs
MSG: Comparing Heuristics for Concurrent Sequential Processes
GRAS: Developing and Debugging Real Applications
Conclusion
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 5/130
Analytical or Experimental?
Analytical works?
I Some purely mathematical models exist
, Allow better understanding of principles in spite of dubious applicability
impossibility theorems, parameter influence, . . .
/ Theoretical results are difficult to achieve
I Everyday practical issues (routing, scheduling) become NP-hard problems
Most of the time, only heuristics whose performance have to be assessed are proposed
I Models too simplistic, rely on ultimately unrealistic assumptions.
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 6/130
Running real-world experiments
, Eminently believable to demonstrate the proposed approach applicability
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 7/130
Running real-world experiments
, Eminently believable to demonstrate the proposed approach applicability
/ Very time and labor consuming
I Entire application must be functional I Parameter-sweep; Design alternatives
/ Choosing the right testbed is difficult
I My own little testbed?
, Well-behaved, controlled,stable / Rarely representative of production platforms
I Real production platforms?
I Not everyone has access to them; CS experiments are disruptive for users
I Experimental settings may change drastically during experiment
(components fail; other users load resources; administrators change config.)
/ Results remain limited to the testbed
I Impact of testbed specificities hard to quantify ⇒ collection of testbeds...
I Extrapolations and explorations of “what if” scenarios difficult
(what if the network were different? what if we had a different workload?)
/ Experiments are uncontrolled and unrepeatable
No way to test alternatives back-to-back (even if disruption is part of the experiment)
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 7/130
Running real-world experiments
, Eminently believable to demonstrate the proposed approach applicability
/ Very time and labor consuming
I Entire application must be functional I Parameter-sweep; Design alternatives
/ Choosing the right testbed is difficult
I My own little testbed?
, Well-behaved, controlled,stable / Rarely representative of production platforms
I Real production platforms?
I Not everyone has access to them; CS experiments are disruptive for users
I Experimental settings may change drastically during experiment
(components fail; other users load resources; administrators change config.)
/ Results remain limited to the testbed
I Impact of testbed specificities hard to quantify ⇒ collection of testbeds...
I Extrapolations and explorations of “what if” scenarios difficult
(what if the network were different? what if we had a different workload?)
/ Experiments are uncontrolled and unrepeatable
No way to test alternatives back-to-back (even if disruption is part of the experiment)
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 8/130
Simulation
I Model: Set of objects defined by a state ⊕ Rules governing the state evolution
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 8/130
Simulation
I Model: Set of objects defined by a state ⊕ Rules governing the state evolution
I Simulator: Program computing the evolution according to the rules
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 8/130
Simulation
I Model: Set of objects defined by a state ⊕ Rules governing the state evolution
I Simulator: Program computing the evolution according to the rules
I Wanted features:
I Accuracy: Correspondence between simulation and real-world
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 8/130
Simulation
I Model: Set of objects defined by a state ⊕ Rules governing the state evolution
I Simulator: Program computing the evolution according to the rules
I Wanted features:
I Accuracy: Correspondence between simulation and real-world
I Scalability: Actually usable by computers (fast enough)
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 8/130
Simulation
I Model: Set of objects defined by a state ⊕ Rules governing the state evolution
I Simulator: Program computing the evolution according to the rules
I Wanted features:
I Accuracy: Correspondence between simulation and real-world
I Scalability: Actually usable by computers (fast enough)
I Tractability: Actually usable by human beings (simple enough to understand)
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 8/130
Simulation
I Model: Set of objects defined by a state ⊕ Rules governing the state evolution
I Simulator: Program computing the evolution according to the rules
I Wanted features:
I Accuracy: Correspondence between simulation and real-world
I Scalability: Actually usable by computers (fast enough)
I Tractability: Actually usable by human beings (simple enough to understand)
I Instanciability: Can actually describe real settings (no magical parameter)
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 8/130
Simulation
I Model: Set of objects defined by a state ⊕ Rules governing the state evolution
I Simulator: Program computing the evolution according to the rules
I Wanted features:
I Accuracy: Correspondence between simulation and real-world
I Scalability: Actually usable by computers (fast enough)
I Tractability: Actually usable by human beings (simple enough to understand)
I Instanciability: Can actually describe real settings (no magical parameter)
I Relevance: Captures object of interest
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 8/130
Simulation in Computer Science
Microprocessor Design
IA few standard “cycle-accurate” simulators are used extensively
http://www.cs.wisc.edu/~arch/www/tools.html
⇒ Possible to reproduce simulation results
Networking
I A few established “packet-level” simulators: NS-2, DaSSF, OMNeT++, GTNetS
I Well-known datasets for network topologies
I Well-known generators of synthetic topologies
I SSF standard: http://www.ssfnet.org/
⇒ Possible to reproduce simulation results
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 9/130
Simulation in Parallel and Distributed Computing
I Used for decades, but under drastic assumptions in most cases
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 11/130
Agenda
Experiments for Large-Scale Distributed Systems Research
Methodological Issues
Main Methodological Approaches
Tools for Experimentations in Large-Scale Distributed Systems
Possible designs
Experimentation platforms: Grid’5000 and PlanetLab
Emulators: ModelNet and MicroGrid
Packet-level Simulators: ns-2, SSFNet and GTNetS
Ad-hoc simulators: ChicagoSim, OptorSim, GridSim, . . .
Peer to peer simulators
SimGrid
Resource Models in SimGrid
Analytic Models Underlying SimGrid
Experimental Validation of the Simulation Models
Platform Instanciation
Platform Catalog
Synthetic Topologies
Using SimGrid for Practical Grid Experiments
Overview of the SimGrid Components
SimDag: Comparing Scheduling Heuristics for DAGs
MSG: Comparing Heuristics for Concurrent Sequential Processes
GRAS:SimGrid
Developing
for Research onand Debugging
Large-Scale Real Applications
Distributed Systems Experiments for Large-Scale Distributed Systems Research 12/130
Models of Large-Scale Distributed Systems
Model = Set of objects defined by a state ⊕ Set of rules governing the state evolution
Model objects:
I Evaluated application: Do actions, stimulus to the platform
I Resources (network, CPU, disk): Constitute the platform, react to stimulus.
I Application blocked until actions are done
I Resource can sometime “do actions” to represent external load
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 13/130
Models of Large-Scale Distributed Systems
Model = Set of objects defined by a state ⊕ Set of rules governing the state evolution
Model objects:
I Evaluated application: Do actions, stimulus to the platform
I Resources (network, CPU, disk): Constitute the platform, react to stimulus.
I Application blocked until actions are done
I Resource can sometime “do actions” to represent external load
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 13/130
Models of Large-Scale Distributed Systems
Model = Set of objects defined by a state ⊕ Set of rules governing the state evolution
Model objects:
I Evaluated application: Do actions, stimulus to the platform
I Resources (network, CPU, disk): Constitute the platform, react to stimulus.
I Application blocked until actions are done
I Resource can sometime “do actions” to represent external load
CPU
I Macroscopic: Flows of operations in the CPU pipelines
I Microscopic: Cycle-accurate simulation (fine-grain d.e. simulation)
I Emulation: Virtualization via another CPU / Virtual Machine
Applications
I Macroscopic: Application = analytical “flow”
I Less macroscopic: Set of abstract tasks with resource needs and dependencies
I Coarse-grain d.e. simulation
I Application specification or pseudo-code API
I Virtualization: Emulation of actual code trapping application generated events
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 14/130
Large-Scale Distributed Systems Simulation Tools
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 15/130
Grid’5000 (consortium – INRIA)
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 17/130
Experimental tools comparison
CPU Disk Network Application Requirement Settings Scale
Grid’5000 direct direct direct direct access fixed <5000
Planetlab
Modelnet
MicroGrid
ns-2
SSFNet
GTNetS
ChicSim
OptorSim
GridSim
P2PSim
PlanetSim
PeerSim
SimGrid
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 17/130
PlanetLab (consortium)
Open platform for developping, deploying, and accessing planetary-scale services
Planetary-scale 852 nodes, 434 sites, >20 countries
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 18/130
ModelNet (UCSD/Duke)
Applications
I Emulation and virtualization: Actual code executed on “virtualized” resources
I Key tradeoff: scalability versus accuracy
Resources: system calls intercepted
I gethostname, sockets
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 19/130
Experimental tools comparison
CPU Disk Network Application Requirement Settings Scale
Grid’5000 direct direct direct direct access fixed <5000
Planetlab virtualize virtualize virtualize virtualize none uncontrolled hundreds
Modelnet - - emulation emulation lot material controlled dozens
MicroGrid
ns-2
SSFNet
GTNetS
ChicSim
OptorSim
GridSim
P2PSim
PlanetSim
PeerSim
SimGrid
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 19/130
MicroGrid (UCSD)
Applications
I Application supported by emulation and virtualization
I Actual application code is executed on “virtualized” resources
I Accounts for CPU and network
Application
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 20/130
Packet-level simulators
ns-2: the most popular one
I Several protocols (TCP, UDP, . . . ), several queuing models (DropTail, RED, . . . )
I Several application models (HTTP, FTP), wired and wireless networks
I Written in C++, configured using TCL. Limitated scalability (< 1, 000)
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 21/130
Packet-level simulators
ns-2: the most popular one
I Several protocols (TCP, UDP, . . . ), several queuing models (DropTail, RED, . . . )
I Several application models (HTTP, FTP), wired and wireless networks
I Written in C++, configured using TCL. Limitated scalability (< 1, 000)
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 21/130
Packet-level simulators
ns-2: the most popular one
I Several protocols (TCP, UDP, . . . ), several queuing models (DropTail, RED, . . . )
I Several application models (HTTP, FTP), wired and wireless networks
I Written in C++, configured using TCL. Limitated scalability (< 1, 000)
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 21/130
Packet-level simulators
ns-2: the most popular one
I Several protocols (TCP, UDP, . . . ), several queuing models (DropTail, RED, . . . )
I Several application models (HTTP, FTP), wired and wireless networks
I Written in C++, configured using TCL. Limitated scalability (< 1, 000)
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 21/130
ChicagoSim, OptorSim, GridSim, . . .
I Network simulator are not adapted, emulation solutions are too heavy
I PhD students just need simulator to plug in their algorithm
I Data placement/replication
I Grid economy
⇒ Many simulators. Most are home-made, short-lived; Some are released
ChicSim designed for the study of data replication (Data Grids), built on ParSec
Ranganathan, Foster, Decoupling Computation and Data Scheduling in Distributed
Data-Intensive Applications, HPDC’02.
OptorSim developped for European Data-Grid
DataGrid, CERN. OptorSim: Simulating data access optimization algorithms
GridSim focused on Grid economy
Buyya et Al. GridSim: A Toolkit for the Modeling and Simulation of Global Grids,
CCPE’02.
every [sub-]community seems to have its own simulator
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 22/130
Experimental tools comparison
CPU Disk Network Application Requirement Settings Scale
Grid’5000 direct direct direct direct access fixed <5000
Planetlab virtualize virtualize virtualize virtualize none uncontrolled hundreds
Modelnet - - emulation emulation lot material controlled dozens
MicroGrid emulation - fine d.e. emulation none controlled hundreds
ns-2 - - fine d.e. coarse d.e. C++ and tcl controlled <1,000
SSFNet - - fine d.e. coarse d.e. Java controlled <100,000
GTNetS - - fine d.e. coarse d.e. C++ controlled <177,000
ChicSim coarse d.e. - coarse d.e. coarse d.e. C controlled few 1,000
OptorSim coarse d.e. amount coarse d.e. coarse d.e. Java controlled few 1,000
GridSim coarse d.e. coarse d.e. coarse d.e. coarse d.e. Java controlled few 1,000
P2PSim
PlanetSim
PeerSim
SimGrid
Thee peer-to-peer community also has its own private collection of simulators:
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 23/130
Experimental tools comparison
CPU Disk Network Application Requirement Settings Scale
Grid’5000 direct direct direct direct access fixed <5000
Planetlab virtualize virtualize virtualize virtualize none uncontrolled hundreds
Modelnet - - emulation emulation lot material controlled dozens
MicroGrid emulation - fine d.e. emulation none controlled hundreds
ns-2 - - fine d.e. coarse d.e. C++ and tcl controlled <1,000
SSFNet - - fine d.e. coarse d.e. Java controlled <100,000
GTNetS - - fine d.e. coarse d.e. C++ controlled <177,000
ChicSim coarse d.e. - coarse d.e. coarse d.e. C controlled few 1,000
OptorSim coarse d.e. amount coarse d.e. coarse d.e. Java controlled few 1,000
GridSim coarse d.e. coarse d.e. coarse d.e. coarse d.e. Java controlled few 1,000
P2PSim - - - state machine C++ controlled few 1,000
PlanetSim - - cste time coarse d.e. Java controlled 100,000
PeerSim - - - state machine Java controlled 1,000,000
SimGrid
History
I Created just like other home-made simulators (only a bit earlier ;)
I Original goal: scheduling research ; need for speed (parameter sweep)
I HPC community concerned by performance ; accuracy not negligible
SimGrid in a Nutshell
I Simulation ≡ communicating processes performing computations
I Key feature: Blend of mathematical simulation and coarse-grain d. e. simula-
tion
I Resources: Defined by a rate (MFlop/s or Mb/s) + latency
I Also allows dynamic traces and failures
I Tasks can use multiple resources explicitely or implicitly
I Transfer over multiple links, computation using disk and CPU
I Simple API to specify an heuristic or application easily
Casanova, Legrand, Quinson.
SimGrid: a Generic Framework for Large-Scale Distributed Experimentations, EUROSIM’08.
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 24/130
Experimental tools comparison
CPU Disk Network Application Requirement Settings Scale
Grid’5000 direct direct direct direct access fixed <5000
Planetlab virtualize virtualize virtualize virtualize none uncontrolled hundreds
Modelnet - - emulation emulation lot material controlled dozens
MicroGrid emulation - fine d.e. emulation none controlled hundreds
ns-2 - - fine d.e. coarse d.e. C++ and tcl controlled <1,000
SSFNet - - fine d.e. coarse d.e. Java controlled <100,000
GTNetS - - fine d.e. coarse d.e. C++ controlled <177,000
ChicSim coarse d.e. - coarse d.e. coarse d.e. C controlled few 1,000
OptorSim coarse d.e. amount coarse d.e. coarse d.e. Java controlled few 1,000
GridSim coarse d.e. coarse d.e. coarse d.e. coarse d.e. Java controlled few 1,000
P2PSim - - - state machine C++ controlled few 1,000
PlanetSim - - cste time coarse d.e. Java controlled 100,000
PeerSim - - - state machine Java controlled 1,000,000
SimGrid math/d.e. (underway) math/d.e. d.e./emul C or Java controlled few 100,000
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 26/130
So what simulator should I use?
It really depends on your goal / resources
I Grid’5000 experiments very good . . . if have access and plenty of time
I PlanetLab does not enable reproducible experiments
I ModelNet, ns-2, SSFNet, GTNetS meant for networking experiments (no CPU)
I ModelNet requires some specific hardware setup
I MicroGrid simulations take a lot of time (although they can be parallelized)
I SimGrid’s models have clear limitations (e.g. for short transfers)
I SimGrid simulations are quite easy to set up (but rewrite needed)
I SimGrid does not require that a full application be written
I Ad-hoc simulators are easy to setup, but their validity is still to be shown,
ie, the results obtained may be plainly wrong
I Ad-hoc simulators obviously not generic (difficult to adapt to your own need)
Key trade-off seem to be accuracy vs speed
I The more abstract the simulation the fastest
I The less abstract the simulation the most accurate
Does this trade-off really hold?
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 26/130
Simulation Validation
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 27/130
Simulation Validation: the FLASH example
FLASH project at Stanford
I Building large-scale shared-memory multiprocessors
I Went from conception, to design, to actual hardware (32-node)
I Used simulation heavily over 6 years
Gibson, Kunz, Ofelt, Heinrich, FLASH vs. (Simulated) FLASH: Closing the Simulation Loop,
Architectural Support for Programming Languages and Operating Systems, 2000
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 28/130
Simulation Validation: the FLASH example
FLASH project at Stanford
I Building large-scale shared-memory multiprocessors
I Went from conception, to design, to actual hardware (32-node)
I Used simulation heavily over 6 years
For FLASH, the simple simulator was all that was needed. . .
Gibson, Kunz, Ofelt, Heinrich, FLASH vs. (Simulated) FLASH: Closing the Simulation Loop,
Architectural Support for Programming Languages and Operating Systems, 2000
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 28/130
Conclusion
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 29/130
Conclusion
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research 30/130
Agenda
Experiments for Large-Scale Distributed Systems Research
Methodological Issues
Main Methodological Approaches
Tools for Experimentations in Large-Scale Distributed Systems
Resource Models in SimGrid
Analytic Models Underlying SimGrid
Experimental Validation of the Simulation Models
Platform Instanciation
Platform Catalog
Synthetic Topologies
Using SimGrid for Practical Grid Experiments
Overview of the SimGrid Components
SimDag: Comparing Scheduling Heuristics for DAGs
MSG: Comparing Heuristics for Concurrent Sequential Processes
GRAS: Developing and Debugging Real Applications
Conclusion
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 31/130
Agenda
Experiments for Large-Scale Distributed Systems Research
Methodological Issues
Main Methodological Approaches
Tools for Experimentations in Large-Scale Distributed Systems
Resource Models in SimGrid
Analytic Models Underlying SimGrid
Modeling a Single Resource
Multi-hop Networks
Resource Sharing
Experimental Validation of the Simulation Models
Platform Instanciation
Platform Catalog
Synthetic Topologies
Using SimGrid for Practical Grid Experiments
Overview of the SimGrid Components
SimDag: Comparing Scheduling Heuristics for DAGs
MSG: Comparing Heuristics for Concurrent Sequential Processes
GRAS: Developing and Debugging Real Applications
Conclusion
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 32/130
Analytic Models underlying the SimGrid Framework
Main challenges for SimGrid design
I Simulation accuracy:
I Designed for HPC scheduling community ; don’t mess with the makespan!
I At the very least, understand validity range
I Simulation speed:
I Users conduct large parameter-sweep experiments over alternatives
Application to networks
I Turns out to be “inaccurate” for TCP
I B not constant, but depends on RTT, packet loss ratio, window size, etc.
I Several models were proposed in the literature
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 34/130
Modeling TCP performance (single flow, single link)
Padhye, Firoiu, Towsley, Krusoe. Modeling TCP Reno Performance: A Simple Model and
Its Empirical Validation. IEEE/ACM Transactions on Networking, Vol. 8, Num. 2, 2000.
!
Wmax 1
B = min , p p
RTT RTT 2bp/3 + T0 × min(1, 3 3bp/8) × p(1 + 32p 2 )
Model discussion
I Captures TCP congestion control (fast retransmit and timeout mecanisms)
I Assumes steady-state (no slow-start)
I Accuracy shown to be good over a wide range of values
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 35/130
Modeling TCP performance (single flow, single link)
Padhye, Firoiu, Towsley, Krusoe. Modeling TCP Reno Performance: A Simple Model and
Its Empirical Validation. IEEE/ACM Transactions on Networking, Vol. 8, Num. 2, 2000.
!
Wmax 1
B = min , p p
RTT RTT 2bp/3 + T0 × min(1, 3 3bp/8) × p(1 + 32p 2 )
Model discussion
I Captures TCP congestion control (fast retransmit and timeout mecanisms)
I Assumes steady-state (no slow-start)
I Accuracy shown to be good over a wide range of values
I p and b not known in general (model hard to instanciable)
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 35/130
SimGrid model for single TCP flow, single link
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 36/130
Modeling Multi-hop Networks
S
l1
l2
l3
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 37/130
Modeling Multi-hop Networks: Store & Forward
S
l1
l2
l3
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 37/130
Modeling Multi-hop Networks: Store & Forward
S
l1
l2
l3
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 37/130
Modeling Multi-hop Networks: WormHole
S
l1
l2
l3
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 38/130
Modeling Multi-hop Networks: WormHole
S
l1
l2 pi,j
l3
MTU
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 38/130
Modeling Multi-hop Networks: WormHole
S
l1
l2 pi,j
l3
MTU
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 39/130
Macroscopic TCP modeling is a field
Notations
I L: set of links I F: set of flows; f ∈ P(L)
I Cl : capacity of link l (Cl > 0) I λf : transfer rate of f
I nl : amount of flows using link l
Feasibility constraint
X
I Links deliver their capacity at most: ∀l ∈ L, λf ≤ Cl
f 3l
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 39/130
Max-Min Fairness
Objective function: maximize min(λf )
f ∈F
I Equilibrium reached if increasing any λf decreases a λ0f (with λf > λ0f )
I Very reasonable goal: gives fair share to anyone
I Optionally, one can add prorities wi for each flow i
; maximizing min(wf λf )
f ∈F
Bottleneck links
I For each flow f , one of the links is the limiting one l
(with more on that link l, the flow f would get more overall)
I The objective function gives that l is saturated, and f gets the biggest share
X
∀f ∈ F, ∃l ∈ f , λf 0 = Cl and λf = max{λf 0 , f 0 3 l}
f 0 3l
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 40/130
Implementation of Max-Min Fairness
Bucket-filling algorithm
I Set the bandwidth of all flows to 0
I Increase the bandwidth of every flow by . And again, and again, and again.
I When one link is saturated, all flows using it are limited (; removed from set)
I Loop until all flows have found a limiting link
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 41/130
Implementation of Max-Min Fairness
Bucket-filling algorithm
I Set the bandwidth of all flows to 0
I Increase the bandwidth of every flow by . And again, and again, and again.
I When one link is saturated, all flows using it are limited (; removed from set)
I Loop until all flows have found a limiting link
Efficient Algorithm
Cl Ck
1. Search for the bottleneck link l so that: = min , k∈L
nl nk
2. ∀f ∈ l, λf = Cnll ;
Update all nl and Cl to remove these flows
3. Loop until all λf are fixed
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 41/130
Max-Min Fairness on Homogeneous Linear Network
C1 = C n1 = 2
flow 0
C2 = C n2 = 2
link 1 link 2
λ0 =
flow 1 flow 2 λ1 =
λ2 =
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 42/130
Max-Min Fairness on Homogeneous Linear Network
C1 = C n1 = 2
flow 0
C2 = C n2 = 2
link 1 link 2
λ0 = C /2
flow 1 flow 2 λ1 = C /2
λ2 =
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 42/130
Max-Min Fairness on Homogeneous Linear Network
C1 = 0 n1 = 0
111111 000000
000000
000000 111111
C2 = C /2 n2 = 1
111111 flow 2
λ0 = C /2
λ1 = C /2
λ2 =
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 42/130
Max-Min Fairness on Homogeneous Linear Network
C1 = 0 n1 = 0
111111 000000
000000
000000 111111
C2 = 0 n2 = 0
111111 flow 2
λ0 = C /2
λ1 = C /2
λ2 = C /2
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 42/130
Max-Min Fairness on Homogeneous Linear Network
C1 = 0 n1 = 0
111111 000000
000000
000000 111111
C2 = 0 n2 = 0
111111 flow 2
λ0 = C /2
λ1 = C /2
λ2 = C /2
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 42/130
Max-Min Fairness on Backbone
C0 =1 n0 =1
Flow 1 C1 = 1000 n1 =1
link 1 link 3 C2 = 1000 n2 =2
C3 = 1000 n3 =1
link 2
C4 = 1000 n4 =1
link 0 link 4
Flow 2 λ1 =
λ2 =
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 43/130
Max-Min Fairness on Backbone
C0 =0 n0 =0
Flow 1 C1 = 1000 n1 =1
link 1 link 3 C2 = 999 n2 =1
C3 = 1000 n3 =1
link 2
C4 = 999 n4 =0
link 0 link 4
Flow 2 λ1 =
λ2 = 1
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 43/130
Max-Min Fairness on Backbone
C0 =0 n0 =0
Flow 1 C1 = 1000 n1 =1
link 1 link 3 C2 = 999 n2 =1
C3 = 1000 n3 =1
link 2
C4 = 999 n4 =0
link 0 link 4
Flow 2 λ1 =
λ2 = 1
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 43/130
Max-Min Fairness on Backbone
C0 =0 n0 =0
Flow 1 C1 =1 n1 =0
link 1 link 3 C2 =0 n2 =0
C3 =1 n3 =0
link 2
C4 = 999 n4 =0
link 0 link 4
Flow 2 λ1 = 999
λ2 = 1
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 43/130
Max-Min Fairness on Backbone
C0 =0 n0 =0
Flow 1 C1 =1 n1 =0
link 1 link 3 C2 =0 n2 =0
C3 =1 n3 =0
link 2
C4 = 999 n4 =0
link 0 link 4
Flow 2 λ1 = 999
λ2 = 1
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 43/130
Side note: OptorSim 2.1 on Backbone
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 44/130
Side note: OptorSim 2.1 on Backbone
OptorSim (developped @CERN for Data-Grid)
I One of the rare ad-hoc simulators not using wormhole
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 44/130
Side note: OptorSim 2.1 on Backbone
OptorSim (developped @CERN for Data-Grid)
I One of the rare ad-hoc simulators not using wormhole
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 44/130
Side note: OptorSim 2.1 on Backbone
OptorSim (developped @CERN for Data-Grid)
I One of the rare ad-hoc simulators not using wormhole
Proportional Fairness
I MaxMin gives more to long flows (resource-eager), TCP known to do opposite
X
I Objective function: maximize wf log(λf ) (instead of min wf λf for MaxMin)
F
F
I log favors short flows
Kelly, Charging and rate control for elastic traffic, in European Transactions on Telecommunica-
tions, vol. 8, 1997, pp. 33-37.
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 45/130
Implementing Proportional Fairness
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 46/130
Implementing Proportional Fairness
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 46/130
Implementing Proportional Fairness
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 46/130
Recent TCP implementation
More protocol refinement, more model complexity
I Every agent changes its window size according to its neighbors’ one
(selfish net-utility maximization)
I Computing a distributed gradient for Lagrange multipliers ; same updates
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 47/130
Recent TCP implementation
More protocol refinement, more model complexity
I Every agent changes its window size according to its neighbors’ one
(selfish net-utility maximization)
I Computing a distributed gradient for Lagrange multipliers ; same updates
Low, S.H., A Duality Model of TCP and Queue Management Algorithms, IEEE/ACM
Transactions on Networking, 2003.
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 47/130
Recent TCP implementation
More protocol refinement, more model complexity
I Every agent changes its window size according to its neighbors’ one
(selfish net-utility maximization)
I Computing a distributed gradient for Lagrange multipliers ; same updates
Low, S.H., A Duality Model of TCP and Queue Management Algorithms, IEEE/ACM
Transactions on Networking, 2003.
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 48/130
So, what is the model used in SimGrid?
“--cfg=network model” command line argument
I CM02 ; MaxMin fairness
I Vegas ; Vegas TCP fairness (Lagrange approach)
I Reno ; Reno TCP fairness (Lagrange approach)
I By default in SimGrid v3.3: CM02
I Example: ./my simulator --cfg=network model:Vegas
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 48/130
So, what is the model used in SimGrid?
“--cfg=network model” command line argument
I CM02 ; MaxMin fairness
I Vegas ; Vegas TCP fairness (Lagrange approach)
I Reno ; Reno TCP fairness (Lagrange approach)
I By default in SimGrid v3.3: CM02
I Example: ./my simulator --cfg=network model:Vegas
Want more?
I network model:gtnets ; use Georgia Tech Network Simulator for network
Accuracy of a packet-level network simulator without changing your code (!)
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 48/130
So, what is the model used in SimGrid?
“--cfg=network model” command line argument
I CM02 ; MaxMin fairness
I Vegas ; Vegas TCP fairness (Lagrange approach)
I Reno ; Reno TCP fairness (Lagrange approach)
I By default in SimGrid v3.3: CM02
I Example: ./my simulator --cfg=network model:Vegas
Want more?
I network model:gtnets ; use Georgia Tech Network Simulator for network
Accuracy of a packet-level network simulator without changing your code (!)
I Plug your own model in SimGrid!!
(usable as scientific instrument in TCP modeling field, too)
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 48/130
How are these models used in practice?
Simulation kernel main loop
Data: set of resources with working rate
1
0
0
1
0
1
11
00
00
11
00
11
Simulated time
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 49/130
How are these models used in practice?
Simulation kernel main loop
Data: set of resources with working rate
1. Some actions get created (by application) and assigned to resources
111
000
000
111
000 1
111
00
1100
11 0
00
11
00
1100
11
000
111
0
1
0
1
00111
11000
11
0000
11
00
11
0011
1100
Simulated time
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 49/130
How are these models used in practice?
Simulation kernel main loop
Data: set of resources with working rate
1. Some actions get created (by application) and assigned to resources
2. Compute share of everyone (resource sharing algorithms)
111
000
000
111
000 1
111
00
1100
11 0
00
11
00
1100
11
000
111
0
1
0
1
00111
11000
11
0000
11
00
11
0011
1100
Simulated time
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 49/130
How are these models used in practice?
Simulation kernel main loop
Data: set of resources with working rate
1. Some actions get created (by application) and assigned to resources
2. Compute share of everyone (resource sharing algorithms)
3. Compute the earliest finishing action, advance simulated time to that time
111
000
000
111
t
000 1
111 000
111
00
1100
11 0 111
000
111
00000
11
00
11
00
11
00
1100
11
000
111
0
1
0
1
111
00000
11
00
11
111111
000000
00111
11000
11
00
11
00
11
00
11
0011
00 11
00
00
11 Simulated time
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 49/130
How are these models used in practice?
Simulation kernel main loop
Data: set of resources with working rate
1. Some actions get created (by application) and assigned to resources
2. Compute share of everyone (resource sharing algorithms)
3. Compute the earliest finishing action, advance simulated time to that time
4. Remove finished actions
111
000 00
11
11
00 00
11
000
111
000
111 00
11
00011
00
00111
11
t
000
111 111
000
1
0
0
1 111
000
111
00000
11
00
11
0
1
111
00000
11
00
11
111111
000000
11
00
11
00
11
00
11
0011
00 11
00
00
11 Simulated time
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 49/130
How are these models used in practice?
Simulation kernel main loop
Data: set of resources with working rate
1. Some actions get created (by application) and assigned to resources
2. Compute share of everyone (resource sharing algorithms)
3. Compute the earliest finishing action, advance simulated time to that time
4. Remove finished actions
5. Loop back to 2
111
000
000
111
t
000
111 000
111
1
0
0
1 111
000
111
000
0
1
111
000
111111
000000
11
00
11
00
11
00
11
0011
00 11
00
00
11 Simulated time
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 49/130
How are these models used in practice?
Simulation kernel main loop
Data: set of resources with working rate
1. Some actions get created (by application) and assigned to resources
2. Compute share of everyone (resource sharing algorithms)
3. Compute the earliest finishing action, advance simulated time to that time
4. Remove finished actions
5. Loop back to 2
111
000
000
111
t
000
111 000
111 0
1
1
0
0
1 111
000
111
000 0
1
0
1
111
000
1111111
0000000
0 11
1 00
00
11
11
00
00
11
00
11 11
00
0
1
00
11
0
1 Simulated time
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 49/130
How are these models used in practice?
Simulation kernel main loop
Data: set of resources with working rate
1. Some actions get created (by application) and assigned to resources
2. Compute share of everyone (resource sharing algorithms)
3. Compute the earliest finishing action, advance simulated time to that time
4. Remove finished actions
5. Loop back to 2
111
000
000
111
000
111
t
111
000
111
000 1
00
11
0
1
0
0
1
111
000
111
000 00
11
0
1
1111111
00
11
0
0
1
00000000
11
0
1
11
00
00
11
00
11 11
00
0
1
00
11
0
1 Simulated time
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 49/130
Adding Dynamic Availabilities to the Picture
Trace definition
I List of discrete events where the maximal availability changes
I t0 → 100%, t1 → 50%, t2 → 80%, etc.
1
0
0
1
0
1
Simulated time
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 50/130
Adding Dynamic Availabilities to the Picture
Trace definition
I List of discrete events where the maximal availability changes
I t0 → 100%, t1 → 50%, t2 → 80%, etc.
111
000
000
111
000 1
111
0011
1100 0
00
11
00
1100
11
000
111
0
1
0
1
00111
11000 Simulated time
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 50/130
Adding Dynamic Availabilities to the Picture
Trace definition
I List of discrete events where the maximal availability changes
I t0 → 100%, t1 → 50%, t2 → 80%, etc.
111
000
000
111
000 1
111 00
11
0011
1100 0 11
00
111
000
00
11
00
1100
11
000
111
0
1
0
1
11
00
11
00
00111
11000 Simulated time
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 50/130
Adding Dynamic Availabilities to the Picture
Trace definition
I List of discrete events where the maximal availability changes
I t0 → 100%, t1 → 50%, t2 → 80%, etc.
111
000
000
111
000 1
111 00
11
0011
1100 0 11
00
111
000
00
1100
11 11000
111
0
1
00
11000
111 0
1
11
00
00 111111
000000
111
000
00111
11000 Simulated time
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 50/130
Adding Dynamic Availabilities to the Picture
Trace definition
I List of discrete events where the maximal availability changes
I t0 → 100%, t1 → 50%, t2 → 80%, etc.
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 50/130
Adding Dynamic Availabilities to the Picture
Trace definition
I List of discrete events where the maximal availability changes
I t0 → 100%, t1 → 50%, t2 → 80%, etc.
111
000
000
111
000 1
111 00
11 000
111
0011
1100 0 11
00
111
000111
000 00000
11111
111
000
000
111 00
11
00
1100
11 0
1
11
00 00000
11111
00
11000
111 110000001111
111111 00
0
1
00111
11000 00 111
000111
000
00 00
11 Simulated time
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 50/130
Agenda
Experiments for Large-Scale Distributed Systems Research
Methodological Issues
Main Methodological Approaches
Tools for Experimentations in Large-Scale Distributed Systems
Resource Models in SimGrid
Analytic Models Underlying SimGrid
Experimental Validation of the Simulation Models
Single link
Dumbbell
Random platforms
Simulation speed
Platform Instanciation
Platform Catalog
Synthetic Topologies
Using SimGrid for Practical Grid Experiments
Overview of the SimGrid Components
SimDag: Comparing Scheduling Heuristics for DAGs
MSG: Comparing Heuristics for Concurrent Sequential Processes
GRAS: Developing and Debugging Real Applications
Conclusion
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 51/130
SimGrid Validation
Quantitative comparison of SimGrid with Packet-Level Simulators
I NS2: The Network Simulator
I SSFnet: Scalable Simulation Framework 2.0 (Dartmouth)
I GTNetS: Georgia Tech Network Simulator
Methodological limits
I Packet-level supposed accurate (comparison to real-world: future work)
I Max-Min only: other models were not part of SimGrid at that time
Challenges
I Which topology?
I Which parameters consider? e.g. bandwidth, latency, size, all
I How to estimate performance? e.g. throughput, communication time
I How to estimate simulation response time slowdown?
P PerfPacketLevel
I How to compute error? e.g. PerfSimGrid
Velho, Legrand, Accuracy Study and Improvement of Network Simulation in the SimGrid Frame-
work, to appear in Second International Conference on Simulation Tools and Techniques, SIMU-
Tools’09, Rome, Italy, March 2009.
(other publication by Velho and Legrand submitted to SimuTools’09)
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 52/130
SimGrid Validation
Experiments assumptions
I Topology: Single Link; Dumbbell; Random topologies (several)
I Parameters: data size, #flows, #nodes, link bandwidth and latency
I Performance: communication time and bandwidth estimation
I All TCP flows start at the same time
I All TCP flows are stopped when the first flow completes
I Bandwidth estimation is done based on communication remaining.
Simulation time
I Slowdown: Simulated time
Notations
I B, link nominal bandwidth ; L, link latency
I S, Amount of transmitted data
I Error: ε(TGTNetS , TSimGrid ) = log(TGTNetS ) − log(TSimGrid )
I Symmetrical for over and under estimations (thanks to logs)
1X
I Average error: |ε| = |εi | Max error: |εmax | = max(|εi |)
n i i
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 53/130
Validation experiments on a single link (1/2)
Experimental settings
TCP Link TCP I Flow throughput as function of L and B
source sink
1 flow
I Fixed size (S=100MB) and window (W=20KB)
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 54/130
Validation experiments on a single link (1/2)
Experimental settings
TCP Link TCP I Flow throughput as function of L and B
source sink
1 flow
I Fixed size (S=100MB) and window (W=20KB)
Results
Legend
1000 I Mesh: SimGrid results
800
S
Throughput (KB/s)
600 W
S/min(B, 2L
) +L
400
200
I 2: GTNetS results
I #: NS2 results
0
1000
I ×: SSFNet
0
20 with TCP FAST INTERVAL=default
500 40
Bandwidth (KB/s)
60 I +: SSFNet
80 Latency (ms)
0 100 with TCP FAST INTERVAL=0.01
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 54/130
Validation experiments on a single link (1/2)
Experimental settings
TCP Link TCP I Flow throughput as function of L and B
source sink
1 flow
I Fixed size (S=100MB) and window (W=20KB)
Results
Legend
1000 I Mesh: SimGrid results
800
S
Throughput (KB/s)
600 W
S/min(B, 2L
) +L
400
200
I 2: GTNetS results
I #: NS2 results
0
1000
I ×: SSFNet
0
20 with TCP FAST INTERVAL=default
500 40
Bandwidth (KB/s)
60 I +: SSFNet
80 Latency (ms)
0 100 with TCP FAST INTERVAL=0.01
Conclusion
I SimGrid estimations close to packet-level simulators (when S=100MB)
W
I When B < 2L
(B=100KB/s, L=500ms), |εmax | ≈ |ε| ≈ 1%
W
I When B > 2L
(B=100KB/s, L= 10ms), |εmax | ≈ |ε| ≈ 2%
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 54/130
Validation experiments on a single link (2/3)
Experimental settings
TCP
source
Link TCP
sink
I Compute achieved bandwidth as function of S
1 flow I Fixed L=10ms and B=100MB/s
800
700
I Packet-level tools don’t completely agree
Throughput (Kb/s)
600
500
400
I SSFNet TCP FAST INTERVAL bad default
300 NS2
SSFNet (0.2)
I GTNetS is equally distant from others
200
SSFNet (0.01)
100 GTNets
SimGrid
0
0.001 0.01 0.1 1 10 100 1000
0.5
S ∈ [100KB; 10MB] ≈ 17% ≈ 80%
S > 10MB ≈ 1% ≈ 1%
0
0.001 0.01 0.1 1 10 100 1000
Data size (MB)
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 55/130
Validation experiments on a single link (3/3)
Experimental settings
TCP
source
Link TCP
sink
I Compute achieved bandwidth as function of S
1 flow I Fixed L=10ms and B=100MB/s
800
I Statistical analysis of GTNetS slow-start
700 I New SimGrid model (MaxMin based)
Throughput (KB/s)
600
300 NS2
I Latency changed to 10.4 × L
SSFNet (0.2)
200
SSFNet (0.01)
100 GTNets
SimGrid
0
0.001 0.01 0.1 1 10 100 1000
I This dramatically improve validity range
Data size (MB)
|ε| |εmax |
2
S
1.5 S < 100KB ≈ 12% ≈ 162%
1
S > 100KB ≈ 1% ≈ 6%
|ε|
0.5
0
0.001 0.01 0.1 1 10 100 1000
Data size (MB)
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 56/130
Validation experiments on the dumbbell topology
Experimental settings
Flo
wA
10
0M B/s
10 B/ 0M s
ms s 10 m
10
B MB/s
20 ms
s 10
B/ 0M
0 0M s L B/
s
1 m ms
10
B
ow
Fl
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 57/130
Validation experiments on the dumbbell topology
Throughput as function of L when B=100MB/s (limited by latency)
0.60
0.55
Flo
wA 0.50
1
50
50
0.1
0.1
10
8
8
100
200
100
200
150
150
10
GTNetS Latency (ms) SimGrid
I L < 10 ms ⇒ Flow A gets less bandwidth than B
I L = 10 ms ⇒ Flow A gets as much bandwidth as B
I L > 10 ms ⇒ Flow A gets more bandwidth than B
Neglectable error
I |εmax | ≈ 0.1%
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 58/130
Validation experiments on the dumbbell topology
50
10
100
200
50
150
10
100
200
150
GTNetS SimGrid
Analysis Latency (ms)
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 59/130
Data fitting on the Bandwidth ratio in GTNetS
Ratio = f(B,L) Share flow A
Approximation
I Ratio = (in GTNetS)
Share flow B
6 P
Ratio 5
4
Li + 8775
Bi
3
2
I Data fitting = P
8775
1 Lj + Bj
0.02 5e+07
0.04
0.06 4e+07
0.08
0.1
0.12
0.14 2e+07
3e+07
Bandwidth
I Conclusion: MaxMin needs priorities
Latency
0.16 1e+07
0.18
0.2 0
I CM02 use latency only (wi = Li )
I bandwidth should be considered
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 60/130
Data fitting on the Bandwidth ratio in GTNetS
Ratio = f(B,L) Share flow A
Approximation
I Ratio = (in GTNetS)
Share flow B
6 P
Ratio 5
4
Li + 8775
Bi
3
2
I Data fitting = P
8775
1 Lj + Bj
0.02 5e+07
0.04
0.06 4e+07
0.08
0.1
0.12
0.14 2e+07
3e+07
Bandwidth
I Conclusion: MaxMin needs priorities
Latency
0.16 1e+07
0.18
0.2 0
I CM02 use latency only (wi = Li )
I bandwidth should be considered
LV08 improvements
X 8775
I Max-Min with priorities: wi = Lk +
Bk
link k is used by flow i
I Improved results:
I |ε| ≈ 4% ; |εmax | ≈ 44%
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 60/130
Validation experiments on the dumbbell topology
Flo
w A
10
0M B/s
10 B/s 0M s
ms 10 m
B/ 0M
0M s L B/s
10 m ms
Some are bandwidth-limited, some are latency-limited 10
ow
B
Fl
Flow A Flow B
2 2
Old Old
Improved Improved
1.5 1.5
1 1
|ε|
|ε|
0.5 0.5
0 0
0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000
Experiments Experiments
Conclusion
I SimGrid uses an accurate yet fast sharing model
I Improved model is validated against GTNetS
I Accuracy has to be evaluated against more general topologies
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 61/130
Validation experiments on random platforms
SPARCs
Boston
Gregory
Gagnon
Mahoney
Turcotte
I Bentz
UniPress
Interleaf
Horne
AutoCAD
Alain
I Florient
Romano
Poussart
Vincent
Toronto
Ltd
Jean_Maurice
Pointe_Claire
Linda Saint_Amand
Ronald Foisy
Jupiter
Toulouse
George
Ginette
Fourier
Wright
Frank
Wilfrid Ottawa
Jean_Yves
Pedro Vehlo, Arnaud Legrand. Accuracy Study and Improvement of Network Simulation in the
SimGrid Framework. Submitted to SimuTools’09.
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 62/130
Validation experiments on random platforms
Summary of experiments
1 2
Improved
Improved
1.5
0.5 1
0.5
0 0
3 3
2 2
1 1
Ratio
Ratio
0 0
−1 −1
−2 −2
−3 −3
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160
Experiment Experiment
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 63/130
Simulation speed
200-nodes/200-flows network sending 1MB each
GTNetS SimGrid
# of flows Simulation time simulation
simulated
Simulation time simulation
simulated
10 0.661s 0.856 0.002s 0.002
25 1.651s 1.712 0.008s 0.010
50 3.697s 3.589 0.028s 0.028
100 7.649s 7.468 0.137s 0.140
200 15.705s 11.515 0.536s 0.396
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 64/130
Simulation speed
200-nodes/200-flows network sending 1MB each
GTNetS SimGrid
# of flows Simulation time simulation
simulated
Simulation time simulation
simulated
10 0.661s 0.856 0.002s 0.002
25 1.651s 1.712 0.008s 0.010
50 3.697s 3.589 0.028s 0.028
100 7.649s 7.468 0.137s 0.140
200 15.705s 11.515 0.536s 0.396
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid 65/130
Conclusion
Models of “Grid” Simulators
I Most are overly simplistic (wormhole: slow and inaccurate at best)
I Some are plainly wrong (OptorSim unfortunate sharing policy)
Structural description
I Hosts list
I Links and interconnexion topology
Structural description
I Hosts list
I Links and interconnexion topology
Peak Performance
I Bandwidth and Latencies
I Processing capacity
Structural description
I Hosts list
I Links and interconnexion topology
Peak Performance
I Bandwidth and Latencies
I Processing capacity
Background Conditions
I Load
I Failures
Grid’5000 DAS 3
9 sites, 25 clusters 5 clusters
1,528 hosts 277 hosts
GridPP LCG
18 clusters 113 clusters
7,948 hosts 44,184 hosts
Two-step generators
1. Nodes are placed on on a square (of side c) following a probability law
2. Each couple (u, v ) get interconnected with a given probability
1. Node Placement
(
α if d < L × r
I Locality-aware: probability P(u, v ) =
β if d > L × r
Zegura, Calvert, Donahoo, A quantitative comparison of graph-based models for
Internet topology, IEEE/ACM Transactions on Networking, 1997.
Stub Domains
10 10 10 10
1 1 1
10 10 10 10
1 1 1
Baràbasi-Albert algorithm
I Incremental growth
I Affinity connexion
Barabási and Albert, Emergence of scaling in random networks, Science 1999, num 59, p509–512.
SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation 79/130
Checking two Power-Laws
Out degree rank
100 10
100
outdegree_rank
outdegree_rank
outdegree_rank
outdegree_rank
outdegree_rank
10
10 100
10
1 1
1 10 100 1000 1 10 100 1000 10000 1 10 100 1000 10000 1 10 100 1000 1 10 100 1000
rank rank rank rank rank
outdegree_rank
outdegree_rank
outdegree_rank
outdegree_rank
outdegree_rank
10
10 100
10
1 1
1 10 100 1000 1 10 100 1000 10000 1 10 100 1000 10000 1 10 100 1000 1 10 100 1000
rank rank rank rank rank
100
frequency
100
frequency
frequency
frequency
frequency
100 100 10
10 10 10
10
1 1 1 1 1
1 10 1 10 1 10 1 10 100
outdegree_freq outdegree_freq outdegree_freq outdegree_freq outdegree_freq
outdegree_rank
outdegree_rank
outdegree_rank
outdegree_rank
outdegree_rank
10
10 100
10
1 1
1 10 100 1000 1 10 100 1000 10000 1 10 100 1000 10000 1 10 100 1000 1 10 100 1000
rank rank rank rank rank
100
frequency
100
frequency
frequency
frequency
frequency
100 100 10
10 10 10
10
1 1 1 1 1
1 10 1 10 1 10 1 10 100
outdegree_freq outdegree_freq outdegree_freq outdegree_freq outdegree_freq
Methodological limits
I Necessary condition 6= sufficient condition
I Laws observed by Faloutsos brothers are correlated
I They could be irrelevant parameters
Baford, Bestavros, Byers, Crovella, On the Marginal Utility of Network Topology
Measurements, 1st ACM SIGCOMM Workshop on Internet Measurement, 2001.
I They could even be measurement bias!
Lakhina, Byers, Crovella, Xie, Sampling Biases in IP Topology Measurements, INFOCOM’03.
Methodological limits
I Necessary condition 6= sufficient condition
I Laws observed by Faloutsos brothers are correlated
I They could be irrelevant parameters
Baford, Bestavros, Byers, Crovella, On the Marginal Utility of Network Topology
Measurements, 1st ACM SIGCOMM Workshop on Internet Measurement, 2001.
I They could even be measurement bias!
Lakhina, Byers, Crovella, Xie, Sampling Biases in IP Topology Measurements, INFOCOM’03.
Conclusion
I 10,000 nodes platform: Degree-based generators perform better
I 100 nodes platform
I Power-laws make no sense
I Structural generators seem more appropriate
Conclusion
I 10,000 nodes platform: Degree-based generators perform better
I 100 nodes platform
I Power-laws make no sense
I Structural generators seem more appropriate
Probabilistic Models
I Naive: experimental distributed availability and unavailability intervals
I Weibull distributions:
Nurmi, Brevik, Wolski, Modeling Machine Availability in Enterprise and Wide-area
Distributed Computing Environments, EuroPar 2005.
I Models by Feitelson et Al.: job inter-arrival times (Gamma), amount of work
requested (Hyper-Gamma), number of processors requested: Compounded (2p ,
1, ...)
Feitelson, Workload Characterization and Modeling Book, available at http://www.cs.huji.
il/~feit/wlmod/
Traces
I The Grid Workloads Archive (http://gwa.ewi.tudelft.nl/pmwiki/)
I Resource Prediction System Toolkit (RPS) based traces (http://www.cs.
northwestern.edu/~pdinda/LoadTraces)
I Home-made traces with NWS
SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation 86/130
Example Synthetic Grid Generation
Generate topology and networks
I Topology: Generate a 5,000 node graph with Tiers
I Latency: Euclidian distance (scaling to obtain the desired network diameter)
I Bandwidth: Set of end-to-end NWS measurements
Generate computational resources
I Pick 30% of the end-points
I Clusters at each end-point: Kee’s synthesizer for Year 2008
I Cluster load: Feitelson’s model (parameters picked randomly)
I Resource failures: based on the Grid Workloads Archive
All-in-one tools
I GridG
Lu and Dinda, GridG: Generating Realistic Computational Grids,
Performance Evaluation Review, Vol. 30::4 2003.
I Simulacrum tool
Quinson, Suter, A Platform Description Archive for Reproducible Simulation Experiments,
Submitted to SimuTools’09.
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 88/130
Agenda
Experiments for Large-Scale Distributed Systems Research
Methodological Issues
Main Methodological Approaches
Tools for Experimentations in Large-Scale Distributed Systems
Resource Models in SimGrid
Analytic Models Underlying SimGrid
Experimental Validation of the Simulation Models
Platform Instanciation
Platform Catalog
Synthetic Topologies
Using SimGrid for Practical Grid Experiments
Overview of the SimGrid Components
SimDag: Comparing Scheduling Heuristics for DAGs
MSG: Comparing Heuristics for Concurrent Sequential Processes
GRAS: Developing and Debugging Real Applications
Conclusion
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 89/130
User-visible SimGrid Components
SimDag MSG GRAS AMOK SMPI
Framework for Simple application- Framework toolbox Library to run MPI
to develop applications on top of
DAGs of parallel tasks level simulator distributed applications a virtual environment
XBT: Grounding features (logging, etc.), usual data structures (lists, sets, etc.) and portability layer
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 90/130
User-visible SimGrid Components
SimDag MSG GRAS AMOK SMPI
Framework for Simple application- Framework toolbox Library to run MPI
to develop applications on top of
DAGs of parallel tasks level simulator distributed applications a virtual environment
XBT: Grounding features (logging, etc.), usual data structures (lists, sets, etc.) and portability layer
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 91/130
Argh! Do I really have to code in C?!
No, not necessary
I Some bindings exist: Java bindings to the MSG interface (new in v3.3)
I More bindings planned:
I C++, Python, and any scripting language
I SimDag interface
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 91/130
Argh! Do I really have to code in C?!
No, not necessary
I Some bindings exist: Java bindings to the MSG interface (new in v3.3)
I More bindings planned:
I C++, Python, and any scripting language
I SimDag interface
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 91/130
Argh! Do I really have to code in C?!
No, not necessary
I Some bindings exist: Java bindings to the MSG interface (new in v3.3)
I More bindings planned:
I C++, Python, and any scripting language
I SimDag interface
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 92/130
SimDag: Comparing Scheduling Heuristics for DAGs
Root
1 2
3 4 5
End
Main functionalities
1. Create a DAG of tasks
I Vertices: tasks (either communication or computation)
I Edges: precedence relation
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 93/130
SimDag: Comparing Scheduling Heuristics for DAGs
Root
1 2 2
3 4 5 Time
1
6 5
Time
6
End
Main functionalities
1. Create a DAG of tasks
I Vertices: tasks (either communication or computation)
I Edges: precedence relation
2. Schedule tasks on resources
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 93/130
SimDag: Comparing Scheduling Heuristics for DAGs
Root
1 2 2
2 3
3 4 5 Time
1
4
1 5 4 6
6 5
Time
6
End
Main functionalities
1. Create a DAG of tasks
I Vertices: tasks (either communication or computation)
I Edges: precedence relation
2. Schedule tasks on resources
3. Run the simulation (respecting precedences)
; Compute the makespan
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 93/130
The SimDag interface
DAG creation
I Creating tasks: SD task create(name, data)
I Creating dependencies: SD task dependency {add/remove}(src,dst)
Scheduling tasks
I SD task schedule(task, workstation number, *workstation list,
double *comp amount, double *comm amount,
double rate)
I Tasks are parallel by default; simply put workstation number to 1 if not
I Communications are regular tasks, comm amount is a matrix
I Both computation and communication in same task possible
I rate: To slow down non-CPU (resp. non-network) bound applications
I SD task unschedule, SD task get start time
Running the simulation
I SD simulate(double how long) (how long < 0 ; until the end)
I SD task {watch/unwatch}: simulation stops as soon as task’s state changes
(historical) Motivation
I Centralized scheduling does not scale
I SimDag (and its predecessor) not adapted to study decentralized heuristics
I MSG not strictly limited to scheduling, but particularly convenient for it
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 96/130
MSG: Heuristics for Concurrent Sequential Processes
(historical) Motivation
I Centralized scheduling does not scale
I SimDag (and its predecessor) not adapted to study decentralized heuristics
I MSG not strictly limited to scheduling, but particularly convenient for it
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 96/130
MSG: Heuristics for Concurrent Sequential Processes
(historical) Motivation
I Centralized scheduling does not scale
I SimDag (and its predecessor) not adapted to study decentralized heuristics
I MSG not strictly limited to scheduling, but particularly convenient for it
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 96/130
The MSG master/workers example: the worker
The master has a large number of tasks to dispatch to its workers for execution
sprintf(mailbox,"worker-%d",id);
while(1) {
errcode = MSG_task_receive(&task, mailbox);
xbt_assert0(errcode == MSG_OK, "MSG_task_get failed");
if (!strcmp(MSG_task_get_name(task),"finalize")) {
MSG_task_destroy(task);
break;
}
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 98/130
The MSG master/workers example: deployment file
Specifying which agent must be run on which host, and with which arguments
</platform>
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 99/130
The MSG master/workers example: the main()
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 100/130
The MSG master/workers example: raw output
[Tremblay:master:(1) 0.000000] [example/INFO] Got 3 workers and 6 tasks to process
[Tremblay:master:(1) 0.000000] [example/INFO] Sending ’Task_0’ to ’worker-0’
[Tremblay:master:(1) 0.147613] [example/INFO] Sending ’Task_1’ to ’worker-1’
[Jupiter:worker:(2) 0.147613] [example/INFO] Processing ’Task_0’
[Tremblay:master:(1) 0.347192] [example/INFO] Sending ’Task_2’ to ’worker-2’
[Fafard:worker:(3) 0.347192] [example/INFO] Processing ’Task_1’
[Tremblay:master:(1) 0.475692] [example/INFO] Sending ’Task_3’ to ’worker-0’
[Ginette:worker:(4) 0.475692] [example/INFO] Processing ’Task_2’
[Jupiter:worker:(2) 0.802956] [example/INFO] ’Task_0’ done
[Tremblay:master:(1) 0.950569] [example/INFO] Sending ’Task_4’ to ’worker-1’
[Jupiter:worker:(2) 0.950569] [example/INFO] Processing ’Task_3’
[Fafard:worker:(3) 1.002534] [example/INFO] ’Task_1’ done
[Tremblay:master:(1) 1.202113] [example/INFO] Sending ’Task_5’ to ’worker-2’
[Fafard:worker:(3) 1.202113] [example/INFO] Processing ’Task_4’
[Ginette:worker:(4) 1.506790] [example/INFO] ’Task_2’ done
[Jupiter:worker:(2) 1.605911] [example/INFO] ’Task_3’ done
[Tremblay:master:(1) 1.635290] [example/INFO] All tasks dispatched. Let’s stop workers.
[Ginette:worker:(4) 1.635290] [example/INFO] Processing ’Task_5’
[Jupiter:worker:(2) 1.636752] [example/INFO] I’m done. See you!
[Fafard:worker:(3) 1.857455] [example/INFO] ’Task_4’ done
[Fafard:worker:(3) 1.859431] [example/INFO] I’m done. See you!
[Ginette:worker:(4) 2.666388] [example/INFO] ’Task_5’ done
[Tremblay:master:(1) 2.667660] [example/INFO] Goodbye now!
[Ginette:worker:(4) 2.667660] [example/INFO] I’m done. See you!
[2.667660] [example/INFO] Simulation time 2.66766
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 101/130
The MSG master/workers example: colorized output
$ ./my_simulator | MSG_visualization/colorize.pl
[ 0.000][ Tremblay:master ] Got 3 workers and 6 tasks to process
[ 0.000][ Tremblay:master ] Sending ’Task_0’ to ’worker-0’
[ 0.148][ Tremblay:master ] Sending ’Task_1’ to ’worker-1’
[ 0.148][ Jupiter:worker ] Processing ’Task_0’
[ 0.347][ Tremblay:master ] Sending ’Task_2’ to ’worker-2’
[ 0.347][ Fafard:worker ] Processing ’Task_1’
[ 0.476][ Tremblay:master ] Sending ’Task_3’ to ’worker-0’
[ 0.476][ Ginette:worker ] Processing ’Task_2’
[ 0.803][ Jupiter:worker ] ’Task_0’ done
[ 0.951][ Tremblay:master ] Sending ’Task_4’ to ’worker-1’
[ 0.951][ Jupiter:worker ] Processing ’Task_3’
[ 1.003][ Fafard:worker ] ’Task_1’ done
[ 1.202][ Tremblay:master ] Sending ’Task_5’ to ’worker-2’
[ 1.202][ Fafard:worker ] Processing ’Task_4’
[ 1.507][ Ginette:worker ] ’Task_2’ done
[ 1.606][ Jupiter:worker ] ’Task_3’ done
[ 1.635][ Tremblay:master ] All tasks dispatched. Let’s stop workers.
[ 1.635][ Ginette:worker ] Processing ’Task_5’
[ 1.637][ Jupiter:worker ] I’m done. See you!
[ 1.857][ Fafard:worker ] ’Task_4’ done
[ 1.859][ Fafard:worker ] I’m done. See you!
[ 2.666][ Ginette:worker ] ’Task_5’ done
[ 2.668][ Tremblay:master ] Goodbye now!
[ 2.668][ Ginette:worker ] I’m done. See you!
[ 2.668][ ] Simulation time 2.66766
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 102/130
Agenda
Experiments for Large-Scale Distributed Systems Research
Methodological Issues
Main Methodological Approaches
Tools for Experimentations in Large-Scale Distributed Systems
Resource Models in SimGrid
Analytic Models Underlying SimGrid
Experimental Validation of the Simulation Models
Platform Instanciation
Platform Catalog
Synthetic Topologies
Using SimGrid for Practical Grid Experiments
Overview of the SimGrid Components
SimDag: Comparing Scheduling Heuristics for DAGs
MSG: Comparing Heuristics for Concurrent Sequential Processes
Motivations, Concepts and Example of Use
Java bindings
A Glance at SimGrid Internals
Performance Results
GRAS: Developing and Debugging Real Applications
Conclusion
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 103/130
MSG bindings for Java: master/workers example
import simgrid.msg.*;
public class BasicTask extends simgrid.msg.Task {
public BasicTask(String name, double computeDuration, double messageSize)
throws JniException {
super(name, computeDuration, messageSize);
}
}
public class FinalizeTask extends simgrid.msg.Task {
public FinalizeTask() throws JniException {
super("finalize",0,0);
}
}
public class Worker extends simgrid.msg.Process {
public void main(String[ ] args) throws JniException, NativeException {
String id = args[0];
while (true) {
Task t = Task.receive("worker-" + id);
if (t instanceof FinalizeTask)
break;
BasicTask task = (BasicTask)t;
Msg.info("Processing ’" + task.getName() + "’");
task.execute();
Msg.info("’" + task.getName() + "’ done ");
}
Msg.info("Received Finalize. I’m done. See you!");
}
}
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 104/130
MSG bindings for Java: master/workers example
import simgrid.msg.*;
public class Master extends simgrid.msg.Process {
public void main(String[ ] args) throws JniException, NativeException {
int numberOfTasks = Integer.valueOf(args[0]).intValue();
double taskComputeSize = Double.valueOf(args[1]).doubleValue();
double taskCommunicateSize = Double.valueOf(args[2]).doubleValue();
int workerCount = Integer.valueOf(args[3]).intValue();
Msg.info("Got "+ workerCount + " workers and " + numberOfTasks + " tasks.");
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 105/130
MSG bindings for Java: master/workers example
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 106/130
Agenda
Experiments for Large-Scale Distributed Systems Research
Methodological Issues
Main Methodological Approaches
Tools for Experimentations in Large-Scale Distributed Systems
Resource Models in SimGrid
Analytic Models Underlying SimGrid
Experimental Validation of the Simulation Models
Platform Instanciation
Platform Catalog
Synthetic Topologies
Using SimGrid for Practical Grid Experiments
Overview of the SimGrid Components
SimDag: Comparing Scheduling Heuristics for DAGs
MSG: Comparing Heuristics for Concurrent Sequential Processes
Motivations, Concepts and Example of Use
Java bindings
A Glance at SimGrid Internals
Performance Results
GRAS: Developing and Debugging Real Applications
Conclusion
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 107/130
Implementation of CSPs on top of simulation kernel
Idea
I Each process is implemented in a thread
I Blocking actions (execution and communication) reported into kernel
I A maestro thread unlocks the runnable threads (when action done)
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 108/130
Implementation of CSPs on top of simulation kernel
Idea
I Each process is implemented in a thread
I Blocking actions (execution and communication) reported into kernel
I A maestro thread unlocks the runnable threads (when action done)
Example
I Thread A:
I Send ”toto” to B
I Receive something from B
I Thread B:
I Receive something from A
I Send ”blah” to A
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 108/130
Implementation of CSPs on top of simulation kernel
Idea
I Each process is implemented in a thread
I Blocking actions (execution and communication) reported into kernel
I A maestro thread unlocks the runnable threads (when action done)
Example
I Thread A: Maestro Thread A Thread B
Simulation
Kernel:
I Send ”toto” to B who’s next?
Send "toto" to B
I Receive something from B
I Thread B: Receive from A
Send "blah" to A
(done)
(done)
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 108/130
Implementation of CSPs on top of simulation kernel
Idea
I Each process is implemented in a thread
I Blocking actions (execution and communication) reported into kernel
I A maestro thread unlocks the runnable threads (when action done)
Example
I Thread A: Maestro Thread A Thread B
Simulation
Kernel:
I Send ”toto” to B who’s next?
Send "toto" to B
I Receive something from B
I Thread B: Receive from A
SMPI GRAS
SimDag MSG
SMURF
SimIX network proxy
SimIX
”POSIX-like” API on a virtual platform
SURF
virtual platform simulator
XBT
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 109/130
Agenda
Experiments for Large-Scale Distributed Systems Research
Methodological Issues
Main Methodological Approaches
Tools for Experimentations in Large-Scale Distributed Systems
Resource Models in SimGrid
Analytic Models Underlying SimGrid
Experimental Validation of the Simulation Models
Platform Instanciation
Platform Catalog
Synthetic Topologies
Using SimGrid for Practical Grid Experiments
Overview of the SimGrid Components
SimDag: Comparing Scheduling Heuristics for DAGs
MSG: Comparing Heuristics for Concurrent Sequential Processes
Motivations, Concepts and Example of Use
Java bindings
A Glance at SimGrid Internals
Performance Results
GRAS: Developing and Debugging Real Applications
Conclusion
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 110/130
Some Performance Results
Master/Workers on amd64 with 4Gb
#tasks Context #Workers
mecanism 100 500 1,000 5,000 10,000 25,000
1,000 ucontext 0.16 0.19 0.21 0.42 0.74 1.66
pthread 0.15 0.18 0.19 0.35 0.55 ?
java 0.41 0.59 0.94 7.6 27. ?
10,000 ucontext 0.48 0.52 0.54 0.83 1.1 1.97 ?: #semaphores reached system limit
pthread 0.51 0.56 0.57 0.78 0.95 ?
java 1.6 1.9 2.38 13. 40. ?
(2 semaphores per user process,
100,000 ucontext 3.7 3.8 4.0 4.4 4.5 5.5 System limit = 32k semaphores)
pthread 4.7 4.4 4.6 5.0 5.23 ?
java 14. 13. 15. 29. 77. ?
1,000,000 ucontext 36. 37. 38. 41. 40. 41.
pthread 42. 44. 46. 48. 47. ?
java 121. 130. 134. 163. 200. ?
Simulation Application
Without GRAS
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 113/130
Goals of the GRAS project (Grid Reality And Simulation)
Ease development of large-scale distributed apps
Development of real distributed applications using a simulator
Research Development Research & Development
Code Code Code
rewrite
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 113/130
Goals of the GRAS project (Grid Reality And Simulation)
Ease development of large-scale distributed apps
Development of real distributed applications using a simulator
Research Development Research & Development
Code Code Code
rewrite 11111111
00000000
00000000
11111111
GRDK
API
GRE
SimGrid 1
0 00
11
Simulation Application
011
1 00
Without GRAS With GRAS
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 113/130
Goals of the GRAS project (Grid Reality And Simulation)
Ease development of large-scale distributed apps
Development of real distributed applications using a simulator
Research Development Research & Development
Code Code Code
rewrite API
GRAS GRDK GRE
Simulation Application 0011
11
SimGrid
00
11
00
00
11
Without GRAS With GRAS
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 113/130
Goals of the GRAS project (Grid Reality And Simulation)
Ease development of large-scale distributed apps
Development of real distributed applications using a simulator
Research Development Research & Development
Code Code Code
rewrite API
GRAS GRDK GRE
Simulation Application 0011
11
SimGrid
00
11
00
00
11
Without GRAS With GRAS
Emulation issues
I How to get the process sleeping? How to get the current time?
I System calls are virtualized: gras os time; gras os sleep
I How to report computation time into the simulator?
I Asked explicitly by user, using provided macros
I Time to report can be benchmarked automatically
I What about global data?
I Agent status placed in a specific structure, ad-hoc manipulation API
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 116/130
Example of code: ping-pong (1/2)
Code common to client and server
#include "gras.h"
XBT_LOG_NEW_DEFAULT_CATEGORY(test,"Messages specific to this example" );
static void register_messages(void) {
gras_msgtype_declare("ping", gras_datadesc_by_name("int" ));
gras_msgtype_declare("pong", gras_datadesc_by_name("int" ));
}
Client code
int client(int argc,char *argv[ ]) {
gras_socket_t peer=NULL, from ;
int ping=1234, pong;
gras_init(&argc, argv);
gras_os_sleep(1); /* Wait for the server startup */
peer=gras_socket_client("127.0.0.1",4000);
register_messages();
gras_exit();
return 0;
}
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 117/130
Example of code: ping-pong (2/2)
Server code
typedef struct { /* Global private data */
int endcondition;
} server_data_t;
int server (int argc,char *argv[ ]) {
server_data_t *globals;
gras_init(&argc,argv);
globals = gras_userdata_new(server_data_t);
globals->endcondition=0;
gras_socket_server(4000);
register_messages();
gras_cb_register("ping", &server_cb_ping_handler);
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 119/130
Exchanging structured data
GRAS wire protocol: NDR (Native Data Representation)
Avoid data conversion when possible:
I Sender writes data on socket as they are in memory
I If receiver’s architecture does match, no conversion
I Receiver able to convert from any architecture
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 119/130
Exchanging structured data
GRAS wire protocol: NDR (Native Data Representation)
Avoid data conversion when possible:
I Sender writes data on socket as they are in memory
I If receiver’s architecture does match, no conversion
I Receiver able to convert from any architecture
I Tested solutions
I GRAS
I PBIO (uses NDR)
I OmniORB (classical CORBA solution)
I MPICH (classical MPI solution)
I XML (Expat parser + handcrafted communication)
I Platform: x86, PPC, sparc (all under Linux)
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 121/130
Performance on a LAN
40.0ms
Sender: ppc 22.7ms
sparc x86 17.9ms
8.2ms
10
-2
10
-2 7.7ms 10
-2
4.3ms 5.4ms
3.9ms 3.1ms
2.4ms
0.8ms
ppc 10-3 10-3 10-3
n/a n/a
10-4 10-4 10-4
GRAS MPICH OmniORB PBIO XML GRAS MPICH OmniORB PBIO XML GRAS MPICH OmniORB PBIO XML
34.3ms
18.0ms
12.8ms
10-2 5.2ms 10-2 5.4ms
5.6ms 10-2
3.4ms 2.9ms 3.8ms
2.3ms 2.2ms
I MPICH twice as fast as GRAS, but cannot mix little- and big-endian Linux
I PBIO broken on PPC
I XML much slower (extra conversions + verbose wire encoding)
GRAS is the better compromise between performance and portability
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 122/130
Assessing API simplicity
Experiment: ran code complexity measurements on code for previous experiment
Results discussion
I XML complexity may be artefact of Expat parser (but fastest)
I MPICH: manual marshaling/unmarshalling
I PBIO: automatic marshaling, but manual type description
I OmniORB: automatic marshaling, IDL as type description
I GRAS: automatic marshaling & type description (IDL is C)
Conclusion
GRAS is the least demanding solution from developer perspective
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 123/130
Conclusion: GRAS eases infrastructure development
SURF
virtual platform simulator
With GRAS
XBT
Ongoing applications
I Comparison of P2P protocols (Pastry, Chord, etc)
I Use emulation mode to validate SimGrid models
I Network mapper (ALNeM): capture platform descriptions for simulator
I Large scale mutual exclusion service
Future applications
I Platform monitoring tool (bandwidth and latency)
I Group communications & RPC; Application-level routing; etc.
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments 125/130
Agenda
Experiments for Large-Scale Distributed Systems Research
Methodological Issues
Main Methodological Approaches
Tools for Experimentations in Large-Scale Distributed Systems
Resource Models in SimGrid
Analytic Models Underlying SimGrid
Experimental Validation of the Simulation Models
Platform Instanciation
Platform Catalog
Synthetic Topologies
Using SimGrid for Practical Grid Experiments
Overview of the SimGrid Components
SimDag: Comparing Scheduling Heuristics for DAGs
MSG: Comparing Heuristics for Concurrent Sequential Processes
GRAS: Developing and Debugging Real Applications
Conclusion
Future Plans
SMPI GRAS
I Improve usability SimDag MSG
SimIX
I Extreme Scalability for P2P ”POSIX-like” API on a virtual platform
SURF
I Model-checking of GRAS applications virtual platform simulator
XBT
I Emulation solution à la MicroGrid
Large community
http://gforge.inria.fr/projects/simgrid/
I 130 subscribers to the user mailling list (40 to -devel)
I 40 scientific publications using the tool for their experiments
I 15 co-signed by one of the core-team members
I 25 purely external
I LGPL, 120,000 lines of code (half for examples and regression tests)
I Examples, documentation and tutorials on the web page