Academia.eduAcademia.edu

Parallel simulation Researches survey

2004

The Following survey aims at identifying some of th e prominent researches in the filed of modeling and simulation, by means of their published papers. The selected researches are Richard Fujimoto, David M.Nicol, and Philip A.W ilsey. The following document will attempt to summarize some of their latest works in the field of modelling and simulation, as for the selection of the topics it was based accord ing to their relativity to our interest in parallel simulation of complex networks.

Parallel simulation Researches survey Introduction The Following survey aims at identifying some of the prominent researches in the filed of modeling and simulation, by means of their published papers. The selected researches are Richard Fujimoto, David M.Nicol, and Philip A.Wilsey. The following document will attempt to summarize some of their latest works in the field of modelling and simulation, as for the selection of the topics it was based according to their relativity to our interest in parallel simulation of complex networks. 1. Richard Fujimoto Richard Fujimoto is a prominent researcher in the field of Parallel and Distributed and Simulation. Currently he is a professor in the College of Computing at Georgia Institute of Technology. He got his PhD and MS degrees from the University of California at Berkeley in 1980 and 1983 respectively. He has been active in research in the area of Parallel and Distributed Simulation since 1985. He has given tutorials and delivered lectures on parallel simulation in leading conferences around the globe. He is also very active with the D.O.D (Department of Defense) research activities; especially recently he is the technical lead for time management issues for D.O.D's HLA (High Level Architecture). Also, he is a contributing member of various IEEE societies including the one on Parallel and Distributed Simulation. Besides IEEE, he is an area editor for ACM Transactions on Modeling and Computer Simulation, has chaired the steering committee for the Workshop on Parallel and Distributed Simulation (PADS) from 1990 to 1998. He was also a member of the Conference Committee for the Simulation Interoperability Workshop. In addition to numerous conference and journal contributions, he has co-authored books on Parallel and Distributed Simulation too. In the past decade he has published research activities in Parallel and Distributed Simulation of Communication Networks. Position Statement of Richard Fujimoto: [10] Interoperable distributed simulations have been widely used for D.O.D activities but this technology is yet to find widespread application for the non-military purposes. Most importantly, the feasibility must be sufficiently attractive for a business to invest in the initial expenditures in technology. Embedded computing industry provides a good scope for modeling and simulation. Embedded computers are used to make “smart” devices. Parallel networks of these smart devices will add another dimension i.e. devices will be capable to anticipate and adapt to future events. The distributed systems of embedded devices must be power efficient and their modeling and simulation process must be automated. Interoperability issues amongst components from different or even the same manufacturer must be resolved. Permulla et al, 2002 describes a simulation of a military network using ns2 and GloMoSim. In this case the network models an offshore landing. The network provides communication between troops on the land and naval ships. The simulation models actual networks. Such a simulation can be modeled for non-military purposes too. (1) Experiences Parallelizing a Commercial Network Simulator Following is an overview of a paper by Dr. Fujimoto, relating to “Experiences parallelizing a commercial network simulator” This paper approaches a methodology which extends sequential simulators to run on parallel machines. This methodology will be applied to OPNET simulator. The results show that considerable speedup can be obtained for some OPNET models provided proper partitioning strategies are implemented and simulation attributes are adjusted appropriately. It is very expensive, time consuming and in some cases impossible to construct real models of huge networks. It is also impractical to deploy new protocols throughout the internet. Modeling and simulation of networks over a single processor is often time consuming too. Parallel and distributed simulation provides one solution to this problem. There have been a number of parallel simulators built over the past decade. In spite of these endeavors, sequential simulators are still widely used today. This is due to the overheads in transition to new software running on different languages. The approach in this paper is to parallelize sequential simulators. The methodology is to decompose the system being modeled into subsystems, and running the subsystems on different processors. The methodology implemented in this research particularly assumes that source code of the simulation programs is not available. Hence, there will be minimal changes to the original sequential simulator. Parallel Network Simulation Architecture: Each federate runs a sub-network. A sequential simulator runs this sub-network. RTI provides the communication interface between the sequential simulators running on different machines. A proxy model is added to each federate running on a single processor, providing an interface between the sequential simulator and the RTI. Methodology for Parallel Simulation: 1. 2. 3. 4. The whole network is partitioned into sub-networks. Each sub-network runs on a different processor Proxy model is responsible for communication of one sub-model with the others Optimizations must be applied to improve performance Data flow across federates: A network model consists of node objects and link objects. When a big network is broken down into sub-nets, some links are broken. So end nodes of some links are not available (they are in a different federate). Proxy objects are used to communicate with these nodes. Proxy objects make use of the RTI functions too. Another important feature of the proxy objects is too translate native simulator message format to a well-known one used by the proxies and vice versa. Proxy is divided into two parts. 1. gen_proxy, independent of the protocol, takes care of the time and events. 2. pro_proxy, protocol dependent portion to process specific protocol packets. Data channels between federates may be uni-directional or bi-directional. These channels are implemented based on the HLA publishing/receiving class mechanism. Simulation Time and Event Management: As this is a discrete event simulation, unprocessed events are stored in a queue and processed in a time stamp order. Local time of each simulator must be synchronized with the others. Synchronization is a challenging problem. It is to be made sure that no federate receives an event in its past. Therefore synchronization among federates is an important task. For that purpose an LBTS value is maintained and no federate can advance its simulation beyond that LBTS value. Performance Related Issues Lookahead is used to improve parallelism and hence performance in the system. The larger the value of lookahead, the more the parallelism in the system. When a federate needs information beyond its sub-model, a ghost object is created that models that specific part of the network. This results in reduced memory burden as compared to defining the overall network in every federate. Another research in the same filed is related to “a parallel OPNET simulation” Kowing that OPNET consists of an event based simulation engine, libraries to write models in C, drag and drop style graphical interface and a library of network components. Implementation: FDK (Federated Simulations Development Kit) developed by Fujimoto et al. at Georgia Tech was used for this project. An important task is to calculate the propogation delays, at the link objects. The proxy model computes real delays. OPNET models heavily rely on global state information. To resolve this issue, ghost objects implemented on each federate, store information of the whole network. This process is static and not modifiable at run-time. OPNET also uses interrupts, that make interaction through RTI very tough, so a detailed analysis of the whole network is required to increase the lookahead. Performance: 1. Performance is increased has lookahead is increased. For this, either the network model is partitioned at links with low bandwidth, or distance is increased between federates mapped to low bandwidth. 2. An increase in event density improves performance 3. An improvement in traffic locality reduces cost and increases performance. CONCLUSIONS: This method is easy if the sequential simulator doesn’t extensively use global state information. Problems like zero lookahead and global state make parallelization difficult. Recently OPNET has introduced support for HLA. But this technique is superior because it allows use of existing network models. (2) Generic Framework for Parallelization of Network Simulations Another research by Richard Fugimoto is a study of a Generic Framework for Parallelization of Network Simulations. The goal of the research was to develop and demonstrate a practical, scalable approach to parallel and distributed simulation that will enable widespread reuse of sequential discrete event simulation models and software. The focus was on an approach to parallelization where an existing network simulator was used to build models of subnet works that were composed to create simulations of larger networks. Simulation tools have not been able to keep up with the rapid increase in the size, complexity and speed of modern networks. Which is why an approach that exploits parallel and distributed simulations is needed to improve the performance of the simulation of networks. The approach used in the paper, was to extend the features of ns, and allow it to be interconnected to create parallel simulations. Each simulator will be given the network topology and data flow characteristics, which describe only a portion of the network being simulated. Interactions between the different simulations were done using a runtime infrastructure. A methodology for parallelization was described for simulations run on shared-memory, symmetric multiprocessors and via distributed computing on several workstations. The basic steps required were: 1. Determine how many processes (threads) will be assigned to run the parallel simulation. Ideally, on a system with n-CPUs, the work would be divided into nprocesses. 2. Divide the state set into n partitions and create a one-to-one mapping between partitions and processes. 3. Maintain a separate event list for each physical process, so each process will be concerned with only the events that affect the states in it’s state set. 4. Distribute events during the execution among the physical processes. 5. Add a synchronization/communication mechanism to ensure consistent state management between the processes. 6. Perform optimizations With the above steps a parallel simulation can be constructed on an SMP. However, there are several issues concerning distributed simulations on separate workstations. The issues concern defining physical and logical connectivity between sub models of a divided simulation model. To define connectivity between sub models, such as a source and a sink, which reside on different workstations, the IP Address and port number is used. The steps needed to create a distributed simulation are to determine routing paths, event time management and event communication. Routing paths can be determined by the simulator run some existing and well known routing protocols while the simulation is running in order to exchange dynamic routing information between the sub models. Event time management needs to be implemented. This means, that each simulator must determine that no other simulator can create events at an earlier time before it can be allowed to process it’s most recent event. This can be done using a lower bound time-stamp (LBTS). Both event communication and event time management is provided with a runtime library such as RTIKIT, which provides these services using a multicast group management strategy known as MCAST. Optimizations were made to the event communication/management schemes by decreasing LBTS overhead and using polling on the listener sockets used for communication only when it was sure that it would not block forever. After conducting experiments using an eight-node model in a distributed system using the TCP protocol for communication, an increase in performance was observed that stated a successful parallel simulation. 2. David M.Nicol Next is yet another researches in the field of network simulation .Mr.David M.Nicol, who is curretnly a Professor in the Electrical and Computer Engineering, department in the univeristy of Illionis. Professor Nicol’s area of research is parallel simulation, of large scale networks, either building tools dor analysis or investigation of causes for the precessince of certain applications(such as Worm inestation). (1) A Mixed Abstraction Level Simulation Model of Large-Scale Internet Worm Infestations This paper was a proceeding of the 10th IEEE International Symposium on Modeling, Analysis, & Simulation of Computer & Telecommunications Systems written by David Nicol along with other authors. The purpose of this paper is to model large-scale worm infestations in order to assess their threat levels, evaluate countermeasures and investigate their possible influence on the Internet infrastructure. The paper describes the approach of the simulation, the collection of data and modeling of certain essential model elements, such as topology, population distributions, and scanning traffic. The method used for modeling Internet-worm infestations is based on a mixed abstraction simulation by using selective abstraction through Epidemiological models combined with detailed protocol models. The epidemiological model originally developed for the study of biological diseases, greatly simplifies modeling the worm propagating in the network because it reduces the complexity of the model, and it is a better match for the limited available data on the events. The epidemiological model also helps in gathering information about worm propagation dynamics and it effect on the routing infrastructure. To improve the reliability of the simulation, the authors made an assumption that the worm scanning traffic induces an increase in BGP (Border Gateway Protocol) routing message traffic. Based on this assumption, three models are required for simulation; a model of how the worm propagates and infects hosts in the Internet, a traffic model for the scans emitted by the worm and a model of how the worm scans induce stress on routers. Furthermore, in order to study the system at the level of inter-domain routing, the system is decomposed spatially into autonomous systems (AS’s). This would help in developing a stratified epidemic model for worm propagation such that the host population is stratified into AS’s. The underlying data of the simulation includes both stochastic (chaotic) and deterministic versions. Since the population is sufficiently large, the stochastic models are approximated by a system of equations based on a continuous state-continuous time deterministic model. These equations rest upon AS’s the law of mass action AS’s which incorporates the principle of Homogeneous mixing. Unfortunately, due to limited time and memory size, there was a constriction in the number of BGP routers used in the mixed abstraction model. As a result the model was down scaled to simulating only a few hundred autonomous systems. However, in the future, the use of parallel execution techniques and judicious abstraction could make the simulation of a few thousand AS's possible. Thus a better interpretation of the model output would be achieved. (2) Utility Analysis of Parallel Simulation Another document also published by David M. Nicol carrying the title “Utility Analysis of Parallel Simulation”. We shall attmept to summarize the model and partitioning analysis part of the original document. The summary section of this document is divided into two parts, Model (a summary of the model), and partitioning (a summary of the partitioning and analysis section) 1.0 Utility Analysis of Parallel Simulation summary: 1.1 Model: Recognizing that large problems are user dependent, the approach uses the notation of user defined utility. The problem size described by variable m is supposed to be able to be characterized into problem units, and µ(m) is used to denote the user utility of simulating a problem with size m. Although the size is discreet, using it as a continues quantity wont effect the obtained results. The purpose of the model is to capture the notion that the users utility grows as the problem size simulated grows. A simple model that expresses a wide rang of growth is µ(m) = cm mα, for some positive constant cm. Exponent α expresses how rabidly the utility grows, and turns out to be a key determinant to the optimal system configuration. With large problem sizes, and to push the system to equilibrium, the problem must be advanced further into simulation time. This implies a trade-off problem, is the added utility large enough to offset the added computational cost? With a parallel machine with N processors, the system might be used in a variety of ways to execute the simulation. One extreme is using all processors concurrently to run problem not larger than size mx, another extreme is to use the entire machine in parallel to simulate one problem of size no grater than Nmx. A utility rate can be associated with each partition of the system, and can be calculated by dividing the utility gained by one experiment of the chosen size by the time needed to complete the experiment. The aggregate utility rate can be found by adding all the systems partitions utility rates, and can be used to compare different configuration of the system. The approach can be extended by adding a cost that varies with the number of used processors of the parallel machine. When approaching optimization problems with a model that is dependent on the problem size, and the number of processors used, it shows that the maximized aggregate rate is a result of using one of the extremes; fully parallel or not at all. “Determination of which extreme is best depends on the rate of utility increase (α) in problem size, the rate ( ) at which length of the simulation must grow to reach equilibrium as the problem size grows, and the rate of performance increase ( ) as additional processors are used in the simulation. Of these only α is subjective, and the user’s perception of how utility increases in problem size effectively determines which of the extreme configurations optimizes the aggregate utility rate.”[1.3] The cost of using a machine with N processors is supposed to be proportional to the execution time multiplied by N , for > 0. for any value for > 0 it is shown that the configuration that optimizes utility rate b > 0 per unit cost is an extreme. The native application behaviors is described be the problem size m, and the length of simulation time T(m) needed for interesting run of the problem of size m. The simulation length can be either dependent or independent of the problem size. Two characteristics are used to describe the capabilities of simulation. The first is ( ) the average execution time needed to evaluate a unit of problem simulation second on one CPU. The dependence of both the simulation length, and the execution time per unit problem can be denoted by modeling the execution on processor as x(m,1) = ct γm ε1 ∗ m ∗ m ε 2 ( ) x(m,1) = ct γ * m1+ε , where ε = ε 1 + ε 2 the second characteristic describes the ability of the simulation to be parallel. Letting a(N) be the speedup of execution on a parallel system sing N processors, the speedup is let to be a(N) = N , for (0, 1), and N [1, Nx]. this model accounts for behavior where adding processors improves performance. Using these concepts the execution time of an application is expressed as x(m, n ) = x(m,1) a (n ) ct * γ * m1+ c x(m, n ) = nβ Using the utility model µ(m) the utility rate at which utility is accrued simulating a problem of size m, using N processors is λ µ (m, n ) = µ ( n) x(m, n ) λ µ (m, n) = K µ m α −(1+ε ) N β Where K µ = Cm γ * Ct 1.2 Partitioning: For a parallel system, and to employ the systems resources, many partitioning possibilities can be used, for example ½ the system can work on one problem, a ¼ on a smaller problem, and the remaining ¼ on individual problems. An analysis was conducted using the utility function and the utility rat equation on different partitioning scenarios. In the analysis the equations were treated as continues although they are discrete to simplify the analysis. The analysis was conducted through assuming a number of theories and lemmas and proving them, and in all of the lemmas and theories the analysis showed that the optimal partitioning is fully parallel or fully serial. 2.0 Conclusion: When using parallel systems, partitioning is used to decide whether to run simulations using the entire machine in parallel, in serial, or a mix of both. Using a utility function, and a utility rate function that was derived using the equations for simulation length, and execution time, it was shown that the most optimized solution to maximize the aggregate rate at which user’s utility is accrued is an extreme. The two extremes that maximize the aggregate rate were either using the machine fully parallel, or using it fully serial. 3. Philip A. Wilsey Another promenant researhcer in the filed of network simulation and analysis is Mr. Philip A. Wilsey whose work in the area of parallel simulation of complex network has greatlly benefeted other researhces in the files. We will attempt to look on his analysis of the Active Networking Architecture, in the following overview of his work Active networking architecture enabled the integration of embedded computational abilities, within conventional networks. Therefore increasing efficiency and capacity of current networks through incrementing their customization ability for each specific computation. However this happened with the added side effect of increased complexity and the massive increase in size, thus making conventional analytical methods of modelling, simulation and analysis techniques obsolete. The Answer was a discrete event simulation technique so as to simplify and parallelize the simulation process so as to maintain maximum efficiency. The paper at hand by Dhananjai M. Rao and Philip A. Wilsey, describes an integrated environment for the modelling and simulation (including parallel mode simulation) of Active networks. The Environment “Active Network Simulation Environment” (A.N.S.E.) incorporates a synchronized Time warp simulation kernel of WARPED. Thus enabling parallel simulation, it also provides support for Packet Language for Active Network (P.L.A.N.). In this comprehensive survey we shall attempt to shed the light on the theory behind the architecture, and the construction of A.N.S.E. Theory behind the Architecture: A.N.S.E. was created with the intention of parallel simulation in mind from day one, this lead to the development of a framework around a general purpose discrete simulation kernel, with the use of object oriented rather than a structured infrastructure; this provided us with a mush required “separation of concerns”. And the ability to use various simulation kernels without need to change the modules already created. As mentioned earlier ANSE incorporates a time warp WARPED synchronized kernel. WARPED is an API(Application program Interface) with various implementation, one being a Time warped optimistic synchronization strategy. This implementation has been used since an active network is based on the idea that the nodes constituting the network have a customizable computational ability on the datagrams flowing through it. Thus enabling Kernel to organize the simulation into asynchronous communicating logical processes (LP’s). Communication between various LP’s is done through exchange of virtualtime stamps, while each process maintains its own Local virtual time (LVT). However this mechanism is error prone with errors referred to as straggler events may occur. Nevertheless a rollback feature is made available to recover from the causality error. Recovery is only for LP’s prior to the error, while those where the error was created in or resulted in creating the error are destroyed, and then continue execution of LP’s in their previous order. Each LP also maintains a list of input/outputs and another for transitions between states to perform efficient rollbacks, discarding events that are no longer needed. Finally the Warp kernel provides an Interface to build LP’s according to the Jefferson definition of time Warp. Also the ability to create different LP’s with unique state definitions, with the clustering nomenclature adding more simplicity to the API, without the hassle of having to synchronize clusters, since control is exchanged between the application and the simulation Kernel through cooperative use of function calls. Overview of the blocks constructing ANSE: We shall attempt to look at each module presented in figure 1, and describe its operation, in order to understand the construction of ANSE. Topology Specification Language (TSL): The main input of the environment which is to be simulated (the network at hand) is given in TSL. The Backus Normal Form (BNF) grammar of TSL specifies a set of interconnected topology specifications each consisting of 3 main categories 1. Object definition section Contain module details, which will be used in the simulation 2. Object instantiation section Specifies the various nodes constituting the Topology 3. The Netlist section Defines interconnectivity, between variously instantiated nodes. The topology also makes use of labels to define related segments of the code. TSL Parser: The parser is used to convert (Parse) the input topology into an object oriented TSL intermediate format (TSL-IF). TSL-IF is implemented in C++ and is a set of cross referenced classes. It is available through the Purdue Compiler Construction Tool Set (PCCTS). The intermediate Format is accomplished by filling in the references in the various C++ classes with appropriate values. Static elaborator: The Part of the Environment used to reformat specification of large networks for the use of smaller sub networks, as “Hierarchical constructs provide convenient techniques to specific large networks by reusing the specification for smaller sub networks”. While elaboration is defined to be braking down of large hierarchical constructs into their constituting components. The Elaborated Topology is in TSL-If. The elaborator traverses the user-specified sub-topologies in the model creating and instantiating objects and subtopologies, as sub-topologies are instantiated they are then imploded into a major (enclosing) topology. Static elaboration is done since we are operating before the code generation step(opposite to the choice of dynamic elaboration). Code Generator: Generates a C++ code (simulatable model) from the Elaborated TSL-If description, supplied from the Static elaborator. The generated code is compliant with the ANSE API. It is also worth mentioning that the Code Generator may be replaced to provide compatibility with other frameworks. ANSE API and Library: As mentioned earlier ANSE provides an interface to define logical processes (LP’s). The processes are defined as entities with the ability to send, receive and act upon events by applying a set of internal states (internal to the LP). The Lp’s are created using an object oriented infrastructure with a class performing the role of a master (Object) class which is “NetworkNode”, from which all classes are inherited, and created. The API also provides State support through classes such as “NetworkNodeState”, and “ActiveNodeState” (baring in mind the role of nodes in creating active network architecture, shows the importance of such classes). The state classes are used to hold state information for each node/component. This enables the simulation kernel (WARPED) to perform rollbacks, thus a recovery mechanism from casual violations that might occur due to the optimistic nature of the time warp simulation. The discrete event in the system is the Packet represented by the Packet Class. Finally it is worth mentioning that the API is created using C++ making use of its robust operation. PLAN Library: “PLAN is a simple, functional programming language based on a subset of ML with some added primitives to express remote evaluation” [1]. In active network architecture packets can contain PLAN programs, to help customize operation for various network operations. Same as the API libraries, PLAN also makes use of an object oriented infrastructure enabling the use of Master classes such as ”PacketInjectors” to inject PLAN programs or packets into the simulation environment, to give an example. As for runtime operation the support of PLAN from The ANSE enables simulation of large, complex networks within limited hardware requirements of course to a certain point of complexity. Conclusion: In conclusion It is the Testimony of the respected researches who have wrote this paper that “it is better to have a simple, yet flexible language such as TSL, for modelling network Topologies. It is useful to have a clear delineation between the languages for developing the software modules for networking components and network modelling language.”[1] The inter-operability between different types of models, and simulators from my point of view is certainly the greatest achievement of the ANSE Project. Glossary Federation: In HLA, a parallel/distributed simulation. Federate: individual simulator. Lookahead: “In parallel simulation, it is the minimum of the packet delivery delays in all the links of a sub-model that cross boundaries of partition” LBTS: Lower bound on time stamp. References: [1] Modeling and simulation of Active Networks, by: Dhananj M. Rao and Philip A. Wilsey, Experimental Computer Laboratory. [10] Distributed Simulation and Industry: Potentials and Pitfalls Proceedings of the 2002 Winter Simulation Conference