Academia.eduAcademia.edu

Towards Hybrid Model Persistence

2018

Change-based persistence has the potential to support faster and more accurate model comparison, merging, as well as a range of analytics activities. However, reconstructing the state of a model by replaying its editing history every time the model needs to be queried or modified can get increasingly expensive as the model grows in size. In this work, we integrate change-based and state-based persistence mechanisms in a hybrid model persistence approach that delivers the best of both worlds. In this paper, we present the design of our hybrid model persistence approach and report on its impact on time and memory footprint for model loading, saving, and storage space usage.

Towards Hybrid Model Persistence Alfa Yohannis1,3 , Horacio Hoyos Rodriguez∗1 , Fiona Polack∗∗2 , and Dimitris Kolovos1 1 Department of Computer Science, University of York, United Kingdom 2 School of Computing and Maths, Keele University, United Kingdom 3 Department of Computer Science, Institut Teknologi dan Bisnis Kalbis, Indonesia {ary506, dimitris.kolovos}@york.ac.uk ∗ horacio hoyos [email protected] ∗∗ [email protected] Abstract. Change-based persistence has the potential to support faster and more accurate model comparison, merging, as well as a range of analytics activities. However, reconstructing the state of a model by replaying its editing history every time the model needs to be queried or modified can get increasingly expensive as the model grows in size. In this work, we integrate change-based and state-based persistence mechanisms in a hybrid model persistence approach that delivers the best of both worlds. In this paper, we present the design of our hybrid model persistence approach and report on its impact on time and memory footprint for model loading, saving, and storage space usage. 1 Introduction Change-based persistence (CBP) of models [1] conforming to metamodelling architectures such as MOF/EMF [2,3] comes with notable advantages over statebased persistence (SBP): it provides support for fast comparison and differencing of versions of the same model [4,5,6,7] – which can also substantially speed up incremental model management activities, and enables novel model analytics activities (e.g. pattern detection in the editing history to understand how modellers use modelling languages and tools) [8]. However, CBP comes at the cost of ever-growing model files [6,8] since all changes (even deleting model elements) are recorded in an editing log, which naturally leads to longer loading times [9]. In this work, we address the latter challenge by introducing the concept of hybrid persistence of models. In hybrid model persistence the change-based representation is augmented with a state-based representation (which may be derived from the change-based representation) of the latest state of the model which is used to speed up model loading and querying. The paper is structured as follows. Section 2 introduces the concept of changebased model persistence and recent work on state-based model persistence. Sections 3 and 4 present our approach to hybrid model persistence and its implementation. Section 5 presents experimental results and evaluation. Section 7 provides an overview of related work, and Section 8 concludes with a discussion on directions for future work. 2 Change and State-based Model Persistence To explain the differences, benefits and drawbacks of CBP and SBP, consider a modelling activity on a UML model as presented Fig. 1. The sub-figures 1a to 1f depict the evolution of a UML model at different time stamps. Classes are created and added/removed from Package X. In SBP, for each session, only the final state of the model is persisted (the state of previous session are overridden by the state of the latest session). Thus, to represent the final state of the UML model, only the information about Package X and Class C needs to be persisted, as presented in Listing 1 (XMI format). In CBP, all the changes in the model are persisted. Thus, a list of all the events generated by the model editor is needed to represent the final state of the model. (a) Time stamp 1 (b) Time stamp 2 (c) Time stamp 3 (d) Time stamp 4 (e) Time stamp 5 (f) Time stamp 6 Fig. 1: The states of the example model after certain changes and their corresponding lines in Listing 2. A session depicts a set of changes made between save events, i.e. a session comprises all the changes that happened since the last time that the model was persisted. The CBP representation is shown in Listing 21 . Lines 1-7 represent the initial state (Fig. 1a), followed by lines 8 (Fig. 1b), 9 (Fig. 1c), 11 (Fig. 1d), 12 (Fig. 1e), and 13 (Fig. 1f). Table 1 summarises the benefits (+) and drawbacks (-) of change and state-based model persistence. To load an SBP model, only the elements that exist in the final state need to be loaded into memory. To load a CBP model, all the events that lead to the final state must be replayed to load the model in memory. Loading times for SBP models are proportional to the size of the model. Loading times for CBP models are proportional to the number of events. As a result, loading times of CBP models will always increase over time and are considerably longer than for SBP [10,9]. 1 We use a natural language pseudo-code for CBP, introduced in [1,10] To store an SBP model, all the elements that exist in the final state must be persisted. To save a CBP, only the change events in the last session need to be persisted. Storing times of SBP models are proportional to the size of the model. Storing times of CBP models are proportional to the number of events in a session. As a result, storing times of CBP models can be considerably shorter than for SBP models [10]. Comparing and finding the differences between two versions of a state-based model is expensive [11] (O(N 2 ) in the general case) which affects the efficiency of change visualisation and comprehension, and has a substantial impact on downstream activities such as incremental model transformation [12] and validation. Listing 1: The UML2 model of the example Listing 2: The textual CBP for producing state-based model in model in Fig. 1. List. 1. Its visual illustration is 1 <uml:Package xmi:id="1" name="X"> 2 <packagedElement xsi:type="uml:Class" in Fig. 1. 3 xmi:id="3" name="C"/> </uml:Package> Table 1: Comparison of model persistence approaches. Dimensions Change-based State-based Load Time − + Save Time + − Comparison Time + − Storage Space − + 1 2 3 4 5 6 7 8 9 10 11 12 13 session 1 create p1 type Package set p1.name to "X" create c1 type Class set c1.name to "A" create c2 type Class set c2.name to "B" add c1 to p1. packagedElement add c2 to p1. packagedElement session 2 set c2.name to "C" remove c1 from p1. children delete c1 By contrast, in CBP, changes are first-class entities in the persisted model file and as such, model comparison and differencing is relatively inexpensive. The main downsides of CBP are it’s model file sizes [8,6] and ever-increasing loading times [9]. Loading times can be reduced by around 50% by processing the changelog, detecting, memorising and subsequently ignoring change events that have no impact to the final state of the model. The loading times are still substantially longer – more than 6.4 times slower and even longer as the persisted changes increase – than loading times for state-based approaches [10]. 3 Hybrid Model Persistence To achieve the best of both worlds we introduce a hybrid model persistence approach which combines change-based and state-based model persistence, to work together side-by-side. An overview of the proposed approach is illustrated in Fig. 2. In the proposed approach a hybrid model is stored in two representations at the same time: a change-based (e.g. using CBP) and a state-based representation (e.g. using XMI or a database-backed approach such as NeoEMF). The change-based representation is perceived as the main representation of a model, while the state-based representation can be fully derived from the change-based representation. Fig. 2: The mechanism of hybrid model persistence. Loading a hybrid model. Models are loaded into in-memory object graphs that clients (e.g. editors, transformations) can then interact with2 . In the proposed hybrid approach, if the state-based counterpart already exists, the inmemory object graph is populated from it; otherwise, it is populated by replaying the complete editing history recorded in the change-based representation. Changing a hybrid model. When an element in a loaded model is created, modified or deleted, the change is applied to the in-memory object graph and is also recorded in an in-memory list of changes (Editing session changes in Fig 2). We use the term editing session for the period between loading a model and saving back to disk. Saving a hybrid model. The current version of the in-memory object graph is stored in the preferred state-based representation. The list of changes recorded in the current editing session (with optional processing, as described above) is appended to the change-based representation. Versioning a hybrid model. Since the state-based representation is fully derived from the change-based representation, if a model needs to be versioned (e.g. in a Git repository), only the change-based representation needs to be stored. The first time it is loaded after being checked out/cloned, the state-based representation is computed and persisted locally and is used in subsequent model loading steps. Comparing hybrid models. To compare two hybrid models3 , their changebased representations are used: this is much more efficient than state-based comparison. 2 3 Depending on the state persistence mechanism, the object graph may be loaded in its entirety at startup (e.g. XMI) or loaded progressively, in a lazy manner (e.g. NeoEMF/CDO) The work of the hybrid model comparison is still in the preliminary stage and out of the scope of this paper. 4 Implementation We have implemented the proposed hybrid model persistence approach in a prototype4 on top of the Eclipse Modeling Framework (EMF) [3]. The prototype makes use of an existing implementation of change-based model persistence, the Epsilon CBP [1], augmented with two state-based persistence implementations: NeoEMF [13] and XMI [14]. XMI has been selected as a standard state-based model persistence format (natively supported by EMF), and NeoEMF as a best-of-breed representative of database-backed state-based model persistence frameworks. The core components of the prototype are presented in Fig. 3. Fig. 3: Class diagram showing the core components of the hybrid model persistence implementation. The Epsilon CBP provides a ChangeEventAdapter class [1] that extends from Ecore’s EContentAdapter adapter class. This class collects changes made to the in-memory object graph of an EMF model in the form of a list of events changeEvents. Based on this class, we derived an adapter class, HybridChangeEventAdapter, for the hybrid model persistence implementation. It is an abstract class so that it can be further derived to create different implementations of adapter classes for different types of state-based persistence. The HybridNeoEMFChangeEventAdapater is the adapter class for NeoEMF, and the HybridXMIChangeEventAdapater for XMI. These classes override notifyChanged(Notification) in the ChangeEventAdapter class, to handle events that are specific to NeoEMF and XMI, respectively. We also created a resource class for hybrid persistence, HybridResource (a resource class is a class dedicated to interacting with a persistence, e.g. save, load, get contents), derived from the Ecore’s ResourceImpl. The class is again abstract so that it can be realised in different resource implementation classes for different state-based persistence. The HybridResource class contains the stateBasedResource field which is used to refer to a state-based persistence that is being 4 The prototype emf-cbp. is available under https://github.com/epsilonlabs/ used, and the cbpOutputStream field that refers to an OutputStream (e.g. file, inmemory) as the representation of the CBP for saving changes. HybridResource has an association with HybridChangeEventAdapater, so that the former can access the events collected by the latter, and the latter can also use facilities provided by the former (e.g. getting the identity of an element in the resource; saving changes to a change-based model representation). The resource implementation classes for NeoEMF and XMI are HybridNeoEMFResourceImpl and HybridXMIResourceImpl respectively. HybridNeoEMFResourceImpl also implements the NeoEMF’s PersistenceResource interface so that specific NeoEMF’s methods can be used (e.g. close(), to close a connection with a backend database). 5 Evaluation In this section, we compare hybrid model persistence (Epsilon CBP with each of NeoEMF and XMI) vs state-based persistence (NeoEMF or XMI only) on storage space usage, loading and saving time and memory footprint, and demonstrate that hybrid model persistence can still perform fast model loading and saving. The evaluation was performed on Intel R CoreTM i7-6500U CPU @ 2.50GHz 2.59GHz, 12GB RAM, and the JavaTM SE Runtime Environment (build 1.8.0 162-b12). For the evaluation, we used models reverse-engineered from the Java source code of the Epsilon [15,16] and BPMN2 [17] projects. For state-based representation of the models, we used the MoDisco tool [18] to generate XMIbased UML2 [19] models that reflect the classes, fields, and operation signatures of the source code of the project and then imported the generated models into NeoEMF. We also derived MoDiscoXML models [20] from the Wikipedia article on the United States [21]. We then used reverse-engineering to generate a CBP for each project based on the differences between consecutive versions of the models. Table 2: Space usage for the Epsilon and BPMN2 projects, and the Wikipedia’s United States article. Case Epsilon BPMN2 Wikipedia Generated 940 commits 192 commits 10,187 versions From Type XMI NeoEMF CBP XMI NeoEMF CBP XMI NeoEMF CBP Element 88,020 88,020 — 62,062 62,062 — 13,112 13,112 — Count Event — — 4.3 m — — 1.2 m — — 62.3 m Count Space 9.44 188 406 6.55 134 109 1.28 31.8 5.85 Size MBs MBs MBs MBs MBs MBs MBs MBs GBs Average 112 2 98 110 2 92 102 2 98 Space bytes/ KBs/ bytes bytes/ KBs/ bytes bytes/ KBs/ bytes Size element element /event element element /event element element /event m = million events, MB = Megabytes, KB = Kilobytes 5.1 Storage Space Usage For the Epsilon project, we have successfully generated a CBP from version 1 up to version 940 and also CBPs for the BPMN2 project and Wikipedia article up to version number 192 and 10,187 respectively. The details (element count, event count, space size, and average space size per element or event) of their models, when persisted in XMI, NeoEMF, and CBP are shown in Table 2. The last row of the table derives an average space usage per element (for the SBPs) or event (for the CBP). We can estimate the storage space usage for a hybrid model persistence to be the combination of CBP and the appropriate SBP space usage. Table 3: The comparison on time and memory footprint for loading and saving models of the hybrid and state-based-only persistence. Dimension Loading Time Saving Time Loading Memory Footprint Saving Memory Footprint 5.2 NeoEMF XMI NeoEMF BPMN2 XMI NeoEMF Wikipedia XMI Hybrid mean sd 0.292 0.061 0.317 0.006 0.308 0.071 0.212 0.016 0.262 0.048 0.045 0.001 State-based Significance mean sd W p-value 0.279 0.023 258 0.72 0.270 0.018 26 < 0.05 0.286 0.025 230 0.79 0.179 0.016 37 < 0.05 0.273 0.062 250 0.86 0.040 0.001 0 < 0.05 NeoEMF XMI NeoEMF BPMN2 XMI NeoEMF Wikipedia XMI 0.0892 0.411 0.0777 0.33 0.135 0.024 0.0829 0.397 0.0775 0.28 0.120 0.020 Case Backend Epsilon Epsilon 0.0421 0.023 0.0424 0.007 0.048 0.048 NeoEMF 38.601 0.878 10.014 XMI 10.72018 0.00022 10.72009 NeoEMF 40.78 1.29 27.20 BPMN2 XMI 6.73367 1.29305 6.73367 NeoEMF 35.91 1.03 27.25 Wikipedia XMI 8.4079 0.0008 8.0933 Epsilon NeoEMF XMI NeoEMF BPMN2 XMI NeoEMF Wikipedia XMI The time is in seconds, Epsilon 2.64 1.29 2.61 1.56355 0.0005 1.56326 1.86 3.86 1.52 0.8378 0.00361 0.8375 1.32 1.51 0.97 0.0010 0.00044 0.0005 and the memory footprint 0.0494 0.015 0.0452 008 0.024 0.002 216 0.55 78 < 0.05 213 0.51 0 < 0.05 218 0.59 42 < 0.05 1.088 0 < 0.05 0.00024 0 < 0.05 1.05 0 < 0.05 0.00056 101 < 0.05 0.54 27.25 0.54 0.0009 0 < 0.05 0.78 283 0.34 0.0018 408 < 0.05 0.77 308 0.12 0.00362 58 < 0.05 0.76 189 0.22 0.00001 0 < 0.05 is in MBs. Time and Memory Footprint of Loading and Saving Models We evaluated the performance of our hybrid persistence prototype against XMI and NeoEMF regarding time and memory footprint for loading and saving. We repeated our experiments 22 times for each dimension measured. Since the data were not normally distributed, we used the nonparametric Mann-Whitney U test [22] with a significance level of 5%. As it can be noticed in Table 3, all cases experience a slight slowdown on loading and saving time (hybrid approach’s mean > state-based approach’s mean). However, almost for all NeoEMF cases, the slowdown is not significant, which means that side-effect of the hybrid approach on loading and saving time is still acceptable. The hybrid approach also produces more memory footprint compared to the state-based-only approach. Nevertheless, considering the cost of main memory, this condition is acceptable in almost all real-world scenarios. 6 Discussion The use of state-based persistence in hybrid model persistence enables faster model loading, as shown by the result of loading time evaluation in Section 5.2, without having to replay all the changes persisted in its CBP – the main challenge for the change-based approach [10,9]. Hybrid model persistence performs slightly slower – statistically significant for Hybrid XMI but insignificant for Hybrid NeoEMF – compared to loading a state-based model. A slight slowdown also appears on model saving – statistically significant for Hybrid XMI but insignificant for Hybrid NeoEMF (Section 5.2). The slowdown is because changes have to be persisted into two representations, state-based and change-based. The main drawback of hybrid model persistence is that it consumes more memory when loading and saving and storage space for persisting models compared to state-based representation only (Sections 5.2 and 5.1). However, considering the cost of main memory and storage, the trade-off can be acceptable in most real-world scenarios. 7 Related Work There are several non-XMI approaches to state-based model persistence, using relational or NoSQL databases. For example, EMF Teneo [23] persists EMF models in relational databases, while Morsa [24] and NeoEMF [13] persist models in document and graph databases, respectively. None of these approaches provides built-in support for versioning and models are eventually stored in binary files/folders which are known to be a poor fit for text-oriented version control systems like Git and SVN. Connected Data Objects (CDO) [25], which provides support for database-backed model persistence, also provides collaboration facilities, but CDO adoption necessitates the use of a separate version control system (e.g. a Git repository for code and a CDO repository for models), which introduces fragmentation and administration challenges [26]. Similar challenges arise in relation to other model-specific version control systems such as EMFStore [7]. 8 Conclusions and Future Work In this paper, we have proposed a hybrid model persistence approach and evaluated its impact on time and memory footprint for model loading and saving, and storage space usage. Based on the evaluation results, the hybrid model persistence provides benefits on model loading time with an acceptable trade-off on memory footprint and storage space usage. Currently, we are still working on the hybrid model comparison (Section 3 – Comparing hybrid models). So far, the progress is promising. Based on our preliminary investigation, it can detect atomic changes of models faster than state-based model comparison, e.g. detecting elements that have been removed from older versions. In the future, we plan to evaluate hybrid model persistence on even larger models and perform experiments where software modellers are asked to construct change-based models. We also plan to develop a solution for the efficient merging of change-based and hybrid models. Acknowledgements. This work was partly supported by through a scholarship managed by Lembaga Pengelola Dana Pendidikan Indonesia (Indonesia Endowment Fund for Education). References 1. Yohannis, A., Kolovos, D.S., Polack, F.: Turning models inside out. In: Proceedings of MODELS 2017 Satellite Event: Workshops (ModComp, ME, EXE, COMMitMDE, MRT, MULTI, GEMOC, MoDeVVa, MDETools, FlexMDE, MDEbug), Posters, Doctoral Symposium, Educator Symposium, ACM Student Research Competition, and Tools and Demonstrations co-located with ACM/IEEE 20th International Conference on Model Driven Engineering Languages and Systems (MODELS 2017), Austin, TX, USA, September, 17, 2017. (2017) 430–434 2. OMG: Metaobject Facility. http://www.omg.org/mof Accessed: 2018-02-21. 3. Steinberg, D., Budinsky, F., Merks, E., Paternostro, M.: EMF: Eclipse Modeling Framework. Eclipse Series. Pearson Education (2008) 4. Lippe, E., van Oosterom, N.: Operation-based merging. In: SDE 5: 5th ACM SIGSOFT Symposium on Software Development Environments, Washington, DC, USA, December 9-11, 1992. (1992) 78–87 5. Ignat, C., Norrie, M.C.: Operation-based merging of hierarchical documents. In: The 17th Conference on Advanced Information Systems Engineering (CAiSE ’05), Porto, Portugal, 13-17 June, 2005, CAiSE Forum, Short Paper Proceedings. (2005) 6. Koegel, M., Herrmannsdoerfer, M., Li, Y., Helming, J., David, J.: Comparing state- and operation-based change tracking on models. In: Proceedings of the 14th IEEE International Enterprise Distributed Object Computing Conference, EDOC 2010, Vitória, Brazil, 25-29 October 2010. (2010) 163–172 7. Koegel, M., Helming, J.: Emfstore: a model repository for EMF models. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2, ICSE 2010, Cape Town, South Africa, 1-8 May 2010. (2010) 307–308 8. Robbes, R., Lanza, M.: A change-based approach to software evolution. Electr. Notes Theor. Comput. Sci. 166 (2007) 93–109 9. Mens, T.: A state-of-the-art survey on software merging. IEEE Trans. Software Eng. 28(5) (2002) 449–462 10. Yohannis, A., Rodriguez, H.H., Polack, F., Kolovos, D.: Towards efficient loading of change-based models. In: Modelling Foundations and Applications - 14th European Conference, ECMFA 2018, Held as Part of STAF 2018, Toulouse, France, June 25-29, 2018. Proceedings. (2018 (to be presented http://eventmall.info/ ecmfa2018/program)) Accessed: 2018-04-19. 11. Kolovos, D.S., Di Ruscio, D., Pierantonio, A., Paige, R.F.: Different models for model matching: An analysis of approaches to support model differencing. In: Proceedings of the 2009 ICSE Workshop on Comparison and Versioning of Software Models. CVSM ’09, Washington, DC, USA, IEEE Computer Society (2009) 1–6 12. Ogunyomi, B., Rose, L.M., Kolovos, D.S.: Property access traces for source incremental model-to-text transformation. In: Modelling Foundations and Applications - 11th European Conference, ECMFA 2015, Held as Part of STAF 2015, L’Aquila, Italy, July 20-24, 2015. Proceedings. (2015) 187–202 13. Daniel, G., Suny, G., Benelallam, A., Tisi, M., Vernageau, Y., Gmez, A., Cabot, J.: Neoemf: A multi-database model persistence framework for very large models. Science of Computer Programming 149 (2017) 9 – 14 Special Issue on MODELS’16. 14. OMG: About the XML Metadata Interchange Specification Version 2.5.1. http: //www.omg.org/spec/XMI Accessed: 2018-02-21. 15. Eclipse: Epsilon. https://www.eclipse.org/epsilon/ Accessed: 2018-02-12. http://git.eclipse.org/ 16. Eclipse: Epsilon Git. c/epsilon/org.eclipse.epsilon.git/commit/?id= ebd0991c279a1f0df1acb529367d2ace5254fe87 Accessed: 2018-02-19. 17. Eclipse: MDT/BPMN2. http://wiki.eclipse.org/MDT/BPMN2 Accessed: 2018-01-15. 18. Brunelière, H., Cabot, J., Dupé, G., Madiot, F.: Modisco: A model driven reverse engineering framework. Information & Software Technology 56(8) (2014) 1012– 1032 19. Eclipse: MDT/UML2. http://wiki.eclipse.org/MDT/UML2 Accessed: 201801-15. http://help.eclipse.org/neon/index. 20. Eclipse: XML Metamodel. jsp?topic=%2Forg.eclipse.modisco.xml.doc%2Fmediawiki%2Fxml_ metamodel%2Fuser.html Accessed: 2018-02-19. 21. Wikipedia: United States. https://en.wikipedia.org/w/index.php? title=United_States&oldid=45118452 Accessed: 2018-02-19. 22. McKnight, P.E., Najab, J. In: MannWhitney U Test. American Cancer Society (2010) 1–1 23. Eclipse: Teneo. http://wiki.eclipse.org/Teneo Accessed: 2017-10-15. 24. Espinazo-Pagán, J., Cuadrado, J.S., Molina, J.G.: Morsa: A scalable approach for persisting and accessing large models. In: Model Driven Engineering Languages and Systems, 14th International Conference, MODELS 2011, Wellington, New Zealand, October 16-21, 2011. Proceedings. (2011) 77–92 25. Eclipse: CDO The Model Repository. https://www.eclipse.org/cdo/ Accessed: 2017-10-15. 26. Barmpis, K., Kolovos, D.S.: Evaluation of contemporary graph databases for efficient persistence of large-scale models. Journal of Object Technology 13(3) (2014) 3: 1–26