Academia.eduAcademia.edu

Managing service level agreement contracts in OGSA-based Grids

2008, Future Generation Computer Systems

Future Generation Computer Systems 24 (2008) 245–258 www.elsevier.com/locate/fgcs Managing service level agreement contracts in OGSA-based Grids Antonios Litke ∗ , Kleopatra Konstanteli, Vassiliki Andronikou, Sotirios Chatzis, Theodora Varvarigou Department of Electrical and Computer Engineering, National Technical University of Athens, 9, Heroon Polytechniou Str, 15773 Athens, Greece Received 7 July 2006; received in revised form 11 June 2007; accepted 12 June 2007 Available online 30 June 2007 Abstract Grids and mobile Grids can form the basis and the enabling technology for pervasive and utility computing due to their ability to be open, highly heterogeneous and scalable. However, the process of selecting the appropriate resources and initiating the execution of a job is not enough to provide quality in a dynamic environment such as a mobile Grid, where changes are numerous, highly variable and with unpredictable effects. In this paper we present a scheme for advancing and managing Quality of Service (QoS) attributes contained in Service Level Agreement (SLA) contracts of Grids that follow the Open Grid Services Architecture (OGSA). In order to achieve this, the execution environment of the Grid infrastructure establishes and exploits the synergies between the various modules of the architecture that participate in the management of the execution and the enforcement of the SLA contractual terms. We introduce an Execution Management Service which is in collaboration with both the application services and the network services in order to provide an adjustable quality of the requested services. The components that manage and control the execution in the Grid environment interact with the suit of the SLA-related services exchanging information that is used to provide the quality framework of the execution with respect to the agreed contractual terms. The described scheme has been implemented in the framework of the Akogrimo IST project. c 2007 Elsevier B.V. All rights reserved. Keywords: Open Grid services architecture; Service level agreement; Execution management services 1. Introduction The Grid can be viewed as a distributed, high performance computing and data handling infrastructure that incorporates geographically and organizationally dispersed, heterogeneous resources (computing systems, storage systems, instruments and other real-time data sources, human collaborators, communication systems) and provides common interfaces for all these resources, using standard, open, general-purpose protocols and interfaces [1]. However, it is also the basis and the enabling technology for pervasive and utility computing due to the ability to be open, highly heterogeneous and scalable. The coordinated execution of computational intensive tasks and some cases of real-time distributed applications are some of the paradigms that are of particular interest to be potentially ∗ Corresponding author. Tel.: +30 210 7722558. E-mail addresses: [email protected] (A. Litke), [email protected] (K. Konstanteli), [email protected] (V. Andronikou), [email protected] (S. Chatzis), [email protected] (T. Varvarigou). 0167-739X/$ - see front matter c 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.future.2007.06.004 deployed on Grid and mobile Grid infrastructures. In particular, if we consider that Grids can be used as dynamic information systems to enable business models for utility and pervasive computing, we can identify the significance of the design and creation of novel schemes and architectures for the efficient management of such systems. Such applications are candidates to migrate to Grid and mobile Grid solutions. In order to do this, the Grids should be equipped with the relative mechanisms to achieve efficiency. The specific architectures that will be designed for this purpose should involve as basic building blocks those mechanisms that will guarantee Quality of Service (QoS) attributes which are mandatory for the commercial exploitation of these applications through the establishment of Service Level Agreement (SLA) contracts. The Grid, and especially the mobile Grid [2], is a dynamic system where environmental conditions are subject to unpredictable changes: system or network failures, system performance degradation, removal of machines, variations in the cost of resources, etc. This problem causes a direct impact on various functionalities that should be managed through the 246 A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258 involved execution management components in order to have continuous conformance to the contractual terms of SLAs. Selecting the appropriate resources and initiating the execution of the job requested by the client might be enough when we are dealing with a static environment where changes are highly unlikely to take place. In a dynamic environment such as, for instance, a mobile Grid, however, where changes are numerous, highly variable and with unpredictable effects, it is not enough. It is very critical that the Execution Management System (EMS) monitors and manages the execution of the job until its completion, so that it will be able to detect the unexpected failures as soon as they take place and take actions dynamically to rectify them in such a way as to meet the terms defined in the related SLA. For example, if there is unexpected failure during the execution of a job or if the number of violations produced exceeds a certain limit, the EMS must be immediately informed and then, depending on the type of failure, decide whether to restart the execution on the same location or at a different available one. The clients must be assured not only that the selected resources will live up to his expectations but also that the EMS will make provisions against erroneous conditions that may occur during the execution of a job. In this paper we present the architecture of an EMS, for advancing and managing QoS attributes of Open Grid Services Architecture (OGSA)-based [3,4] Grids. EMS was developed within the framework of the Akogrimo IST project (“Access to KnOwledge through the GRid in a MObile world”) [5]. In order to address the various tasks mentioned above, the EMS establishes and exploits the synergies between the various modules of the architecture that participate in the management of the execution and the enforcement of the SLA contractual terms. The approach that is followed is based on a layered structure of a Grid infrastructure which positions the Grid services middleware in collaboration with both the application services and the network services in order to provide an adjustable QoS to the requests of the clients. The EMS interacts with the suit of the SLA-related services exchanging information that is used by the services to provide the quality framework of the execution with respect to the agreed contractual terms. The rest of the paper is structured as follows: in Section 2 a related work survey is given on the examined topic with emphasis on similar approaches that are met in the literature and other projects. Sections 3 and 4 provide a discussion on the OGSA-based requirements in general, the applied execution environment and the need for SLA in such Grids. Section 5 gives an architectural overview of EMS and the overall layered approach that is followed and which is defined in the framework of the Akogrimo project, while Section 6 details the enforcement of the SLA contractual terms as well as the monitoring services. The study proceeds with a detailed description of the EMS implementation with all the involved phases of operation in Section 7, and a description of the overall EMS activity in Section 8. Finally, Section 9 provides some indicative results of the system operation while Section 10 concludes the paper with a discussion on the outcome and future work. 2. Related work There are several approaches in the design and development of execution management services that aim to handle SLAs in Grid environments. In the sequel we present briefly the most relevant ones. The G-QoSM framework [6] is an approach for SLAbased management in service oriented Grids. This approach is mainly aimed at service discovery, negotiation, reservation and monitoring-based on QoS. QoS properties are associated with the Grid services through their interface definition, by extending UDDI to incorporate them in WSDL-based documents. In contrast to the G-QoSM framework, SLA management in our work does not require any extension of the WSDL; it is Web Services Resource Framework (WSRF)compliant and uses GT4’s MDS instead of a static registry such as UDDI. Furthermore, service execution and management are not really supported within the G-QoSM framework. Instead, it relies on the existence of systems such as Globus to manage and control the execution of a service. On the contrary, our work builds heavily on high-level services that are included in its middleware to guarantee QoS enforcement, not only during the discovery and negotiation phases, but during the actual execution of the service. The Web Service Level Agreement (WSLA) framework [7] is an IBM project that offers SLA management for Web services in the phases of negotiation and monitoring. It introduces an XML-based language for SLA definition along with a runtime environment that enables third parties to perform the monitoring of the services in an effort to achieve greater objectivity. However, this framework does not support advance reservation and although it claims to be adjustable to any environment, it is not clear whether it can support WSRFcompliant Grid services. Furthermore, although the WSLA framework is able to monitor the execution of the services and detect SLA violation, it only reports the violations and is not capable of taking any corrective actions. Execution management in our work, is not limited to reporting the violations to the accounting system but is also able to reallocate resources on the fly, depending on the type and severity of the violation. The Globus Architecture for Reservation and Allocation (GARA) [8] addresses the problem of achieving end-toend QoS enforcement in Grid environments, by introducing an architecture for discovery, reservation and allocation of resources that meet application QoS requirements. The execution process is tightly connected to Globus Resource Allocation Manager (GRAM), which is able to handle the execution of executable files only; therefore the execution of Grid services is not supported. Although this architecture allows reservation and QoS monitoring of allocated resources, it does not support negotiation whereas the discovery process is not detailed. The Service Quality across Independently Managed Networks (SEQUIN) framework [9] introduces an approach for end-to-end QoS provisioning across multiple management domains and exploits a combination of link layer technologies. However, this work is mainly focused on the A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258 network layer and although provisions are made, there is no effective monitoring system. Aspired and designed to bring the Grid closer to the industrial reality concerning mainly security and business to business (B2B) services, the European project GRIA [10] designed and developed the GRIA middleware based on Web Services. The resource allocation service in the existing GRIA middleware supports the confirmation of a service offer through the establishment of service level agreements as well as the extension of existing SLAs. With a vision towards the industrial and business use of the Grid, the European project Grid-based Application Service Provision (GRASP) [11] focused on the study, design, development, implementation and validation of an advanced infrastructure. Aiming to incorporate business functionality into the GRASP framework, an SLA management subsystem has been developed which includes the service provision negotiation based on QoS criteria as well as the monitoring of the feasibility of the contract concerning each Grid service. The existing Unicore Grid middleware [12] is the result of the design and development efforts of two German projects Uniform Interface to Computing Resources (UNICORE) and UNICORE Plus. As a successor of the European Grid Interoperability Project (GRIP) — a project that worked on the realization of the interoperability of Globus and Unicore, UniGrids focuses on extending Unicore towards a Grid services infrastructure in compliance with the OGSA. As serviceoriented architectures and web services are being adopted and imported in Grid reality followed by the development of a Web Services Agreement specification by the Global Grid Forum (GGF), Unigrids aims to develop an SLA framework and cross-Grid brokering services in order to support Grid economics [13] as well as to integrate a Web Services Agreement-based resource management framework into the Unicore Grid middleware [14]. 3. Execution environment in OGSA-based Grids The OGSA recommended by the GGF (renamed recently to Open Grid Forum — OGF) [15] addresses the need for standardization by defining, within a Service Oriented Architecture (SOA), a set of core capabilities and behaviors that address key concerns in Grid systems. These concerns include authentication and authorization issues for access on the Grid resources, policy and negotiation issues, service discovery and lifecycle management, SLA, execution management, monitoring and management of services’ collections, etc. The Execution Management Services of OGSA (OGSAEMS) are concerned with the problems of instantiating and managing tasks. They address issues such as what are the locations at which a specific task can be executed (considering various restrictions such as CPU type and availability, license, etc.), where is it efficient to execute the task, and finally to prepare, manage and monitor the execution itself? Typically, an OGSA-EMS consists of the following modules: Job manager: The Job Manager (JM) encapsulates all aspects of executing a job, or a set of jobs, from start to finish 247 Fig. 1. The EMS components and their interactions. and it may schedule them to resources, collect agreements and reservations. Candidate set generator (CSG): The CSG is in charge of finding the computational resources where the job can be executed by applying a suitable match-making mechanism. In order to do this CSG takes into account all the static requirements but not those that are dynamic. Static resources attributes could be Operating System, Number of processors, Software available (libraries, binaries), bandwidth, etc. Dynamic resources attributes could be: free disk space, available CPU, file transfer rate, etc. . . . Execution planning services (EPS): this is a high level scheduler that creates mappings called “schedules” between jobs and resources — the resources that have been selected by the CSG. An EPS component will typically attempt to optimize some objective function such as execution time, cost, reliability, etc. (i.e. answering “where should the job execute?”), in order to improve system performance, provide QoS and meet the SLA terms. The EPS should start from the match-making results provided by the CSG, combine them with the dynamic attributes, and run an algorithm to find the best resources for execution. Advanced reservation services (ARS): the ARS is in charge of reserve resources for a specific period of time or permanently, depending on the type of reservation. Different types of reservation are possible: computational resource reservation, storage resource reservation, network resource reservation, service reservation. Fig. 1 shows the basic modules of an OGSA-based EMS system and their interactions. The EMS described in this paper, encapsulates a great percentage of the logic and the objectives of the aforementioned services. It is able to discover a candidate set of business services that meet the clients’ needs and restrictions, as expressed in their SLAs and, once the execution of the business service has started, it is able to manage and monitor it until completion, in order to achieve continuous conformance to the contractual terms of SLAs. Furthermore, two different types of advance reservation are supported: (1) service 248 A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258 Fig. 2. The SLA in the OGSA architecture. resource reservation and (2) network resource reservation. Its functionality is described in detail in the sections that follow. 4. OGSA components and SLA In this study the basic requirement is the execution management of services based on concrete SLA terms as they are described in the contracts that are established between the service providers and the users. During execution of the requested service the QoS requirements expressed in the SLA contract must be fulfilled through the management of the execution and the associated Grid resources (application service, computational resources, storage facilities, policies, etc). In order to achieve this, it is necessary to provide mechanisms for monitoring the execution of the service and the QoS parameters. Moreover, the estimation of the resources’ capabilities and their utilization in specific time windows are factors that influence decisions for any planning and adjusting of the resources’ usage. In cases where a renegotiation of the agreement is required (so that new service requirements are addressed) the existing agreement must be updated and the respective resource requirements must be recalculated. Fig. 2 presents the conceptual levels of an OGSA architecture in which the Execution Management services and the SLA Enforcement services are located, together with other services, in the foreground of the OGSA capabilities and at the level of OGSA specific functional services. Each group of services in this level is responsible for the management of a set of resources (both logical and physical) positioned at the Resource level through the Infrastructure services level. The latter forms the basis and framework for both manageability and management of resources in such an OGSA environment (for instance, using the specifications of the WSRF [16,17]). The circles in this figure represent the interfaces that each group of services exposes in order to communicate with the other groups of services and coordinate their tasks. 5. EMS architecture and design Instead of designing a complicated service responsible for all tasks involved in the execution management process, the EMS is a set of sub-services, with each of them addressing a specific task. Although from the client’s perspective it appears to be a single Grid service, in actuality the EMS is a composition of the five Grid services listed below: • Core service, which acts as a gateway service, offering a single-point of access to the EMS whilst “hiding” its details and complexity from the clients. It is therefore responsible for interpreting client’s requests, delegating them to the appropriate EMS sub-services, orchestrating the interactions between the latter and returning the results to the clients. • Advertisement service, which is used by the Service Providers (SPs) to advertise their services in the index service used by the EMS in the discovery process. The SPs run an EMS client application on their local machines, providing all information necessary to form the advertisement of the service. • Negotiation service, which is responsible for interpreting requests for the negotiation of the terms of the client’s SLA contracts. The Negotiation service contains a resource factory1 that creates Negotiation Resources (NRs). The NRs are persistent resources used to store information related to the negotiations. After a successful negotiation, the Endpoint Reference (EPR)2 to the NR that was created is returned to the Core service, which in turn passes it to the client. • Reservation service, which is in charge of performing advance reservation on the resources needed for the execution as well as the monitoring of it. Upon successful reservation, a persistent Reservation Resource (RR) is created and all information related to the reservation is stored inside it. The EPR to the RR is returned to the Core service and through it to the client. • Discovery service, which is responsible for finding available services, registered in the underlying Grid, that meet the QoS parameters defined in the SLA. It is designed so as to follow rules based on low-level performance parameters related to the execution of the service, such as CPU, memory, network 1 A mechanism encapsulated inside the Negotiation service able to create resource instances of the service. For more information refer to: 2 The EPR is a pair of a Unique Resource Identifier (URI) and a string key, which differentiates an instance of a resource from all other instances of the same resource. A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258 249 Fig. 3. EMS services and their interactions. bandwidth and disk capacity as well as the availability of the service and the price that the client is willing to pay. • Execution service, which is in charge of the coordination of the SLA Enforcement group of services and the actual execution of the reserved business resources as well as being responsible for taking corrective actions when needed. At the end of the execution of the business service, the result is returned to the Core service. The Core service filters the result and sends it to the client. The EMS was implemented using the Globus Toolkit version 4 (GT4) [18] and runs under the standalone container that the toolkit offers. GT4 is an open source Grid middleware that provides the necessary functionality required to build and deploy fully operational Grid services. It includes software for security, information infrastructure, resource management, data management, communication, fault detection and portability. It supports the core specifications that define the Web services architecture such as XML, SOAP and WSDL. It also supports and implements the WS-Security and other specifications relating to security, as well as the WSRF, WS-Addressing and WS-BaseNotification specifications used to define, name and interact with stateful resources. An overview of the EMS services and their interactions is depicted in Fig. 3. The GT4 includes a core set of high-level services and tools that address different tasks one can face when involved with Grid computing. Depending on the explicit nature of the EMS system, two of these high-level services were integrated into the EMS: 1. WS-GRAM [19], the core of the GT execution management, providing services to submit jobs and monitor their execution on a GT4-based Grid. “Jobs” within the WSGRAM framework are considered to be binary executable files. However, not all applications are good candidates for exposure as a Web service. The WS-GRAM concept and the Web service concept are two mutually exclusive approaches. WS-GRAM is meant to be used in situations where the job requested by the client is not available as a Web service but in the general form of a binary executable file. Through WS-GRAM, the EMS is able to address this need for management of applications in the form of executable files, while at the same time is able to manage the execution of application-specific Web services. From the client’s perspective, there is no difference in the EMS behaviour, whether it handles the execution of an executable file or a Web service. The use of WS-GRAM is transparent to the client of the EMS. 250 A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258 2. MDS4 Index Service [20], which provides the necessary functionality for GT service discovery, was integrated inside the Advertisement and Discovery services of the EMS. More specifically, through the Advertisement service, every resource “living” in machines inside the VO is registered to their local Index service. The Index service which is deployed in the same standalone container as the EMS, acts as the central VO index to which all the Index services scattered throughout the underlying Grid are registered. In this way, the EMS maintains a registry of available resources for the entire VO. On the other hand, the Discovery service queries this EMS’s Index service to find a list of candidates for the execution using the criteria defined in the client’s SLA contract. 6. SLA Enforcement and Monitoring services EMS was developed on the basis of the OGSA and WSRF specifications. It is therefore responsible for finding execution candidate locations, selecting the most suitable execution location, preparing, initiating and managing/monitoring the execution of a Business Service (BS).3 In order to address these tasks and at the same time ensure continuous conformation to the terms of the SLA contract, EMS works closely together and involves numerous interactions with the SLA-Enforcement and Monitoring group of services. The services involved in these two groups were developed within the framework of the Akogrimo project by other consortium partners. More information on their functionality and their implementation can be found in [21,22]. 6.1. Monitoring group of services The Monitoring group of services consists of the following three Grid services, developed on the GT4 platform: • Metering service, which maintains run-time information on the performance parameters related to the business service execution, defined in the SLA. In particular, it measures lowlevel performance parameters such as memory, CPU and disk usage. • QoS Broker service, which is a bandwidth broker that can provide three bundles of network services, each one of them corresponding to a specific usage profile: audio, video or data. It handles QoS network requests, keeps records of information related to the client, such as network QoS levels the client is allowed to use according to the SLA contract and is also responsible for monitoring network parameters such as network bandwidth throughout the execution of a BS. • Monitoring service, which is the link between the Monitoring group of services and the SLA Enforcement group of services. It receives notifications about the values of the QoS parameters related to the execution of the BSs from the Metering and QoS Broker services and notifies the SLA Enforcement group of services. 3 The term Business Service (BS) is used throughout this paper to refer to business applications that are available in the VO in the form of Grid services or executable files. 6.2. SLA Enforcement group of services The SLA Enforcement group is composed of a number of Grid services communicating with each other and exchanging information prior to and during the execution of a client request: • SLA-Access service, which provides access to all existing SLA contracts for the system. • SLA-Controller service, which is responsible for comparing measured QoS parameters against the thresholds defined in the SLA contract and communicates violations to the SLADecider service. • SLA-Decider service, which is responsible for the management of the QoS violations that may occur during the business service execution. The SLA-Decider service checks the correspondent policy in order to decide on the action that should be taken. Although EMS and the Monitoring group of services were developed on GT4, all the services in the SLA Enforcement group were developed using the WSRF.NET platform [23], another popular middleware used in the development of Grid services that implements the WSRF and WS-family of specifications. For this reason, the EMS-SLA Enforcement interaction was not only challenging in terms of design and efficiency but in terms of interoperability too, since it could be checked whether these two popular Grid development tools, implement the WSRF and WS-related specifications in a transparent and interoperable way. During the establishment of communication between them, several inconsistencies were detected and were handled in different ways and at different levels with some of them even requiring changing the form of the SOAP messages that were generated by the services. 7. EMS implementation The various tasks that the EMS covers are encapsulated into four distinct phases: (i) the Negotiation and Discovery phase, (ii) the Reservation phase, (iii) the Execution and Monitoring phase and (iv) the Advertisement phase. It should be stressed that each of the first three phases requires the preceding ones, i.e. the client must first negotiate an SLA contract, then proceed to reservation and finally request the execution of the reserved BS. On the other hand, SPs can advertise their services with QoS capabilities or update existing advertisements at any time, using the Advertisement service of EMS. All phases are presented in detail in the corresponding sections that follow, with the exception of the Advertisement phase that does not include any interactions with the SLA Enforcement and Monitoring services. An overview of EMSSLA Enforcement-Monitoring interactions is depicted in Fig. 4. 7.1. Negotiation and Discovery phase During this phase EMS is in charge of negotiating SLA contracts with its clients. This involves the discovery of the resources needed for the execution that meet the client’s A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258 251 Fig. 4. Overview of EMS-SLA Enforcement-Monitoring interaction. criteria. The phase begins with a client asking EMS to check for service and network availability by invoking the checkResourcesAvailability operation on the Core service, providing a service ID,4 a client ID5 as well as the desired values and thresholds for the QoS parameters. A typical example of the QoS parameters used is presented in Table 1. Example of the low-level QoS parameters. These low-level QoS parameters are the ones that can be currently monitored by the framework and each SP has to specify their values when advertising a service. The EMS Negotiation service makes use of the QoS parameters to locate the most suitable, available service running within the VO. More specifically, it delegates the task to the EMS Discovery service to see if there are any resources registered in the central VO-wide Index service that meet the specified QoS criteria. This is done by invoking the findSuitableResources operation exposed by the Discovery service and passing the QoS parameters. If the query returns more than one BSs, then the Discovery service finds the one that is closest to the client’s requirements. In case there are BSs with identical attributes, EMS selects the cheapest one by default. After selecting a service from the list, the Discovery service checks the network avail4 Each service registered in the EMS Index service bares an ID, specified by the SP during the advertisement process. 5 This a unique identifier used by the EMS system for authentication/authorization and accounting purposes. Table 1 Example of the low level QoS parameters <QoSParams> <QoSItems> <QoSItem> <param>cpuSpeed</param> <paramValue>3.5</paramValue> <paramType>MHz</paramType> <threshold>40%</threshold> </QoSItem> <QoSItem> <param>diskSpace</param> <paramValue>1</paramValue> <paramType>GB</paramType> <threshold>20%</threshold> </QoSItem> <QoSItem> <param>memory/param> <paramValue>1</paramValue> <paramType>GB</paramType> <threshold>50%</threshold> </QoSItem> <QoSItem> <param>networkBandwidth</param> <paramValue>GOLD</paramValue> <paramType></paramType> <threshold></paramValue> </QoSItem> </QoSItems> </QoSParams> 252 A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258 Fig. 5. Sequence diagram of the Discovery and Reservation phase. ability by invoking the QoS Broker service. In case the network availability is not verified, EMS moves on to the second service on the list and this sequence is repeated until the QoS Broker is able to confirm that a candidate service is available over a network that meets the client’s criteria. After a successful negotiation, EMS creates an NR. All information related to the negotiation request and the discovered resources is stored into the NR and its EPR is returned to the client. In case no match is found, EMS returns a null EPR to the client, along with a list of the available services and their affiliated QoS properties. Fig. 5 shows a high-level view of the interactions performed by the EMS during this phase. 7.2. Reservation phase The phase begins with a client requesting advance service and network reservation on the resources that were discovered during the negotiation phase by invoking the performAdvanceReservation operation of the Core service, providing the EPR to the NR, which was obtained at the end of the negotiation phase. The Core service delegates this task to the EMS’ Reservation service by invoking the reserveResources operation on the latter. At first step, the Reservation service locates the NR and retrieves all information stored inside it. EMS contacts the SLA-Access service and A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258 requests the SLA of the specific client. The SLA-Access service queries its database and returns the QoS parameters specified in the client’s SLA contract to the Reservation service. Afterwards, it creates an instance resource of the discovered BS and manages its life cycle by using the start time and end time defined in the SLA. In order for EMS to perform life cycle management, each Business Resource (BR) exposes two operations; one responsible for setting the start time and one for setting the end time of the life of the resource. If the EMS tries to invoke the BR earlier than the start time, it will receive a message from the BR informing it of the exact date and time the service will become available. If the client attempts to invoke the BR after the expiration of the validity period of the affiliated SLA, an error message stating that the requested BR no longer exists will be returned.6 In case of failure during the creation or the lifetime management of the BR, the Core service queries the Discovery service for candidate services in different locations and repeats the above sequence. Once the advance service reservation has been performed, the Reservation service proceeds with the creation of an SLAController, a Metering and a Monitoring resource. Afterwards, it contacts the QoS Broker service and requests the reservation of a network bundle using the bandwidth value that is defined in the SLA. At the last step of the Reservation phase, the Reservation service creates an RR and stores into it all information related to the reservation along with the EPRs to the resources that have been created on previous steps. If the reservation is successful, the EPR to the RR is returned to the client via the Core service. In case of failure, a null EPR is returned to the client. Fig. 6 shows a high-level view of the interactions performed by the EMS during this phase. This sequence diagram shows only the basic behavior of the EMS during the Reservation phase. Alternative sequences that may take place, for example rediscovery of candidate location in case the already discovered BS is not available, have been omitted for simplicity. 7.3. Execution and Monitoring phase The Execution and Monitoring phase begins with a client initiating the execution of a BR. The client invokes the performExecution method on the Core service, providing the EPR to the RR obtained at the end of the Reservation phase and specifying the input needed for the execution of the specific Business Service. The Core service invokes the executeoperation on the Execution service. The latter uses the EPR to the RR to locate it and retrieve all information related to the reservation. At the first step, the Execution service uses the EPR to the SLA-Controller resource that was created during the Reservation phase to locate and activate the specific SLA-Controller resource. Having activated the 6 In order to avoid problems related to lack of time synchronization between the machines that the services run in, the EMS does not pass on date-time objects but rather, absolute time periods (until the SLA becomes valid and until it stops being valid). The BR uses these two values to calculate the actual start and termination times. 253 SLA-Controller resource, the Execution service activates the Monitoring and Metering resources on the machine hosting the requested service. Once the set-up of the SLA-Enforcement and Monitoring group of services has been completed, the Execution service continues with the execution of the actual BR. During the execution, the Metering service calculates performance parameters and notifies the Monitoring service using a WS-Notification mechanism established between these two services. When the Monitoring service is activated by the Execution service, it subscribes to specific notification topics that the Metering Service exposes, representing the performance parameters that are calculated. In order to avoid SOAP message traffic, notifications are generated by the Metering service only if a significant change (e.g., | previous CPU usage value – current CPU usage value | >10%) in the performance parameters has taken place. The Monitoring service filters the notifications and sends them to the SLA-Controller resource. The SLA-Controller resource compares the values of the performance parameters that have just been received with the thresholds specified by the client in the SLA contract. If a violation is detected, the SLA-Controller service notifies the SLA-Decider. The SLA-Decider “decides” on the appropriate action that should be taken, depending on the type of the violation and its affiliated parameters. If case of severe SLA violations, the SLA-Decider notifies the Execution service that the execution should be reallocated. For this purpose a WS-Notification is established between the Execution service and the SLA-Decider. Once this type of notification is received, the Execution service destroys the BR that produced this violation as well as the Monitoring and Metering resources that are connected to it. Following that, the Execution service contacts the Discovery service and tries to find new available services in the Grid that satisfy the client’s SLA. In case a match is found, the Execution service updates the RR and repeats the steps described above. After the successful completion of the execution of the BR, the Execution service stops the execution of all related services, destroys all affiliated resources and returns the result of the service execution to the client. In Fig. 7 a high-level view of the interactions performed by the EMS during the Execution and Monitoring phase is depicted. 8. EMS in Akogrimo testbed The business services that can be supported by the EMS might come from various scientific and business domains, including among others bio-medics, construction, engineering, finance, etc. Within the Akogrimo project framework, the EMS has been used to support a suite of e-health services that comprise an e-Health application, offering to its clients a Heart Monitoring and Emergency Response Process. The Akogrimo demonstrator scenario consists of a university hospital, regional hospitals, medical specialists, general practitioners, emergency medical services and an emergency dispatch center establishing a regional health network. The health network is headed by the 254 A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258 Fig. 6. Sequence diagram of the Reservation phase. university hospital and provides telemedicine services to the partners and the patients attended by partners of the network. As depicted in Fig. 8, the EMS along with the SLA Enforcement, Monitoring and e-Health services are positioned in the Akogrimo Grid Infrastructure Services Layer which is responsible for providing the core Grid functionalities. The EMS acts as the central point of access to the Grid layer functionalities, responsible for the coordination of all services that reside in the Akogrimo Grid. The above figure shows a simplified high-level view of the Akogrimo architecture [24]. Although a detailed description would be out of the scope of this paper, for consistency reasons it should be mentioned that in the Akogrimo environment, each business service requested by a client is modeled as a business process instance. The business process represents a set of one or more linked procedures or activities that collectively realize a business objective goal, normally within the context of an organizational structure defining functional roles and A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258 255 Fig. 7. Sequence diagram for the Execution and Monitoring phase. relationships. Behind the business process there are one or more workflows representing the automation of the business process. Each workflow coordinates and manages component services or entities involved in the automation of the business process. 9. Experimental results In this section, experimental results on the performance of the EMS are presented. Two sets of experiments were conducted. In the first one, we focused on the overall performance of EMS. Time measurements were performed for each of the three distinct phases of EMS, with and without the activation of the SLA Enforcement/Monitoring services. In the second experiment, the focus is shifted towards the EMS reallocation technique. The time measurements that were performed in this phase were strictly related to the reallocation mechanism. Four PCs with identical technical characteristics (Pentium V 1.7 GHz, 1 GB of memory, 10 Mb/s Ethernet segment) 256 A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258 Table 3 Results of the second experiments set (measured in secs) Fig. 8. EMS in the Akogrimo Grid Infrastructure. Table 2 Results of the first experiments set (measured in secs) SLA Enforcement/Monitoring Negotiation Reservation Execution Deactivated Activated 0.16 0.25 0.38 0.75 0.22 1.03 were used during the experiments. Three out of the four PCs were configured to behave as a GT4 cluster whereas the remaining one served as a host for the SLA Enforcement group services. All EMS services were deployed on the same PC (part of the GT4 cluster) and ran under the same GT4 container. The tests were performed against a WSRF service that was developed on the GT4 platform. This was the single service that was offered to the clients of the EMS and performed trivial mathematical operations. This service along with the Metering service was deployed into two of the three GT4 containers in the cluster with the exception of the one hosting the EMS. On the first set of experiments, we measured the time spent to perform negotiation, reservation and execution with SLA Enforcement/Monitoring deactivated and activated respectively. Since the key aim was to evaluate the overhead that the SLA Enforcement group added to the EMS’ overall performance, the thresholds in the SLA contract that was used were given extremely low values to assure that no violation requiring reallocation would occur. Also, the EMS was handling one execution at the time. A set of one hundred tests were performed for each of the three phases.7 Table 2 shows the average results of the measurements in seconds. As we can see, the SLA Enforcement adds a significant overhead to the EMS performance, especially during the execution phase. After further investigation, we came to the conclusion that a significant percentage of the added overheads are due to the relatively slow performance of the SLAEnforcement group of services whereas the Monitoring group of services has very quick responses, posing only trivial 7 The containers were restarted after each test to ensure greater reliability. # services tdeact tdisc tact tinit ttotal 1 10 100 0.32 0.75 1.98 0.03 0.09 0.12 0.45 2.05 2.96 0.10 0.21 0.26 0.90 3.10 5.32 overheads in the performance of the overall system. As the workload on EMS and consequently on the SLA Enforcement services increases, the overall performance of the system might be considered too slow. On the second set of experiments, the time measurements were strictly related to the generation of a reallocation notification and the reallocation mechanism itself. In contrast to the ones used for the previous tests, the thresholds defined in the clients SLA contract were given relatively high values to ensure that reallocation will be required. Each time a request for reallocation was generated, the EMS reallocated execution between the two PCs in the cluster that hosted the business service. Three sets of tests were performed, each time specifying a different number of executions handled in parallel. The number of services registered in the EMS index service remained the same throughout the tests, in order to achieve a greater degree of independency from the inherent functionality of GT4’s Index service which is used by EMS during the discovery process. Table 3 shows the results of the measurements in seconds. where: tdeact = elapsed time between receiving a reallocation request and deactivating the SLA Enforcement and Monitoring group, tdisc = time spent for the process of discovering a new location, tactiv = time spent for the process of activating the SLA Enforcement and Monitoring group on the new location, tinit = elapsed time between the creation of a duplicate business resource on the new location and its initiation and consequently: ttotal = tdeact + tdisc + tactiv + tinit = total time the reallocation mechanism is active By comparing the results shown in Table 3, some very valuable conclusions can be extracted. On the one hand, the actions that are handled internally by EMS and don’t require any interaction with external-to-EMS services are scaling relatively well when the workload increases. On the other hand, actions such as the activation and deactivation of the SLA Enforcement group that involve a number of interactions with services external to EMS impose an increased overhead on the overall EMS performance. Apart from the obvious solution of improving the performance of the SLA Enforcement services and since the cost of discovery is but a fraction of the one related to the activation/deactivation of the SLA Enforcement services, one other way to improve the overall performance of the system is to introduce new instances of the SLA Enforcement services within the Grid and perform a discovery process for the SLA Enforcement services as well. This will allow for A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258 better work load balancing within the Grid in terms of SLA management and will eventually result in improved overall performance. 10. Conclusions and discussion In this paper we have presented the architectural design and the implementation of an OGSA compliant EMS that enables the management and enforcement of SLA contracts in the execution environment of OGSA-based Grids. In order to achieve this, core components of the execution environment of the Grid infrastructure establish and exploit synergies between the various SLA related modules of the architecture. The approach that is followed is based on a layered structure of a Grid infrastructure which positions the Grid services middleware in collaboration with both the application services and the network services in order to provide an adjustable QoS to the requests of the clients. The paper provides a detailed description of the participating components and the extensive set of exchanged messages and interfaces between them. The interaction between the Execution Management Services and the SLA enforcement mechanism is presented also through three discrete phases: the discovery and negotiation phase, reservation phase for the needed resources that will be deployed, and the execution and monitoring phase where a set of services is being activated in order to support and control the execution in the Grid environment as specified. The main contribution of this work lies in the design and prototype implementation of the respective services in a manner that enables the adaptive management of the Grid execution based on contractual SLA terms. A WSRF-compliant EMS has been implemented with the capability to enforce an SLA compliant execution of services while at the same time it monitors any violations that occur and takes corrective actions. It discusses the need for dynamically manageable SLA in Grids and formulates the Grid middleware participation in conjunction with other services (application, business flow, other network middlewares) in a seamless and modular approach. The current work, through its implementation, provides the feasibility framework for structuring Grid services for commercial exploitation through service providers and telecom operators. Finally, it enables the discussion for a variety of issues for the future research in the domain of Grid services and SLA. An interesting aspect is to deploy other commercial applications with complex and extensive offered services and business models, something which would complicate the set of negotiated terms and the way of monitoring their violation. While having failures in meeting the clients’ requirements in such a dynamic environment, it is interesting to show also how these failures affect the efficiency of the Grid infrastructure in terms of having deviation from deadlines. We can assume that the failures to execute successfully a user’s job results in a cost for the system. This cost may be due to the consumption of resources or the price (penalty) that has to be paid back to the user due to deadline and SLA violation. It is an interesting topic to examine how in such a dynamic environment a viable 257 economic model could be developed for estimating this cost. The idea of preserving the QoS attributes of the system adds an important aspect to various Business Models, because of the dynamic SLA mechanisms that can make feasible dynamic workflow techniques. Acknowledgments The work has been partially supported by the Akogrimo Integrated Project (FP6-2003-IST-004293). The authors would like to thank the members of the consortium for their collaboration, and especially Jesus Movilla, Francesco D’Andria, Nuno Inacio and Nadia Romano. References [1] I. Foster, What is the Grid? A three point checklist, GRID Today (2002). [2] A. Litke, D. Skoutas, T. Varvarigou, Mobile Grid computing: Changes and challenges of resource management in a mobile Grid environment, in: Access to Knowledge through the Grid in a Mobile World” Workshop, held in conjunction with 5th Int. Conf. on Practical Aspects of Knowledge Management, PAKM 2004, Vienna. Available at: www.mobilegrids.org/ docs/pakm2004/papers/3 Mobile-Grid-Computing.pdf. [3] Open Grid Services Architecture. http://www.globus.org/ogsa/. [4] OGSA Working Group. http://forge.gridforum.org/projects/ogsa-wg. [5] Access to knowledge through the Grid in a mobile world (AKOGRIMO) integrated project FP6-2003-IST-004293. http://www.akogrimo.org/. [6] R. Al-Ali, O. Rana, D. Walker, G-QoSM: Grid service discovery using QoS properties, Grid Computing, Computing and Informatics Journal 21 (2002) (special issue). [7] A. Keller, H. Ludwig, The WSLA framework: Specifying and monitoring service level agreements for web services, E-Business Management, Journal of Network and Systems Management 11 (2003) (special issue). [8] I. Foster, C. Kesselman, C. Lee, R. Lindell, K. Nahrstedt, A. Roy, A distributed resource management architecture that supports advance reservations and co-allocation, in: International Workshop on Quality of Service, 1999. [9] C. Bourasa, M. Campanella, et al., QoS and SLA aspects across multiple management domains: The SEQUIN approach, Future Generation Computer Systems 19 (2003). [10] M. Surridge, St. Taylor, D. De Roure, Ed Zaluska, Experiences with GRIA — Industrial applications on a Web Services Grid, in: Proceedings of the First International Conference on e-Science and Grid Computing, 2005, pp. 98–105. [11] GRASP. http://eu-grasp.net/english/default.htm. [12] UNICORE. www.unigrids.org. [13] A. Streit, D. Erwin, Th. Lippert, D. Mallmann, R. Menday, M. Rambadt, M. Riedel, M. Romberg, B. Schuller, Ph. Wieder, UNICORE — From project results to production Grids, in: L. Gandinetti (Ed.), Grid Computing: The New Frontiers of High Performance Processing, in: Advances in Parallel Computing, vol. 14, Elsevier, 2005. [14] M. Riedel, V. Sander, P. Wieder, J. Shan, Web services agreement-based resource negotiations in UNICORE. http://whitepapers.silicon.com/0, 39024759,60142543p,00.htm, 2005. [15] Open Grid Forum. http://www.ogf.org. [16] Web Services Resource Framework. http://www-106.ibm.com/ developerworks/library/ws-resource/. [17] OASIS Web Services Resource Framework (WSRF). http://www. oasis-open.org. [18] Globus Toolkit version 4. http://www.globus.org/toolkit/. [19] GT 4.0 WS-GRAM. http://www-unix.globus.org/toolkit/docs/4.0/ execution/wsgram/. [20] Information Services (MDS): Key concepts. http://www.globus.org/ toolkit/docs/4.0/info/key-index.html. 258 A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258 [21] Akogrimo Deliverable D4.3.1, Architecture of the infrastructure services layer V1. Public available at: http://www.akogrimo.org/download/ Deliverables/Akogrimo D431 Version1 Final.pdf. [22] Akogrimo Deliverable D4.1.1, Mobile network architecture design & implementation. Public available at: http://www.akogrimo.org/download/ Deliverables/D4.1.1.pdf. [23] WSRF.net. http://www.cs.virginia.edu/∼gsw2c/wsrf.net.html. [24] Akogrimo Deliverable D3.1.3, Overall architecture definition and layer integration. https://bscw.hlrs.de/bscw/bscw.cgi/0/109536. Dr. Antonios Litke received his diploma from the Dept. of Computer Engineering and Informatics of the University of Patras, Greece in 1999, and his Ph.D. from Electrical and Computer Engineering Department of National Technical University of Athens in 2006. Currently, he is working in the Telecommunication Laboratory of Electrical and Computer Engineering Department of National Technical University of Athens as researcher participating in numerous EU and National funded projects. His research interests include Grid computing, resource management in heterogeneous systems, Web services and information engineering. Kleopatra Konstanteli was born in Athens, Greece in 1981. She received her M.Sc. in Electrical and Computer Engineering in 2004 from the National Technical University of Athens. She is currently pursuing her doctoral-level research in computer science while at the same time working as research associate in the Telecommunications Laboratory of Electrical and Computer Engineering of NTUA and participating in EU-funded projects. Her research interests are mainly in the field of Grid computing. Vassiliki Andronikou obtained her M.Sc. in Electrical and Computer Engineering in 2004 from the National Technical University of Athens. She is a Ph.D. candidate in the same department and is currently working as Research Associate in the Telecommunications Laboratory of Electrical and Computer Engineering of NTUA participating in EU funded projects, such as AKOGRIMO, BEinGRID, POLYMNIA and FIDIS. Her research interests are mainly in the field of Grid computing, biometrics and identity-based profiling. Sotirios Chatzis was born in Athens at 1982. He received his M.Sc. (5- year diploma) from the Electrical and Computer Engineering Department of the National Technical University of Athens, at 2005, majoring in Computer Science. Currently, he pursues his Ph.D. in the Electrical and Computer Engineering Department of the National Technical University of Athens. His major research interests are Artificial Intelligence, Machine Learning, Pattern Analysis and Recognition and their applications in Computer Vision tasks. Theodora Varvarigou received the B.Tech degree from the National Technical University of Athens, Athens, Greece in 1988, the M.S. degrees in Electrical Engineering (1989) and in Computer Science (1991) from Stanford University, Stanford, California in 1989 and the Ph.D. degree from Stanford University as well in 1991. She worked at AT&T Bell Labs, Holmdel, New Jersey between 1991 and 1995. Between 1995 and 1997 she worked as an Assistant Professor at the Technical University of Crete, Chania, Greece. Since 1997 she has been working as an Associate Professor at the National Technical University of Athens. Her research interests include Grid Technologies parallel algorithms and architectures, fault-tolerant computation, optimisation algorithms and content management.