Future Generation Computer Systems 24 (2008) 245–258
www.elsevier.com/locate/fgcs
Managing service level agreement contracts in OGSA-based Grids
Antonios Litke ∗ , Kleopatra Konstanteli, Vassiliki Andronikou,
Sotirios Chatzis, Theodora Varvarigou
Department of Electrical and Computer Engineering, National Technical University of Athens, 9, Heroon Polytechniou Str, 15773 Athens, Greece
Received 7 July 2006; received in revised form 11 June 2007; accepted 12 June 2007
Available online 30 June 2007
Abstract
Grids and mobile Grids can form the basis and the enabling technology for pervasive and utility computing due to their ability to be open,
highly heterogeneous and scalable. However, the process of selecting the appropriate resources and initiating the execution of a job is not enough
to provide quality in a dynamic environment such as a mobile Grid, where changes are numerous, highly variable and with unpredictable effects.
In this paper we present a scheme for advancing and managing Quality of Service (QoS) attributes contained in Service Level Agreement (SLA)
contracts of Grids that follow the Open Grid Services Architecture (OGSA). In order to achieve this, the execution environment of the Grid
infrastructure establishes and exploits the synergies between the various modules of the architecture that participate in the management of the
execution and the enforcement of the SLA contractual terms. We introduce an Execution Management Service which is in collaboration with
both the application services and the network services in order to provide an adjustable quality of the requested services. The components that
manage and control the execution in the Grid environment interact with the suit of the SLA-related services exchanging information that is used
to provide the quality framework of the execution with respect to the agreed contractual terms. The described scheme has been implemented in
the framework of the Akogrimo IST project.
c 2007 Elsevier B.V. All rights reserved.
Keywords: Open Grid services architecture; Service level agreement; Execution management services
1. Introduction
The Grid can be viewed as a distributed, high performance
computing and data handling infrastructure that incorporates
geographically and organizationally dispersed, heterogeneous
resources (computing systems, storage systems, instruments
and other real-time data sources, human collaborators,
communication systems) and provides common interfaces for
all these resources, using standard, open, general-purpose
protocols and interfaces [1]. However, it is also the basis and the
enabling technology for pervasive and utility computing due to
the ability to be open, highly heterogeneous and scalable.
The coordinated execution of computational intensive tasks
and some cases of real-time distributed applications are some
of the paradigms that are of particular interest to be potentially
∗ Corresponding author. Tel.: +30 210 7722558.
E-mail addresses:
[email protected] (A. Litke),
[email protected] (K. Konstanteli),
[email protected]
(V. Andronikou),
[email protected] (S. Chatzis),
[email protected]
(T. Varvarigou).
0167-739X/$ - see front matter c 2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.future.2007.06.004
deployed on Grid and mobile Grid infrastructures. In particular,
if we consider that Grids can be used as dynamic information
systems to enable business models for utility and pervasive
computing, we can identify the significance of the design and
creation of novel schemes and architectures for the efficient
management of such systems. Such applications are candidates
to migrate to Grid and mobile Grid solutions. In order to do
this, the Grids should be equipped with the relative mechanisms
to achieve efficiency. The specific architectures that will be
designed for this purpose should involve as basic building
blocks those mechanisms that will guarantee Quality of Service
(QoS) attributes which are mandatory for the commercial
exploitation of these applications through the establishment of
Service Level Agreement (SLA) contracts.
The Grid, and especially the mobile Grid [2], is a
dynamic system where environmental conditions are subject
to unpredictable changes: system or network failures, system
performance degradation, removal of machines, variations in
the cost of resources, etc. This problem causes a direct impact
on various functionalities that should be managed through the
246
A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258
involved execution management components in order to have
continuous conformance to the contractual terms of SLAs.
Selecting the appropriate resources and initiating the execution of the job requested by the client might be enough when we
are dealing with a static environment where changes are highly
unlikely to take place. In a dynamic environment such as, for instance, a mobile Grid, however, where changes are numerous,
highly variable and with unpredictable effects, it is not enough.
It is very critical that the Execution Management System (EMS)
monitors and manages the execution of the job until its completion, so that it will be able to detect the unexpected failures as
soon as they take place and take actions dynamically to rectify
them in such a way as to meet the terms defined in the related
SLA. For example, if there is unexpected failure during the execution of a job or if the number of violations produced exceeds a
certain limit, the EMS must be immediately informed and then,
depending on the type of failure, decide whether to restart the
execution on the same location or at a different available one.
The clients must be assured not only that the selected resources
will live up to his expectations but also that the EMS will make
provisions against erroneous conditions that may occur during
the execution of a job.
In this paper we present the architecture of an EMS,
for advancing and managing QoS attributes of Open Grid
Services Architecture (OGSA)-based [3,4] Grids. EMS was
developed within the framework of the Akogrimo IST project
(“Access to KnOwledge through the GRid in a MObile
world”) [5]. In order to address the various tasks mentioned
above, the EMS establishes and exploits the synergies between
the various modules of the architecture that participate in
the management of the execution and the enforcement of
the SLA contractual terms. The approach that is followed is
based on a layered structure of a Grid infrastructure which
positions the Grid services middleware in collaboration with
both the application services and the network services in order
to provide an adjustable QoS to the requests of the clients.
The EMS interacts with the suit of the SLA-related services
exchanging information that is used by the services to provide
the quality framework of the execution with respect to the
agreed contractual terms.
The rest of the paper is structured as follows: in Section 2
a related work survey is given on the examined topic with
emphasis on similar approaches that are met in the literature
and other projects. Sections 3 and 4 provide a discussion
on the OGSA-based requirements in general, the applied
execution environment and the need for SLA in such Grids.
Section 5 gives an architectural overview of EMS and the
overall layered approach that is followed and which is defined
in the framework of the Akogrimo project, while Section 6
details the enforcement of the SLA contractual terms as well
as the monitoring services. The study proceeds with a detailed
description of the EMS implementation with all the involved
phases of operation in Section 7, and a description of the overall
EMS activity in Section 8. Finally, Section 9 provides some
indicative results of the system operation while Section 10
concludes the paper with a discussion on the outcome and
future work.
2. Related work
There are several approaches in the design and development
of execution management services that aim to handle SLAs in
Grid environments. In the sequel we present briefly the most
relevant ones.
The G-QoSM framework [6] is an approach for SLAbased management in service oriented Grids. This approach
is mainly aimed at service discovery, negotiation, reservation
and monitoring-based on QoS. QoS properties are associated
with the Grid services through their interface definition,
by extending UDDI to incorporate them in WSDL-based
documents. In contrast to the G-QoSM framework, SLA
management in our work does not require any extension of
the WSDL; it is Web Services Resource Framework (WSRF)compliant and uses GT4’s MDS instead of a static registry such
as UDDI. Furthermore, service execution and management are
not really supported within the G-QoSM framework. Instead, it
relies on the existence of systems such as Globus to manage
and control the execution of a service. On the contrary, our
work builds heavily on high-level services that are included in
its middleware to guarantee QoS enforcement, not only during
the discovery and negotiation phases, but during the actual
execution of the service.
The Web Service Level Agreement (WSLA) framework [7]
is an IBM project that offers SLA management for Web services
in the phases of negotiation and monitoring. It introduces
an XML-based language for SLA definition along with a
runtime environment that enables third parties to perform the
monitoring of the services in an effort to achieve greater
objectivity. However, this framework does not support advance
reservation and although it claims to be adjustable to any
environment, it is not clear whether it can support WSRFcompliant Grid services. Furthermore, although the WSLA
framework is able to monitor the execution of the services
and detect SLA violation, it only reports the violations and
is not capable of taking any corrective actions. Execution
management in our work, is not limited to reporting the
violations to the accounting system but is also able to reallocate
resources on the fly, depending on the type and severity of the
violation.
The Globus Architecture for Reservation and Allocation
(GARA) [8] addresses the problem of achieving end-toend QoS enforcement in Grid environments, by introducing
an architecture for discovery, reservation and allocation
of resources that meet application QoS requirements. The
execution process is tightly connected to Globus Resource
Allocation Manager (GRAM), which is able to handle the
execution of executable files only; therefore the execution
of Grid services is not supported. Although this architecture
allows reservation and QoS monitoring of allocated resources,
it does not support negotiation whereas the discovery process
is not detailed. The Service Quality across Independently
Managed Networks (SEQUIN) framework [9] introduces an
approach for end-to-end QoS provisioning across multiple
management domains and exploits a combination of link layer
technologies. However, this work is mainly focused on the
A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258
network layer and although provisions are made, there is no
effective monitoring system.
Aspired and designed to bring the Grid closer to the
industrial reality concerning mainly security and business to
business (B2B) services, the European project GRIA [10]
designed and developed the GRIA middleware based on Web
Services. The resource allocation service in the existing GRIA
middleware supports the confirmation of a service offer through
the establishment of service level agreements as well as the
extension of existing SLAs. With a vision towards the industrial
and business use of the Grid, the European project Grid-based
Application Service Provision (GRASP) [11] focused on the
study, design, development, implementation and validation of
an advanced infrastructure. Aiming to incorporate business
functionality into the GRASP framework, an SLA management
subsystem has been developed which includes the service
provision negotiation based on QoS criteria as well as the
monitoring of the feasibility of the contract concerning each
Grid service.
The existing Unicore Grid middleware [12] is the result of
the design and development efforts of two German projects
Uniform Interface to Computing Resources (UNICORE) and
UNICORE Plus. As a successor of the European Grid
Interoperability Project (GRIP) — a project that worked on
the realization of the interoperability of Globus and Unicore,
UniGrids focuses on extending Unicore towards a Grid services
infrastructure in compliance with the OGSA. As serviceoriented architectures and web services are being adopted and
imported in Grid reality followed by the development of a
Web Services Agreement specification by the Global Grid
Forum (GGF), Unigrids aims to develop an SLA framework
and cross-Grid brokering services in order to support Grid
economics [13] as well as to integrate a Web Services
Agreement-based resource management framework into the
Unicore Grid middleware [14].
3. Execution environment in OGSA-based Grids
The OGSA recommended by the GGF (renamed recently
to Open Grid Forum — OGF) [15] addresses the need
for standardization by defining, within a Service Oriented
Architecture (SOA), a set of core capabilities and behaviors
that address key concerns in Grid systems. These concerns
include authentication and authorization issues for access on the
Grid resources, policy and negotiation issues, service discovery
and lifecycle management, SLA, execution management,
monitoring and management of services’ collections, etc.
The Execution Management Services of OGSA (OGSAEMS) are concerned with the problems of instantiating and
managing tasks. They address issues such as what are the
locations at which a specific task can be executed (considering
various restrictions such as CPU type and availability, license,
etc.), where is it efficient to execute the task, and finally to
prepare, manage and monitor the execution itself? Typically,
an OGSA-EMS consists of the following modules:
Job manager: The Job Manager (JM) encapsulates all
aspects of executing a job, or a set of jobs, from start to finish
247
Fig. 1. The EMS components and their interactions.
and it may schedule them to resources, collect agreements and
reservations.
Candidate set generator (CSG): The CSG is in charge of
finding the computational resources where the job can be
executed by applying a suitable match-making mechanism.
In order to do this CSG takes into account all the
static requirements but not those that are dynamic. Static
resources attributes could be Operating System, Number of
processors, Software available (libraries, binaries), bandwidth,
etc. Dynamic resources attributes could be: free disk space,
available CPU, file transfer rate, etc. . . .
Execution planning services (EPS): this is a high level
scheduler that creates mappings called “schedules” between
jobs and resources — the resources that have been selected by
the CSG. An EPS component will typically attempt to optimize
some objective function such as execution time, cost, reliability,
etc. (i.e. answering “where should the job execute?”), in order
to improve system performance, provide QoS and meet the
SLA terms. The EPS should start from the match-making
results provided by the CSG, combine them with the dynamic
attributes, and run an algorithm to find the best resources for
execution.
Advanced reservation services (ARS): the ARS is in
charge of reserve resources for a specific period of time or
permanently, depending on the type of reservation. Different
types of reservation are possible: computational resource
reservation, storage resource reservation, network resource
reservation, service reservation.
Fig. 1 shows the basic modules of an OGSA-based EMS
system and their interactions.
The EMS described in this paper, encapsulates a great
percentage of the logic and the objectives of the aforementioned
services. It is able to discover a candidate set of business
services that meet the clients’ needs and restrictions, as
expressed in their SLAs and, once the execution of the business
service has started, it is able to manage and monitor it until
completion, in order to achieve continuous conformance to
the contractual terms of SLAs. Furthermore, two different
types of advance reservation are supported: (1) service
248
A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258
Fig. 2. The SLA in the OGSA architecture.
resource reservation and (2) network resource reservation. Its
functionality is described in detail in the sections that follow.
4. OGSA components and SLA
In this study the basic requirement is the execution
management of services based on concrete SLA terms as they
are described in the contracts that are established between
the service providers and the users. During execution of
the requested service the QoS requirements expressed in the
SLA contract must be fulfilled through the management of
the execution and the associated Grid resources (application
service, computational resources, storage facilities, policies,
etc). In order to achieve this, it is necessary to provide
mechanisms for monitoring the execution of the service and
the QoS parameters. Moreover, the estimation of the resources’
capabilities and their utilization in specific time windows are
factors that influence decisions for any planning and adjusting
of the resources’ usage. In cases where a renegotiation of the
agreement is required (so that new service requirements are
addressed) the existing agreement must be updated and the
respective resource requirements must be recalculated.
Fig. 2 presents the conceptual levels of an OGSA
architecture in which the Execution Management services and
the SLA Enforcement services are located, together with other
services, in the foreground of the OGSA capabilities and at
the level of OGSA specific functional services. Each group of
services in this level is responsible for the management of a
set of resources (both logical and physical) positioned at the
Resource level through the Infrastructure services level. The
latter forms the basis and framework for both manageability and
management of resources in such an OGSA environment (for
instance, using the specifications of the WSRF [16,17]). The
circles in this figure represent the interfaces that each group of
services exposes in order to communicate with the other groups
of services and coordinate their tasks.
5. EMS architecture and design
Instead of designing a complicated service responsible for
all tasks involved in the execution management process, the
EMS is a set of sub-services, with each of them addressing
a specific task. Although from the client’s perspective it
appears to be a single Grid service, in actuality the EMS is a
composition of the five Grid services listed below:
• Core service, which acts as a gateway service, offering a
single-point of access to the EMS whilst “hiding” its details
and complexity from the clients. It is therefore responsible
for interpreting client’s requests, delegating them to the
appropriate EMS sub-services, orchestrating the interactions
between the latter and returning the results to the clients.
• Advertisement service, which is used by the Service
Providers (SPs) to advertise their services in the index
service used by the EMS in the discovery process.
The SPs run an EMS client application on their local
machines, providing all information necessary to form the
advertisement of the service.
• Negotiation service, which is responsible for interpreting
requests for the negotiation of the terms of the client’s
SLA contracts. The Negotiation service contains a resource
factory1 that creates Negotiation Resources (NRs). The NRs
are persistent resources used to store information related to
the negotiations. After a successful negotiation, the Endpoint
Reference (EPR)2 to the NR that was created is returned to
the Core service, which in turn passes it to the client.
• Reservation service, which is in charge of performing
advance reservation on the resources needed for the
execution as well as the monitoring of it. Upon successful
reservation, a persistent Reservation Resource (RR) is
created and all information related to the reservation is
stored inside it. The EPR to the RR is returned to the Core
service and through it to the client.
• Discovery service, which is responsible for finding available
services, registered in the underlying Grid, that meet the QoS
parameters defined in the SLA. It is designed so as to follow
rules based on low-level performance parameters related to
the execution of the service, such as CPU, memory, network
1 A mechanism encapsulated inside the Negotiation service able to create
resource instances of the service. For more information refer to:
2 The EPR is a pair of a Unique Resource Identifier (URI) and a string key,
which differentiates an instance of a resource from all other instances of the
same resource.
A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258
249
Fig. 3. EMS services and their interactions.
bandwidth and disk capacity as well as the availability of the
service and the price that the client is willing to pay.
• Execution service, which is in charge of the coordination
of the SLA Enforcement group of services and the actual
execution of the reserved business resources as well as being
responsible for taking corrective actions when needed. At
the end of the execution of the business service, the result
is returned to the Core service. The Core service filters the
result and sends it to the client.
The EMS was implemented using the Globus Toolkit version
4 (GT4) [18] and runs under the standalone container that
the toolkit offers. GT4 is an open source Grid middleware
that provides the necessary functionality required to build and
deploy fully operational Grid services. It includes software for
security, information infrastructure, resource management, data
management, communication, fault detection and portability. It
supports the core specifications that define the Web services
architecture such as XML, SOAP and WSDL. It also supports
and implements the WS-Security and other specifications
relating to security, as well as the WSRF, WS-Addressing
and WS-BaseNotification specifications used to define, name
and interact with stateful resources. An overview of the EMS
services and their interactions is depicted in Fig. 3.
The GT4 includes a core set of high-level services and tools
that address different tasks one can face when involved with
Grid computing. Depending on the explicit nature of the EMS
system, two of these high-level services were integrated into the
EMS:
1. WS-GRAM [19], the core of the GT execution management,
providing services to submit jobs and monitor their
execution on a GT4-based Grid. “Jobs” within the WSGRAM framework are considered to be binary executable
files. However, not all applications are good candidates
for exposure as a Web service. The WS-GRAM concept
and the Web service concept are two mutually exclusive
approaches. WS-GRAM is meant to be used in situations
where the job requested by the client is not available as a
Web service but in the general form of a binary executable
file. Through WS-GRAM, the EMS is able to address
this need for management of applications in the form of
executable files, while at the same time is able to manage
the execution of application-specific Web services. From
the client’s perspective, there is no difference in the EMS
behaviour, whether it handles the execution of an executable
file or a Web service. The use of WS-GRAM is transparent
to the client of the EMS.
250
A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258
2. MDS4 Index Service [20], which provides the necessary
functionality for GT service discovery, was integrated inside
the Advertisement and Discovery services of the EMS.
More specifically, through the Advertisement service, every
resource “living” in machines inside the VO is registered
to their local Index service. The Index service which is
deployed in the same standalone container as the EMS, acts
as the central VO index to which all the Index services
scattered throughout the underlying Grid are registered. In
this way, the EMS maintains a registry of available resources
for the entire VO. On the other hand, the Discovery service
queries this EMS’s Index service to find a list of candidates
for the execution using the criteria defined in the client’s
SLA contract.
6. SLA Enforcement and Monitoring services
EMS was developed on the basis of the OGSA and WSRF
specifications. It is therefore responsible for finding execution
candidate locations, selecting the most suitable execution
location, preparing, initiating and managing/monitoring the
execution of a Business Service (BS).3 In order to address these
tasks and at the same time ensure continuous conformation to
the terms of the SLA contract, EMS works closely together
and involves numerous interactions with the SLA-Enforcement
and Monitoring group of services. The services involved in
these two groups were developed within the framework of
the Akogrimo project by other consortium partners. More
information on their functionality and their implementation can
be found in [21,22].
6.1. Monitoring group of services
The Monitoring group of services consists of the following
three Grid services, developed on the GT4 platform:
• Metering service, which maintains run-time information on
the performance parameters related to the business service
execution, defined in the SLA. In particular, it measures lowlevel performance parameters such as memory, CPU and
disk usage.
• QoS Broker service, which is a bandwidth broker that can
provide three bundles of network services, each one of them
corresponding to a specific usage profile: audio, video or
data. It handles QoS network requests, keeps records of
information related to the client, such as network QoS levels
the client is allowed to use according to the SLA contract and
is also responsible for monitoring network parameters such
as network bandwidth throughout the execution of a BS.
• Monitoring service, which is the link between the
Monitoring group of services and the SLA Enforcement
group of services. It receives notifications about the values
of the QoS parameters related to the execution of the BSs
from the Metering and QoS Broker services and notifies the
SLA Enforcement group of services.
3 The term Business Service (BS) is used throughout this paper to refer to
business applications that are available in the VO in the form of Grid services
or executable files.
6.2. SLA Enforcement group of services
The SLA Enforcement group is composed of a number of
Grid services communicating with each other and exchanging
information prior to and during the execution of a client
request:
• SLA-Access service, which provides access to all existing
SLA contracts for the system.
• SLA-Controller service, which is responsible for comparing
measured QoS parameters against the thresholds defined in
the SLA contract and communicates violations to the SLADecider service.
• SLA-Decider service, which is responsible for the management of the QoS violations that may occur during the business service execution. The SLA-Decider service checks the
correspondent policy in order to decide on the action that
should be taken.
Although EMS and the Monitoring group of services were
developed on GT4, all the services in the SLA Enforcement
group were developed using the WSRF.NET platform [23],
another popular middleware used in the development of
Grid services that implements the WSRF and WS-family of
specifications. For this reason, the EMS-SLA Enforcement
interaction was not only challenging in terms of design and
efficiency but in terms of interoperability too, since it could
be checked whether these two popular Grid development
tools, implement the WSRF and WS-related specifications in
a transparent and interoperable way. During the establishment
of communication between them, several inconsistencies were
detected and were handled in different ways and at different
levels with some of them even requiring changing the form of
the SOAP messages that were generated by the services.
7. EMS implementation
The various tasks that the EMS covers are encapsulated
into four distinct phases: (i) the Negotiation and Discovery
phase, (ii) the Reservation phase, (iii) the Execution and
Monitoring phase and (iv) the Advertisement phase. It should
be stressed that each of the first three phases requires the
preceding ones, i.e. the client must first negotiate an SLA
contract, then proceed to reservation and finally request the
execution of the reserved BS. On the other hand, SPs can
advertise their services with QoS capabilities or update existing
advertisements at any time, using the Advertisement service of
EMS. All phases are presented in detail in the corresponding
sections that follow, with the exception of the Advertisement
phase that does not include any interactions with the SLA
Enforcement and Monitoring services. An overview of EMSSLA Enforcement-Monitoring interactions is depicted in Fig. 4.
7.1. Negotiation and Discovery phase
During this phase EMS is in charge of negotiating SLA
contracts with its clients. This involves the discovery of the
resources needed for the execution that meet the client’s
A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258
251
Fig. 4. Overview of EMS-SLA Enforcement-Monitoring interaction.
criteria. The phase begins with a client asking EMS to
check for service and network availability by invoking the
checkResourcesAvailability operation on the Core service,
providing a service ID,4 a client ID5 as well as the desired
values and thresholds for the QoS parameters. A typical
example of the QoS parameters used is presented in Table 1.
Example of the low-level QoS parameters. These low-level QoS
parameters are the ones that can be currently monitored by
the framework and each SP has to specify their values when
advertising a service.
The EMS Negotiation service makes use of the QoS parameters to locate the most suitable, available service running within
the VO. More specifically, it delegates the task to the EMS Discovery service to see if there are any resources registered in
the central VO-wide Index service that meet the specified QoS
criteria. This is done by invoking the findSuitableResources operation exposed by the Discovery service and passing the QoS
parameters. If the query returns more than one BSs, then the
Discovery service finds the one that is closest to the client’s requirements. In case there are BSs with identical attributes, EMS
selects the cheapest one by default. After selecting a service
from the list, the Discovery service checks the network avail4 Each service registered in the EMS Index service bares an ID, specified by
the SP during the advertisement process.
5 This a unique identifier used by the EMS system for authentication/authorization and accounting purposes.
Table 1
Example of the low level QoS parameters
<QoSParams>
<QoSItems>
<QoSItem>
<param>cpuSpeed</param>
<paramValue>3.5</paramValue>
<paramType>MHz</paramType>
<threshold>40%</threshold>
</QoSItem>
<QoSItem>
<param>diskSpace</param>
<paramValue>1</paramValue>
<paramType>GB</paramType>
<threshold>20%</threshold>
</QoSItem>
<QoSItem>
<param>memory/param>
<paramValue>1</paramValue>
<paramType>GB</paramType>
<threshold>50%</threshold>
</QoSItem>
<QoSItem>
<param>networkBandwidth</param>
<paramValue>GOLD</paramValue>
<paramType></paramType>
<threshold></paramValue>
</QoSItem>
</QoSItems>
</QoSParams>
252
A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258
Fig. 5. Sequence diagram of the Discovery and Reservation phase.
ability by invoking the QoS Broker service. In case the network
availability is not verified, EMS moves on to the second service
on the list and this sequence is repeated until the QoS Broker
is able to confirm that a candidate service is available over a
network that meets the client’s criteria. After a successful negotiation, EMS creates an NR. All information related to the
negotiation request and the discovered resources is stored into
the NR and its EPR is returned to the client. In case no match is
found, EMS returns a null EPR to the client, along with a list of
the available services and their affiliated QoS properties. Fig. 5
shows a high-level view of the interactions performed by the
EMS during this phase.
7.2. Reservation phase
The phase begins with a client requesting advance
service and network reservation on the resources that were
discovered during the negotiation phase by invoking the
performAdvanceReservation operation of the Core service,
providing the EPR to the NR, which was obtained at the
end of the negotiation phase. The Core service delegates
this task to the EMS’ Reservation service by invoking the
reserveResources operation on the latter. At first step, the
Reservation service locates the NR and retrieves all information
stored inside it. EMS contacts the SLA-Access service and
A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258
requests the SLA of the specific client. The SLA-Access
service queries its database and returns the QoS parameters
specified in the client’s SLA contract to the Reservation service.
Afterwards, it creates an instance resource of the discovered
BS and manages its life cycle by using the start time and end
time defined in the SLA. In order for EMS to perform life
cycle management, each Business Resource (BR) exposes two
operations; one responsible for setting the start time and one
for setting the end time of the life of the resource. If the EMS
tries to invoke the BR earlier than the start time, it will receive a
message from the BR informing it of the exact date and time the
service will become available. If the client attempts to invoke
the BR after the expiration of the validity period of the affiliated
SLA, an error message stating that the requested BR no longer
exists will be returned.6 In case of failure during the creation or
the lifetime management of the BR, the Core service queries the
Discovery service for candidate services in different locations
and repeats the above sequence.
Once the advance service reservation has been performed,
the Reservation service proceeds with the creation of an SLAController, a Metering and a Monitoring resource. Afterwards,
it contacts the QoS Broker service and requests the reservation
of a network bundle using the bandwidth value that is defined
in the SLA.
At the last step of the Reservation phase, the Reservation
service creates an RR and stores into it all information related
to the reservation along with the EPRs to the resources that have
been created on previous steps. If the reservation is successful,
the EPR to the RR is returned to the client via the Core service.
In case of failure, a null EPR is returned to the client. Fig. 6
shows a high-level view of the interactions performed by the
EMS during this phase. This sequence diagram shows only
the basic behavior of the EMS during the Reservation phase.
Alternative sequences that may take place, for example rediscovery of candidate location in case the already discovered
BS is not available, have been omitted for simplicity.
7.3. Execution and Monitoring phase
The Execution and Monitoring phase begins with a client
initiating the execution of a BR. The client invokes the
performExecution method on the Core service, providing the
EPR to the RR obtained at the end of the Reservation
phase and specifying the input needed for the execution of
the specific Business Service. The Core service invokes the
executeoperation on the Execution service. The latter uses
the EPR to the RR to locate it and retrieve all information
related to the reservation. At the first step, the Execution
service uses the EPR to the SLA-Controller resource that was
created during the Reservation phase to locate and activate
the specific SLA-Controller resource. Having activated the
6 In order to avoid problems related to lack of time synchronization between
the machines that the services run in, the EMS does not pass on date-time
objects but rather, absolute time periods (until the SLA becomes valid and until
it stops being valid). The BR uses these two values to calculate the actual start
and termination times.
253
SLA-Controller resource, the Execution service activates the
Monitoring and Metering resources on the machine hosting the
requested service.
Once the set-up of the SLA-Enforcement and Monitoring
group of services has been completed, the Execution service
continues with the execution of the actual BR. During
the execution, the Metering service calculates performance
parameters and notifies the Monitoring service using a
WS-Notification mechanism established between these two
services. When the Monitoring service is activated by
the Execution service, it subscribes to specific notification
topics that the Metering Service exposes, representing the
performance parameters that are calculated. In order to
avoid SOAP message traffic, notifications are generated
by the Metering service only if a significant change
(e.g., | previous CPU usage value – current CPU usage value
| >10%) in the performance parameters has taken place. The
Monitoring service filters the notifications and sends them
to the SLA-Controller resource. The SLA-Controller resource
compares the values of the performance parameters that have
just been received with the thresholds specified by the client in
the SLA contract. If a violation is detected, the SLA-Controller
service notifies the SLA-Decider. The SLA-Decider “decides”
on the appropriate action that should be taken, depending
on the type of the violation and its affiliated parameters. If
case of severe SLA violations, the SLA-Decider notifies the
Execution service that the execution should be reallocated.
For this purpose a WS-Notification is established between the
Execution service and the SLA-Decider.
Once this type of notification is received, the Execution
service destroys the BR that produced this violation as well as
the Monitoring and Metering resources that are connected to
it. Following that, the Execution service contacts the Discovery
service and tries to find new available services in the Grid that
satisfy the client’s SLA. In case a match is found, the Execution
service updates the RR and repeats the steps described above.
After the successful completion of the execution of the BR, the
Execution service stops the execution of all related services,
destroys all affiliated resources and returns the result of the
service execution to the client. In Fig. 7 a high-level view of
the interactions performed by the EMS during the Execution
and Monitoring phase is depicted.
8. EMS in Akogrimo testbed
The business services that can be supported by the EMS
might come from various scientific and business domains,
including among others bio-medics, construction, engineering,
finance, etc. Within the Akogrimo project framework, the EMS
has been used to support a suite of e-health services that
comprise an e-Health application, offering to its clients a Heart
Monitoring and Emergency Response Process. The Akogrimo
demonstrator scenario consists of a university hospital, regional
hospitals, medical specialists, general practitioners, emergency
medical services and an emergency dispatch center establishing
a regional health network. The health network is headed by the
254
A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258
Fig. 6. Sequence diagram of the Reservation phase.
university hospital and provides telemedicine services to the
partners and the patients attended by partners of the network.
As depicted in Fig. 8, the EMS along with the SLA
Enforcement, Monitoring and e-Health services are positioned
in the Akogrimo Grid Infrastructure Services Layer which is
responsible for providing the core Grid functionalities. The
EMS acts as the central point of access to the Grid layer
functionalities, responsible for the coordination of all services
that reside in the Akogrimo Grid.
The above figure shows a simplified high-level view of the
Akogrimo architecture [24]. Although a detailed description
would be out of the scope of this paper, for consistency reasons
it should be mentioned that in the Akogrimo environment,
each business service requested by a client is modeled as a
business process instance. The business process represents a set
of one or more linked procedures or activities that collectively
realize a business objective goal, normally within the context
of an organizational structure defining functional roles and
A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258
255
Fig. 7. Sequence diagram for the Execution and Monitoring phase.
relationships. Behind the business process there are one or
more workflows representing the automation of the business
process. Each workflow coordinates and manages component
services or entities involved in the automation of the business
process.
9. Experimental results
In this section, experimental results on the performance
of the EMS are presented. Two sets of experiments were
conducted. In the first one, we focused on the overall
performance of EMS. Time measurements were performed for
each of the three distinct phases of EMS, with and without
the activation of the SLA Enforcement/Monitoring services.
In the second experiment, the focus is shifted towards the
EMS reallocation technique. The time measurements that were
performed in this phase were strictly related to the reallocation
mechanism.
Four PCs with identical technical characteristics (Pentium
V 1.7 GHz, 1 GB of memory, 10 Mb/s Ethernet segment)
256
A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258
Table 3
Results of the second experiments set (measured in secs)
Fig. 8. EMS in the Akogrimo Grid Infrastructure.
Table 2
Results of the first experiments set (measured in secs)
SLA Enforcement/Monitoring
Negotiation
Reservation
Execution
Deactivated
Activated
0.16
0.25
0.38
0.75
0.22
1.03
were used during the experiments. Three out of the four
PCs were configured to behave as a GT4 cluster whereas the
remaining one served as a host for the SLA Enforcement group
services. All EMS services were deployed on the same PC (part
of the GT4 cluster) and ran under the same GT4 container.
The tests were performed against a WSRF service that was
developed on the GT4 platform. This was the single service
that was offered to the clients of the EMS and performed trivial
mathematical operations. This service along with the Metering
service was deployed into two of the three GT4 containers
in the cluster with the exception of the one hosting the
EMS.
On the first set of experiments, we measured the time
spent to perform negotiation, reservation and execution
with SLA Enforcement/Monitoring deactivated and activated
respectively. Since the key aim was to evaluate the overhead
that the SLA Enforcement group added to the EMS’ overall
performance, the thresholds in the SLA contract that was
used were given extremely low values to assure that no
violation requiring reallocation would occur. Also, the EMS
was handling one execution at the time. A set of one hundred
tests were performed for each of the three phases.7 Table 2
shows the average results of the measurements in seconds.
As we can see, the SLA Enforcement adds a significant
overhead to the EMS performance, especially during the
execution phase. After further investigation, we came to the
conclusion that a significant percentage of the added overheads
are due to the relatively slow performance of the SLAEnforcement group of services whereas the Monitoring group
of services has very quick responses, posing only trivial
7 The containers were restarted after each test to ensure greater reliability.
# services
tdeact
tdisc
tact
tinit
ttotal
1
10
100
0.32
0.75
1.98
0.03
0.09
0.12
0.45
2.05
2.96
0.10
0.21
0.26
0.90
3.10
5.32
overheads in the performance of the overall system. As the
workload on EMS and consequently on the SLA Enforcement
services increases, the overall performance of the system might
be considered too slow.
On the second set of experiments, the time measurements
were strictly related to the generation of a reallocation
notification and the reallocation mechanism itself. In contrast
to the ones used for the previous tests, the thresholds defined
in the clients SLA contract were given relatively high values
to ensure that reallocation will be required. Each time a
request for reallocation was generated, the EMS reallocated
execution between the two PCs in the cluster that hosted the
business service. Three sets of tests were performed, each
time specifying a different number of executions handled in
parallel. The number of services registered in the EMS index
service remained the same throughout the tests, in order to
achieve a greater degree of independency from the inherent
functionality of GT4’s Index service which is used by EMS
during the discovery process. Table 3 shows the results of the
measurements in seconds.
where:
tdeact = elapsed time between receiving a reallocation
request and deactivating the SLA Enforcement and Monitoring
group,
tdisc = time spent for the process of discovering a new
location,
tactiv = time spent for the process of activating the SLA
Enforcement and Monitoring group on the new location,
tinit = elapsed time between the creation of a duplicate
business resource on the new location and its initiation
and consequently:
ttotal = tdeact + tdisc + tactiv + tinit = total time the
reallocation mechanism is active
By comparing the results shown in Table 3, some very
valuable conclusions can be extracted. On the one hand, the
actions that are handled internally by EMS and don’t require
any interaction with external-to-EMS services are scaling
relatively well when the workload increases. On the other hand,
actions such as the activation and deactivation of the SLA
Enforcement group that involve a number of interactions with
services external to EMS impose an increased overhead on the
overall EMS performance. Apart from the obvious solution of
improving the performance of the SLA Enforcement services
and since the cost of discovery is but a fraction of the one related
to the activation/deactivation of the SLA Enforcement services,
one other way to improve the overall performance of the
system is to introduce new instances of the SLA Enforcement
services within the Grid and perform a discovery process for
the SLA Enforcement services as well. This will allow for
A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258
better work load balancing within the Grid in terms of SLA
management and will eventually result in improved overall
performance.
10. Conclusions and discussion
In this paper we have presented the architectural design
and the implementation of an OGSA compliant EMS that
enables the management and enforcement of SLA contracts in
the execution environment of OGSA-based Grids. In order to
achieve this, core components of the execution environment of
the Grid infrastructure establish and exploit synergies between
the various SLA related modules of the architecture. The
approach that is followed is based on a layered structure
of a Grid infrastructure which positions the Grid services
middleware in collaboration with both the application services
and the network services in order to provide an adjustable
QoS to the requests of the clients. The paper provides a
detailed description of the participating components and the
extensive set of exchanged messages and interfaces between
them. The interaction between the Execution Management
Services and the SLA enforcement mechanism is presented also
through three discrete phases: the discovery and negotiation
phase, reservation phase for the needed resources that will be
deployed, and the execution and monitoring phase where a set
of services is being activated in order to support and control the
execution in the Grid environment as specified.
The main contribution of this work lies in the design and
prototype implementation of the respective services in a manner
that enables the adaptive management of the Grid execution
based on contractual SLA terms. A WSRF-compliant EMS
has been implemented with the capability to enforce an SLA
compliant execution of services while at the same time it
monitors any violations that occur and takes corrective actions.
It discusses the need for dynamically manageable SLA in
Grids and formulates the Grid middleware participation in
conjunction with other services (application, business flow,
other network middlewares) in a seamless and modular
approach. The current work, through its implementation,
provides the feasibility framework for structuring Grid services
for commercial exploitation through service providers and
telecom operators.
Finally, it enables the discussion for a variety of issues
for the future research in the domain of Grid services and
SLA. An interesting aspect is to deploy other commercial
applications with complex and extensive offered services and
business models, something which would complicate the set
of negotiated terms and the way of monitoring their violation.
While having failures in meeting the clients’ requirements in
such a dynamic environment, it is interesting to show also how
these failures affect the efficiency of the Grid infrastructure
in terms of having deviation from deadlines. We can assume
that the failures to execute successfully a user’s job results in a
cost for the system. This cost may be due to the consumption
of resources or the price (penalty) that has to be paid back to
the user due to deadline and SLA violation. It is an interesting
topic to examine how in such a dynamic environment a viable
257
economic model could be developed for estimating this cost.
The idea of preserving the QoS attributes of the system adds
an important aspect to various Business Models, because of
the dynamic SLA mechanisms that can make feasible dynamic
workflow techniques.
Acknowledgments
The work has been partially supported by the Akogrimo
Integrated Project (FP6-2003-IST-004293). The authors would
like to thank the members of the consortium for their collaboration, and especially Jesus Movilla, Francesco D’Andria, Nuno
Inacio and Nadia Romano.
References
[1] I. Foster, What is the Grid? A three point checklist, GRID Today (2002).
[2] A. Litke, D. Skoutas, T. Varvarigou, Mobile Grid computing: Changes
and challenges of resource management in a mobile Grid environment,
in: Access to Knowledge through the Grid in a Mobile World” Workshop,
held in conjunction with 5th Int. Conf. on Practical Aspects of Knowledge
Management, PAKM 2004, Vienna. Available at: www.mobilegrids.org/
docs/pakm2004/papers/3 Mobile-Grid-Computing.pdf.
[3] Open Grid Services Architecture. http://www.globus.org/ogsa/.
[4] OGSA Working Group. http://forge.gridforum.org/projects/ogsa-wg.
[5] Access to knowledge through the Grid in a mobile world (AKOGRIMO)
integrated project FP6-2003-IST-004293. http://www.akogrimo.org/.
[6] R. Al-Ali, O. Rana, D. Walker, G-QoSM: Grid service discovery using
QoS properties, Grid Computing, Computing and Informatics Journal 21
(2002) (special issue).
[7] A. Keller, H. Ludwig, The WSLA framework: Specifying and monitoring
service level agreements for web services, E-Business Management,
Journal of Network and Systems Management 11 (2003) (special issue).
[8] I. Foster, C. Kesselman, C. Lee, R. Lindell, K. Nahrstedt, A. Roy,
A distributed resource management architecture that supports advance
reservations and co-allocation, in: International Workshop on Quality of
Service, 1999.
[9] C. Bourasa, M. Campanella, et al., QoS and SLA aspects across multiple
management domains: The SEQUIN approach, Future Generation
Computer Systems 19 (2003).
[10] M. Surridge, St. Taylor, D. De Roure, Ed Zaluska, Experiences with GRIA
— Industrial applications on a Web Services Grid, in: Proceedings of the
First International Conference on e-Science and Grid Computing, 2005,
pp. 98–105.
[11] GRASP. http://eu-grasp.net/english/default.htm.
[12] UNICORE. www.unigrids.org.
[13] A. Streit, D. Erwin, Th. Lippert, D. Mallmann, R. Menday, M.
Rambadt, M. Riedel, M. Romberg, B. Schuller, Ph. Wieder, UNICORE
— From project results to production Grids, in: L. Gandinetti (Ed.),
Grid Computing: The New Frontiers of High Performance Processing,
in: Advances in Parallel Computing, vol. 14, Elsevier, 2005.
[14] M. Riedel, V. Sander, P. Wieder, J. Shan, Web services agreement-based
resource negotiations in UNICORE. http://whitepapers.silicon.com/0,
39024759,60142543p,00.htm, 2005.
[15] Open Grid Forum. http://www.ogf.org.
[16] Web Services Resource Framework. http://www-106.ibm.com/
developerworks/library/ws-resource/.
[17] OASIS Web Services Resource Framework (WSRF). http://www.
oasis-open.org.
[18] Globus Toolkit version 4. http://www.globus.org/toolkit/.
[19] GT 4.0 WS-GRAM. http://www-unix.globus.org/toolkit/docs/4.0/
execution/wsgram/.
[20] Information Services (MDS): Key concepts. http://www.globus.org/
toolkit/docs/4.0/info/key-index.html.
258
A. Litke et al. / Future Generation Computer Systems 24 (2008) 245–258
[21] Akogrimo Deliverable D4.3.1, Architecture of the infrastructure services
layer V1. Public available at: http://www.akogrimo.org/download/
Deliverables/Akogrimo D431 Version1 Final.pdf.
[22] Akogrimo Deliverable D4.1.1, Mobile network architecture design &
implementation. Public available at: http://www.akogrimo.org/download/
Deliverables/D4.1.1.pdf.
[23] WSRF.net. http://www.cs.virginia.edu/∼gsw2c/wsrf.net.html.
[24] Akogrimo Deliverable D3.1.3, Overall architecture definition and layer
integration. https://bscw.hlrs.de/bscw/bscw.cgi/0/109536.
Dr. Antonios Litke received his diploma from the
Dept. of Computer Engineering and Informatics of the
University of Patras, Greece in 1999, and his Ph.D.
from Electrical and Computer Engineering Department
of National Technical University of Athens in 2006.
Currently, he is working in the Telecommunication
Laboratory of Electrical and Computer Engineering
Department of National Technical University of Athens
as researcher participating in numerous EU and
National funded projects. His research interests include
Grid computing, resource management in heterogeneous systems, Web services
and information engineering.
Kleopatra Konstanteli was born in Athens, Greece
in 1981. She received her M.Sc. in Electrical and
Computer Engineering in 2004 from the National
Technical University of Athens. She is currently
pursuing her doctoral-level research in computer
science while at the same time working as research
associate in the Telecommunications Laboratory of
Electrical and Computer Engineering of NTUA and
participating in EU-funded projects. Her research
interests are mainly in the field of Grid computing.
Vassiliki Andronikou obtained her M.Sc. in Electrical
and Computer Engineering in 2004 from the National
Technical University of Athens. She is a Ph.D. candidate in the same department and is currently working as Research Associate in the Telecommunications
Laboratory of Electrical and Computer Engineering
of NTUA participating in EU funded projects, such
as AKOGRIMO, BEinGRID, POLYMNIA and FIDIS.
Her research interests are mainly in the field of Grid
computing, biometrics and identity-based profiling.
Sotirios Chatzis was born in Athens at 1982. He
received his M.Sc. (5- year diploma) from the Electrical
and Computer Engineering Department of the National
Technical University of Athens, at 2005, majoring in
Computer Science. Currently, he pursues his Ph.D. in
the Electrical and Computer Engineering Department
of the National Technical University of Athens. His
major research interests are Artificial Intelligence,
Machine Learning, Pattern Analysis and Recognition
and their applications in Computer Vision tasks.
Theodora Varvarigou received the B.Tech degree
from the National Technical University of Athens,
Athens, Greece in 1988, the M.S. degrees in Electrical
Engineering (1989) and in Computer Science (1991)
from Stanford University, Stanford, California in 1989
and the Ph.D. degree from Stanford University as well
in 1991. She worked at AT&T Bell Labs, Holmdel,
New Jersey between 1991 and 1995. Between 1995
and 1997 she worked as an Assistant Professor at the
Technical University of Crete, Chania, Greece. Since
1997 she has been working as an Associate Professor at the National Technical
University of Athens. Her research interests include Grid Technologies
parallel algorithms and architectures, fault-tolerant computation, optimisation
algorithms and content management.