SLA Management Handbook: Concepts and Principles
SLA Management Handbook: Concepts and Principles
SLA Management Handbook: Concepts and Principles
Volume 2
Release 2.5
GB 917-2
i
SLA Management Handbook - Vol 2
Notice
Executive Summary
The objective of the SLA Management Handbook series is to assist two parties in
developing a Service Level Agreement (SLA) by providing a practical view of the
fundamental issues. The parties may be an “end” Customer, i.e., an Enterprise,
and a Service Provider (SP) or two Service Providers. In the latter case one
Service Provider acts as a Customer buying services from the other Service
Provider. For example, one provider may supply network operations services to
the provider that supplies leased line services to its customers. These relationships
are described as the Customer-SP interface and the SP-SP interface.
The perspective of the SLA Management Handbook series is that the end
Customer, i.e., an Enterprise, develops its telecommunication service
requirements based on its Business Applications. These requirements are
presented to a Service Provider and the two parties begin negotiating the specific
set of SLA parameters and parameter values that best serves both parties. For
the SP, the agreed-upon SLA requirements flow down through its organization and
become the basis for its internal management and control of its Quality of Service
(QoS) processes. For the Enterprise Customers, the SLA requirements serve as a
foundation or a component of its internal network services or business services.
This volume of the Handbook contains three tools that provide the foundation for
clarifying management roles, processes, responsibilities and expectations. These
are the Key Quality Indicator Methodology, the Life Cycle of the Service and the
SLA Parameter Framework.
The move towards service focused management leads to a requirement for a new
'breed' of indicators that are focused on service quality rather than network
performance. These new indicators or Key Quality Indicators (KQIs) provide a
measurement of a specific aspect of the performance of the product, product
components e.g. services, or service elements and draw their data from a number
of sources including the KPIs
A service and its associated SLA are divided into six Life Cycle Stages to clarify
the roles of the Customer and the SP. The six Life Cycle Stages are as follows:
product/service development, negotiation and sales, implementation, execution,
assessment and decommissioning. Each life cycle stage addresses specific
operations processes in the enhanced Telecom Operations Map (eTOM) [GB
912]. The SLA Life Cycle provides a complete process description by delineating
interactions between well-defined stages.
Many performance parameters exist that have similar names yet have drastically
different definitions. The SLA Parameter Framework is a useful tool for
categorizing parameters. The framework organizes SLA parameters into six
categories based upon service and delivery technology and upon measures of
individual instance and average performance. The specification of specific values
for service performance parameters is part of a specific contract negotiation and is
beyond the scope of the Handbook.
The SLA Management Handbook series incorporate earlier work that appears in
the Performance Reporting Concepts and Definitions Document [TMF 701], in the
Service Provider to Customer Performance Reporting Business Agreement [NMF
503] and in Service Quality Management Business Agreement [TMF506].
Table of Contents
Notice ............................................................................................................................................... ii
Executive Summary...................................................................................................................... iii
Table of Contents........................................................................................................................... v
List of Figures................................................................................................................................ xi
List Of Tables............................................................................................................................... xiv
1 Introduction ................................................................................................................................. 1
The Drivers For Service Level Agreements ............................................................................ 1
Approach To SLA Specifications............................................................................................. 2
SLA Business Relationships.................................................................................................... 3
Scope........................................................................................................................................ 4
Handbook Benefits................................................................................................................... 7
1.1.1 Service Providers .................................................................................................... 7
1.1.2 Customers ............................................................................................................... 7
1.1.3 Equipment And Software Vendors ......................................................................... 8
Notes For Readers................................................................................................................... 8
Overview................................................................................................................................... 8
2 Business Considerations ........................................................................................................ 12
Introduction............................................................................................................................. 12
The Business Context............................................................................................................ 12
2.1.1 SLAs In The Evolving Marketplace ...................................................................... 14
The Business Model............................................................................................................... 14
2.1.2 The Basic Model ................................................................................................... 14
2.1.3 The Relationship Model ........................................................................................ 15
2.1.4 End To End Service .............................................................................................. 16
Business Benefits................................................................................................................... 17
2.1.5 Customer Benefits................................................................................................. 17
2.1.6 Service Provider Benefits...................................................................................... 19
2.1.7 Supplier Benefits ................................................................................................... 19
Service Performance Parameter Selection........................................................................... 20
List of Figures
Figure 6-1: Time Line For Reporting Network And Service Incidents..................... 101
Figure 6-6: Reporting With Acknowledgement And Optional Logging .................. 106
List Of Tables
1 Introduction
The interest in Service Level Agreements (SLA) has significantly increased due to
the changes taking place in the telecommunications industry. The liberalization of
the telecommunications market has been an important event leading to change
and competition. SLAs represent one of the responses to this newly competitive
environment where a variety of providers enter the marketplace and compete for
Customers. New entrants that are striving to gain market share and to establish
themselves as viable suppliers can use SLAs to provide one means of attracting
Customers. By committing to provide specified levels of service with compensation
or administrative responses if such commitments are not met, a new Service
Provider (SP) can begin to establish its credibility. Likewise, incumbent SPs are
increasingly offering SLAs in response.
Consequently, these IT managers take steps to insure that their infrastructure and
staff can meet these internal SLAs. The IT staff will typically use the SLA
commitments from their SPs when planning the growth and evolution of their own
systems. SLAs are also advantageous for smaller organizations that do not have
an IT department and networking specialists. In these cases, SLAs can provide the
needed performance assurance.
All SPs are seeking new ways to distinguish the quality of their services from those
of their competitors. The use of SLAs provides an excellent mechanism to achieve
this goal. However, SPs frequently encounter difficulties in preparing and
managing SLAs. For example, since SLAs have developed in an ad hoc way,
there may not be a set of mutually understood SLA terms to use when creating a
Customer-specific agreement. In addition, many of the commonly used
performance parameters focus on network and network element performance,
whereas a Customer is concerned with assessing service performance levels. All
of these factors require that network-related performance measures be mapped
into service level metrics that are relevant to the service being provided and that
best reflect the Customer’s service expectations.
When multiple SPs are involved in providing a service to an end Customer, the
value chains become more complex. SLAs need to account for this value chain so
that the end Customer can be given the required level of service by it’s SP.
Consequently, all SPs supporting a service require a common understanding of
the service performance requirements and must follow a consistent approach to
SLA management in order to support the commitments to the end Customer.
The methodology and tools defined in this document have been developed to
support a customer centric approach. One key element is that not all of the factors
required to measure the customer’s experience can be derived from the network
component data.. Indeed the perception that customers have of a service is not
solely based on traditional quantitative measures related to network performance,
but may include more qualitative factors, for example; billing accuracy and the
perceived timeliness of problem resolution. This expanded view of customer
satisfaction provides a foundation for service modeling that can account for metrics
The methodology and tools herein can be used to manage service quality
throughout the Customer Experience Lifecycle. This means managing service
quality beyond the in-use phase of the lifecycle to include point of sales,
provisioning, in-use phase and service cessation aspects. It should also be noted
that the in-use phase includes service components such as customer services and
billing.
The customer centric approach described herein recogonizes the need for
extending service quality management beyond individual services to the product
layer of the business. The basis for this is that products the customer buys are
typically composed of one or more services and it is this collection of services that
are the subjects of SLAs.
This document accounts for the fact that in real business situations there exists the
need for end-to-end service management especially when part of the service
delivery chain is outside of the customer facing service providers own business.
This is a situation that is becoming more prevalent in the industry with the
expansion of data services that rely on external content providers.
Scope
Volume 2 of the SLA Handbook series addresses SLA Principles. It is part of the
TeleManagement Forum’s (TMF) four volume series on SLA Management. Figure
1-2 describes the relationship among the four volumes. Annexes B through D
contain the Executive Summaries from the other three volumes. The SLA
Handbook series was cooperatively developed by the TMF and the Open Group.
Expands On
Supports Supports
GB 917 Volume 4
GB 917 Volume 2 Enterprise View Through
Concepts And Principles SLA Negotiations
Application Of Based On
This volume defines Service Level Agreements (SLA), the circumstances under
which they are used, and their relationship to other service and business
constructs. It contains a model for telecommunications services that is useful for
SLA development. The important concepts of Service Availability (SA) and
Emergency Telecommunications Service (ETS) are specified in terms of this
service model. The service reporting process is addressed in detail. Finally, this
volume contains assistance in structuring the complex process of SLA
management by providing three tools, viz., the Key Quality Indication (KQI)
methodology, the six-stage Service Life Cycle and the SLA Parameter Framework.
These tools are the key technical contributions of Volume 2.
The move towards service focused management leads to a requirement for a new
'breed' of indicators that are focused on service quality rather than network
performance. These new indicators or Key Quality Indicators (KQIs) provide a
measurement of a specific aspect of the performance of the product, product
components e.g. services, or service elements and draw their data from a number
of sources including the KPIs
The development of an SLA must consider the complete life cycle of a service. Six
life cycle stages are identified in this volume. They are product/service
development, negotiation and sales, implementation, execution, assessment and
decommissioning. When the Customer-Provider interactions address each of
these stages, the resulting agreements will be better aligned and the relationship
enhanced. The term Customer refers to the business that consumes a service. A
Customer may also be another SP. An end user is considered to be an individual
within the Customer organization1.
Many performance parameters exist that have similar names yet have drastically
different definitions. The SLA Parameter Framework is a useful tool for
categorizing parameters. The framework organizes SLA parameters into six
categories based upon service and delivery technology and upon measures of
individual instance and average performance. The specification of specific values
for service performance parameters is part of a specific contract negotiation and is
beyond the scope of the Handbook.
This volume of the Handbook addresses SLA Management from the perspective
of the Customer - SP interface by focusing on service level and service quality
issues. It does not prescribe SLA management from a network operations or
service implementation perspective. An SLA between a Customer and a SP can
1
The main focus of this Handbook is on business and government Customers as opposed to SPs as customers of other
SPs.
refer to one or more services that the Customer is ordering from the SP. It is not
necessarily restricted to one service or even one type of service.
Agreement on terms and definitions for service performance parameters and their
measurement and reporting is a key aspect for constructing and managing SLAs.
Annex A of this document contains definitions for the key terms used in SLA
specifications.
Apart from functionality and price, service quality is an increasingly important factor
for Customers in the competitive telecommunication market. Some would say that
service performance is now the prime factor in selecting telecom services,
particularly for business Customers. This Handbook primarily addresses service
performance related SLA parameters. There are numerous business parameters
contained in a contract that are not covered by this Handbook. For example, while
a rebate process is described in terms of the parameters that may drive it,
parameters for payment and terms are not included herein.
Handbook Benefits
The SLA Handbook provides valuable information to three classes of readers, viz.,
Service Providers, Service Customers, and Telecommunications and Data
Equipment and Software Providers. Specific benefits to these readers are
discussed in the following sections.
This Handbook will assist SPs in developing new services with associated SLA
parameters, aligning SLA parameters to meet Customer requirements and an
SP’s internal processes, assigning internal processes according to SLAs, and
responding to SLA requirements from other SPs.
This Handbook will assist Service Providers to quantify Service Levels (and
therefore the SLA) of any new service in terms of Key Performance Indicators
(KPIs) and Key Quality Indicators (KQIs). Additionally, by defining the level of
service that the SLA provides, the SP is provided with the means to develop a
mechanism for managing the customer’s expectation.
1.1.2 Customers
This Handbook will assist Customers in negotiating an SLA contract that satisfies
their application requirements and in understanding the possibilities, limitations and
expectations related to performance and cost of a service. The SLA parameter
framework documents the needs of individual customers.
The customer will benefit from an SLA that reflects the service as he / she sees it
and not in terms of a technical network.
The Handbook helps Vendors, SPs and Customers create uniform cost-effective
SLA Management solutions. Such solutions consist of products and services
needed to satisfy requirements. Equipment and software vendors will gain insights
into their client’s needs through Volume 3 examples of services to be managed
and the performance parameters to be measured, reported and processed. and
through the SLA Application Notes document series.
The handbook will help vendors of OSS management solutions to understand the
value of a uniform method of determining KPIs and KQIs.
This volume restricts the term enterprise to business entities that do not provide
Information and Communications services to the public.
Many of the figures in this volume use the Unified Modeling Language (UML)
notation. As this usage is limited to simple cases, familiarity with UML is not a
prerequisite for reading this volume. However, UML knowledgeable readers may
gain a deeper understanding of the information contained in the figures.
Overview
The following paragraphs briefly describe the objectives and content of the
remaining Chapters of Volume 2 of the Handbook.
Business drivers for SLA management are identified and business benefits for
both Customers and SPs are described in this chapter.
This chapter introduces a conceptual service model that is used to identify service
components and to specify the role of such components within a SP’s service
offering to Customers. It provides examples of separating service offerings into
elementary building blocks, i.e., elementary services and service access points.
The examples also illustrate the relationships between these building blocks.
The chapter also summarizes the various network and service performance
concepts used in the industry. It then partitions service performance measures into
Level Of Service (LoS) and Quality Of Service (QoS) parameters and relates these
two concepts.
This Chapter also introduces Key Performance Indicators (KPIs) and Key Quality
Indicators (KQIs). KPIs are inherently network-based and provided little direct
indication of the end to end service performance. The service focus leads to a
requirement for new indicators that capture service quality rather than network
performance. These indicators are the KQIs. The KQIs provide a measure of
specific aspect of a service and are based on a number of sources including the
KPIs.
The six-stage Service Life Cycle and the SLA Parameter Framework are
introduced and explained in this chapter. These two tools provide assistance in
structuring the complex process of SLA specification, negotiation, and
management.
When developing an SLA, consideration must be given to the complete service life
cycle as the various life cycle stages may affect SLA requirements. For example,
different combinations of processes are required to support the six phases of the
SLA life cycle. Different SLA parameters and values may apply during
product/service development, negotiation and sales, implementation, execution,
assessment, and decommissioning. The relevant use cases supporting SLA
management are described and related to the enhanced Telecom Operations Map
(eTOM) processes.
The SLA Parameter Framework is used to organize the large number of SLA
parameters that are in use in the industry. Specifically, the framework defines six
SLA parameters categories. These categories provide direction for developing the
SLA specification. The Framework is the basis for a structured process for SLA
negotiation that can reduce the time required to reach a mutually satisfactory
contract.
The framework places parameters into one of three categories, i.e., a parameter
may be technology specific, service specific or technology and service
independent. Some services may contain both technology-specific and service-
specific parameters; some may contain only one or the other. Many services
contain technology and service-independent parameters.
Certain SLA parameters may be of more interest to the SP than the Customer and
vice versa. This framework provides an organized approach to jointly defining
pertinent SLA parameters and ultimately their values.
This chapter briefly discusses the relationships between the work on SLA
Management the TMF’s New Generation Operations Systems and Software
(NGOSS), enhanced Telecoms Operation Map (eTOM), and Shared Information and
Data (SID) initiatives.
Annex A
Agreement on terms and definitions for service performance parameters, and their
measurement and reporting, is required for constructing and managing SLAs.
Apart from functionality and price, service quality is an increasingly important factor
for Customers in the competitive telecommunication market. This Annex describes
SLA and QoS parameters, their classification and use.
Annexes B - D
These annexes contain the Executive Summaries from the other three volumes in
the SLA Management Handbook.
Annex E
This appendix provides an overview of work being undertaken on SLA and QoS
issues by various standards bodies and other organizations.
Annex F
This appendix reviews the data sources available to Service Providers that may be
used for performance reporting.
Annex G
Annex H
Annex I
Annex J
Annex K
This annex provides a detail breakdown of the derivation and application of service
based measurements based on the eTOM framework.
2 Business Considerations
Introduction
Government and Corporate Customers are seeking SLAs that offer more
extensive verification and analysis of the performance they receive from their
various telecommunications services. In addition to requirements such as service
accessibility and availability during periods of network congestion, the ability to
monitor services on an increasingly granular level is imperative. For example, to
efficiently manage their information and communications costs, these Customers
need information on service access rates during emergency conditions, the
number of successfully delivered packets in a given time period, the amount of
time a user was connected to a server, notification of what traffic type is using the
most bandwidth, what percentage of the traffic is going to a specific sub-network or
server, etc.
These issues were identified by previous work of the TMF Service Quality
Management (SQM) Project Team and others such as ETSI in [ETR 003].
It is recognized by this analysis and others such as ETSI ETR 003 that the
computation of service quality is based on network-related parameters and
non-network-related parameters, for example, help desk answer time.
While is it well understood that network-related parameters are based on
performance usage and trends from Network Data Management, it is not
yet known how to furnish the non network- related parameters (criteria).
It concludes that additional network parameters together with new parameters for
service measurements need to be identified to capture a more accurate picture of
the customer's perception.
SLAs can cover many aspects of the relationship between the Customer and the
Service Provider such as service performance, billing, provisioning etc. Service
performance reports normally use the SLA as the specification for the data to be
provided to the Customer. Service parameters that are not explicitly included in the
SLA may be provided via the SP’s standard reports.
SLAs, also known as Service Level Guarantees (SLGs), are an excellent tool for
establishing a Customer and Service Provider business relationship. A well-crafted
SLA sets expectations for all elements of the service to which it refers and provides
a basis for managing these expectations. It helps the Service Provider assess the
value of operational changes and of improved internal measurement and reporting
procedures. It also provides trending data, promotes improved Customer relations
and provides a vehicle for a SP to differentiate itself from its competitors.
The SLA Handbook makes use of the TM Forum’s enhanced Telecom Operations
Map (eTOM) [GB 912] business relationship model. The SLA management
aspects of this model are briefly reviewed in the following paragraphs.
Figure 2-2 presents an example of the roles and relationships involved in a value
network that is providing service products to the customer. Note that the figure
shows only those relationships that are relevant to the value network. It is very
likely, for example, that the Customer also has a relationship with Hardware,
Software, Solution, Vendors, etc. Since such a relationship is not part of the value
network, it is out of scope for the eTOM Business Relationship Context Model.
However, for those involved in supplying a service product to a Customer, the
relationships to the Hardware, Software, Solution, etc., Suppliers is relevant since
these relationships are involved in supplying a product to a Customer.
Figure 2-2 depicts the value network from the perspective of the customer-facing
service provider at the core of the value network. It explicitly includes the service
provider’s use of the eTOM Business Process Framework. Other parties may or
may not use the eTOM Business Process Framework. Note that the Handbook is
primarily focused on the SLA between the highlighted Service Provider (SP) and
the Government – Business Service Customer.
Figure 2-2 reflects the type of context that occurs in government and commercial
business environments where service relationships evolve constantly and where
these relationships need to be flexibly and rapidly reconfigured. The roles are
intended to represent the kind of parties that might be present in such an
environment. The model is generic and adaptable to many different contexts
without attempting to reflect all possible lower level details.
In any value chain comprising a number of trading partners the main challenge is
to establish an effective end to end set of processes that deliver to the end user
and customer a seamless service that is indistinguishable from the same service
provided by a single supplier.
Provider Customer
Third Party Service
Provider
CRM
Provider
S/PRM Third Party Service
Provider
Provider Customer
CRM
Customer Service Provider
S/PRM
CRM
S/PRM
Provider Customer
Third Party Service
Provider
Interactions
S/PRM
Business Benefits
The customer has a holistic view of the product based on a wide range of
interactions with the service provider. The combined perception of these
interactions across the whole product life cycle determines the customer's
perception of total service quality.
Service Provider
Request for
proposal
A consistent approach to SLAs and SLA management will provide the following
benefits to service Customers. It:
1) Helps Customers develop the baseline, establish the requirements,
customize an SLA contract, validate the ongoing performance compliance
to end-to-end service level objectives, and review and refine the SLA as
business needs evolve.
2) Helps Customers establish parameters, measurement methods, reports
and exception handling procedures.
3) Helps Customers define high-level common terms and definitions for end-
to-end service performance in the form of network technology independent
performance parameters and reports. These include items such as mean
time to provision, mean time to identify, repair and resolve malfunction,
service availability, end-to-end throughput, delays and errors.
4) Helps Customers evaluate the relationship between the technology-
specific service and network parameters and the technology/service-
independent parameters of the SLA. This includes considering how, within
each of the multiple SP administrative domains, these parameters capture
or approximate the performance perceived by the Customer.
5) Helps Customers validate the performance of the service as defined in the
SLA by receiving scheduled and on-exception reports. This includes a
SP’s responses to performance inquiries. These reports include
notifications of SLA violations, of any developing capacity problems, and of
changes in usage patterns. The reports enable the Customer to compare
the delivered performance as defined in the SLA to its perception of the
service’s performance.
A consistent approach to SLAs and SLA management will provide the following
benefits to Service Providers. It:
1) Helps introduce operational changes, improve internal measurements and
reporting, enrich Customer relations and differentiate the SP from its
competitors.
2) Helps create more knowledgeable Customers who can better express their
needs to the SP, reducing the time devoted to the negotiating process.
3) Helps create a common language and understanding with the Customer
on characterizing network and operational parameters.
4) Helps create SP internal recognition of the Customer’s perception of
network errors and service interruptions.
5) Helps prioritize service improvement opportunities.
6) Helps create common performance goals across multiple technology
domains.
7) Helps standardize performance-gathering practices across multiple internal
domains.
A consistent approach to SLAs and SLA management will provide the following
benefits to hardware and software suppliers. It:
1) Helps suppliers understand Customer and SP requirements for SLAs.
2) Helps equipment suppliers agree on the mapping of technology-specific
parameters and measurement methods into service-specific parameters
and measurement methods.
3) Helps software suppliers agree on common interface definitions for SLA
management.
An SLA should contain values for the Level of Service and the Quality of Service
parameters that the Service Provider commits to deliver. The parameters defined
in the SLA should be measurable. The methodology or process used for
quantifying a specific performance parameter should be clearly described. The
delivered performance should be periodically, e.g., annually, reviewed with the
Customer. Based on these reviews, the Customer, if specified in the SLA, may
have the right to terminate the service contract, receive compensation for missed
commitments, require the SP to initiate administrative corrective actions related to
the delivered service, or pay an incentive for exceeding committed performance.
There are many mature documents and tools that address Performance Reporting
for individual network elements and transport technology connections. However,
the Handbook series is the first open document to address end-to-end service
performance issues.
2
See Chapter 3 for a detailed discussion of service parameters.
It is important that Customers and Service Providers both work together to define
the common service performance metrics that can be measured and collected by
the network providers and made available to the Customers. A major problem is
that a single Service Provider often cannot control end-to-end performance
because multiple providers are involved in supplying the service. In order to verify
or monitor the delivered performance, it is important to identify measurable
network-based parameters and to implement an effective measurement scheme
that is capable of assessing SLA compliance.
for a service performance metric should have the property that it is repeatable; i.e.
if the methodology is used multiple times under identical conditions, it should result
in consistent measurements (though not necessarily the same values since a
metric may vary with respect to time). For a given set of performance metrics, a
number of distinct measurement methodologies may exist. Typical schemes
include direct measurement of a performance metric using injected test traffic,
calculation of a metric from lower-level measurements, and estimation of a
constituent metric from a set of more aggregated measurements. The
measurement methodologies should strive to minimize and quantify measurement
uncertainties and errors.
Anticipating Change
Since it is highly likely that a Customer’s service requirements will change during
the period of the service contract, some method for managing change should be
specified in the contract. An SLA provision should be made for liaison and dispute
resolution and for opportunities to resolve the service performance short-falls using
formal notification processes, responses, and verification. The service contract
should contain penalty clauses or administrative remedial actions to be taken for a
failure to meet the SLA performance commitments as well as a provision for
cancellation fees or compensation or incentive payments for exceeding
performance targets. The expected performance levels and SP-Customer
responsibilities must be clearly defined to avoid misunderstandings. However, a
service contract and its associated SLA should not be so thorough in specifying
the performance parameters that it unnecessarily complicates the contracting
process. It is very important that the Service Provider has the capability to detect
any degradation in service performance, alert the Customer, and respond to
performance events affecting the Customer.
3 Telecommunications Services
Service Characterization
Figure 3-1 illustrates the relationship between service functions and service
resources. Service functions are composed of three distinct types of elemental
functions, viz., Primary Functions, Enabling Functions, and Operations,
Administration, and Maintenance (OAM) Functions. As shown in Figure 3-1,
service resources enable service functions. The three fundamental types of service
resources are hardware/software resources, staff resources, and intellectual
property/licenses resources.
Figure 3-2 illustrates the Enterprise view of the relationships among Enterprise
Business Applications, Enterprise Business Services and Network Services. The
Enterprise View is described in detail in Volume 4 of the Handbook. Volume 4 also
contains additional information on deriving Network Service performance
requirements from Business Application requirements.
The Enterprise’s external network services are obtained from Information and
Communications Service Providers (ICSP). The Enterprise may also outsource
various Business Services and Business Applications. See Volume 4 for details.
Service Elements
Each SE may be realized from network or service resources at the disposal of the
Customer facing Service Provider (Service Provider 1 in Figure 3-3), or may be
provided using services obtained by Service Provider 1 from other Service
Providers (e.g. Service Provider 2 in Figure 3-3). In order to support the service
commitments given to the Customer in the SLA, corresponding commitments are
required from the Service Providers contributing SEs used to construct the service.
3
See Section 0 for a discussion of Service Availability.
4
See Section 0 for a discussion of SAPs and SAP Groups.
The “integrating” Service Provider (Service Provider 1) must then ensure that the
negotiated SE commitments are sufficient to support the commitments made to
the Customer for the overall service. This will require that the overall commitment
can be computed from the individual SE commitments, e.g., in the simplest case,
the commitments are of the same kind. It also requires that the values of the
service parameters assigned to the SEs are adequate to ensure that the SLA
commitments to the Customer are in fact achievable.
The Customer may use the interface for the control and management of its
communication services. Since many Customers conduct business in a multiple
service, multiple Service Provider environment, Customers require their Service
Providers to supply service information at the Customer contact point in a
common, understandable manner.
3.1.1 Definition
service and its associated service elements that are provided at the specified
SAPs.
The SAP is the concept used to distinguish the boundary between the Customer
domain and the Service Provider domain.
The Service Provider delivers a contracted service to the SAP(s). Each service is
associated with at least one SAP. A SAP may only be associated with one service.
The actual location of the SAP can depend on the ownership of the CPE. In the
case of a fully “managed service,” the CPE is owned and maintained by the
Service Provider. Here the SAP is on the Customer side of the CPE. For a leased
line service, the CPE may be owned and maintained by the Customer or by a third
party. In this situation, the SAP is effectively on the network side of the CPE. A
third case, such as interface between two networks is also possible. In this
instance the SAP, sometimes called a Network Access Point (NAP) or a Point Of
Presence (POP), is between two Service Providers where one party is effectively a
Service Provider for the other.
Figure 3-6 illustrates another service implementation. Note that in this example the
Service Provider’s responsibility may, where permitted by law, include all or parts
of the CPE, the access lines and the Provider’s backbone network/service
platform.
The ownership of the CPE is often a regulatory issue. In some countries, the
Service Provider is not permitted to own and operate electronic equipment on the
Customer premises. In other countries, where the Service Provider is a legal
monopoly, the CPE may be provided, owned and operated by the Service
Provider.
Performance Perspectives
The six service performance factors illustrated in Figure 3-7 are the main
contributors to the overall service performance perceived by a telecommunication
service user. These factors are high level concepts that are in turn characterized
by many parameters. See ITU-T Recommendations E.800 and E.801 for additional
discussion of these performance factors and their parameters.
The service performance factors are defined in the following paragraphs. See
Section 0 for a discussion of the relationship between performance factors and
Quality of Service (QoS).
Performance events are effectively instantaneous phenomena that occur within the
network supporting a service or within the service’s general environment that affect
the properties of the delivered service. Examples of events are Errored Seconds
(ESs), Severely Errored Seconds (SESs), Severely Errored Periods (SEPs), lost or
5
Time to first yield is defined as the time interval between initiating service and the first reportable service-impacting
event.
6
See Section 3.8 for the E.800 definition of QoS.
As shown in Figure 3-8 the values of the GoS parameters are allocated to network
components in the Grade of Service Tasks. These values are then used in the
Traffic Control and Dimensioning Tasks to determine the required capacity of the
various network components / network elements. Typical GoS parameters include
parameters such as calling rate, call set-up delay, answer-signal delay, and
internal and external blocking.
Two classes of GoS objectives are commonly used. The first class contains the
accessibility objectives, i.e., the call and connection objectives, In this case GoS
objectives are governed mainly by the End-to-end Connection Blocking Probability
(ECBP). The second class contains the retainability and integrity objectives, i.e.,
the information transfer objectives. For ATM, these are the maximum queuing
delay (defined as a remote quantile of the cell delay distribution) and the mean
queuing delay, both based on Cell Transfer Delay (CTD) and the Cell Delay
Variation (CDV), and the Cell Loss Ratio (CLR).
The move towards service focused management leads to a requirement for a new
'breed' of indicators that capture service quality rather than network performance.
These new indicators or Key Quality Indicators (KQIs) provide a measurement of a
specific aspect of the performance of the product, product components e.g.
services, or service elements and draw their data from a number of sources
including the KPIs.
Two main types of KQI need to be considered. At the highest level, a KQI or
group of KQIs are required to monitor the quality of the product offered to the end-
user. These KQIs will often form part of the contractual SLA between the provider
and the customer. Product KQIs will derive some of their data from the lower level
Service KQIs, with the latter focused on monitoring the performance of individual
product components. In its simplest form a KQI may have one single KPI as its
data source. More commonly a Service KQI will aggregate multiple KPIs to
calculate the service element quality and Product KQIs will aggregate multiple
Service KQIs. Figure 3-9 Key Indicator Hierarchy
Customer SLAs
Product Additional
KQIs Data
Customer Focus
Internal & Supplier/Partner SLAs Product Components
Service Additional
KQIs Data
Network Focus
Service Elements
Additional
KPIs Data
Service Resources
(Grouped by Type)
Performance
Related Data
Service Resource
Instances e.g.
Network Element
To summarise, starting from the network elements, the network performance data
is aggregated to provide KPIs that are an indication of service resource
performance. The KPIs are then used to produce the Service KQIs that are the
key indicators of the service element performance. Service KQIs are then used as
the primary input for management of internal or supplier / partner SLAs that
calculate actual service delivery quality against design targets or in the case of
supplier / partner, contractual agreements. Service KQIs provide the main source
of data for the Product KQIs that are required to manage product quality and
support the contractual SLAs with the customer.
Figure 3-9 illustrates the mapping of the key indicators onto the service
decomposition described in the TM Forum SLA Handbook.
This section describes how the indicators in the measurement hierarchy are
derived and how they are used to manage the various SLAs that the operator
needs to manage.
Figure 3-10 shows, at a high level, the steps that are necessary to identify the
correct quality indicators to support the products and services. It also illustrates
how they relate to the Customer, Product, Supplier / Partner and Internal SLAs.
The diagram uses solid arrows to show the definition of the key indicators and SLA
templates and uses dashed arrows to show the flow of the resultant generated
indicators.
Individual
Internal
Customer
SLAs
SLA
Service
SLA
Templates
Supplier /
Partner
Product SLAs
Internal
SLA
Product
Product Service
SLA KPI Measurable
KQI KQI
Templates
The process start with the ‘birth’ of the product concept and flows through the
design of the commercial product, the decomposition of that product into the
individual services and down to the service resources themselves. At each stage
of the total process, the key quality indicators necessary to manage quality at that
level and the template for the appropriate SLA are defined in a hierarchical
manner. The methodology ensures that ultimately all of the measurements
necessary for the management of the various SLAs are defined at the service
resource level.
Quality Of Service
Factors that are independent from network performance include, for example, the
time to provide a service, and the response times and competence of the customer
service-centre to the network performance.
Network dependent factors for mobile service may include Core Network
performance, Radio Access performance, Call Detail Records, Transmission
system performance data, Value Added Systems performance data, Network
Probes e.g. signaling data, Drive Survey information, Planned Works information,
Service Fault Reports, Customer Reports.
In order to see the limitations of the E.800 definition, the next two sections discuss
user perception and customer satisfaction.
At the heart of a Service Provider’s success is rapid response to the service needs
of the customer. As the market develops the key objectives are ‘more for less’ --
faster service introduction, improved quality of service at a lower cost.
The factors that influence this perception will include at least the following:
1) Pre-sale material; the initial expectation of the service on which the initial
decision to purchase or participate was made
2) Billing accuracy
3) Security of data and transactions; m-commerce and privacy considerations
4) Price sensitivity; perceived value
5) Network quality; this is increasingly seen as 'a given', but requires
significant engineering to achieve.
6) Experience of the customer supplier relationship; all stages of the
customer lifecycle through provisioning, problem handling, and where
players in the supply chain may be 3rd parties with complex inter-
relationships.
There are several conceptual difficulties in assessing user perception of service
performance. Measurements made by a SP of network level performance may not
capture the user perception of the performance of a service. Additionally, with user
perception there is the question of whether a user’s perception of QoS is relative or
absolute. For example, is the user’s perception of a data service over a WAN the
same as for a data service over a LAN, of a voice service over an IP-based
network the same as over a PSTN, or of a video service over a
telecommunications network the same as for broadcast TV?
For voice services, characterizing user perception may involve specifying and
measuring performance both objectively using measuring devices, and
subjectively using test subjects. Objective measurement often involves In-service
Non-intrusive Measuring Devices (INMDs) that measure call clarity, noise and
echo. Analysis and interpretation of the results obtained is also specified.
Subjective methods often use standardized signal generators and human listeners
from whom Mean Opinion Scores (MOS) are derived.
However, to date the main service parameters used in SLAs for voice services are
the accessibility of the service, i.e., hours of operation and the availability of dial
tone during this period and the accuracy of billing. There have been no guarantees
as such on call completion rates or noise levels, echo, delay, and distortion.
However, networks have traditionally been engineered to deliver acceptable
performance under a broad range of offered traffic and network states. By keeping
toll circuit delay below 100 milliseconds, using echo cancellers, and maintaining
low error rates on digital connections, voice telephony performance is normally not
an issue. Another network metric frequently used is the measure of Customer
Affecting Incidents in terms of Blocking Defects Per Million Attempts.
For data services, new users often do not understand error conditions, and
perfection is expected. Higher level protocols have made LANs appear “flawless,”
and modern physical layer network transport makes this almost true. Experienced
users want WANs to behave like LANs. User expectations on response times to
query/request type messages escalate as technology improves.
Other real-time services such as video (MPEG and broadcast TV), sound program
and CD music are much more demanding than voice services. For IP-based
networks, a given level of IP transport performance results in a level of user-
perceived audio/video performance that depends in part on the effectiveness of
the methods used to overcome transport problems. Bit errors on a packet-based
network generally are either corrected at a lower functional layer, or result in
packet loss. Packet loss requires the receiving application to compensate for lost
packets in a fashion that conceals errors to the maximum possible extent. For data
and control, retransmission at the transport layer is used. For audio and video,
retransmission approaches have not been used.
The devices attached to the network and people such as Customer Care
representatives involved in delivering the service have a big impact on user
perceptions. Increasingly, high levels of performance at acceptable cost will be the
differentiating factor between SPs in a competitive service provision environment.
With respect to estimating customer satisfaction, the well known “Pareto 80/20”
rule often applies, i.e., 80% of the problems can be identified from 20% of the
measures. For example, 80% of Customer complaints might refer to a small
number of parameters such as unexpectedly long delays for new installations,
excessive service repair and restoration times, poor support information and
service, high congestion in the busy hour, etc. One challenge is to identify
“Customer-sensitive measures” and particularly the values that satisfy the
Customer, which may vary from one target group (market segment) to another.
The usual method of measuring Customer satisfaction is using a survey, the value
of which is naturally subject to the type of questions asked, and the quality of the
data obtained from them. A full survey covers many aspects of the
telecommunication business such as Pre-sales, Post-sales, Development,
QoS/NP, Fraud and Security, etc. In constructing and managing SLAs, some
means of measuring Customer satisfaction should be implemented. Cultural
issues and human factors need to be taken into account. The Customer
satisfaction/perception with/of a given service or selected parameters can be
assessed, for example, by the Mean Opinion Score (MOS) rating method in terms
of bad 1, poor 2, fair 3, good 4, and excellent 5.
Figure 3-11 is a class diagram that shows the four principal uses of QoS concepts.
The following sections will discuss these areas and provide the background and
justification for the definition of QoS used in the Handbook.
Quality of Service
(QoS)
Type Of
Figure 3-12 illustrates the inter relationships between the various viewpoints of
QoS.
Network
Network related
Performance
criteria
Objectives
QoS QoS
Requirements Offered
Non-Network related
criteria
Network
QoS QoS
Performance
Perceived Achieved
Measurements
Shows feedback
The QoS requirements used for Traffic Engineering applications are high level
specifications that represent composite satisfaction levels for a broad set of
services and users. These requirements are established by a SP based on a
combination of its business objectives, expected service demand, current state of
its network, its financial condition, etc. This set of QoS requirements is not within
the scope of the Handbook.
The E.800 QoS definition quoted in Section 0 does not provide guidance for
specifying quantifiable QoS targets. To address this need, ITU-T Recommendation
E.860 has modified the E.800 definition as follows:
The SLA Handbook uses the E.860 QoS definition. See Section 3.1.15 for a
discussion of how this concept of QoS is used for performance specifications in
SLAs.
The delivered QoS may or may not meet the commitments contained in the SLA.
Typically, the SLA will specify actions to be taken in these cases. Possible actions
include administrative changes by the SP to improve the service, payment of
penalties for not meeting quality objectives, or the payment of incentives for
exceeding quality objectives.
Figure 3-13 describes the performance aspects of a generic service. The white
boxes in the figure represent abstract classes or service templates whereas the
shaded boxes represent specific instances of the service. The white boxes can be
thought of as containing lists of parameter names, whereas the shaded boxes
contain these same lists but also associates a value with each of the items in the
lists.
Qualitative factors are the non-measurable aspects of a service. Examples are the
manners of service installers, the style used in correspondence with the Customer,
sensitivity to a Customer’s personality, etc. Although possibly important to
Customer retention, such factors are not elements of SLAs.
The LoS specification is the set of parameters that specify how the service will
function. Examples include payload rate, error rate, delay, etc.
The QoS specification is a set of parameters that specify how much variation in the
LoS parameters the Customer could experience. That is, the QoS parameters
provide bounds for the delivered LoS parameters. If the delivered LoS lies within
the QoS bounds, the Service Provider has met its SLA quality commitments. If the
delivered LoS falls outside of the QoS bounds, an SLA violation has occurred.
The delivered QoS values provide a measure of how well the delivered service
matches the contracted service. As discussed in Section 0, service performance is
characterised by the combined aspects of service support, operability,
accessibility, retainability, integrity, and security performance. QoS and LoS
objectives may be set for each of these service performance components.
Figure 3-14 provides a conceptual view of the main factors contributing to service
performance. Each of the applicable factors associated with a given service would
be specified in an SLA. The parameters chosen as contributors to the QoS factors
may be service specific, technology specific, or service and technology
independent parameters. The parameters selected are those that are fundamental
to the service and that affect the Customer’s experiences.
Service Availability
Feedback from the above mentioned interview process as well as TMF members’
experience shows that “Service Availability” is the key parameter of interest to
Customers. Although standard industry definitions exist for network and network
element availability, cf., Figure 3-7 and Sections 3.1.6 and 3.1.7, service availability
has no generally agreed technical definition. This leads to misunderstandings and
Customer dissatisfaction. For example, if the Customer believes that "availability"
means their application is running without a problem, and the Service Provider
uses the same term to mean that the service is working (even if impaired), then a
mismatch in expectations is inevitable. Similarly, if a Service Provider contracts
with other providers for components of a service, the lack of common performance
terms makes it virtually impossible to construct a picture of end-to-end
performance.
and less well known term serveability. The Handbook series defines service
availability as E.800 serveability.
Figure 3-15 shows the relationship between service availability and the service
performance factors described in Figure 3-7 and in Section 3.1.6. The relative
importance of Accessibility Performance, Retainability Performance, and Integrity
Performance to Service Availability must be determined on a case-by-case basis.
Misuse Operability
Reliability Dependability
Σ Outage Interval
SUA% = x 100%
Activity Time
In order to address this issue, a Service Degradation Factor (SDF) can be used in
the SUA calculation as follows:
A list of SDF values with the corresponding event type can be defined in the SLA.
This procedure characterises the service in the SLA and can be stated according
to the Customer’s business need. A possible list of SDFs is shown in Table 3-3.
Table 3-4 presents the relationship between Billing, Availability, and Degraded
Performance. SDF enables the definition and orderly application of measures for
degraded performance.
Unacceptable Performance
No Billing Unavailable
No Performance
Each SAP may be assigned a weight that indicates its importance to the
Customer. A service may be provided at many SAPs, not all of which are weighted
equally. For example, a service consisting of a server site connected to multiple
client sites is available at a number of SAPs i.e., one for the server and one for
each client. In this example, the weighting of the server SAP may be higher than
all of its client SAPs, indicating that a problem with the server SAP has a more
significant impact on the service than a problem at any of the client SAPs.
If the Customer’s business need requires a weighting of SAPs, the SUA% formula
can be extended with a SAP weighting factor as follows.
In those cases where the SLA is based on SAP Cover Time rather than full time
(SAP Activity Time), the following SUA% formula may be used.
Table 3-5 provides the essential SAP-related attributes and parameters that may
be used in SLAs.
Reporting The Reporting Period represents the period over Derived from
Period which Performance Reports are generated. It is Ordering or an
defined independently for each SAP Group SLA template
within the SLA.
SAP Activity A SAP Activity Interval (SAI) represents the Derived from
Interval duration of a specific active period (i.e. when the Ordering or an
Customer requires service from the SAP) within SLA template
a specified Reporting Period. It must be
measured separately for each active period of
every SAP within a SAP Group.
SAP Activity SAP Activity Time (SAT) represents the total Derived from
Time duration (sum of) of all SAP Activity Intervals of a Ordering or an
specific SAP within a defined Reporting Period. SLA template
SAP Cover SAP Cover Time (SCT) represents the interval Derived from
Time for which a Service Provider is responsible for Ordering or an
the agreed level of service at the SAP. SLA template
SAP Start SAP Start Time (SST) represents a SAP Activity Derived from
Time Interval starting time. Ordering or an
SLA template
SAP End SAP End Time (SET) represents the time at the Derived from
Time end of a SAP Activity Interval. Ordering or an
SLA template
SAP Outage SAP Outage Interval (SOI) represents the Derived from
Interval duration of a specific outage period within a Service
defined Reporting Period. An outage period is a Problem
period of service unavailability (e.g. due to a Resolution
fault) which occurs during a SAP Activity Interval
for a given SAP. Note that outages due to
causes excluded by the associated SLA (e.g.
force majeure) are not included in this interval.
An SOI is measured for each outage period for
every SAP within a SAP Group.
SAP Outage SAP Outage Time (SOT) represents the duration Derived from
Time of (sum of) all SAP Outage Intervals for a specific Service
SAP within a defined Reporting Period. Problem
Resolution
(exclusion
conditions from
an SLA
template)
SAP Weight SAP Weight (SW) is a number that represents Derived from
the relative priority attached to a SAP and Ordering or an
influences the relative significance placed on the SLA template
SAP by the Service Provider (and eventually the
Customer). It is defined independently for each
SAP within an SLA.
Fault Report Fault Report Timestamp (FRT) represents the Derived from
Timestamp time recorded by a service management agent Service
when a fault is reported to it by an external actor Problem
(e.g. Customer, Management System, etc.). Resolution
Fault Fixed Fault Fixed Timestamp (FFT) represents the Derived from
Timestamp time recorded by a service management agent Service
when a fault is reported as resolved. Problem
Resolution
One of the main difficulties with the Service Availability formula is identifying the
appropriate information sources for all of the required data elements. These data
elements are:
1) SAP Outage Interval,
2) SAP Activity Time,
3) SAP Cover time,
4) SAP Weighting,
5) SDF.
While SAP weights and SDFs can be found in an SLA, event related information
such as Outage Intervals and Activity Times must be obtained from Network
Element Managers (NEM) and/or Trouble Ticketing Systems (TTS). A Service
Provider will need both types of systems to manage the services provided to its
Customers. Usually these two systems operate independently. Therefore it is likely
that the Service Availability and other performance metrics computed from data in
one system will be different when data from the other system is used. This is a
potential source of confusion unless consistent Customer reporting schemes are
established.
When possible, it is useful to map services onto the ISO/OSI seven layer protocol
model. One example is a Service Provider to Network Provider SLA, where the
Service Provider buys transmission facilities and capacity (trunks) to build its own
network. Another example is a Customer to Service Provider SLA, where the
Customer out-sources its inter-company communication needs to the Service
Provider. In the first example, the service maps to the OSI layer 1 and in the
second example, to the application level, i.e., layer 7.
When dealing with lower layer services as defined in the ISO/OSI layer model, the
underlying technology determines the choice of the data source for the Service
Availability calculation. In this case, the specific Network Management System and
the parameters to use depend on the service, i.e., PL-LL, ATM, Frame Relay etc.
In the case of upper layer services it is not possible to refer to a specific delivery
technology, since the Customer is not interested in how the Service Provider
delivers the service, as long it is available. As a guiding principle, it can be
assumed that the final goal of Performance Reporting is to state the provided level
of service as close as possible to the Customer’s perception.
Service Availability values (SA%) are based on business requirements and should
be specified in the SLA. Factors that influence the SA% value include:
1) Type of service (e.g. PL-LL, PVC, SVC),
2) Specific protocols used (e.g. X.25, Frame Relay, ATM, etc.),
3) Network/service configuration (e.g. single point of failure vs. redundancy,
reliable network components, etc.),
4) Service Provider’s policy.
The following are some specific SA parameters that may be included in SLAs and
factors to consider when specifying these parameters.
1) Specification of a service applicability or service cover time, e.g., 08:00 -
18:00 Monday to Friday inclusive; 24 hours Monday to Sunday, should be
included in the SLA.
2) End-to-end Service Availability measurements/calculations should not be a
summarization of network element performance measures. To reduce the
complexity and to shield the single network element performance values,
Performance Reporting can use Trouble Ticket information to determine
and calculate the end-to-end Service Availability.
3) Service Availability measurements and calculations should also reflect
degraded service / degraded performance. Service degradation and
service availability are two related but in some cases separate matters. A
Service Provider may establish criteria and performance credits for service
degradation. Service degradation criteria relate to the service offered to the
Customer.
4) Where applicable, bandwidth availability should be specified in an SLA.
When specified, it should be measured and reported.
5) SAP availability should be calculated on a SAP by SAP basis to assist
reporting granularity. If this is not done, it is possible, in a large population
and/or over a long reporting period, to have a major failure at a single SAP
but report the service as exceeding performance requirements. For
example, global indicators should be reported, but individual failures should
also be noted. SLA commitments at a individual SAP must be explicitly set.
These values are statistically lower than Overall Service Availability.
6) Service Availability calculations must account for the fact that a Customer
may add and/or delete SAPs during a reporting period. One of the
following options can be used in these cases:
a) The average number of SAPs or a cut off date for determining the
number of SAPs may be used in the calculation. In ether case, the
mechanism must be agreeable to both parties and noted in an
attachment to the appropriate reports.
b) The service availability formula can take into account the time in which
every single SAP has been active within the reporting period.
The need for international standardization of the priority use of services during
emergencies has been recognized in ITU-T Recommendation E.106, Description
Of An International Emergency Preference Scheme (IEPS). E.106 focuses on
voice band services. More recently, Recommendation Y.12717, Framework(s) On
Network Requirements And Capabilities To Support Emergency Communications
Over Evolving Circuit-Switched And Packet-Switched Networks, presents an
overview of the basic requirements, features, and concepts for emergency
communications that evolving networks are capable of providing.
7
Y.1271 is expected to be approved at the February 2004 Meeting of ITU-T Study Group 13.
Priority service treatment requires that ETS users be identifiable, e.g., by the use of
end user identification and authentication methods or by using specified access
mechanisms. The service provider may support this service by employing
signaling and control protocols that identify the invocation of ETS, by exempting
the ETS from restrictive management controls and applying expansive
management controls, or by invoking a special emergency operating mode.
Table 3-6 contains a high level description of ETS Functional Requirements. See
[Y.1271] and [TR-79] for additional information on a generic description of ETS
Functional Requirements.
ETS as defined in the previous sections uses the term service in a different sense
than that used in the Handbook. In Handbook terms, ETS is a collection of
performance parameters that can be associated with specific telecommunications
services. For example, GETS is a priority PSTN voice band service. Applying the
ETS requirements to a video service would be an example of a priority video
service.
Table 3-7 illustrates a mapping of the ETS requirements into the six service
performance factors defined in Section 3.1.6.
1. Priority Treatment X X X
2. Secure X
3. Location X
4. Restorability X X
5. Network X X X
Connectivity
6. Interoperable X
7. Mobile X X X
8. Ubiquitous X
9. Survivable X X X
10. Voice X X X
11. Scaleable X
12. Reliability X X
This section addresses a number of service performance topics that are of high
interest to many service customers.
Time to Restore Service (TTRS) represents the time interval between the Fault
Report Time (FRT) and Service Restoration Time (SRT), i.e., the actual time
interval required to restore the service. Some Service Providers use the time when
the fault occurred as opposed to when it’s reported to compute the TTRS. Other
Service Providers use the time a trouble ticket opened but require that the fault
condition be confirmed before opening the trouble ticket.
ITU-T Recommendation E.800 defines Time to Restore as the time interval during
which an item is in a down state due to a failure. A time diagram in Figure 3-16
illustrates this definition. Note that “failure” is not the same as “fault.” For example,
a service could be said to have failed due to agreed maintenance actions such as
It should be noted that there is a difference between Time to Repair (TTR) and
Time to Restore Service (TTRS). TTRS is only completed when the repair is
verified. The relationships are found in Figure 3-16 for the case of manual service
restoration. Other cases of services protected by other means may result in
different relationships.
The layered service concept illustrated in Figure 3-3 can be used to define Service
Provisioning as the implementation of different Service Elements, within or external
to the Service Provider’s domain, in order to offer a certain service at the Service
Provider to Customer interface, i.e., the SAP or SAP group.
Another useful concept is the SAP life cycle, or more precisely the SAP Activity
Interval (SAI). This time interval begins when a defined service is accepted, i.e.,
considered ready for use, and available to the Customer even if service is not
guaranteed within a specified Service Cover Time. The beginning of the SAI,
which may be defined as SAP Start Time (SST), is indirectly used in determination
of outages and Service Availability. The ordering process creates this information.
The sequence diagram for this process is shown in Figure 3-17.
When a service order is confirmed, i.e., the Order Confirmation Time (OCT), both
the Customer and the Service Provider agree on service qualities, quantities and
configuration. The Service Provider must commit to provide the defined service by
a specified time, the Committed Time to Provide Service (CTPS).
Typically a Service Provider maintains a list of the CTPS for the various service
classes it offers. The CTPS is thus based on the:
1) Service Type, e.g., data, voice, application etc.,
2) Implementation Type, e.g., change request, configuration management
etc.,
3) Location Type, e.g., domestic, international, urban, rural, etc.
A CTPS may also be negotiated and defined on a SAP-by-SAP basis depending
on the Customer’s needs and the Service Provider’s resources and processes.
The Ordering Process initiates the service implementation phases. This includes
Service and Network Element Configuration, Resource Activation, Network
Element Internal Tests, a Bringing-Into-Service (BIS) Test, and a Customer
Acceptance Test. The latter test is considered to be the end of the Ordering
Process. The SAP Start Time (SST) or Ready For Service (RFS) time is
considered to be the completion time of a successful Customer Acceptance Test.
In many Service Provider organizations, this represents the time when the
∆ T is defined as the difference between CTPS and SST for each SAP. See
Figure 3-18 for an illustration of this definition.
Obviously the target value for ∆ T is 0. If ∆ T is greater than zero, then this
information must be provided to the billing process to determine if rebates or other
actions are applicable. (See the following paragraphs for exceptions.) If ∆ T is
negative, the service fee is charged only if the Customer agrees to early service
activation.
There are supporting network technology-specific issues that affect the delay
experienced at the service level. For example, ATM transport will affect the delay
experienced at the service level especially if the contracted service is a pure ATM
cell delivery service. In IP services it is known that delay may become a critical
factor for real-time services such as voice or video and for command/response
type services.
Delay has an impact on the perceived QoS at the application layer versus
delivered QoS at the network layer. Response Delay could be seen as Round Trip
Delay (RTD) from SAP to SAP excluding the application response function. This is
a technology-dependent parameter. For example, one could ping a network
access router to obtain just the Network RTD or could ping a host within a private
network and obtain information that includes congestion performance of the private
network and of applications running on the host. Measuring true end-to-end delay
at the application level is rather difficult and depends on the service type, the
supporting network technology and the desired measurement resolution and
accuracy. It may require synchronization of the two measuring devices at both
ends of the service connection or flow.
An open user group describes a service that is available to all users at all network
SAPs and supports a unidirectional or bidirectional network connection or data flow
between an arbitrary set of distinct users or SAPs. Examples of this type of access
are the public switched telephone network (PSTN) and the public Internet.
A closed access group and open destination group describes a service that can be
initiated by a predefined set of users or at a predefined set of SAPs and can
support a unidirectional or bidirectional network connection or data flow to an
arbitrary set of network users or SAPs. An example of this type of access group is
the Emergency Telecommunications Services described in Section 3.10.
An open access and closed destination group describes a service that can be
initiated by any service user at any SAP and can support a unidirectional or
bidirectional network connection or data flow to an arbitrary set of network users or
SAPs. Examples of this type of access are free phone services such as
information, 411 services, or emergency service, e.g., 911, 999 calls.
3.1.32 Throughput
There is a general agreement in the industry that throughput is related to delay but
not exclusively dependent on it. From a message point of view, throughput relates
to frames, protocol data units (PDU) etc. per unit time. FR and ATM services
specify a committed information rate (CIR). This parameter is a measure of the
efficiency with which the available bandwidth is used. CIR in FR is essentially the
same as basic bandwidth in a digital leased line.
3.1.33 Errors
For detailed definition and discussion of error performance, please refer to the SLA
relevant ITU-T Recommendations.
Content Recommendations
The recommendations in this section are concerned with the SLA negotiation and
engineering processes that take place during the fulfillment phase of the eTOM
business process model
The requirements in this section are concerned with the assurance processes after
the service has been provisioned and is being delivered to the Customer. They
affect mainly the monitoring of service quality levels and the reporting of
information to the Customer as specified in the SLA.
the case of failure, Customers should be notified of the approximate time that the
service will be resumed and should be kept informed about the progress of
problem resolution.
Recommendation 12: The SP should have soft thresholds for every parameter to
provide an early warning of impending trouble. To the degree specified in the SLA,
Customers should be kept informed of any service degradation that could lead to
possible SLA violations.
These requirements are concerned with the interface between the Customer and
the SP and with how the SP should respond to Customer inquiries concerning a
service and its SLA.
The following recommendations are not easily associated with the eTOM process
model but are provided for completeness.
When considering the terms Product and Service it is mportant to consider how
the distinction is applied within an operator’s business. Figure 4-2 and the
following text identify the different types of SLA that may exist within an
organization.
Customer
Product SLA
Network
While individual services may be the focus for internal management purposes, in
most cases it is a combination of these services that are sold to the customer.
Figure 4-3 depicts a hypothetical product offering to the end customer. From this it
can be seen that services form components of products and an individual service
may form a component of one or many products. The way in which these
components are packaged with pricing and terms and conditions of the contractual
agreement form the basis of the offer, or proposition, to the customer. The scope
and terms of the offer are determined by marketing and it is this complete package
that the customer orders and on which the expectation of service is set. Service
level agreements are related to the offer.
Mobile IP IT
Access Transport Network
Network SLA SLA SLA
RADIUS
Firewall
DNS DHCP DNS DNS RADIUS
GPRS/UMTS ISP IT
IP Transport
TE Mobile Access Network Network
Domain
Domain Domain
It is also evident that products are customer based whereas services are more
network based although there is another level of granularity, the service
component that may be used by the operation's function for ring fencing roles and
responsibilities. For example an email service requires several service
components such as the email servers, RAN, GPRS core, ATM etc. Some of
these may be managed independently of each other within the operation’s
environment and may have internal SLAs (SLOs) against these components.
There may also be a service manager type role that will manage the end-to-end
service against an internal SLA that is effectively an aggregation of the service
components.
There are three distinct areas associated with the application of Service Level
Agreements (SLAs) within the operator’s business. The most obvious, and in
many cases the area that is currently under focus, is that of internal SLAs. Internal
SLAs are focused on managing components of the service delivery chain and are
aggregated to form the end-to-end service measurables. By their nature these
SLAs are often not truly customer focused in as much as they are not couched in
terms that understandable by the end customer. The key users of these SLAs are
the functions responsible for managing the service components and for managing
those components against agreed quality objectives. Other parts of the operator’s
business rely on internal SLAs to drive improved efficiencies and for understanding
the service delivery performance within the functions of the business. It should
also be noted that service level management is not restricted to network based
service components but must also be applied to non-network services e.g. billing
availability and accuracy.
As operators become more and more reliant on 3rd parties for the delivery of
services and content, the need to implement SLAs against those suppliers /
partners grows in importance. Indeed the implementation of SLAs in this area will
almost certainly improve the quality of content delivery. It will also enable the
operator to share the financial risk of service degradation with their 3rd parties.
These SLAs are similar in nature to the internal SLA discussed above and form
part of the end-to-end SLA but may also have an impact on the accounting
processes for paying for the content services.
The final area that needs to be considered is the end customer or external SLA. In
considering this area, the nature of the products sold to the customer needs to be
taken into account. While internally the focus may be on individual services, the
offering to the customer tends to be a product that consists of a number of services
which may be either network or non-network based.
Network
• Voice
• SMS
• Voicemail
• WAP
• Handset
• Customer Care
• Itemised Billing
• Per Second Billing
• Handset Replacement
• Insurance Non-Network
Figure 4-4 illustrates the range of product components which could make up a
product offering. As can be seen some of these components are network derived,
however there are a number of the product components that have little or no
dependence on network resources. The SLA with the customer is likely to
encompass all aspects of the product including the non-network aspects. The
external SLA will consist of a wide ranging set of parameters worded in terms that
are understood by the customer. Deriving this set of SLA criteria requires a clear
understanding of what is important to the end user and is therefore best achieved
by discussion and negotiation with the users.
Much as the SLA management processes are an integral part of a SP’s total
service management solution, so too is the actual SLA an integral part of the
service provider’s product offerings - depicting exact details of the service, the
quality expectations, and the delivery process. This section introduces the major
components and relationships associated with an SLA. The goal of Section 0 is to:
1) Help SPs and Customers identify the components comprising SLAs and
the role of these components within a SP service offering.
2) Identify different aspects of SLAs at each link within the supply chain.
3) Help SPs find a mapping between SLAs and QoS parameters.
4) Provide inputs for evaluating impacts and requirements in different
operational process areas when a new service offering and its accociated
SLA are designed.
SLAs may be defined between an SP’s internal administrative units. This includes
allocating performance objectives to the internal administrative units for their part of
the network connection. The approach used to model SLAs with external
Customers can be applied to these intra-company administrative units.
Section 0 does not contain a prescription for creating contracts and SLAs. Rather,
it presents a conceptual model for analyzing the different roles, types of SPs and
Customers, and types of services. This model supports and structures the complex
task of mapping a Customer’s quality expectations into the SP’s terms.
package that contains a collection of commercial offers, e.g., an xDSL access with
email and web hosting. The SP may package different commercial offers to meet
different Customer requirements. This is illustrated in Figure 4-5.
Service resources are the base level building blocks for composing the basic
service elements and are usually not visible to the Customer. The service
resources are the key elements over which the SP has control to manage the
levels of service offered and to measure the levels of service achieved. A service
decomposition example is described in Figure 4-9.
The service contract9 lies at the heart of the relationship between the Customer
and the Service Provider. The role of a service contract is illustrated in Figure 4-6
8
The level of Customer visibility will depend on the SP’s policy and on the “nature” of the service.
9
The actual relationship between the service contract and the SLA can be SP-specific; for instance, the SLA can be
considered part of the contract, or the SLA can be considered to be the contract.
The role of an SLA template is to capture the set of service objectives, the actions
to be taken for exceeding or for not meeting the specified objectives, and any
conditions under which the SLA does not apply. The composition of an SLA
template and its relation to an SLA is shown in Figure 4-7.
Once a Customer enters into negotiations to purchase a service, the SLA template
is used as the blueprint for the SLA. Depending on the type of service offered,
there may be extensive negotiations of the SLA terms between the Customer and
the SP. The SLA Template provides a baseline for these negotiations.
SLA templates effectively form part of the SP’s product catalogue. For the
Customer, the choice of service levels may be explicit in the service order, or may
be implicit in the commercial bundle description, e.g., a VIP bundle which
combines a set of gold level services.
Customers may order multiple service instances, e.g., five ATM PVCs. In these
cases, the SLA template may contain both parameters for each service instance
and for the group as a whole. For example, an SLA template related to a service
offering which includes a PDH access to a backbone network with a set of PVCs
crossing that network, could include parameters with threshold values set for each
PVC (e.g. Cell Loss Ratio on a single PVC in one direction) and also general
parameters calculated on the group of PVCs (e.g. Maximum Cell Loss Ratio for all
the PVCs).
One of the key aspects of an SLA is the association of the required levels of
service to the specific details of a Service Instance. As defined in Chapter 3, the
Service Access Point (SAP) represents a logical point located between the
Customer and the SP domains. In the case of services supplied by different SPs,
the SAP characterizes a point on the boundary between the SP domains.
The relationship between the SLA and the SAPs for the service instance provides
the mapping between the service level objectives and the physical service
resources that comprise the service. This gives the context in which to monitor the
delivered service performance in order to determine if the service level objectives
defined in the SLA are being met. Figure 4-8 illustrates these relationships and
specifies the role of Measurement Access Points (MAP). The activity diagram
shown in the figure describes how the measured data is processed. Note that the
MAPs are not required to be coincident with the SAPs and that the data used for
SLA validation may be derived from the MAP data.
l
cia
er Residential Basic SOHO Basic SOHO PRO
m
m Internet Internet Internet
Co fers
Of
s
A late Basic Premium Residential SOHO SOHO PRO Web Hosting Web Hosting
SL mp Email Email Access Access Access Silver Gold
Te
ice es
e rv urc Email Authentication DHCP Access Network Web
S so Server Server Server Device Elements Server
Re
End-to-end Service
Customer
Service 1 Internet
Service
Provider
Service 2
Service 3
Access Service 6
Provider Service 5
Network
Network Service
Service 4 Service Provider
Provider
End-to-end Service
SLAa
SLA1
SLAb Service SLA2 Service
Provider 2 Provider 1
SLAc SLA3
Customers
End-to-end Service
SLAa
SLAc
Customers
The second type of SLA is illustrated in Figure 4-12. This type may be required
when Service Provider 2 cannot provide a service to Service Provider 1 at the
same level of granularity as Service Provider 1 provides to the end Customer. This
case can occur when a defined capacity within Service Provider 2’s backbone
network is reserved for Service Provider 1’s Customers. Given these decisions
that were made before the customers requirements were known, there may not be
a direct relationship between SLAs a, b, and c and SLA 4. In particular, it may not
be possible to specify performance parameters in SLA 4 that are based on the
parameters related to a single customer service. It may however, be possible to
use statistical indicators of the entire bundle provided by Service Provider 2 to
Service Provider 1.
This chapter provides three tools that are useful for structuring the complex
process of SLA preparation and management. These tools are the six-stage
Service Life Cycle, the KQI Development Methodology, and the SLA Parameter
Framework.
The development of an SLA should span the complete life cycle of a service. The
six stage Life Cycle is described in detail in Section 0. When the Customer-
Provider interactions address each of these stages, the resultant expectations will
be better aligned and the relationship enhanced.
The KQI development Methodology identifies the customer based metrics that
capture the customer’s perception of quality of service. A top down approach with
regard to service offered is combined with a bottom up approach to assess
transactions necessary to delivery the service. This leads to a framework which
facilitates the identification of the mapping between network and service data from
which network and non-network data may be consolidated and aggregated into the
key quality indicators for the service.
There are many performance parameters in common use that have similar names
yet have drastically different definitions. The SLA Parameter Framework organizes
performance parameters into six categories based upon service and delivery
technology and upon performance measures for an individual service instance and
for averages across all service instances. The SLA Parameter Framework is
described in Section 0. Note that the specific values for service performance
parameters are arrived at through the contract negotiation process and are beyond
the scope of the Handbook.
SLA management requires interaction between many of the eTOM processes [GB
921]. In order to analyze these interactions more thoroughly, various lifecycle
stages or phases must be considered separately. The life cycle of an SLA is
composed of the six phases shown in Figure 5-1.
The life cycle phases are introduced in the following subsections. A set of use
cases and the corresponding eTOM processes are presented in Annex H. These
use cases are not intended to be prescriptive but serve to illustrate one approach
to the process flows involved in SLA management.
The exit criteria for this phase are new product descriptions with the corresponding
SLA Templates.
The Negotiation and Sales Phase includes negotiating service options, Level of
Service and Quality of Service parameters, and potentially, changes in the values
of service parameters from those specified in the template. The scope of the
negotiation and sales phase may depend on the service and the type of Customer.
For example, a SP may offer only pre-defined service levels to residential or
SOHO Customers, but may enter into negotiations with large corporate
Customers.
The model presented here assumes that the Customer is contracting for multiple
service instances that will be delivered during the contract period. In general, this
phase will be approximately the same for each individual service sale that involves
an SLA.
3) The costs incurred by the SP when an SLA violation occurs or the amount
of incentive payments when the committed service levels are exceeded.
4) Definition of reports associated with product. Note that the time and
frequency of report generation depends on the relevant SLA parameters,
e.g., availability over a unit of time, such as one day, week, month, quarter,
or year.
The exit criterion for this phase is a signed contract.
The Implementation Phase of the life cycle covers the activities associated with
enabling and activating service instances. This may involve deploying new network
and/or service resources or configuring existing equipment. Network resources are
configured to support the required level and quality of service specified in the SLA.
The Implementation Phase processes will be executed differently by every SP but
the overall results will be the same.
The exit criterion for this phase is the instantiated, tested, and accepted product.
The Execution Phase covers the operation of the services specified in the SLA.
These include:
1) Normal in-service execution and monitoring,
2) Real-time reporting and service quality validation,
3) Real-time SLA violation processing.
The Assessment Phase has two parts. The first assessment focuses on a single
Customer’s SLA and on the QoS delivered to this Customer. The second
assessment occurs during the period when the SP is reviewing its overall quality
goals, objectives, and risk management procedures. This later assessment is part
of an overall internal business review.
Service
Step 1 Scenarios
Analyse
Step 2 Timeline
Identify Service
Step 3 Topology
Identify
Steps 5 & 6 Develop KQIs
Measurements
Starting with the business problem and working down the stack the methodology
derives the Key Quality Indicators and the required measurement and metrics to
calculate them.
The service scenarios or range of services offered by the service provider will be
unique to the particular service provider according to business case drivers.
Similarly the range of applicable KQI’s will vary according to service providers
chosen or preferred range of services and the availability of applicable
measurements needed across the service delivery components and systems
needed to delivery the service.
The scenarios are set against a timeline, developed to represent both the
customer’s and supplier’s actions from the initial point of contact through to
cessation of the service.
To each of these scenarios the timeline is applied as shown in Figure 5-3 resulting
in more specific Customer Experience Timelines. The bottom half of the time line
maps the customers interactions with the service at different points in the life-cycle.
The top half of the timeline considers the effect of those actions on the supplier of
service and the options or impacts that need to be considered in the analysis.
First user
experience Receive
Find out Change
bill
about Service Personalise personalisation
service Activated service
Report
Cease
problem
Subsequent service
Provision Request usage
service for service Experience
customer
problem
Problem
Decide to buy resolved
Customer
advised
Applying each scenario to the timeline provides more specific timeline actions and
prompts review of the interactions from a customer perspective which may need to
be measured. By taking a top-down approach the full range of both network and
non-network related interactions are identified.
.At this stage there is a business judgement as to which are the significant
interactions which are mapped in detail in the next step.
Before the KPIs and measurements can be derived, the complete service topology
needs to be understood.
The service delivery path must describe the service resources which comprise the
end to end topology. This will cover the network components, supporting system
components, for example billing systems, key process components and reliance
on 3rd party suppliers relationships, as well as the relationships between the
service elements.
New services will include components that already exist in other services. These
components or service resources may be considered as reusable components. An
individual service resource may simultaneously be a service offered to the
customer and a component in another service.
Some of these service elements may be sourced from a third party, inside or
outside an enterprise.
Each of the actions from the individual timelines can then be broken down to
discrete customer actions, then to the technical actions and represented as a
transaction matrix.
The matrix is developed by mapping the columns of the matrix to the service
resources that were identified previously. For clarity they should follow the primary
service delivery path.
The rows of the matrix are the user interactions which were selected for analysis in
step 2. The sub-actions are listed and their path through the service resources
plotted.
The service resource row of the table identifies the resource availability KQIs that
should be considered. A degradation of availability of any of the resources in the
transaction table is likely to affect the service quality. An aggregation of these
KQIs may be used to form a combined KQI (C-KQI) for a single indication of the
end-to-end service availability. At the highest level of aggregation, this KQI is
sometimes referred to as the Service Index.
The transaction flow arrows identify the points where information is passed
between service resources. These are potential points for identifying accuracy
KQIs and can again be aggregated to form one or more KQIs that provide a higher
view of service quality.
Working down the worksheet the transaction arrows flowing from left to right and
the corresponding ‘response’ arrows flowing in the opposite direction, provide an
indication of where speed (time) based KQIs may be required for measuring the
service quality. The aggregated KQIs may span multiple transactions.
A further KQI type may be extracted from the table although these are not as
obvious as those discussed above. These KQIs are measurements of ‘simplicity’
or ease of use. For example given a service that involves presenting the user with
a number of options, a measurement of the time between the system issuing the
menu or prompt and the time taken for the user to respond may provide an
indication of the clarity of the menu or prompt options.
The Transaction matrix allows the identification of: the metrics and measurements
that are required to monitor the KQIs.
This section introduces and defines the service parameter framework. The intent
here is not to specify actual parameter values since these are commercially
sensitive and subject to negotiation between the SP and its Customer. The
objective is rather to define a method of classifying service parameters and listing
and defining them in a consistent manner within the SLA.
The set of SLA service parameter values is an element of a contract between two
parties. Whether the SLA and its service parameter values are considered part of
the contract or an adjunct to it varies from provider to provider.
There are numerous business parameters contained in a contract that are not
covered by this Handbook. For example, while SLA violations and remedies are
discussed in terms of the parameters that may identify the violation, mechanisms
for payment and/or other required actions are not included since these are unique
to each contract and the commercial agreement between the parties concerned.
Similarly, proprietary underlying systems and processes used to support SLAs are
not described. However, see Annex H for selected use cases describing SLA
support within the eTOM process framework.
Figure 5-7 illustrates a typical relationship between the service functions defined in
Section 0 and the three parameter categories. In most cases the Service Specific
parameters refer to a service’s primary functions.
Primary Functions X X X
Enabling Functions X X
OAM Functions X X
Figure 5-7: Service Functions-Parameter Categories Relationship
The rows in the service parameter table distinguish between the individual user
view of the service and the aggregated view. The individual user view is
associated with a SAP or SAP Group and covers service properties such as the
service interface or the maximum service down-time that an individual service user
could experience during a specified time period. The aggregated view essentially
captures the average performance over all service users during a specified time
period and includes items such as service billing and aggregate availability. It is
important to distinguish between single event requirements and aggregated
requirements from the user’s perspective. For example, if only aggregated
availability is specified over a billing period, it would be possible for a single user
outage to shut down that user entirely while the aggregated service requirement is
met. The single user view parameters can be used to define the maximum down
time for a single event and the minimum up-time between events. This detail could
be lost at the aggregated requirements level.
The SP's perspective is not the same as the Customer's, and internally at least,
the SP must consider issues such as revenue generation/continuity, differentiated
services, and the cost to maintain networks and services. These issues might
feature in the “internal SLA” between departments of a SP or between one SP
(retailer) and another (wholesaler). Note that availability performance can be
specified in all six categories.
Figure 5-8 illustrates some typical parameters that may be found in the Service
Parameter Table.
Service/technology independent service parameters are those which are often (if
not always) specified in an SLA. Examples include Percentage Availability, MTBO
or MTBF, OI, MTPS, MTRS, time to first yield, average call response time, etc.
These are sometimes referred to as “operational performance criteria10,” and some
are required to be reported to regulatory authorities by SPs on a regular basis,
e.g., time to first yield. Included in this set might be integrated or consolidated
billing for a basket of services, accuracy of billing and payment terms.
Other service/technology independent factors are billing period, security (of both
service access and information transfer/delivery) and the specification of alternate
routing/redundancy of network connections providing the service, possibly
including avoidance of Common Points of Failure (CPoFs). For many mission-
critical services, these factors are very important and need consideration. In an
ebusiness environment, they may be critical to the survival of the business,
particularly for large companies and financial institutions.
10
This terminology is not aligned with the service performance terminology used in E.800 and in this document.
Time of day synchronization across multiple SPs, NOs and time zones can be
difficult. The ITU has standardized time stamping of events and information
exchange in terms of UTC - Universal Recorded Time. ITU-T Study Group 2 has
developed a Network Management Recommendation that tracks the time from the
occurance/detection of a network defect through all the stages of reporting,
remedial actions taken, etc., to the time when the physical repair is completed.
This time recording scheme is discussed further in Section 0.
The availability measures have been further refined and expanded in Section 0 to
account for SAP weighting.
error performance, premature release, release failure and delay, and CLI
reliability.
Note: with the increasing use of digital network technology, control and
management of echo has become increasingly important, even for quite
close destinations from a caller. For Voice over IP (VoIP), delay and echo
are two of the major hurdles to overcome.
2) Data: BER, % EFS, errored PDUs, lost PDUs, UAS, Transport Parameters
such as loss, attenuation, group delay distortion, noise, impulse noise , and
analog parameters.
3) Facsimile: image quality, character error rate, call cut-off, modem speed
reduction, transaction time and availability.
4) Mobile telephony: call completion rate, call dropout rate, noise, echo,
distortion and availability.
5) Sound program: noise, crosstalk, stereo channel interference, distortion
and availability.
6) Support & maintenance: service penetration (e.g. telephones per 100
population), supplying service instruction and information, access to SP
(e.g. answering requests, response times), service supply and removal
(e.g. MTPS), and service repair (e.g. MTTR).
Network and service performance will inevitably vary over time as equipment ages
and external influences take effect. Unexpectedly large variations in traffic levels in
the supporting network may also impair service. Unforeseen events such as
floods, hurricanes, earthquakes, or intentional acts may cause server service
disruption. The provisioning of network resources alone is not sufficient to ensure
that the contracted performance is sustained. Monitoring and management of the
performance factors specified in the SLA are also important to retaining Customer
satisfaction and loyalty, and avoiding any SLA violations and penalties. Monitoring
is required to detect and locate the source of the service degradation. This requires
that the performance levels be monitored at various points along a connection, not
simply at its origin and destination. Although the latter is of most interest to the
service end user, the former provides valuable information on the location of the
source of the degradation. This requires monitoring of real-time traffic flows in
different network segments. Detailed description of monitoring methods is outside
the scope of this Handbook.
Note that for international networks and services, the performance objectives are
specified by ITU-T Recommendations.. In many cases, these Recommendations
also allocate the performance objectives to different portions of the network in such
a way that each network provider is aware of its responsibilities.
The effective management of all the related Service and Customer data is the
principal challenge to providing truly integrated SLA Management. SLA
Management functions need to combine, correlate, assess, and act on a wide
variety of data sources to effectively meet the fulfillment and assurance levels
specified in SLAs. There is a need to draw on many data sources within a SP’s
Operations Support System (OSS) environment, such as information on
Customers, services, and service performance as shown in Figure 5-12.
Customer
Details Customer SLA
Performance Data SLA Reports
Service SLA
Instances Contracts
Raw
Performance
Data
Product & Service Implementation
Development
As shown in Figure 5-13, raw process data and service configuration details
related to operational process parameters are aggregated to provide service level
quality metrics from the technology, service and process performance data. QoS
Analysis & Reporting and SLA Proactive Monitoring use the service performance
data to correlate service performance against set operating targets and SLA
guarantees.
Customer/SLA Customer
SLA & QoS
Layer Reporting
SLA
Proactive
Monitoring
QoS
Analysis
& Reporting
Service
Layer
Service
Performance
Data
Aggregation
Figure 5-14 represents implied relationships between the engineered level for the
NP and commitments made in an SLA.
Figure 6-1: Time Line For Reporting Network And Service Incidents
Definitions of the significant time points shown in Figure 6-1 are defined in Table
6-1.
T0 The time that a defect first appears in the network regardless of whether or
not any Customer service is impacted.
T2 The time that the network problem (defect) is detected by the Network
Manager. The detection can be via a system or verbal, e.g., a Customer
trouble report. If the detection is from an OS, the detection time is the time
when the exception occurred, not when first seen.
The measurement and recording of the time points and intervals shown in Figure
6-1 have been suggested as a way of characterizing service performance and NP
in terms of “best in class,” “world class,” and “average” bands.
As noted earlier, many of the performance parameters are time-related. There are
two aspects to this - the sampling or measurement period over which each
parameter is calculated, and the reporting interval over which the parameters are
averaged and reported. The sampling periods that have been traditionally used are
every second, every minute, every 15 minutes or every 24 hours. The reporting
period is typically one month. Thus, real-time performance parameters are typically
measured over the sampling period, stored in a database and reported once a
month. The choice of Customer reporting interface implementation will be
influenced by a number of factors, including the contracted reporting interval.
Performance Reporting
The Performance Reporting Process provides the service Customer with all of the
performance reports specified in the SLA. Routine reports include reports of the
performance of a service vis-à-vis the SLA specifications, reports of developing
capacity problems, and reports of Customer usage patterns.
The process functions that are addressed within this document include:
1) Scheduling the Customer reports,
2) Collecting performance data,
3) Compiling the Customer’s reports,
4) Delivering reports to the Customer.
Figure 6-3 shows only the interrelationships required for Performance Reporting.
The remaining interfaces, although essential for providing information to other
processes, are out of scope of the Performance Reporting interface.
Reporting Scenarios
The second scenario is illustrated in Figure 6-5. This scenario describes the case
where a slower than real time reporting service is used (e.g. e-mail, loading an
HTML file on a Web server, etc.) between the Service Provider and the Customer.
The Customer’s access to the retrieval service and to the data in the “mailbox“ is
not under the Service Provider’s control. i.e., it is provided by a third party. Note
that in some cases the Service Provider could act as the third party.
The third scenario is illustrated in Figure 6-6. This scenario describes a Service
Provider operated data management service from which the performance reports
can be retrieved. The Customer is provided with access to this data management
service. This access is typically on-line.
After the Customer acknowledges receipt of the requested reports, these reports
are deleted from the Service Provider’s data management service.
State Model
just completed period is compiled and made available to the Customer. There is,
therefore, a need for two reporting processes to be active for a relatively short time
interval. Figure 6-8 provides a high level view of the reporting process state
model.
The following table shows the relationship between Performance Reporting Events
and the states for the Performance Reporting Process.
Event State
Event State
Reporting Intervals
There are a number of different time intervals associated with the performance
reporting process. The following sections define these intervals.
A Data Collection Interval refers to the frequency with which performance statistics
(parameter values) are retrieved from network equipment. For example data may
be collected on a weekly, biweekly, or monthly basis. This interval does not have
to be the same as the measurement interval because network devices typically
provide a data storage capability. In any case, the data collection interval must
contain a discrete number of measurement intervals.
An Aggregate Interval is the time interval over which the raw data is summarized in
a particular report. For example, raw 15-minute data may be aggregated into one-
hour or one-day periods for reporting purposes. The data aggregation period must
contain a discrete number of measurement intervals or data collection intervals.
6.1.13 Example
A Frame Relay network measures traffic statistics and stores them in 15-minute
intervals. A statistics collection process retrieves this data once a day. Once a
calendar quarter, the Service Provider delivers three reports to the Customer
based on this data. Each report covers one month of traffic data summarized by
day.
Types Of Reports
Basically, Performance Reporting provides two types of reports. These are the
LoS/QoS reports and the Traffic Data/Utilization reports.
The LoS/QoS reports provide overall assessment of service levels and service
quality as specified by the SLA parameters. The Traffic Data report provides usage
measurement information for the service.
The following basic Information is provided in text, graphs and/or tables format.
1) Summary information,
2) Trends,
3) Exceptions,
4) Comparisons,
5) Snapshots.
All reports should have a common report header providing the following
information:
1) Customer Information
a) Customer Name, e.g., name of the entity on the billing record,
b) Customer ID, an internally assigned identifier for the Customer,
c) Customer Contact Name, i.e., name of the Customer contact for
performance reporting matters the contact’s organizational unit,
d) Customer Contact Information, e.g., address, phone numbers, e-mail,
etc. for the Customer contact person.
2) Service Provider Information
a) Service Provider Name, e.g., name of the Service Provider for the
specific service,
b) Service Provider Contact Name, e.g., the contact person or
organization of the Service Provider for the specific service,
c) Service Provider Contact Information, e.g., address, phone
numbers, e-mail, etc. for the Service Provider contact person.
3) Service Information
a) Service Identifier, a unique identifier for a Customer service. This
identifier is a key to request performance data and outage
information,
b) Service Type Code, an enumerated code which identifies a service
type, e.g., FR PVC, CR PVC, DS-1, E-1, etc.,
c) Service Descriptions ID, an identifier which points to a textual
description of the service,
d) Service Profile Descriptions, e.g., descriptions of configuration
parameters and corresponding values,
e) SAP Information, e.g., SAP’s address, SAP’s weighting, etc.
4) Report Information
a) Report Type, LoS/QoS report or Traffic Data report,
b) Reporting Period, start and end time of the reported interval,
c) Boundary Conditions, e.g., information on addressed boundary
conditions, e.g., where outages cross reporting interval boundaries,
d) Suspect Flag, the flag is set if the report is suspected to contain
unreliable or incomplete data.
Traffic Data Reports enable Customers to determine how the contracted services
are being used. This allows the Customer and the Service Provider to plan for
future services based on traffic patterns and to verify if they are over/under using
the services they are currently subscribing to. Typical services for which traffic
reports can be provided include:
1) Leased Line Service,
2) ATM Cell Relay PVC Service,
3) Frame Relay PVC Service,
4) IP Service.
NGOSS
SLAs and the supporting tool fall in the Business View quadrant of the NGOSS
framework shown below, defining the business need for Service Measurement.
eTOM
Figure 7-2 is based on the TM Forum eTOM and shows the processes involved in
defining the various types of SLAs and relating them to the organizational functions
that typically own them; Customer / Marketing and Operations.
S/P
Operation’s Domain S/P SLA SLA Perf. Data
Objectives
Supply Chain Supply Chain S/P S/P
Capability Dev & Change Performance SLA
Mgmnt Mgmnt Management Management
KQI / KPI
SLA Reqs Mapping etc S/P
SLA Data
Requirements
Target Target Target Actual
Product KQIs Service KPIs Data Resource Service
Product Resource Data
Dev. & Dev. & Data Quality
Offer Dev.
Retirement Retirement Collection Analysis & Rpt
Actual
SLA KQIs &
Templates
SLAs Customer
Order QoS/SLA
Violations
Handling Mgmnt
Customer / Marketing Domain
Manage
Internal SLA
The following text provides a summary of the derivation and application of service
based measurements based on the eTOM process framework version 3.
It should be noted that in the latest version of eTOM, version 3.0, an additional
Level 2 process Resource Quality Analysis Action & Reporting has been added.
As the eTOM team will further elaborate on the significance of this new process,
we will update the WSMT methodology mapping in future releases of this
document.
The flow of service data shown in this section is based on a hierarchy of key
indicators as described above across the end to end process starting from the
product development process through to managing the Service Level Agreements
associated with the offering. As indicated in the figure below, there are three
distinct types of SLA:
Customer SLAs
• aimed at the total product offering to the customer and are usually
subject to a formal contract between the customer and the
operator.
• written in terms that the customer understands and based on the
end to end delivery of the product components.
Supplier Partner SLAs
Internal SLAs
Fundamental to the process flow is the fact that the SLA and the component
clauses and requirements are designed at the very beginning of the product and
service design i.e. within the Product Offer process. These SLA requirements then
form part of the design requirements for the product development process and
onwards as KQIs and KPIs into the service and resource development process.
Thus Customer Perceived Quality criteria are traceable throughout the
development processes or in other words ‘quality is designed in’. Additionally the
recommended process flow encompasses the quality objectives for third party
service delivery components and the thus defines the SLA with the partner /
supplier. The top down process flow also enables the operator to define internal
SLAs (or SLOs) against the internal business functions responsible for managing
the service components within the operators business and that these quality
objectives reflect the SLA clauses that are offered to the end user.
SID
The Shared Information Data Model (SID) underpins the NGOSS framework by
defining the data entities and relationships used by the business. Figure 7-3
illustrates the main areas where interaction with the SID team has focused.
1..*
Customer
Commercial
Contract
Offer
Defines
Uses
(P)KQI
Defines
SLA
Uses
Uses
Product
SLA Defines
Template
Uses
Defines
Uses
(S)KQI
Defines
Service
Defines
Uses
Service
Defines Element
KPI
Defines
Service
Resource
References
[GB 910] Telecom Operations Map, GB 910, Approved Version 2.1, TeleManagement
Forum, Morristown, NJ, March 2000.
[GB 917-1] SLA Management Handbook - Executive Overview, Member Reviewed Version
2, TeleManagement Forum, Morristown, NJ, January 2005.
[GB 917-3] SLA Management Handbook - Service and Technology Examples, Member
Reviewed Version 2, TeleManagement Forum, Morristown, NJ, January 2005.
[GB 917-4] SLA Management Handbook – Enterprise and Applications, The Open Group,
2004.
[GB 921] enhanced Telecom Operations Map, GB921, Approved Version 3.0,
TeleManagement Forum, Morristown, NJ, June 2002.
[TMF 044] TM Forum Glossary, TMF 044, Public Version, Version 0.2, March, 2003.
[TMF 506] Service Quality Management Business Agreement, TMF 506, Evaluation
Version Issue 1.5, TeleManagement Forum, Morristown, NJ, May 2001.
[TMF 701] Performance Reporting Concepts & Definitions Document, TMF 701, Version
2.0, TeleManagement Forum, Morristown, NJ, November 2001.
Acronyms
2G 2nd Generation
3G 3rd Generation
3GPP Third Generation Partnership Project
ABR Available Bit Rate
ABT ATM Block Transfer
ACA Australian Communications Authority
AIP Application Infrastructure Provider
ANSI American National Standards Institute
ANT Access Network Transport
API Application Programming Interface
ARPU Average Revenue Per User
ASP Application Service Provider
ASPIC Application Service Provider Industry Consortium
ASR Answer/Seize Ratio
ATC ATM Transfer Capability
ATIS Alliance for Telecommunications Industry Solutions
ATM Asynchronous Transfer Mode
BBE Background Block Error
BBER Background Block Error Ratio
BER Bit Error Ratio
BICC Bearer-Independent Call Control
B-ISDN Broadband Integrated Services Digital Network
CATV Cable Television (Community Antenna Television)
CBR Constant Bit Rate
CCR Call Completion Record
CD Cell Delay
CDR Call Data Record
CDV Cell Delay Variation
CDVT Cell Delay Variation Tolerance
CE Cell Error
CEM Customer Experience Management
OM
FDD Frequency Division Duplex
FR Frame Relay
FRAD Frame Relay Access Device
FSA Framework Study Area
FTD Frame Transfer Delay
FTP File Transfer Protocol
G-CDR Gateway GPRS Support Node – Call Detail Record
GERAN GSM/EDGE Radio Access Network
GFR Guaranteed Frame Rate
GGSN GPRS Gateway Support Node
GII Global Information Infrastructure
GoS Grade of Service
GPRS General Packet Radio Service
GPRS General Packet Radio Service
GSM Global System for Mobile communication
HCPN Hybrid Circuit-switched/Packet-based Network
HTTP Hyper Text Transfer Protocol
IAB Internet Architecture Board
ICSP Information And Communications Service Provider
IESG Internet Engineering Steering Group
IETF Internet Engineering Task Force
IMT International Mobile Telecommunications
IMT-2000 International Mobile Telecommunications-2000
IN Intelligent Network
INMD In-service Non-intrusive Measuring Device
IntServ Integrated Services
IOPS.OR Internet Operators Group
G
IP Internet Protocol
IPDV IP Packet Delay Variation
IPER IP Packet Error Ratio
IPLR IP Packet Loss Ratio
IPPM IP Performance Metrics
IPTD IP Packet Transfer Delay
IRTF Internet Research Task Force
NM Network Management
NMC Network Management Centre
NMDG Network Measurement Development Group
NMF Network Management Forum
NNI Network-Node Interface
NO Network Operator
NOC Network Operations Centre
NP Network Performance
NP&D Network Planning and Development
NPC Network Parameter Control
NSP Network Service Provider
NTE Network Terminating Equipment (Element)
OAM Operations, Administration, and Maintenance
Oftel Office of Telecommunications (British)
OI Outage Intensity
ONP Open Network Provision
OS Operations System
OSI Open Systems Interconnection
OSS Operational Support System
OSS Operations Support System
OTN Optical Transport Network
PC Personal Computer
PCR Peak Cell Rate
PDA Personal Digital Assistant
PDH Plesiochronous Digital Hierarchy
PDN Public Data Network
PDN Public Data Network
PDP Packet Data Protocol
PDU Protocol Data Unit
PEI Peak Emission Interval
PHB Per Hop Behavior
PIB Policy Information Base
PIN Personal Identification Number
PIR Peak Information Rate
PLM Product Line Management
PNO Public Network Operator
Terms and definitions used in this Handbook are mainly based on the TM Forum
Glossary [TMF 044], on the Performance Reporting Concepts and Definitions
Document [TMF 701], and on internationally agreed definitions published by the
ITU. For example, ITU-T Recommendation M.60 [M.60] contains Maintenance
Terminology and Definitions. In this chapter, only some key terms related to SLAs,
QoS and performance are defined and their source identified. For the purposes of
this Handbook, some definitions have been modified and/or extended. Where
multiple definitions are in use within the industry, all are given. Not all of the terms
and definitions in this Annex are necessarily used in this volume of the Handbook
but are included as they are used in other volumes or are expected to be relevant
in later versions of the Handbook series11. For further information, please consult
the previously referenced documents and the ITU-T’s Sector Abbreviations and
defiNitions for a teleCommunications tHesaurus Oriented (SANCHO) database at
http://www.itu.int/sancho/index.asp.
Active (condition) – condition (e.g. alarm) not cleared (ITU-T Rec. M.2140).
*Aggregate Interval – the time period over which raw data is summarized in a
particular report. For example, raw 15-minute data may be aggregated into one-
hour or 24-hour intervals for reporting purposes (TMF 701 modified).
11
Terms and definitions not yet used are marked with a *.
Clear – the end of a fault; the termination of a standing condition (ITU-T Rec.
M.2140).
Commercial Offer - Commercial Offers are sold to customers. They are the
marketing mix and consist of, the product, pricing, the contract, the SLA etc.
may be owned and operated by the Customer or the Service Provider (extracted
from TMF 701).
*Data Collection Interval – the period over which performance parameters are
accumulated to compute each stored measurement and to detect maintenance
threshold crossings (ITU-T Rec. M.2140).
The time interval when statistics are retrieved from the network. This interval does
not have to be the same as the measurement interval because the network
devices may buffer statistics so that a number of them may be collected later (TMF
701).
*Event Report Message – a message sent from one physical system to another
that contains information about an event (ITU-T Rec. M.2140).
*Event Set – the set of all events that are grouped by a selection process for direct
comparison or patterning (ITU-T Rec. M.2140).
GoS is the minimum level of service quality designed into the network supporting
the service and maintained by traffic planning and engineering management
actions depending on traffic densities over the duration the service is offered or
used. As such, GoS represents a guaranteed expected level of service quality for
any connection in the same QoS class of a specified service at any instant, and
may in fact be improved upon depending on traffic loading of the network.
Mean Time Between Failures (MTBF) – the average time between failures of the
service.
Mean Time Between Outages (MTBO) – the average time between outages of
the service.
Mean Time to Provide Service (MTPS) – the average time to actually provide a
specified service from the date of signing a contract to provide service. This may or
may not be specified in the SLA.
Mean Time To Repair (MTTR) – the average time to repair service resources.
Mean Time to Restore Service (MTRS) – the average time to restore service
after reporting a fault; this time includes the time to sectionalize and locate the
fault. This may or may not be specified in the SLA.
Quality of Service Reports – reports generated from the service quality and
performance data to report the performance of the service as a whole.
Raw Performance Data – raw performance data collected from various data
sources including the network and service resources, such as network elements,
network and element management systems, and network and application servers.
In addition, data collected for the SP’s OSSs, such as trouble ticketing, order
processes, maintenance and support, Customer care, etc.
*Ready for Service Date (RFSD) – the specified date in the contract at which the
contracted service is ready for operation.
Note that the calculation of the SA may include weighting of the SAPs as noted
above. The detailed formula is contained in TMF 701 and ITU-T Rec. M.1539.
*Service Degradation Factor (SDF) – a factor agreed between the Customer and
the Service Provider used to weight the service availability calculation when the
service is still available, but degraded from its contracted QoS (extracted from
(TMF 701).
Service Descriptions - details of the service product catalogue offered by the SP.
one) that exists between the Service Provider and the Customer, designed to
create a common understanding about services, priorities, responsibilities, etc.
(TMF 701 modified).
Service Level Agreement Reports – reports generated from the Customer SLA
quality and performance data to report the performance of the specific service
instance for a specific Customer against an SLA.
Note that the term Service Provider is now being used generically and may include
Telecom Service Providers (TSPs), Internet Service Providers (ISPs), Application
Service Providers (ASPs) and other organizations that provide services, e.g.,
internal IT organizations that need or have SLA capabilities or requirements.
*Standing Condition – a condition that has duration, beginning with a failure and
ending with a clear (ITU-T Rec. M.2140).
Supplier - Suppliers interact with the Enterprise in providing goods and services,
which are assembled by the Enterprise in order to deliver its products and services
to the Customer.
Supply Chain - ’Supply Chain’ refers to entities and processes (external to the
Enterprise) that are used to supply goods and services needed to deliver products
and services to customers
Third Party Service Provider - The Third Party Service Provider provides
services to the Enterprise for integration or bundling as an offer from the enterprise
to the Customer. Third party service providers are part of an enterprise’s
seamless offer. In contrast, a complementary service provider is visible in the offer
to the enterprise’s customer, including having customer interaction.
Time to First Yield – the time interval between initiating service and the first
reportable service-impacting event.
Value Network - The enterprise as the hub a value network is a key concept of e-
business. The value network is the collaboration of the enterprise, its suppliers,
complementors and intermediaries with the customer to deliver value to the
customer and provide benefit to all the players in the value network. e-Business
success and, therefore part of the definition of a value network, is that the value
network works almost as a vertically integrated organization to serve the customer.
Demonstrably in the modern business world e-business has either a direct or indirect impact on all
business enterprises. More and more companies are increasingly dependent on
telecommunication services as a core component of business strategy. The quality of
telecommunication services is therefore rapidly becoming a significant factor in the success or
failure of businesses, particularly with regard to availability and reliability. It is the Service Level
Agreement (SLA) that defines the availability, reliability and performance quality of delivered
telecommunication services and networks to ensure the right information gets to the right person
in the right location at the right time, safely and securely. The rapid evolution of the
telecommunications market is leading to the introduction of new services and new networking
technologies in ever-shorter time scales. SLAs are tools that help support and encourage
Customers to use these new technologies and services as they provide a commitment from SPs
for specified performance levels.
The TM Forum SLA Handbook (GB 917 Version 1.5) was released by the TM Forum in June
2001. The objective of GB 917 was to assist Customers and Telecommunication Service
Providers (SP) with understanding the fundamental issues involved with developing
telecommunication services SLAs and SLA management. GB 917 Version 1.5 incorporates the
concepts within the Performance Reporting Concepts and Definitions Document (TMF 701) and
the Telecommunication SP to Customer Performance Reporting Business Agreement (NMF 503)
documents. These two documents offer a valuable extension to GB 917.
GB 917 Version 1.5 was based on the traditional telecommunications SP to Customer relations
embedded within the International Telecommunication Union (ITU) Telecommunication
Management Network (TMN) Framework model and the TM Forum generated
Telecommunications Operating Model (TOM) functional processes. Since GB 917 Version 1.5
there have been many significant advances within the telecommunications SP industry. There is
now a need to address these recent advances in the context of how telecommunication SP to
Customer relations are managed and the impacts on SLAs and SLA management.
The more important external and internal telecommunication SP advances that have made
significant influential impact and need to be taken into consideration are, inter alia:
• A new breed and types of Service Providers such as, Internet, Application and
Content;
The TM Forum SLA/QoS Handbook Team also realized that the first GB 917 SLA Management
Handbook lacks sufficient detail and management processes to cater for the breadth of SLAs
needed within a modernized Telecommunication SP industry. GB 917 Version 1.5 has therefore
been revised to include the SLA management processes necessary to handle recent
telecommunication advances as well as the TMF instigated eTOM.
The revised TM Forum SLA Handbook is designated GB 917 Version 2 and structured as a four
Volume suite. Volumes 1, 2 and 3 focus on telecommunication services SPs, Suppliers and
Customers whereas Volume 4 focuses on Enterprise business applications and associated
telecommunication services. The four volumes are entitled:
Volume 1 is written for Chief Executive Officers (CEO) and Board of Directors members. It is a
concise introduction to SLA Concepts, Business Case, Benefits, and Consequences for
telecommunication service customers, SPs, and hardware and software suppliers. Volume 1 also
addresses where SLAs reside within the modern market place.
Volume 2 is written for the telecommunication and supplier managers. It provides the detail behind
SLA principles such as Service Access Point (SAP), Service Delivery Point (SDP), SLA
management process mapping onto eTOM, service parameter framework, and measurement and
reporting strategies.
Volume 3 is written for Telecommunication and Supplier implementers. It describes how to apply
the SLA principles defined in Volume 2 to a representative set of technologies. Volume 3 also
includes a checklist of items typically included within telecommunication service SLAs.
Volume 4 is written for enterprise managers and implementers. It addresses business application
and services as well as internal and external network services. In this context it generically
describes enterprise performance requirements for end-to-end services. A number of enterprise
business applications of SLAs are described in detail.
For business enterprises, be they end Customers, SPs or Suppliers, to embrace SLAs a business
case must be made. Volume 1 approaches the business case from the point of view that
information related to product lines and transfer of information enabled by telecommunication
services and networks is the life blood of business enterprises. It is business enterprise
information that must be properly understood, managed and assured for business enterprises to
sustain growth and not fail.
For end Customers, SPs and Suppliers CEOs and Board members accepting that adopting SLAs
is considered a progressive adjunct to business enterprise strategy, Volume 1 provides suggested
possible next steps towards implementing, embracing and supporting SLAs. These next steps
range from a review of business strategies, gap analysis, information content, processes, cost-
benefit analysis, current services and supporting SLAs, training needs analysis to future service
plans, marketing strategies and outsourcing strategies.
For convenience Volumes 2, 3 and 4 Executive Summaries are at Annexes A, B and C to this
Volume 1 Executive Overview.
The objective of the SLA Management Handbook series is to assist two parties in
developing a Service Level Agreement (SLA) by providing a practical view of the
fundamental issues. The parties may be an “end” Customer, i.e., an Enterprise,
and a Service Provider (SP) or two Service Providers. In the latter case one
Service Provider acts as a Customer buying services from the other Service
Provider. For example, one provider may supply network operations services to
the provider that supplies leased line services to its customers. These relationships
are described as the Customer-SP interface and the SP-SP interface.
This volume of the Handbook briefly reviews the essential elements of the
concepts and principles presented in Volume 2. Using this as a base, a check list
of items for potential inclusion in SLAs is presented. This is then followed by seven
examples of the application of the SLA concepts and principles.
The SLA check list contains numerous lists of items that may be included in an
SLA. These lists were derived from TMF member contributions, information
provided by user groups and by standardization bodies. Not all of these items will
be relevant to a specific SLA. The order of the items in the list does not imply a
priority. These items may be consolidated or may be disaggregated as needed.
Topics covered include service descriptions, service level and service quality
specification, service monitoring and reporting, and tariffs and billing issues.
This volume concludes with high level examples of how the principles and
concepts defined in Volume 2 can be applied. It should be noted that the services
used in the examples can become quite complex in particular instances. The intent
in this document is to retain only the essential aspects of the services while
illustrating the use of the SLA Parameter Framework.
Note that all parameter values that appear in this document are for illustrative
purposes only and are not intended to represent industry agreements or
The examples herein include lease line services, emergency and disaster relief
services, ATM Cell Delivery and IP based virtual private networks (VPN).
The SLA Management Handbook series incorporate earlier work that appears in
the Performance Reporting Concepts and Definitions Document [TMF 701], in the
Service Provider to Customer Performance Reporting Business Agreement [NMF
503] and in Service Quality Management Business Agreement [TMF506].
This volume addresses the enterprise issues in the provision of end-to-end Service
Level Agreements (SLA) and comes as a collaboration between the Open Group,
representing the enterprise, and the Telemanagement Forum, addressing the
service provider markets. This work was further inspired by survey data gathered
on behalf of the Open Group by Sage Research which indicated great interest in
SLAs in the enterprise but a large gap between where enterprises are considering
SLAs and where standards bodies, such as the IETF, are currently concentrating
their efforts.
The scope of the market addressed by ‘enterprise’ is very broad and business
practices diverse. It was therefore necessary generalize the applications used by
an enterprise that SLA metrics could be applied, measured and reported in a
contractual manner.
This work uses the concept of key quality and performance indicators (KQI/KPI)
developed by the TMF Wireless Services Measurement Handbook (GB 923). The
importance of the KQI/KPI concept is that it allows the provider of the service to
concentrate on the quality, rather than the performance of a service as in the past.
The relationship between the KQI and the performance metrics, KPI could be
identical for the simple case or complex, derived empirically or inferred from a
number of KPI and other data. The mapping between the KQI and KPI forms an
important part of the SLA negotiation.
For each of the generic business services discussed, the KQI are determined and
then KPI for the services and methods tabulated.
The form of an SLA is discussed and special attention is made between the form
of an SLA between internal parties and external parties, especially in terms of
penalties. A monitored and reporting process is then discussed allowing for real
time, near real time and historical reporting of both asynchronous events and
polled parameters.
A number of use cases are considered to validate the approach. The first is a
common example where Voice over IP (VoIP) is used to connect remote sites of
an enterprise to a corporate HQ. Data is also supported but only on a best effort
basis. The second scenario is from the Open Groups Mobile Management
Forum’s work on the ‘Executive on the Move’ where an executive is considered to
have voice and data connectivity wherever they are, in the office, in transit (car,
airplane), at home or at a remote site. Voice, data and video (conferencing) are
also supported for the executive. The final scenario is a naval application where
voice, data and video applications are supported in a highly distributed and
arduous environments. The VoIP and naval scenarios envisage a common IP
infrastructure and uses a differentiated services (DiffServ) marking scheme to
separate the different service domains for prioritization.
The following sections reference other documents that address elements related to
the subject of Performance Reporting. Where possible the issue of the document
has been identified. This list is not exhaustive. Additions should be communicated
to the Editor.
Unless otherwise noted, all of the documented in the following table are ITU-T
documents. Documents noted [*] remain to be reviewed.
Document Title
E.540 Overall GoS of the international part of an international connection
Overall GoS for international connections (subscriber-to-
E.541
subscriber)
E.543 GoS in digital international telephone exchanges
GoS and new performance criteria under failure conditions in
E.550
international telephone exchanges
E.720 ISDN GoS concept
E.721 Network GoS parameters in ISDN
E.771 GoS for land mobile services
E.775 UPT GoS concept
E.724 GoS parameters and target GoS objectives for IN-based services
Terms and Definitions Related to Quality Of Service and Network
E.800
Performance Including Dependability (08/94)
Model for the serviceability performance on a basic call in the
E.810
telephone network [*]
Models for the allocation of international telephone connection
E.830
retainability, accessibility and integrity[*]
Connection accessibility objective for the international telephone
E.845
service[*]
Connection retainability objective for the international telephone
E.850
service[*]
E.855 Connection integrity objective for international telephone service[*]
General aspects of QoS and network performance in digital
I.350
networks, including ISDN[*]
Recommendations in other series concerning network
I.351
performance objectives that apply at reference point T of ISDN[*]
Network performance objectives for connection processing delays
I.352
in an ISDN
I.360 series
M.3100 Generic network information model
M.3400 TMN management functions
Q.822 Performance Reporting Management functions
General Quality Service Parameters for Communication via
X.140
Public Data Networks 09/92
User Information Transfer Performance Parameters for public
X.144
frame relay data networks
Document Title
Performance for Data Networks Providing International Frame
X.145
Relay SVC Service
Definition Of Customer Network Management Service For Public
X.161
Data Networks
Framework(s) On Network Requirements And Capabilities To
Y.1271 Support Emergency Communications Over Evolving Circuit-
Switched And Packet-Switched Networks
ISO/IEC
QoS Basic Framework 01/95 (will become CD 113236-2 in Aug.
JTC21/SC21-
95)
N9309
RACE Service Management documents
RFC 1604 Service management requirements Frame Relay Forum.
CCITT Handbook on Quality of Service and Network Performance
Frame Relay Forum Quality of Service Working Group (Working document August
1993)
T1A1.3/93-011R2 -T1 Performance Standards Contribution Doc
Note: E.3xx, E.5xx and E.8xx documents remain to be assigned.
E.771
The Grade of Service (GoS) parameters for land mobile services are parameters
which primarily describe the interface between Service Providers and/or a
Supplier. The mobile telephone subscriber/end user normally does not ask for GoS
parameters. The probability of end-to-end blocking and connection cut-off due to
unsuccessful handover is highly dependent of the network resources.
E.800
M.3100
M.3400
Q.822
X.140
A primary value of this document is Figure 1/X.140 which defines the three phases
of a communication event: Call Establishment, Data Transfer, and
Disengagement. Performance is measured during each phase by three key
parameters: Speed, Accuracy and Dependability. This 3 by 3 matrix appears to
provide a strong foundational model for other documents addressing performance
and availability.
Specific emphasis is made for switched data services such as X.25 and X.21.
Performance measures are specified at the point where a Public Data Network
terminates at the user site at physical layer one. The OSI user perceived QoS
X.144
This document is very well developed for detailing the performance of PVC
operation in the data transfer phase. The reader is drawn to X.140 for the basic
construction of the three phases of a data connection: Call Establishment, Data
Transfer, and Disengagement. During the PVC data transfer phase various
parameters are discussed as measures of performance and availability. Of
particular value are Figure 5/X.144 that shows statistical populations used in
defining selected accuracy and dependability parameters, Table 1/X.144 that
shows the four Outage criteria for the availability decision parameters, and
Appendix I/X.144 that describes sampling techniques and sampling periods for
availability.
X.145
This document is a companion to X.144 and addresses the call establishment and
disengagement phases for SVC operation. Two outage criteria for the availability
decision parameters for SVCs are described in Table 9/X.145 in addition to the
four criteria described in Table 1/ X.144 for PVCs.
X.161
X.161 defines the management services and supporting functions for CNM. This
recommendation is intended to complement TMN specifications and provide a
specification for the non-TMN environment. X.161 is related to:
1) X.160 - Architecture for CNM for public data networks
2) X.162/X.163 - Definition of Management Information for CNM. The CNM
services defined in X.161 are classified into six groups, viz., Fault,
Accounting, Configuration, Performance and Security Management,
(FCAPS) and CNM supporting services.
Y.1271
Y.1271 provides core requirements and capabilities to support emergency
communications. In particular, Section 8 of the Recommendation describes core
requirements from a technology neutral perspective.
FRF-QoS
This document contains definition for transfer delay on Frame Relay circuits and
networks. Suggested measures are defined.
RFC 1604
The document is a very useful MIB for service management of Frame Relay.
However, it does not contain definitions of performance measures. The information
may be useful, but the reader must abstract the MIB field contents to gain useful
QoS measures.
T1A1.3/93-011R2
This contribution contains definitions for PVC availability and Mean time between
service outages. It uses the same definitions as X.144.
This document is an excellent reference for terms and definitions related to QoS. It
is an extension of X.200 and provides a basic framework for QoS in terms of:
1) Characterization,
2) Requirements specification,
3) Management.
It does not address detailed specification of QoS mechanisms. The writers have
developed text that is specifically directed to the larger issues of systems
management (not the specific requirements of networks). Readers should benefit
from the higher level thinking and terminology. However, it should be read from the
users’ point of view and the end-to-end character of the network application. The
attempt to define QoS at generic interfaces between systems (or elements) has
been accomplished. The reader should be alert to specific details in their own
application that may modify the approach of 9309.
Critique: The subject of statistics is addressed from the point of view of necessity
to compress the data resulting from measurements. In very large networks the
amount of data potentially available will be staggering, therefore statistical
reduction and summarization of data is important.
E.4 Directory
This subsection provides a mapping of topics into the various source docuements.
Management References
Management References
The calculation of SAP outage intervals may be done in conjunction with Trouble
Ticketing systems, specifically by passing the Trouble Ticket information on
affected SAPs from the Trouble Ticketing system to Performance Reporting
process. If this is done by notifying the Performance Reporting process of open
and closed Trouble Tickets, an issue arises with unclosed Trouble Tickets which
will distort the derived Availability calculations for affected SAPs.
Some service providers wish to report on additional QoS related parameters such
as:
1) TTR: Time to Restore for a specific SAP,
2) MTTR: Mean Time to Restore for a specific SAP/SAP Group,
3) MTBF: Mean Time Between Failure for a specific SAP/SAP Group.
It is not within the scope of this document to standardize the above mentioned
parameters for the customer to service provider SLAs. However, by using the
proposed Service Availability calculation formula, service providers will have all
information available to perform the respective calculations.
beyond the scope of this document. The Performance Reporting process assumes
the following functionality is supported in the PM-OSF:
1) The PM-OSF in the NML must be able to maintain and/or retrieve PM
history data for a particular network connection on an end-to-end basis.
2) The number of instances of performance history data and aggregation
intervals can be set by the Performance Reporting process based on the
performance reporting aggregation intervals for each service.
3) The PM-OSF supports a registration capability that permits the
Performance Reporting process to register a list of Managed Object
Instances for periodical reporting. The registration process should allow the
specification of the managed object instance and the set of performance
monitoring data attributes. At the end of a collection interval, the PM-OSF
should report the data to the Performance Reporting process.
4) The PM-OSF must allow ad hoc query of history data.
The Performance Reporting process may also use of common objects to:
1) Identify all SLAs in force for a Customer,
2) Identify service characteristics which may be referenced by individual
SAPs,
3) Identify all SAPs for a given Service, e.g., for a multi-party connection.
F. 6 Data Collection
At the service activation time, the Performance Reporting process will initiate the
data collection process. There are two sources for service performance data, the
Trouble Ticketing system for service availability and the PM-OSF on the NML for
network impairment and traffic data.
The data collection process involves client registration and establishing the
periodical reporting mechanism in the Trouble Ticketing system and PM-OSF if
supported. In the absence of a registration and reporting functionality, the
Performance Reporting process will set up a data polling mechanism.
After the receipt of data from the PM-OSF or the Trouble Ticketing system, the
delivered service performance levels will be computed. For example, the Trouble
Ticketing system will provide the outage duration associated with each Trouble
Ticket for a particular service. The PR-OSSF will calculate the Service Availability
level, and if desired, TTR, MTTR, and MTBF, for a particular aggregation interval.
The customer may choose to receive reports that span several aggregation
intervals. The service provider may choose an “atomic” aggregation interval for
each report type. For example, as QoS metrics are typically defined over longer
time periods than traffic data metrics, a service provider may choose a month as
the atomic aggregation interval for QoS reports and an hour as the atomic
aggregation interval for traffic data reports. Reports based on longer aggregation
intervals can be derived from the atomic aggregation intervals, e.g., quarterly and
annual reports for QoS; daily and monthly reports for traffic data.
The number of history reports available for customer access for each report type is
a local policy issue.
Numerous debates have arisen over the difference between quality as perceived
by the end user and the delivered quality as measured by the Service Provider.
This annex provides one approach to defining measures that can be verified by
both the end user and the Service Provider. Perceived performance12 and
delivered performance are indeed two separate, though highly related measures.
These measures are derived from user data while the user is on the line, thereby
providing an accurate indication of actual performance. Increasing demands for
high availability levels require in-service measures. This process described in this
annex is applicable to services that are operational (and might be degraded below
the SLA threshold). Services that have failed completely are not addressed. There
is usually less debate over hard service outages (faults).
G.1 Application
The method described in this annex focuses on the transport phase of the
connection between user SAPs. The method applies in cases where the
connection was established manually (PVC) and in cases where various semi-
automated or automated mechanisms were used.13.
This method uses in-service measurements. It does not interrupt the user data
flow. In fact the measures are based upon the user data. User messages are
selected for analysis because they flow end-to-end. The same assessment can be
made at any point along the path (even though the transport mechanisms may
vary)14. Although some lower layer protocols place message traffic into packets,
frames, or cells with possibly fixed payload capacities, the technique is most easily
implemented by analysis of user messages in their original form.
12
Perceived performance includes the combined performances of the application, the communications
link(s), and any servers being addressed by the application.
13
Call establishment and release (e.g. Figure 3/X.140) is not addressed herein.
14
Where several SPs comprise the end-to-end connection, each SP can make the same measure (as the
user).
Measurements are made at the appropriate protocol layer where the (user) system
at the SAP controls the message error recovery mechanism using a
retransmission procedure.
G.2 Process
Messages flowing from SAP A to SAP Z will be subject to a finite number of errors.
Current technology utilizes various schemes, e.g., the use of various Cyclic
Redundancy Check (CRC) methods, to detect if one or more message bits have
been corrupted in the transmission process. A single bit error may cause the
message to be discarded.
Receipt of a single errored message may cause not only the errored message to
be retransmitted, but it may require that all all intervening messages be
retransmitted to maintain the proper message sequencing. Frequently, the industry
uses a “repeat all after” process15. The common industry term for the maximum
number of outstanding messages is the window size. Typical window sizes range
from 8 to 128. Under extreme, but not impossible, circumstances over 100
messages could be retransmitted to recover from a single bit error16.
15
Although some protocols provide for a selective reject mechanism, implementation has been minimal.
Nevertheless, the same process would be applicable and the perceived performance might converge
toward the delivered performance.
16
Satellite links have large delays and use a large window size to increase throughput.
17
Calculation of errored messages may vary depending on the specific error recovery process.
18
Calculation of repeated messages may vary depending on the specific error recovery process.
G.3 Delivered SA
Errored Messages
SA Delivered = 1 − x 100%
Total Messages
G.4 Perceived SA
A common availability level used for digital data services is 99.98% error free
seconds19,20. This equates to two “bad” seconds in 10,000 seconds. A comparable
message objective is two “bad” messages in 10,000 messages. If one message in
10,000 is “bad”, the delivered performance is twice as good as the objective.
However, if there are seven outstanding messages, then eight messages21 must
be repeated. A perceived performance of eight in 10,000 is four times worse than
the objective. The Service Provider has an opportunity to provide the higher
availability level required to achieve the perceived objective of the user,(albeit at a
premium price.
19
The values are provided for illustration and do not constitute a recommendation.
20
99.85% error free seconds equals 17 errored seconds in a 24 hour day
21
Seven repeated messages plus the original errored message
Users will not always be transmitting message traffic during the measurement
intervals. Further work is needed to develop sampling techniques. At various
network technology layers, in-service monitoring techniques are already provided
to estimate errors and availability. Sampling techniques should be developed at
the application layer to optimize the use of measurement and traffic bandwidth
resources. Confidence levels should be established that correlate with the sample
frequency, duration, time of day, and user traffic encountered22.
Note: U.S. Patent 5,481,548 may cover certain implementations of the methods
described in this contribution. Hekimian Laboratories is the owner of the Patent
and has provided a “Patent License Availability” statement for TeleManagement
Forum members.
22
This not intended to be an exhaustive list of parameters.
The following use cases present a high level view of the data flows and process
actions required to support the SLA life cycles phases. By examining each phase,
the detailed ramifications of SLA management on the baseline eTOM processes
can be evaluated. The inputs and outputs listed below are selected based on their
relevance to SLA/QoS management and do not include flows that are required to
support other management objectives. Note that eTOM processes may participate
in more than one life cycle phase.
The use cases which follow relate the life cycle phases discussed in Section 0 to
the eTOM [GB 921] processes that support SLA management.
In Figure H-1 the information flow of the first part of the creation of a service is
depicted.
1) Customer requirements are gathered either over time or as a result of a
Customer RFP. These requirements are for services with SLAs that do not
currently exist in the sense that the service is not yet offered, an SLA for
the service is not defined, or the Customer requirements exceed the
current SLA parameter definitions. This information flow includes the
service description for the base service (SAP, SAP technology, access
technology, speeds, overall circuit parameters (e.g. CIR), etc) and the QoS
parameters required by the Customer23.
SALES: would take the disparate requests from separate customers, examine the
current catalogue of services for close matches, estimate the business potential for
new Customers, additional revenue from existing Customers, plus the value of
retained revenue, and pass the requests to service planning and development.
Part of this function is sometimes performed in a group termed Product Line
Management or Marketing.
2) The Customer requirements combined with the potential market value for
this new service and the estimated service lifetime is passed to Service
Planning and Development.
SP&D: splits the requirements into operations-specific requirements and
network/technology requirements. (Service) Architectural alternatives for providing
23
See Chapter 3 for level of service and quality of service definitions.
the new service are weighed, and the operations impacts of the different
architectures are balanced against the different potential impacts of changing
service needs (emerging technologies that could drive Customer demand in a
different direction, e.g., VoIP, or Soft Switch). This creates preferences or priorities
being placed on the underlying technology requested from the network. These
requests are then sent to the Network Planning and Development process block.
The order of flows 3 and 5 can be parallel.
3) Detailed network requirements are forwarded to the Network Planning and
Development process block to obtain network costs (capital and expense),
and the time frame that would be required to implement the new service
with the technical SLAs as specified. This flow indicates all the technical
parameters (service-specific and technology-specific) needed for the SLA
support, including technology preferences (not hard and fast rules),
performance parameters, geographic requirements (potentially including
phasing), and time frame requirements.
NP&D: analyzes the new service requirements against its existing network
inventory and operations support structure including both network and operations
capacity and geographic coverage. During this analysis, NP&D examines the
current network experience for performance and reliability, and formulates a cost
to upgrade the network, field operations, and support systems to provide the new
service, including a realistic time frame. NP&D may need to flow data to all other
network layer process blocks to arrive at the estimate. It may also need to issue
RFI/Ps to get new equipment costs.
4) Cost (expense and capital) and time estimates are returned to SP&D.
These may include specific technology recommendations if during the
NP&D investigation it was determined that one technology was not mature
enough to support the QoS or Service Availability requested.
5) Optional: Query to SQM to obtain current network quality figures for
intended infrastructure and geography. May request Customer QoS
reports if responding to RFP or periodic Customer review.
SP&D: analyzes all returned data and determines the possible SLAs that will meet
the company's risk model.
7) SP&D returns the permissible SLA parameters with values to Sales (PLM)
with ranges of values, required financial returns necessary to cover risks,
geographic restrictions, and lead time before the SLAs may be offered.
8,9,10,11,12) Notices of new service parameters go to most of the rest of the
organization for inclusion into standard business practices.
Figure H-4 depicts the data flow during the Negotiation and Sales phase of an SLA
product. The end of this phase is a signed SLA with both Customer and SP
knowing exactly what is expected of each other, what is covered and when, and
what recourse is available.
2. Selling requests information about the Customer from Retention & Loyalty
(whether an existing Customer or not, Customer history, etc.).
4/5.Order Handling checks availability and feasibility with Service Configuration &
Activation and, if external service components are required, with S/P Buying.
7. The check is completed and Order Handling confirms the product availability to
Selling.
8. Selling enters into negotiations with the Customer and an offer with SLA details
is made to the Customer.
9. The Customer responds to the offer and further negotiations may take place
with Selling.
10. Selling may request additional details from Service Configuration & Activation.
11. Service Configuration & Activation may request further details from Resource
Provisioning.
12. Negotiations are completed, and the Customer signs a contract containing
agreed details of the QoS and SLA parameters.
2. The order enters the service configuration process. This may be through a
direct customer web interface, or through other electronic ordering channels
(i.e. this and step 1 may be the same step) or passed on from Order Handling
to Service Configuration & Activation.
3/4.The order, along with the SLA parameters, enters provisioning, starting the
applicable installation and provisioning timers. Service Configuration &
Activation configures the requested service instance(s) and sends the
appropriate requests both to Resource Provisioning for internal resources as
well as to S/P Buying for external service components required.
5. S/P Buying passes the order to S/P Purchase Order Management for S/P
delivery.
6. S/P Purchase Order Management issues orders for Suppliers and Partners.
11. Service Configuration & Activation tests the additional service instance(s) on
an end-to-end basis and ensures that service KQIs are supported. It updates
Manage Service Inventory with the new service instance(s) and their KQIs.
14. S/P Performance Management informs Service Quality Analysis, Action &
Reporting that monitoring has been initialized for external service components.
15. Service Quality Management informs Service Configuration & Activation that
monitoring has been initialized.
16. Service Quality Management sends details of the newly ordered service
instance(s) to Customer QoS/SLA Management for later SLA processing.
17. Service Configuration & Activation informs Order Handling that the service
instance(s) have been tested and activated and are ready for use.
18. Order Handling informs the Customer of the activated service instance(s), and
the Customer indicates acceptance. This may be electronic from the same
systems used in step 1.
19. After the Customer has indicated acceptance, Order Handling informs
Retention & Loyalty, Billing & Collections Management, and Customer
QoS/SLA Management that the service instance(s) have been activated and
are ready for use by the Customer.
Normal execution, also known as steady state, is the phase where the Customer
receives service on all the contracted and instantiated service instances. This
section first analyzes in Case A, a situation where no service outages or other
alerts occur and the Customer is billed for the service used (Figure H-6), and then
analyzes in Cases B and C, the situation where although service outages occur,
no outage exceeds either the individual or aggregated parameters set in the SLA
(Figure H-7 and H-8). In the first case of normal operation, a Supplier/Partner is
also involved; in the second case, the outages are within the Service Provider
enterprise and so do not involve a Supplier/Partner.
The steps shown in Figure H-7 for Cases B and C are as follows:
2C. Alarms that represent the failure of a component that affects the service of
one or more Customers. Resource Data Collection & Processing sends data
on alarms to Resource Trouble Management for further action.
10. If Resource Trouble Management has not been able to trigger automatic
resource restoration, Service Problem Management requests Service
Configuration & Activation to undertake the required corrective actions. (Steps
10 to 17 are therefore carried out only if automatic resource restoration did not
take place).
11. As the problems have been notified in the resource layer, Service
Configuration & Activation will require changes to be made to the underlying
infrastructure per contractual agreements. This requirement is sent to
Resource Provisioning for activation.
14. Resource Provisioning reports the results of the changes as well as the time
taken and all other infrastructure and operational parameters to Service
Configuration & Activation.
15. Service Configuration & Activation generates updates for Manage Service
Inventory.
18. Notifications and performance data are collected from the service-providing
infrastructure by Resource Data Collection & Processing.
19. Resource Data Collection & Processing sends performance data to Resource
Performance Management for further analysis.
23. Service Quality Management analyzes the resource performance reports and
sends a rectification report to Service Problem Management when it is
established that the troubles causing the Threshold Crossing Alerts or Alarms
have been resolved and that the service is meeting its KQIs.
24. Service Quality Management sends overall service quality reports to Customer
QoS/SLA Management so that it can monitor and report aggregate technology
and service performance.
From time to time, service conditions will exceed the parameters specified in the
SLA. At least two cases need to be examined, one where the SP detects the
outage first, and one where the Customer detects and reports it first. The second
case is depicted in Figures H-9 and H-10.
Service Problem Management then carries out one of the three following
alternatives:
Alternative a
9a. If there is no problem, Service Problem Management sends the actual service
performance to Problem Handling.
10a. Problem Handling informs the Customer of the actual service performance as
well as Retention & Loyalty for future reference and Customer QoS/SLA
Management so that any steps initiated can be terminated.
Alternative b
Alternative c
10c. Service Configuration & Activation will require changes to be made to the
underlying infrastructure per contractual agreements. This requirement will be
sent to Resource Provisioning for activation.
13c. Resource Provisioning reports the results of the changes as well as the time
taken and all other infrastructure and operational parameters to Service
Configuration & Activation.
14c. Service Configuration & Activation generates updates for Manage Service
Inventory.
17. Notifications and performance data are collected from the service-providing
infrastructure by Resource Data Collection & Processing.
18. Resource Data Collection & Processing sends performance data to Resource
Performance Management for further analysis.
20. Service Quality Management analyzes the resource performance reports and
sends a rectification report to Service Problem Management when it
establishes that the problem has been resolved and that the service is meeting
its KQIs.
21. Service Problem Management reports that the problem has been resolved to
Problem Handling.
22. Problem Handling informs the Customer and receives acknowledgement from
the Customer that the problem is resolved.
24. Customer QoS/SLA Management reports the violation rebate to Billing &
Collections Management for billing adjustment and to Retention & Loyalty for
future reference.
25. The Customer is notified in semi real-time about the actions taken on their
behalf.
26. Billing & Collections Management bills the Customer at the end of the billing
cycle with the SLA agreed treatment included.
H.6 Assessment
During the assessment phase, SLAs are examined to determine if they still fit the
business needs. There are several triggers for the assessment, including periodic
either per service or overall, Customer triggered reevaluation, Customer exit, etc.
Figure H-11 shows Case A where Customer SLA needs have changed, because
the Customer’s business needs have changed and there is no SLA meeting these
needs, leading to an assessment of the potential for an enhanced product SLA.
Figure H-12 shows Cases B and C where internal assessments at the CRM and
service layers lead to a realignment of infrastructure support for SLA parameters
and service KQIs respectively. In these flows, Level 3 processes from the
Operations Support & Readiness vertical are included for increased clarity.
2. Selling checks the significance of the Customer with Retention & Loyalty.
1B. Enable Customer Quality Management receives SLA reports for trend analysis
(mainly from Customer QoS/SLA Management). Enable Customer Quality
Management establishes that given SLAs are being violated too often, require
excessive rebates, and that the service KQIs are not supporting the Product
KQIs.
1C. Enable Service Quality Management receives service quality reports for trend
analysis (mainly from Service Quality Analysis, Action & Reporting). Enable
Service Quality Management establishes that the service being provided is not
meeting the required levels on an average basis.
10. Resource Data Collection & Processing sends performance data to Resource
Performance Management for further analysis.
13. Service Quality Management sends service quality reports to Enable Service
Quality Management for trend analysis, where it is established that the service
being provided is now meeting the required levels on an average basis.
This annex uses a sample business scenario to provide an illustration of the way in
which the method may be applied.
A customer uses her PDA to order flowers for delivery to her Aunt in hospital. To
achieve this she:
1) Subscribes to the service
2) Customizes the service
3) *Accesses the service
4) *Selects the appropriate option from the menu
5) *Chooses a bouquet
6) *Enters the delivery instructions and her message
7) *Enters her security PIN
8) *Receives confirmation of the transaction.
9) *Tracks progress on the florist’s delivery tracking service
10) Receives notification from the delivery service that the flowers have been
delivered and accepted and the cost has been added to her phone bill.
11) Checks her phone bill
Checks credit
card bill
First user
experience Receive
Find out Change
bill
about Service Personalise personalisation
service Activated service
Report
Cease
problem
Subsequent service
Provision Request usage
service for service Experience
customer
problem
Problem
Decide to buy resolved
Customer
advised
For the purposes of the following steps in the analysis example only the in-life
experience aspects of the timeline will be considered. However the methodology
for the analysis of the other phases of the timeline is the same. From the timeline
therefore we are able to identify the main activities of a subscriber using the Flower
Service.
Based on the timeline and the scenario, each the service resources in the service
delivery path required to deliver the Flower Service and the associated transaction
flows are identified to build an end-to-end service delivery diagram and used to
form the x and y axis of the transaction matrix.
It should be noted that the level of detail contained in the service delivery path
diagram (and hence the transaction matrix) is implementation specific and is driven
by the level of granularity required for managing the service. It should also be
noted that the granularity does not have to be consistent across the entire path
and will typically be at a greater level of detail for parts of the path that are
managed within the service provider’s business whereas components of the path
provided by a third party may be managed at a much higher level.
The following diagram provides a possible service path to support this example.
WAP DCN
http
Gateway
Server
CSD Bearer
BSC RXCDR DCN
MSC/VLR
BTS
This diagram identifies the key service resources for delivering the flower service.
In this case the components have been aggregated to a higher level and are
identified as:
1) GSM Radio Access Network
2) GSM Core Network
3) Content Access Network
4) Content Provider
In practice it is likely that a lower level of modeling would be used for some (or all)
of these components but in the interests of clarity these higher level aggregated
service resources are used in the following analysis.
Availability KQIs
Having completed the first pass of the matrix, it is now possible to identify a
number of KQIs that *may* be required for managing the service quality. Typically
the service resources across the x-axis indicate the components in the service
delivery path whose availability will impact service delivery. Therefore availability
KQIs need to be developed for these components. In this example these KQIs
are:
• Access_network_Availability
• Core_network_Availability
• Content_access_Availability
These in turn are aggregated to form a single KQI :
• fs_Availability
Accuracy KQIs
Using the same matrix, the accuracy KQIs may be derived by analyzing the
transaction arrows. Each arrow represents a potential point for measuring the
accuracy of the data transfer between the service resources or a service resource
and the end user. In this example the following lists the accuracy KQIs considered
necessary to support the flower service are:
• service_menu_transfer_accuracy
• flower_service_menu_transfer_accuracy
• picture_transfer_accuracy
• message_data_transfer_accuracy
• PIN_transfer_accuracy
• fs_menu_transfer_accuracy
• flower_service_menu_transfer_accuracy
In a similar way it is possible to identify from the transaction flows (arrows) the
‘success’ KQIs for the service. In this example these have been identified in the
matrix as:
• service_menu_access_success_rate
• transaction_success_rate
• session_completion_rate
• service_menu_access_success_rate
• fs_access_rate
• status_enquiry_success_rate
• session_completion_rate
Through the examination of the y-axis the critical request and response timings
may be identified. The analysis of this axis therefore provides the speed or time
KQIs these again may be aggregated to form higher level KQIs. For the Flower
Service, the following KQIs and their contributing KQIs have been identified (the
high level KQIs are shown as the major bullet item and their contributing KQIs as
sub-bullets):
• fs_connection_establishment_time
• service_menu_access_time
• service_menu_user_response_time (simplicity)
• fs_access_time
• fs_buy_transaction_time
• fs_menu_download_time
• flower_picture_transfer_time
• availability_confirmation_time
• delivery_instructions_confirmation_time
• PIN_confirmation_time
• order_confirmation_time
• fs_session_completion_time
• fs_status_check_transaction_time
• fs_access_time
• fs_menu_download_time
• fs_status_access_time
• fs_status_request_time
• fs_status_session_completion_time
SRa_Availability %
E2E_Service_Availability % SRb_Availability %
SRc_Availability %
SRa-SRb_Accuracy
SRa-SRc_Accuracy
E2E_Service_Accuracy % SRb-SRc_Accuracy
SRb-SRa_Accuracy
SRc-SRa_Accuracy
Service_Index SRc-SRb_Accuracy
Action1_ave_response_time*
Service_Transaction_completion_Time* Action2_ave_response_time*
Action3_ave_response_time*
User_Action1_Response_Time*
Action1_Response_Simplicity_Indicator
Note the use of some KQIs in computation of more than one higher level KQI.
This is quite deliberate and a key advantage in the methodology as it provides the
reuse of computed data. The measurements are shown in Figure I-4 in an
alternative format for clarity.
In some cases the speed KQIs may also provide some indication of the simplicity
of the service. For example the time for a user to respond to a prompt or menu
may be an indication of the clarity of that prompt or the menu structure.
Service Index
It can be useful to derive a single indicator for the overall service quality. A service
index of this form is an aggregation of all of the higher level KQIs. It should
however be noted that the service index is often computed from various types of
KQIs e.g. time, percentage, it is therefore necessary to normalise this data as part
of the aggregation algorithm. Normalisation is a complex subject and is beyond the
current scope of this document.
Based on the description of a KQI it is proposed that a KQI will be defined by the
following parameters:
Mandatory/ Weighting
Parameter Type Remarks
Optional Factor
Alpha-
Unique Identifier M NA
numeric
Measurement
O Ditto >0 - 1
Parameter 2
Measurement
O Ditto >0 - 1
Parameter n
Mandatory/ Weighting
Parameter Type Remarks
Optional Factor
Mandatory/ Weighting
Parameter Type Remarks
Optional Factor
Note 2: The Weighting Factor is optional. When applied it must be included for all
parameters. If omitted the effective weighting factor for all parameters are to be
considered as equal.
Note 3: If the time zone for parameter one is not stated then local time is assumed.
If the time zone for the subsequent parameters is not stated then the same time
zone for parameter 1 is applied.
sample sample
Parameter 1 sample 1 sample 2 sample 3 4 5 sample 6
sample
Parameter 2 1 sample 2 sample 3
KQI report 1
report 2
report 3
report 4
The following text and figure provides a detail breakdown of the derivation and
application of service based measurements based on the process framework of
the eTOM (version 3).
Resource
usage SLA
Templates Target Product Target Product
KQIs KQIs
Product
Product & Offer Proposition
SLAs and Development &
Capability Delivery Requirements Retirement
Service Plans
KQI/KPI
KQI usage mapping
Service
Target KQIs
Usage
Target KQI
Required Service KPIs for Resource
Element KPIs Actual Service Measurements Target KP I
Service Planning & definitions for
Element KPIs
Commitment Service
Supply Chain Resources
Capability
Resource & Ops. Availability
Service Resource
Capability
Element KPIs Development
Service Delivery Resource Data
Identify
Usage Resource Collection, Service Quality
Measurements Analysis & Analysis & Rpt
Control
Partner/S
upplier SLA Objectives
The starting point for the flow is the Selling Process that captures the customer
expectations and drives these into the Product & Offer Capability Delivery process
where the outline SLA requirements for the product are derived and provided to
the Product Development & Retirement process. This process defines the SLA
templates for the product and provides the templates to the Order Handling
process. These SLA templates form the outlines for accepting customer orders.
The product target KQIs that define the quality objectives of the product are
derived by the Product Development & Retirement process from the SLA
requirements and passed to both Order Handling and Sales Channel
Management to assist with the product definition used for customer interaction.
Product Development & Retirement also delivers the product target KQIs to the
Service Strategy & Policy process that in turn provides the associated policy for
the definition of technical services to Service Development & Retirement via the
Service Planning and Commitment process. The same product target KQIs are
provided for the Service & Ops Capability Delivery process to enable it to
determine the technical measures and external SLA requirements to be provided
for Service Development & Retirement.
Target KQIs are also provided directly to the Service Development & Retirement
process. This process ‘decomposes’ these product KQIs into discreet service
KQIs, the KPIs and other KQIs that form that service KQI. This process will map
the product KQIs using where possible, existing KQIs and KPIs and will where
necessary define new key indicators. Where new KPIs are required, these are
sent to Resource & Ops Capability Delivery process that decomposes the
performance indicators into internal and external service resources and drives the
Resource Development and Supply Chain development & Change Management
processes respectively where they are decomposed into service resource
measurements. These measurements are provided to Resource Data Collection,
Analysis & Control for the collection of data from the service resources.
Additional information provided for the Service Development & Retirement process
from Service Planning and consists of service planning and forecasting information
that is used to ensure that the service objectives are sustainable over the longer
term i.e. as service utilisation increases.
Service
Service Quality SM&O Service Quality
KQI Violations Configuration &
Analysis & Rpt Readiness Analysis & Rpt
Activation
KQI Violations
for root
cause analysis
Other
Achieved
Achieved KPIs
KPIs Resources
Resource
Restoration KPI Violations
The data that is processed by Resource Data Collection & Control is defined by
the target KPI definitions from the Service Development & Retirement process
and the resource measurements information identified by Resource Development.
This enables this process to aggregate the data and to pass it to the Resource
Quality Analysis Action & Reporting process that evaluates the real time
performance and usage data against the target KPIs. The process then provides
achieved KPI data to the RM&O Readiness, Service Configuration & Activation,
Service Quality Analysis & Reporting and SM&O Readiness processes. The KPI
violations are also provided to the Manage Internal SLA process (not explicitly
identified in the eTOM) to provide real time proactive management of SLA
focussed at the service resources. Data extracted by this process is also passed
to the S/P Performance Management process for the evaluation of quality levels of
parts of the service delivery chain that are outside of the provider’s business
boundary.
Customer QoS/
Manage Internal
SLA Problem Handling
SLAs
Management
KQI Violations
SM&O
Support & Process SLA violations
Management Analysis of aggregated
Target KQIs
forservice (ORT) KQI performance
KQI Violations
Service Problem
Target KQIs Management
forservice
SM&O Analysis of aggregated
KQI performance
Service Agrregated KQI Readiness R/T KQI
Development & performance Performance KQI Violations
Retirement
Customer QoS/
KQI Violations SLA
Management
R/T KQI
Achieved KPIs
Performance
Service Quality
Service Analysis & Rpt
Achieved KPIs
Configuration &
Service usage
Activation Supplier/Partner
KPI Violations SLA Violations
Resource Data
Service Resource
Collection, S/P Performance
Development & Problem
Analysis & Management
Retirement Management
Control
Service Quality Analysis & Reporting is the hub of the service assurance process
model. By mapping the KPI data from Resource Data Collection, Analysis &
Control and using the mapping and target information from Service Development
& Retirement, the process is able to calculate the actual values for each KQI and
to compare these against the objectives for that KQI. Additional information from
Supplier Partner Performance Management (where appropriate) provides the
complete picture of end-to-end service quality. The two key outputs of the Service
Quality Analysis & Reporting process are;
(the real time nature of this data enables proactive SLA management)
KQI violations
Customer
Customer Interface
Retention & Loyalty Management
SLA
Performance
SLA Performance
R/T Achieved KQI and violations
SLA
CRM Readiness Order Template Problem Handling violations
including SLAs Order Handling Service Problem
(Ordering Readiness) Customer Specific SLAs (Real Time) for Handling
RCA
SLA Template
SLA Template
Customer QoS/
SLA SLA
Violations SLA Violations
Analysis of aggregated Management
KQI performance
The KQI performance data and violations provided to the Customer QoS / SLA
Management process enable real time proactive management of specific
customer SLAs. By mapping the KQIs against the SLAs provided via the CRM
Readiness (Order Readiness) and Order Handling process, this process is able to
calculate actual SLA performance and to present this information to the customer
via Customer Interface Management. SLA violations or degradations are handled
through the interface with Problem Handling and Sales & Channel Management
processes and, where appropriate, corrections to customer billing is invoked by
Resource Data
Service Resource & Ops.
Collection,
Capability
Development & Analysis &
Delivery
Retirement Control
Partner/Supplier
SLA Objectives S/P Problem
KQIs Reporting &
Management
Supplier/Partner
Performance Data
Supplier/Partner
SLA Violations
External to Organisation:
Supplier/Partner
Suppliers & Partners Supplier/PartnerSLA Violations
S/P Performance
Service Quality SLA Performance
Analysis & Rpt data
Service Quality
Analysis & Rpt
Where service resources outside of the provider’s business boundary are used to
complete the service delivery chain, Supplier / Partner service levels are managed
based on the KQIs provided by Service Development & Retirement and
processed, by Supply Chain Capability Availability and Supply Chain Development
& Change Management, into Supplier Partner SLA objectives. From the data
provided by Resource Data Collection Analysis & Control and S/P Service Quality
Analysis & Reporting, S/P Performance Management (and SLA Management) is
able to track Supplier / Partner service levels and to provide performance data into
Supplier / Partner Problem Reporting for tracking of service level problems, into
Service / Partner Settlements and Billing Management for accounting (rebate)
purposes and to Service Quality Analysis & Reporting for overall end-to-end
service level management.
Administrative Appendix
<<If there is anything unique about the life cycle, explain it here. For example, the
Operating Plan is to be updated at least twice a year, to show a rolling 12-month
forward view of programs. If nothing special, just put in the next paragraph.>>
Document History
Version History
<This section records the changes between this and the previous document
version as it is edited by the team concerned. Note: this is an incremental number
which does not have to match the release number>
Member Review Version April 2004 Submitted for TMF Member Review
2.0
Team Version 2.2 January 2005 Modified template for next version
generation.
Team Version 2.4 10 June 2005 Tina O'Sullivan Updated Logo & address, other minor
cosmetic changes. Version to be sent
to Approvals Committee.
Member evaluation 20-July-05 Tina O’Sullivan Final modification prior to web posting
Version 2.5 for Member Evaluation.
Release History
<This section records the changes between this and the previous Official
document release>
Document History
This document version is available from the TMF Central Web Site.
Readers are encouraged to provide comments to the SLA Management Team via
email.
In order to reduce file size, comments should reference only relevant paragraphs,
including identifying paragraph headers.
1) Be specific. Consider that you might be on a team trying to produce a
single text through the process of evaluating numerous comments. We
appreciate significant specific input. We are looking for more input than a
“word-smith”, however, structural help is greatly appreciated where clarity
benefits.
2) What to look for: errors, omissions, lack of clarity and missing references to
other accomplished work (please cite exact applicable section(s) of other
documents).
3) Suggestions to further develop or expand a specific area are welcome.
However, the reader should be aware that this work is intended to be a
Acknowledgments