T Rec E.840 201806 I!!pdf e
T Rec E.840 201806 I!!pdf e
T Rec E.840 201806 I!!pdf e
ITU-T E.840
TELECOMMUNICATION (06/2018)
STANDARDIZATION SECTOR
OF ITU
INTERNATIONAL OPERATION
Definitions E.100–E.103
General provisions concerning Administrations E.104–E.119
General provisions concerning users E.120–E.139
Operation of international telephone services E.140–E.159
Numbering plan of the international telephone service E.160–E.169
International routing plan E.170–E.179
Tones in national signalling systems E.180–E.189
Numbering plan of the international telephone service E.190–E.199
Maritime mobile service and public land mobile service E.200–E.229
OPERATIONAL PROVISIONS RELATING TO CHARGING AND ACCOUNTING IN THE
INTERNATIONAL TELEPHONE SERVICE
Charging in the international telephone service E.230–E.249
Measuring and recording call durations for accounting purposes E.260–E.269
UTILIZATION OF THE INTERNATIONAL TELEPHONE NETWORK FOR NON-
TELEPHONY APPLICATIONS
General E.300–E.319
Phototelegraphy E.320–E.329
ISDN PROVISIONS CONCERNING USERS E.330–E.349
INTERNATIONAL ROUTING PLAN E.350–E.399
NETWORK MANAGEMENT
International service statistics E.400–E.404
International network management E.405–E.419
Checking the quality of the international telephone service E.420–E.489
TRAFFIC ENGINEERING
Measurement and recording of traffic E.490–E.505
Forecasting of traffic E.506–E.509
Determination of the number of circuits in manual operation E.510–E.519
Determination of the number of circuits in automatic and semi-automatic operation E.520–E.539
Grade of service E.540–E.599
Definitions E.600–E.649
Traffic engineering for IP-networks E.650–E.699
ISDN traffic engineering E.700–E.749
Mobile network traffic engineering E.750–E.799
QUALITY OF TELECOMMUNICATION SERVICES: CONCEPTS, MODELS, OBJECTIVES
AND DEPENDABILITY PLANNING
Terms and definitions related to the quality of telecommunication services E.800–E.809
Models for telecommunication services E.810–E.844
Objectives for quality of service and related concepts of telecommunication services E.845–E.859
Use of quality of service objectives for planning of telecommunication networks E.860–E.879
Field data collection and evaluation on the performance of equipment, networks and services E.880–E.899
OTHER E.900–E.999
INTERNATIONAL OPERATION
Numbering plan of the international telephone service E.1100–E.1199
NETWORK MANAGEMENT
International network management E.4100–E.4199
Summary
Recommendation ITU-T E.840 is the first in a series covering benchmarking of end-to-end network
performance. Recommendation ITU-T E.840 presents a framework for the statistical analysis
underlying performance benchmarking of networks and services. The framework describes
benchmarking scenarios, use cases, as well as procedures and statistical techniques for ranking end-
to-end key performance indicators (KPIs) or key quality indicators (KQIs). Recommendation ITU-T
E.840 refers to mobile services and benchmarking campaigns performed using mobile agents (devices)
in drive or walk tests, as well as fixed agents or devices placed at fixed locations (e.g., within shopping
malls, office buildings or stadia).
History
Edition Recommendation Approval Study Group Unique ID*
1.0 ITU-T E.840 2018-06-13 12 11.1002/1000/13621
Keywords
End-to-end performance, network performance benchmarking and ranking, statistical framework.
* To access the Recommendation, type the URL http://handle.itu.int/ in the address field of your web
browser, followed by the Recommendation's unique ID. For example, http://handle.itu.int/11.1002/1000/11
830-en.
NOTE
In this Recommendation, the expression "Administration" is used for conciseness to indicate both a
telecommunication administration and a recognized operating agency.
Compliance with this Recommendation is voluntary. However, the Recommendation may contain certain
mandatory provisions (to ensure, e.g., interoperability or applicability) and compliance with the
Recommendation is achieved when all of these mandatory provisions are met. The words "shall" or some other
obligatory language such as "must" and the negative equivalents are used to express requirements. The use of
such words does not suggest that compliance with the Recommendation is required of any party.
ITU 2018
All rights reserved. No part of this publication may be reproduced, by any means whatsoever, without the prior
written permission of ITU.
1 Scope
The Recommendation specifies a statistical framework, as well as benchmarking scenarios and
conditions within which it can be applied, whose use is required by operators and regulators when
qualifying and quantifying performance differences between end-to-end key performance indicators
(KPIs) or key quality indicators (KQIs) affecting the user experience.
The need for this Recommendation arises because, in the intense race to satisfy increasingly
demanding existent users while expanding customer bases at optimal cost, operators have improved
network performance to such an extent that differences between them have become smaller and
smaller.
2 References
The following ITU-T Recommendations and other references contain provisions which, through
reference in this text, constitute provisions of this Recommendation. At the time of publication, the
editions indicated were valid. All Recommendations and other references are subject to revision;
users of this Recommendation are therefore encouraged to investigate the possibility of applying the
most recent edition of the Recommendations and other references listed below. A list of the currently
valid ITU-T Recommendations is regularly published. The reference to a document within this
Recommendation does not give it, as a stand-alone document, the status of a Recommendation.
[ITU-T E.800] Recommendation ITU-T E.800 (2008), Definitions of terms related to quality of
service.
[ITU-T E.804] Recommendation ITU-T E.804 (2014), KPI aspects for popular services in mobile
networks.
3 Definitions
None.
5 Conventions
5.2.1 StatScore: This represents statistical score, i.e., the relative overall quality across various
networks or operators against the best performing network. StatScore is calculated per service.
6 Benchmarking scenarios
Network benchmarking generally has two main use cases: internal and competitive. Internal
benchmarking is focused on continuous cost-efficient network performance assurance and
improvements requiring evaluation on initial roll-out of a network as well as during its development,
as well as new service and new device launches. Internal benchmarking is also performed in well-
established and mature networks. In addition, regions with highways and cities, as well as areas of
interest (e.g., workplaces, shopping malls, stadia and residential premises) require consideration
during evaluation campaigns. Competitive benchmarking performed by operators themselves (or
service companies on behalf of operators), as well as by regulators for checking out competition and
self-ranking is generally used across regions, transport routes (highways, railways) and cities, even
countries in the case of multinational operator groups and for mature networks.
A summary of these use cases, recommended types of tools and techniques are presented in Figure 1.
Areas (such as shopping malls, stadia and workplaces, generally indoors) are often benchmarked
using walk tests. Besides traditional route-based drive or walk testing campaigns, internal, and to
some extent competitive benchmarking, as well as indoor scenarios benefit from fixed probe-based
tools. The latter having the advantages of fast and remote scalability and of device independence.
Therefore, these tools are very suitable for indoor test scenarios and new services launched in areas
of interest, and cities to some extent. In addition, it can be seen that either a posteriori or a-priori
analysis techniques can be applied. In the first case, mostly used in benchmarking, data are collected
and statistical significance is used to evaluate and rank end-to-end KPI or KQI performance; the
measurement accuracy is, by default, embedded in the statistical significance level. A-priori
techniques involve advance calculation of the number of test probes needed for a specified statistical
significance and measurement accuracy. This technique is generally used when test probes are costly
or testing time is limited.
7 Benchmarking conditions
Regardless of the use case, the benchmarking framework needs to rely on a set of prerequisites that
should ensure consistency, validity, reliability and repeatability. Table 1 presents these prerequisites
for each benchmarking phase: equipment set-up, test configuration, data collection, data processing
and analysis. It should be noted that Table 1 refers to the minimum required prerequisites in order to
ensure a fully controlled test environment, as well as a valid statistical analysis. Specific details of
measurements are given in other ITU-T Recommendations (e.g., [ITU-T E.804]).
.8 Benchmarked services
The list of traditionally benchmarked mobile services and their KPIs or KQIs along with their
triggering points lie outside the scope of this Recommendation. See [ITU-T E.800] and [ITU-T E.804]
for more details.
If the scope of mobile benchmarking is to perform a detailed comparative analysis per service, a task
generally undertaken during the internal benchmarking use case (e.g., scenarios such as new device,
new technology add-on), then it is recommended that a comprehensive set of end-to-end KPIs and
KQIs be used for analysis (here, KQIs are measurements obtained by using quality-estimation
models, such as [b-ITU-T P.863] for voice or [b-ITU-T P.1203] for video streaming). In addition, it
is recommended that the main root causes of possible poor performance be analysed based on this
set.
On the other hand, if end-to-end KPI or KQI performance ranking is the goal of mobile benchmarking,
a task generally undertaken during the comparative benchmarking use case, as well as in some internal
benchmarking scenarios (such as market comparisons, periodic market performance evaluation), then
scoring and ranking can consider a smaller set of KPIs or KQIs impacting quality of experience (QoE;
see [b-ITU-T P.10/G.100]) per service and across all benchmarked services.
This Recommendation refers to the latter case of competitive benchmarking. Other ITU-T
Recommendations in the benchmarking series cover details of sets of such KPIs or KQIs.
9 Statistical framework
The recommended framework aims to score and rank network end-to-end performance from a user
perspective and it can be used for both competitive and internal benchmarking. The framework
defines procedures for data validation, statistical evaluation metrics and significance testing, as well
as general guidelines for ranking and scoring.
9.3 Statistical performance metrics, standard errors and statistical significance of the
benchmarking results
9.3.1 Statistical performance metrics
Benchmarking analysis should be based on statistical performance metrics that reflect the average
network performance (represented by mean values, m) or its consistency (represented by the
probability Pth to be above a pre-defined threshold value). This Recommendation refers to the mean
statistical performance metric, as an example. Similar techniques can be applied for consistency.
9.3.2 Standard error
The standard error at the 95% confidence level for the mean or Pth is calculated assuming a gaussian
distribution of the KPIs or KQIs measured (see clause 9.2).
Therefore, depending on the type of KPI or KQI, continuous scores such as radio frequency (RF)
parameters or MOS or discrete ones such as success to failure ratio (r), the standard error at the 95%
confidence level is given by:
StdError(m)=z_95% *std/sqrt(N)= 1.96*std/sqrt(N)
StdError(r)= z_95% * sqrt ( r*(1-r)/N) = 1.96 * sqrt ( r*(1-r)/N)
If fewer than 30 samples are available, then the gaussian quantile z_95% should be replaced by the
Student t_95% (N-1), tabulated value, where N represents the number of available samples.
NOTE – Standard errors represent the measurement accuracy. Therefore, if a specific accuracy is required and
an estimate for the standard deviation is known, then the minimum required number of samples to meet that
accuracy with a selected confidence level can be determined, based on the equations in paragraph 2. This can
be used for an a-priori technique as shown in Figure 1 and is also used in [b-ITU-T E.802] for the calculation
of the minimum number of samples.
This kind of analysis applied to detailed benchmarking results may be extended to various services
as well as a larger set of KPIs or KQIs per service, as mentioned in clause 8. In addition, based on
statistical significance results (Z statistics @ 95%CL, Table 2), individual KPIs or KQIs may be
ranked across networks as described in clause 9.4.
It must be noted that claiming that one network or service configuration can be considered as "better"
than another requires – besides statistical significance – a KPI- or KQI-specific relevant difference-
threshold definition and measurement accuracy information, as defined in other ITU-T
Recommendations in the benchmarking series.
9.3.4 Results reporting
The benchmarking statistical analysis and results must be reported along with the detailed description
of the test scenarios and conditions used for benchmarking; otherwise the interpretation of results can
be wrong and consequently meaningless.
Table 3 – Example of statistically significant end-to-end KPI or KQI scoring and ranking
Network 1 Network 2
Benchmarking analysis refers to the comparison between KPIs or KQIs describing the performance
of various operator networks. Meaningful comparison should rely on statistical significance tests
(hypothesis tests) that depend on the types of KPIs or KQIs compared, continuous (e.g., MOS, RF
parameters) and ratios (e.g., completion or failure ratios).
In the first case, Equation A-1 determines significant difference [ITU-T P.1401]:
Z = StatDiff/sqrt(std1^2/N1 +std2^2/N2) > Zth (A-1)
where StatDiff denotes the difference between compared metrics, std1 and std2 their standard
deviations and N1 and N2 the total numbers of samples used in the comparison for each metric. In
other words, if Z is higher than Zth (based on a gaussian distribution for more than 30 samples, at
CL% confidence level), then StatDiff is a statistically significant difference at CL% confidence level.
In the second case, the KPI or KQI ratio type is described by p number of successes or failures out of
the total number of samples. The significant difference is given by Equation A-2 [b-ITU-T P.1401]
Z = StatDiff/sqrt(p1*(1-p1)/N1 + p2*(1-p2)/N2) > Zth (A-2)
where p1 and p2 represent the numbers of successes or failures of each of the compared metrics.
Table A.1 shows the mapping between the significance thresholds Zth at different levels of
confidence.
If fewer than 30 samples are available, then a t-Student distribution should be used, in which t-Student
(n) where n = N-1 is the number of degrees of freedom, with a total number N of test samples.
It should be noted that, along with statistical significance, KPI- or KQI-specific relevant difference-
thresholds must be used, whenever the differences are irrelevant or possibly within each KPI or KQI
accuracy of measurement. KPI- or KQI-specific relevant difference-thresholds (THrelv) are defined
in other ITU-T Recommendations in the benchmarking series.
This annex describes the algorithm used to score and rank end-to-end performance of networks used
in the calculations in Table 3.
– Calculate end-to-end KPIs (KQIs can also be used) for the service analysed for each network
or operator:
• KPI_1...KPI_i…KPI_N, i=1,n which can be either average or median or proportions
(ratios).
– Build a benchmarking matrix for j=1,M networks (operators) per service described by N KPI
or KQI metrics – see Table B.1.
Sometimes it may be desirable that an overall network statistical score be determined. To this end,
network performance per service type is often used as an underlying criterion. Such a score should
be reported for each area as well as aggregated across regions, as shown in Figure 1.
The statistical score per service may be defined by all considered end-to-end KPI_i or KQI_i (i=1,N)
metrics affecting the overall end-to-end quality of the service analysed. Thus, the score can be defined
by a weighted sum of the StatDiff_i (see Annex B) of each KQI versus those best performing, as
described in clause 9.4. If available, the StatDiff_i value is corrected in terms of the relevant
difference threshold values. The final outcome, the StatScore, describes end-to end performance of
the compared networks against the best performing network.
StatScore = (w_i*StatDiff_i)
Here, w_i is the weight allocated to each KPI or KQI metric contributing to the quality of the service.
The lower the StatScore, the better the performance (or closer to the best performing network) and
the corresponding rank.
Table I.1 is a new version of Table 3, in which some examples of weightings have been added; these
are just informative, since weight definition lies outside the scope of this Recommendation. The
weightings can also be unitary if it is decided that all KPIs or KQIs have equal importance in the
overall statistical score of network performance. However, it should be noted that, even if equal
unitary weights are considered, if the number of KPIs or KQIs is increased or decreased, the statistical
score of the network can change and provide different statistical results.
Therefore, the statistical scoring and ranking of a network in this Recommendation is only valid with
a detailed description and motivation of the selection and underlying selected weights of the KPIs or
KQIs. Without this transparency, the statistical scoring and ranking of the network is not valid.
For the example in Table I.1, network 1 receives the best rank 1, with a minimum score of 2.15, based
on the given weights. In addition, it can be noted that in all cases, the differences between the KPIs
or KQIs coming from the two networks are higher than the appropriate THrevl, meaning that the
results of the statistical significance analysis are valid.
[b-ITU-T E.802] Recommendation ITU-T E.802 (2007), Framework and methodologies for
the determination and application of QoS parameters.
[b-ITU-T P.10] Recommendation ITU-T P.10/G.100 (2017), Vocabulary for performance
and quality of service.
[b-ITU-T P.863] Recommendation ITU-T P.863 (2018), Perceptual objective listening
quality prediction.
[b-ITU-T P.1203] Recommendation ITU-T P.1203 (2017), Parametric bitstream-based quality
assessment of progressive download and adaptive audiovisual streaming
services over reliable transport.
[b-ITU-T P.1401] Recommendations ITU-T P.1401 (2012), Methods, metrics and procedures
for statistical evaluation, qualification and comparison of objective quality
prediction models.
[b-ETSI TR 102 581] ETSI TR 102 581, V1.2.1 (2015), Speech processing, transmission and
quality aspects (STQ); A study on the minimum additional required
attenuation on the antenna path of the field test equipment.
https://www.etsi.org/deliver/etsi_tr/102500_102599/102581/01.02.01_60/tr_102581v010201p.pdf
Series E Overall network operation, telephone service, service operation and human factors
Series F Non-telephone telecommunication services
Printed in Switzerland
Geneva, 2018