A New Approach For Testing Voice Quality
A New Approach For Testing Voice Quality
A New Approach For Testing Voice Quality
What is sQLEAR? 5
More questions? 11
WHITE PAPER
Why sQLEAR?
To meet the needs of today’s evolving mobile networks,
there is a growing need for flexible, real-time, automated
QoE-centric service evaluation, troubleshooting, and
optimization. This has been driven by a number of
factors. First, the volume of 4G subscribers has grown
dramatically. Second, the range of 4G services has also
increased. Finally, 5G network rollout is underway, bringing
significantly increased network complexity, an even greater
number and a larger variety of devices as well as more
service diversity.
deployments, presenting a continuously persistent threat Obtaining accurate, easy to implement and controlled
to MNOs’ voice service revenue. The powerful MNOs’ voice QoE predictors, as well as securing the ability to act
counter to this threat is the expansion of the carriers VoLTE on these in real-time, are thus crucial for enabling cost
services. The GSA report, April 2018, states that VoLTE efficient, optimized network operations that will meet and
has now been launched in more than 145 networks in over maintain customer expectations and demands.
70 countries across all regions. The findings presented in
Infovista’s sQLEAR voice QoE predictor is a new and
the Ericsson Mobility Report, June 2018, show that at the
unique solution, which is specifically designed to answer
end of 2017, VoLTE subscriptions exceeded 610 million.
The Ericsson’s findings also project that the number of these concerns and goals, benefiting thus MNOs and
VoLTE subscriptions will reach 5.4 billion by the end of regulators alike. It provides cost-effective and accurate
2023, accounting for around 80 percent of combined LTE evaluation of voice quality trends and enables predictions
and 5G subscriptions. Figure 1 shows VoLTE subscriptions of future QoE delivery. In addition, it can be used to
prediction per region. perform monitoring, benchmarking, and troubleshooting
of MNO voice services. sQLEAR ensures operational
efficiency for MNOs through effective troubleshooting of IP
networking and the underlying transport layers.
4
WHITE PAPER
5
WHITE PAPER
Network centric
Reference voice sample prediction
QoE
Mos predictor
Machine learning
sQLEAR uses state-of-the-art machine learning to build obtained from the analysis of real-life field data, collected
a model that describes the speech quality perceived by during a significant number - and broad diversity - of drive
users based on all of these information resources (Figure 2) tests in different locations, conditions, and in a number of
MNO networks. The standardized EVS VoIP client ensures
The unique machine learning techniques offer three
that all devices with embedded EVS exhibit the same
clear advantages. First, the complexity of the inter-
behavior. As a result, sQLEAR is completely transparent to
dependencies between all network/codec/client
and independent of the devices used in testing.
parameters, as well as their significance in impacting the
speech quality, is better described and processed by Figure 3 depicts the simulation chain. A reference audio
machine learning algorithms than by the multi-dimensional file is injected into the simulation process, and coded with
optimization techniques required for the estimation of new different EVS codec settings for bandwidth, codec rate
coefficients of multi-variable non-linear functions, which and channel-aware mode. The resulting VoIP file output
are generally used for parametric voice QoE evaluation includes audio packets coded together, with an ideal arrival
algorithms. Second, any time changes that emerge time increasing by 20ms for each packet. Network errors,
from the introduction of new codecs/clients need to be in the form of jitter and packet loss patterns, are applied to
accounted for, machine learning techniques are more the coded audio to simulate degradations that may occur
flexible and quicker to tune. This provides a significant in an all-IP network.
advantage from the perspective of implementing the
To simulate the jitter and packet loss behavior of a radio
algorithm and ensuring operational efficiency. Third, there
and core network, jitter files are created by using a
is no need for additional calibration to the MOS scale
combination of simulations and drive test data. A large set
using first or third order polynomials, because the machine
of databases, spanning approximately 120,000 samples
learning based algorithm “learns” the precise MOS scale
and covering a broad range of conditions that generate
that it needs to predict.
voice degradations for the entire voice quality range, have
been generated. These conditions include:
What is the sQLEAR •• “Live (drive test) data modulated with simulations”,
6
WHITE PAPER
•• “Random packet loss and random jitter”, to handle As a machine learning-based technique, the inputs
reordering of packets; and that sQLEAR uses are defined by features which are
•• “Manually designed packet loss”, to simulate mobile aggregated from basic network parameters. sQLEAR
devices that move in and out of coverage, which uses machine learning both for the creation and selection
results in long and short consecutive packet loss. of features, as well as for the QoE prediction performed
which, in turn, is based on the selected features.
By applying the jitter files and simulating network There are two sources of features. First, the information
degradations, EVS frames are removed when there is derived from the RTP stream generated by the simulated
packet loss and the arrival time of the frames are changed jitter buffer implementation. Second, statistical measures
relative to the jitter file. The new Jittered VoIP file is then built from the RTP stream, which proved through extensive
submitted to the EVS jitter buffer. It is decoded and time testing to have a significant impact on the accuracy of the
scaled, which produces a degraded audio file. Finally, algorithm in comparison to ITU-T P.863.
the degraded audio file is graded using ITU-T P.863 and
There are a number of factors that play an important role
compared it to the original reference, resulting in a MOS
in the MOS estimation process. These include speech
score.
content and frequency; and the duration of silence, as
As a result, each simulated jitter file that describes a well as its distribution within voice samples. These have
network condition has a corresponding degraded audio significant impact on the performance of sQLEAR and,
file and an associated MOS score. The 120,000 samples as a result, the accuracy of QoE prediction. Therefore, to
represent the databases used for sQLEAR learning and improve sQLEAR performance further, audio reference-
evaluation, with a 50%-50% split, as recommended by based features are also used.
current academic research in machine learning (see
These features are weighted based on the location of
“Handbook of Statistical Analysis and Data Mining
the feature (“position-based feature”) in the reference
Applications”, by Elsevier Publisher, 2009)
voice sample. The position is described either by a
sQLEAR uses a combination of bagged decision trees and feature giving the position of, for example, a dip in packet
SVM, (support vector machine) machine learning algorithm loss or by weighting the number of frame erasures. The
categories. These proved to provide the best performance most successful weighting function proved to be the
(for correlation and prediction error) when compared to rms (root mean square) of each 20ms voice frame in the
ITU-T P.863, a point addressed under the question of reference voice. It should be noted that information on the
accuracy later in this paper. reference voice is used, and not the recording. In addition,
Packet
Jitter
loss
EVS Decoding
Reference EVS Network ITU-T
Coding Jitter Time MOS
audio (Radio and core) Decoding P.863
buffer scaling
7
WHITE PAPER
In order to ensure that the jitter files are independent of As a QoE predictor of MOS, sQLEAR has to meet ITU-T
codec and reference voice sample, and to simplify feature requirements for test set-up, run-time, and/or measurement
creation, sQLEAR performs two pre-processing operations: [P.863.1]. However, in order to ensure the best performance
DTX cleaning and the addition of codec information (e.g. of the predictor against ITU-T P.863, some specifics need
rate, mode ). The output of the pre-processing is a new, to be considered.
“DTX cleaned”, jitter file, which also contains codec audio
Reference voice samples. The machine learning-based
payload size and Channel Aware mode data (see Figure 4).
QoE predictor has been trained using one or more
The DTX cleaning pre-processing handles the fact that reference files. Therefore, during run-time, when the
DTX periods, which occur during silence, do not impact the algorithm is deployed, the same reference file(s) must
perceived voice quality. Consequently, the pre-processing be injected to the network under test. The measurement
operation creates a new jitter file with no packet loss methodology and reference file(s) requirements follow
and no jitter during DTX periods, which greatly simplifies ITU-T P.863/P.863.1 specifications.
feature creation.
Pre-processing during run-time. The pre-processing
Add–on codec information represents the second pre- phase is performed in the same way as in simulations,
processing step. The codec information is added to the as described above. In addition, during run-time, the pre-
“DTX cleaned” jitter file. This consists of the audio payload processing synchronizes the reference voice and the IP/
size and Channel Aware mode indication. Packet payload RTP stream to ensure that the position-based features
size indirectly provides both codec rate and indicates if reflect the reference voice sample(s). This is performed by
DTX was used. This information is given for every packet, correlating the pattern of DTX and voice frames with the
since each can change. reference voice sample, which is stored at the receiving
side. It should be noted that no recorded voice is needed.
To summarize, the pre-processing handles codec-specific
operations, such as DTX, codec rate, and Channel Aware Measurement procedure. sQLEAR uses the same test set-
mode, and consequently ensures that the jitter files are up as ITU-T P.863, but without the need for the recorded
both codec and reference voice sample independent. degraded voice sample as has already been explained. In
addition, the recorded speech can be saved for further off-
line analysis., if needed or desired.
+
Pre-processing added to each packet
of Jitter files before
applying them to the
VoIP file VoIP file VoIP file
•• EVS coded packets EVS packets with some
in VoIP file format packets missing and
changed time stamps
8
WHITE PAPER
Pre-process
•• Arrival time
Reference Network
IP stream
•• Seq no DTX cleaned Calculate Pretrained Predicted
audio and Devices •• Size Jitter file Features ML model MOS
•• DTX-clean
•• Syncronize
9
WHITE PAPER
However, because both P.863 and sQLEAR support VoLTE sQLEAR is uniquely based on machine learning, which
service evaluation, quality trends provided by sQLEAR ensures faster and easier adaptation to new environments,
and P.863, and determined based on a large number such as new codec/clients, and network parameters. This
of samples collected using the same voice references can be accomplished without new and costly subjective
and identical network conditions, are expected to be training sequences, which are required by perceptual
the same with a high statistical significance confidence intrusive models (e.g. ITU-T P.863).
level. However, it should be noted that test devices which
sQLEAR avoids device specific degradations caused by
exhibit a particularly strong voice frequency characteristic
the audio path of a mobile device and focuses on the
could be significantly penalized or favored by perceptual
packet-based radio and core network. In addition, it does
models, such as ITU-T P.863. These effects are outside
not use recorded speech, since that will also reflect the
the network and therefore do not represent the network specific device used as a measurement unit, instead of
centric voice quality performance, which in turn sQLEAR network performance. Therefore, sQLEAR predicts speech
is designed to predict. Consequently, in these special quality efficiently and effectively under the following
scenarios, depending on the strength of the device’s voice circumstances and conditions:
frequency characteristic, expected and rightful differences
can be detected between sQLEAR and ITU-T P.863. •• Independently of device acoustical characteristics
(unlike P.863);
What are the differences each device (unlike P.863), which eliminates costly
speech-based, like perceptual intrusive and non-intrusive high quality voice services since it renders speech
algorithms (e.g. ITU-T P.863, P.563), nor solely parametric quality scores comparable between different device
models.
based and non-intrusive such as ITU-T P.564. In addition,
sQLEAR has been designed for the evaluation of VoLTE As previously mentioned, the sQLEAR measurement
service’s quality, while ITU-T P.563 and P.564 has not. procedure is the same as for ITU-T P.863, in the sense
10
WHITE PAPER
that it sends a reference speech sample to the system evaluation and are part of ITU work. sQLEAR is based on
under test and predicts voice QoE from the combination the ongoing ITU work item, ITU-T P.VSQMTF “Voice service
of output from the device and the sent reference sample. quality monitoring and troubleshooting framework for
However, the output is different. In the case of sQLEAR, intrusive parametric voice QoE prediction”.
the output from the device is the RTP stream, while in
case of ITU-T P.863, it is the recorded audio.
More questions?
What are the similarities For more questions contact us at:
11
About Infovista
Infovista, the leader in modern network performance, provides complete visibility and unprecedented
control to deliver brilliant experiences and maximum value with your network and applications. At the core
of our approach are data and analytics, to give you real-time insights and make critical business decisions.
Infovista offers a comprehensive line of solutions from radio network to enterprise to device throughout
the lifecycle of your network. No other provider has this completeness of vision. Network operators
worldwide depend on Infovista to deliver on the potential of their networks and applications to exceed user
expectations every day. Know your network with Infovista.