J Real-Time Image Proc
DOI 10.1007/s11554-017-0685-4
ORIGINAL RESEARCH PAPER
A unified architecture for fast HEVC intra-prediction coding
Damian Ruiz1 • Gerardo Fernández-Escribano1 • José Luis Martı́nez1
Pedro Cuenca1
•
Received: 5 July 2016 / Accepted: 23 March 2017
Ó Springer-Verlag Berlin Heidelberg 2017
Abstract The high efficiency video coding (HEVC) is the
new video coding standard, which obtains over 50% bit
rate savings compared with H.264/AVC for the same perceptual quality. Intra-prediction coding in HEVC achieves
high coding performance in expense of high computational
complexity, due to the exhaustive evaluation of all available coding units (CU) sizes, with up to 35 prediction
modes for each CU, selecting the one with the lower rate
distortion cost, among other new features. This paper
presents a Unified Architecture to form a novel fast HEVC
intra-prediction coding algorithm, denoted as fast partitioning and mode decision. This approach combines a fast
partitioning decision algorithm, based on decision trees,
which are trained using machine learning techniques, and a
fast mode decision algorithm, based on a novel texture
orientation detection algorithm, which computes the mean
directional variance along a set of co-lines with rational
slopes using a sliding window over the prediction unit.
Both algorithms proposed apply a similar approach,
exploiting the strong correlation between several image
features and the optimal CTU partitioning and the optimal
prediction mode. The key point of the combined approach
& Damian Ruiz
[email protected]
Gerardo Fernández-Escribano
[email protected]
José Luis Martı́nez
[email protected]
Pedro Cuenca
[email protected]
1
Instituto Investigación en Informática de Albacete,
Universidad de Castilla-La Mancha, Av España S/N,
02071 Albacete, Spain
is that both algorithms compute the image features with
low complexity, and the partition decision and the mode
decision can also be taken with low complexity, using
decision trees (if-else statements) and by selecting the
minimum directional variance between a reduced set of
directions. This approach can be implemented using any
combination of nodes, obtaining a wide range of time
savings, from 44 to 67%, and light penalties from 1.1 to
4.6%. Comparisons with similar state-of-the-art works
show the proposed approach achieves the best trade-off
between complexity reduction and rate distortion.
Keywords HEVC Intra-prediction Machine learning
Texture orientation Directional variance
1 Introduction
The new video coding standard, known as high efficiency
video coding (HEVC) [1], has been approved by the Joint
Collaborative Team Video Coding (JCT-VC) working
group, from ITU and ISO organizations. HEVC has already
replaced the successful H.264/AVC standard [2], especially
for high-resolution formats beyond HD, like the ultra-high
definition formats termed as 4 and 8 K. The new HEVC
video coding tools allow to achieve bit rate savings of over
50% compared to H.264/AVC, for the same objective
video quality [3]. Furthermore, HEVC also outperforms
other widespread video codecs used in the Internet such as
VP8 and VP9 [4].
Special interest has aroused the high performance of
HEVC using exclusively intra-picture tools for the still
images coding, showing gains around 40% compared with
the successful JPEG-2000 standard [5]. There are several
particular scenarios in which intra-picture coding is the
123
J Real-Time Image Proc
optimal choice, such as still-picture photography storage,
live TV interviews where low latency is needed for natural
communication between the interlocutors, and also for
professional edition and post-production tasks commonly
used in the TV and cinema industries, where high quality
and fast access to the individual pictures are required.
These production codecs, named mezzanine codecs, are
highly relevant for the audio-visual industry, which
demands two things: very high compression efficiency and
a low computational burden. For this reason, the HEVC
standard has approved a set of specific profiles that
exclusively use the intra-prediction schema, known as the
‘‘Main Still Picture’’ and ‘‘Main Intra’’ profiles, with support for different bit depths and chroma sampling formats.
The high HEVC intra-coding performance is mainly
attributable to two novel tools, the new flexible quad-tree
picture partitioning [6], named Coding Tree Unit (CTU),
and the new high density of angular predictors for the mode
decision [7]. The pictures are divided in CTUs, which can
be recursively split to form a quad-tree structure with three
new unit types: coding unit (CU), prediction unit (PU) and
transform unit (TU). The CUs, PUs and TUs cover a size
range from 64 9 64 to 4 9 4, adapting the size of the
coding units to the local image complexity, taking use of
largest sizes for homogeneous regions and smaller sizes for
complex textured areas. With the aim of achieving the best
intra-coding performance, an exhaustive evaluation of the
whole number of possible CU, PU and TU sizes and the
full number of available prediction modes is carried out by
using a well-known Rate Distortion Optimization (RDO)
technique [8]. While such flexibility leads to high compression efficiency, it comes at the expenses of a huge
computational burden, primarily due to the large number of
available directional predictors and PU sizes [9], which is
hindering the rapid adoption of HEVC by the professional
market [10, 11].
Multiple approaches have been proposed in the literature
for the reduction in the HEVC complexity, many of which
focus on the reductions in the number of block sizes to be
evaluated by using a tree pruning scheme. Other approaches are centred in the detection of the most probable intradirection, in order to avoid the evaluation of the full range
of prediction modes by the RDO.
The speeding up of the intra-prediction coding can be
achieved by applying advanced techniques that allow the
taking of decisions that are needed in the different stages of
intra-prediction with low complexity. This approach constitutes the basis of this paper, which addresses the complexity reduction in HEVC intra-coding by using nontraditional techniques used in the video coding standards,
such as machine learning (ML) and image processing
algorithms for texture orientation detection. This paper
presents a Unified Architecture to form a novel fast HEVC
123
intra-prediction coding algorithm, denoted as Fast Partitioning and Mode Decision (FPMD). This approach combines, in a first stage, a Fast Partitioning Decision (FPD)
algorithm, based on decision trees, which are trained using
ML techniques, and in a second stage, a Fast Mode
Decision (FMD) algorithm, based on a novel texture orientation detection algorithm, which computes the mean
directional variance along a set of co-lines with rational
slopes using a sliding window over the PU. Both algorithms apply a similar approach, exploiting the strong
correlation between several image features and the optimal
PU partitioning and the optimal prediction mode. The key
point of the combined approach is that both algorithms
compute the image features with low complexity, and the
partition decision and the mode decision can also be taken
with low complexity, using decision trees and by selecting
the minimum directional variance between a reduced set of
directions.
The rest of the paper is organized as follows. An overview of the HEVC intra-prediction schema is introduced in
Sect. 2. Section 3 presents a review of the fast intra-prediction approaches recently proposed in the literature.
Section 4 presents the details of our Fast Partitioning
Decision (FPD) algorithm and our Fast Mode Decision
(FMD) algorithm to form the Unified Architecture proposed for fast HEVC intra-prediction coding, while the
experimental results are shown in Sect. 5. We summarize
the conclusions in Sect. 6.
2 Technical background
HEVC can be considered an evolution of the H.264/AVC,
since it maintains the same block-based ‘‘hybrid’’ architecture used in previous video compression standards, by
applying inter-picture prediction for temporal decorrelation
and intra-picture prediction for the spatial image decorrelation. In addition, new tools have been introduced in
HEVC that increase its coding efficiency compared to
H.264/AVC, such as a new coding unit partitioning
scheme named Coding Tree Unit (CTU), a new angular
intra-prediction algorithm, new transform sizes of 16 9 16
and 32 9 32, and a new filter in the decoding loop termed
Sample Adaptive Offset (SAO). A detailed description of
those tools and a general overview of the HEVC architecture can be found in [12].
The CTUs in intra-prediction can be iteratively partitioned into four square sub-blocks of half resolution; thus,
it can be considered as a hierarchical tree where each
branch ends in a node, which determines the CUs. Each CU
is by itself a new root of two new trees that contain the PUs
and TUs. The maximum CTU size is 64 9 64 pixels,
allowing CU sizes in the range of 64 9 64–8 9 8 pixels.
J Real-Time Image Proc
The PU is the basic entity in intra-prediction that takes the
same size of its CU, and only for the smallest CU size
(8 9 8), it can be also split into 4 9 4 sub-PUs. Consequently, the PUs can cover the widest range of sizes from
64 9 64 to 4 9 4 pixels. Finally, the TUs can be partitioned and transformed using a tree structure, termed as
Residual Quad Tree (RQT), with a maximum of three
depth levels, allowing TU sizes from 32 9 32 to 4 9 4.
Intra-prediction achieves optimal coding performance
by using an exhaustive evaluation over the 35 prediction
modes, and all possible partitions sizes from 64 9 64 to
4 9 4, which means the evaluation of 341 different blocks
by CTU.
HEVC exploits the high spatial correlation between PU
pixels and the pixels from the top row and left column of
the neighbouring PUs. Those samples, denoted as Pref ;, are
used for the construction of the directional predictors.
Detailed information on intra-prediction coding can be
found in [13].
The intra-prediction modes include two non-directional
modes, namely DC and Planar, which achieve a high
efficiency performance in smooth gradient areas, and 33
angular modes for image areas with edge patterns. The
HEVC angular modes are defined with a 1/32 of fractional precision between two-integer pixel positions of
Pref , and these are clustered in 16 horizontal modes,
named from H2 to H17, and 17 vertical modes, named
from as V18 to V34.
The angular predictors can be also classified in two
categories: the first one is composed by the five modes,
which orientations match with the integer position of the
reference pixels. We called this first category as Integer
Position Modes (IPM). These are the horizontal H10, the
vertical V26 and the three diagonal modes: H2, V18 and V34.
The second category includes the rest of the modes with
orientation falling between two references samples, and
therefore the predictors are computed by the interpolation
of the two nearest Pref samples. We named this second
category as Fractional Position Modes (FPM).
Table 1 collects the details of the 33 angular modes,
denoted as Mi, where ri is the angular mode orientation,
which is defined as a rational slope r ¼ ry =rx , and hi is the
orientation angle, such as hi ¼ arctanðri Þ. As can be noted,
the IPMs have an integer slope ri, meanwhile the FPMs
have a non-integer slope ri . Lines with rational slopes, are
commonly used in digital images processing, favours its
definition in the discrete space -Z--2, denoted as integer lattice, K [ -Z--2. Given a continuous line with rational slope
r ¼ ry =rx , it can be represented by (1),
y ¼ r x þ d;
8r2Q
ð1Þ
Just if ‘‘x’’ takes integer values multiples of rx , that is
x ¼ k rx ; V k [ -Z--, ‘‘y’’ reaches an integer position in K.
Table 1 Orientations and slopes of angular modes in HEVC
Mode Mi
Angle hi
Mode Mi
Angle hi
Slope ri
H2
5p/4
H3
39p/32
-1
V18
3p/4
1
-13/16
V19
23p/32
H4
38p/32
16/13
-21/32
V20
22p/32
32/21
H5
H6
37p/32
-17/32
V21
21p/32
32/17
36p/32
-13/32
V22
20p/32
32/13
H7
35p/32
-9/32
V23
19p/32
32/9
H8
34p/32
-5/32
V24
18p/32
32/5
H9
33p/32
-1/16
V25
17p/32
16/1
H10
p
0
V26
p/2
?
Slope ri
H11
31p/32
1/16
V27
15p/32
-16/1
H12
H13
30p/32
29p/32
5/32
9/32
V28
V29
14p/32
13p/32
-32/5
-32/9
H14
28p/32
13/32
V30
12p/32
-32/13
H15
27p/32
17/32
V31
11p/32
-32/17
H16
26p/32
21/32
V32
10p/32
-32/21
H17
25p/32
13/16
V33
9p/32
-16/13
V34
p/4
-1
Accordingly, the distance between two points with integer
positions belonging to a line with rational slope ri is
determined by the rx ; ry parameters.
As can be observed in Table 1, the FPM modes, H3, V19,
H9, V25, H11, V27, H17 and V33 have a rational slopes of
±m/16 and ±16/m 8 m ¼ 1; 13, that means those lines are
defined with at least two point in integer position for PU
sizes largest of 16 9 16, that is 32 9 32 and 64 9 64 PU
sizes, but not for 16 9 16, 8 9 8 and 4 9 4. The other 20
FPM modes in HEVC have a rational slope of ±m/32 and
±32/m 8m ¼ 5; 9; 13; 17; 21; thus, their lines are defined
with two points in integer position for PU sizes largest of
32 9 32, which only applies for the 64 9 64 PU size.
With the aim of reducing the computational complexity,
the HEVC reference model [14], version HM16.6, implements a low complexity intra-prediction algorithm, which
is based on Piao et al.’s scheme [15]. This algorithm
workflow is performed by means of two processes that are
repeated for each PU, the Rough Mode Decision (RMD)
and the RDO. The RMD model evaluates the 35 prediction
modes computing a low complexity Lagrange cost function
(JHAD), which uses the Sum of Absolute Hadamard
Transformed Differences (SATD). The N modes with
lowest JHAD cost are selected as candidate modes to be
evaluated by the RDO stage, being N equal to 3 for the PU
sizes of 64 9 64, 32 9 32 and 16 9 16, and N is equal to 8
for others PU sizes. The RDO carry out the exhaustive PU
encoding and decoding, in order to compute Lagrange
Jmode cost, and the mode with the lowest cost is selected as
the optimal prediction mode for each PU.
123
J Real-Time Image Proc
3 Related work
Recently, many proposals have been presented by the
research community in order to alleviate the high computational burden of the HEVC intra-coding, avoiding the
exhaustive evaluation of the full number of size-mode
combinations through the rate distortion optimization
stage.
According to the algorithm approach, proposals can be
classified into three categories. The first one evaluates all
the prediction modes but reduces the number of CU sizes
required to be checked by both the RMD and the RDO
stages, mostly by limiting the depth of the CTU tree, and
thus they are commonly named as tree pruning or early
termination algorithms. The second category reduces the
number of angular prediction modes to be checked for the
RMD stage as well as, eventually, the number of candidates in the rate distortion optimization stage, mainly based
on content features. The last category combines both
approaches, mainly reducing the number of directional
modes and the candidate CU sizes to be checked.
In the first group, the proposals can be classified into
two different sub-categories. The first sub-category contains those in which the partitioning decision is taken
exclusively on the basis of the RD cost value of the different CU sizes [16–18]. The approaches in the second subcategory [19, 20] mainly use the same schema, but the
optimal CTU partitioning prediction is based on some
content features extracted from the CUs.
Most popular approaches in the second group are based
on pixel gradient detection in the spatial domain using the
Sobel filter, followed by the Histogram of Oriented Gradient (HOG) computation. In [21] few directional modes
are selected as candidates for the RDO based on HOG.
Chen et al. [22] uses a 2 9 2 filter to select the strong
primary gradient direction of the PU, introducing the
conception of nonparametric approach to estimate the
distribution density of the gradient histogram. Yan et al.
[23] apply a pixel-based edge detection algorithm based on
the sum of absolutes differences in the angle of the prediction modes by using a two-tap interpolation method. In
[24], a reduction in intra-modes was proposed using five
different filters to detect the dominant edge of the 4 9 4
PUs, and a set of 11 prediction modes closer to the dominant edge are evaluated. The Yao et al. [25] proposal
reduces the number of prediction modes to be evaluated to
eleven modes or only two modes, DC and Planar,
depending on the dominant edge assent standard deviation.
Finally, the proposals presented by [26, 27] fall in the
last group. Shen [26] suggested a fast partitioning decision
algorithm that uses the correlation of the content and the
optimal CTU tree depth, limiting the minimum and
123
maximum depth levels. The algorithm is based on the
observed evidence that small CUs tend to be chosen for
rich-textured regions, whereas large CUs are chosen for
homogeneous regions. It computes a depth predictor by
using the neighbouring tree blocks and two early terminations for prediction modes based on the statistics of
neighbouring blocks and the RD cost of the candidates. The
authors reported a 21% time reduction with a rate penalty
of 1.7%. By using a similar approach, the fast intra-prediction algorithm presented in [27] is based on the RD cost
difference between the first and second candidate modes
computed in the RMD stage, and this reduces the number
of RDO candidates from N to one mode (Best Candidate)
or three modes (Best Candidate, DC and MPM). In addition, if the RD cost of the best mode of the RDO stage is
under a threshold, the algorithm decides to apply an early
termination pruning to the tree to avoid the evaluation of
lower CU sizes. The algorithm achieves a computational
complexity reduction of 30.5% with an average performance drop of 1.2%.
4 Unified architecture for fast HEVC intraprediction coding
As mentioned in Sect. 2, the HEVC intra-prediction can
achieve a high level of performance at the expenses of a
huge computational burden due to the high number of
combinations involved in the intra-prediction process,
comprising the evaluation of all possible prediction modes
and the full range of CU sizes. With the aim of reducing the
HEVC intra-prediction complexity, this section presents
our Unified Architecture to form a novel fast HEVC intraprediction coding algorithm, denoted as Fast Partitioning
and Mode Decision (FPMD). This approach combines, in a
first stage, a Fast Partitioning Decision (FPD) algorithm
for the CTU coding decision based on decision trees, which
are trained using ML techniques, and in a second stage, a
Fast Mode Decision (FMD) algorithm, based on a novel
texture orientation detection algorithm.
4.1 Stage 1: The fast partitioning decision (FPD)
algorithm
4.1.1 Observations and motivation
The first stage of the Unified Architecture proposed in this
paper is derived from the analysis of the computational
complexity of the intra-prediction algorithm implemented
in the HM reference software [14]. In some preliminary
tests where one sequence from each class of the JCT-VC
test sequences [28] have been encoded using the HM 16.6
J Real-Time Image Proc
reference model, the computing time of each intra-prediction stage has been collected. The results for two QPs,
QP22 and QP37, showing that the complexity of the RMD
and RDO stages makes up over 80% of the total intraprediction computation, and the remainder of the time is
spent by the RQT stage and other auxiliary tasks. This fact
has motivated the design of our FPD algorithm, which
replaces the brute force scheme used in the HEVC through
the RMD and RDO, with a low complexity algorithm based
on a fast CU size classifier, which is previously trained
using ML methodology. The classifier selects a sub-optimal CTU partitioning, thus avoiding the exhaustive evaluation of all available CU sizes.
The FPD approach proposed is based on the fact that the
CU partitioning decision can be taken using the local CU
features, considering that there exists a strong correlation
between the optimal partitioning size and the texture
complexity of the CU. It is well known that homogeneous
blocks achieve best performance when being encoded by
large block sizes. Otherwise, highly textured blocks can be
efficiently encoded by using small sizes that adjust their
size to the details of the image.
The proposed algorithm uses a binary classifier based on
a decision tree with two classes, Split and Non-Split, in each
top-three depths of the CTU tree. The algorithm starts with
the largest CU size, 64x64, and using several attributes of
the CU the decision tree takes the partitioning decision.
4.1.2 Training data set
With the aim of obtaining a training set covering a wide
range of content complexities, the Spatial Information (SI)
and Temporal Information (TI) metrics were computed for
all JCT-VC test sequences [28], according to the ITU-T
P.910 recommendation [29]. Figure 1 shows the Spatio-
Spao-Temporal Informaon
90
80
PeopleOnStreet
Temporal index (TI)
70
60
Traffic
50
SlideEding
40
BQTerrace BasketballDrillText
30
Cactus
BasketballDrive
20
4.1.3 Attribute selection
The trained sequences were used for the extraction of
attributes from the CUs, and they are also encoded with the
HM encoder with the aim of achieving the optimal partitioning of each CTU. The CTU partitioning has been
obtained for the four distortion levels QP22, QP27, QP32
and QP37 recommended in the Common Test Conditions
and Software Reference Configurations (CTCs) by the JCTVC [28]. That CTU partitioning allows us to classify each
CU in a binary class that takes the Split or Non-Split value.
Many features can be extracted from a CU to describe its
content using the first- and second-order metrics in the spatial
domain, such as the Mean, Standard Deviation, Skewness,
Kurtosis, Entropy, Autocorrelation, Inertia, or Covariance
[30]. There are also useful attributes describing the CU features, which can be computed in the Fourier domain, such as
the DC energy, number of nonzero AC coefficients, or the
mean and variance of the AC coefficients. Initially, we
extracted a large number of such statistics commonly used in
image processing and image classification.
The attribute selection was carried out using the opensource WEKA, an effective ML tool [31] that provides a set of
tools that use different strategies to rank the usefulness of the
attributes, denoted as Attributes Evaluator, in conjunction
with different search algorithms. Considering both factors,
attribute ranking and computational complexity, we finally
selected the three attributes from the spatial domain with the
best performance in terms of ranking evaluation, which are
based on the variance and mean computation of the CU. They
are the following: (1) the variance of the 2N 9 2N CUs,
denoted as r22N , (2) the variance of variances of the four N 9 N
sub-CUs, denoted as r2 r2N , and (3) the variance of the means
of the four N 9 N sub-CUs, denoted as r2 ½lN :
In order to reduce the complexity of the variance computation, it is implemented by the variance algorithm
named textbook one-pass proposed in the Chan et al.
algorithm [32]. This variance approach does not require
passing through the data twice, once to calculate mean and
again to compute sum of squares of deviations from mean.
ParkScene
10
0
Temporal (ST) information of the selected training set
sequences. Only the first picture of each selected sequence
is used to train the classifier, which is considered enough to
achieve a high number of representative CTU samples of
blocks with homogeneous areas and blocks with highly
detailed textured areas.
0
5
10
15
20
Spaal Index (SI)
Fig. 1 ST information of the training sequences
25
30
4.1.4 Decision tree specification
Prior to the training of the decision tree, and considering
that it is one of the key factors in ML, we proceeded to the
123
J Real-Time Image Proc
classifier selection. From among several classifiers, C4.5
[33] is a well-known classifier for general-purpose classifying problems [34] as well as for video coding purposes
[35]. The most common mechanism for measuring the
trained decision tree accuracy is the tenfold cross-validation process available in WEKA, which provides the prediction error measured in terms of misclassification
instances or percentage of Correctly Classified Instances
(CCI).
The first node, denoted as Node64, is the most critical
node because a wrong Non-Split decision at the highest
partitioning level can cause a 64 9 64 CTU that should be
divided into several smaller CUs not to be split, and
therefore the compression efficiency will be reduced. We
used 4231 instances for the training of the Node64 by using
the 4231 64 9 64 of CTUs of the 8 frames used from the 8
test sequences. These instances suffer the well-known
imbalance problem because there are significantly more
instances belonging to Split, by over 80%, than Non-Split
with just 8% for the QP22. To address the imbalance issue,
previously to the training of the decision tree for Node64,
an unsupervised instance random sub-sample filter [36]
available in WEKA has been applied.
The training results show that the CCI after the tenfold
cross-validation step for the decision trees of the Node64
are over 90%, which can be considered a high-accuracy
classification. The decision tree for Node64 is shown in
Fig. 2. Node64 is defined with 3 inner nodes, three rules,
and one condition for each rule with a specific threshold
Thi (8i ¼ 1; 2; 3) that determines the binary decision within
the inner nodes.
Node32 processes the split 32 9 32 CUs from Node64,
and the decision of Non-Split the CUs is taken, forwarding
the CUs classified as Split to Node16, and otherwise
sending the Non-Split CUs to the intra-prediction stage for
an exhaustive evaluation of the 32 9 32 and 16 9 16 PU
sizes. The total training data set size for this node is 16,924,
and, in this case, only the class distribution for the QP22
and QP27 were imbalanced, because there were significantly more instances belonging to Split, by over 60%, than
Non-Split, with around 30%.
The training results show a CCI after the tenfold crossvalidation step for the decision trees of Node32 are in the
range of 83–89%, which can still be considered a highaccuracy classification. The decision tree for Node32 is
shown in Fig. 3. Based on the variance of the 32 9 32 CUs
and the variance of the variances of their four 16 9 16
sub-CUs, the decision tree classifies each of them in the
Split or Non-Split class. Node32 is defined with two inner
nodes and one condition for each rule with a specific
threshold Thi (8i ¼ 4; 5) that determines the binary decision within the inner nodes.
Node16 processes the 16 9 16 CUs split from Node32,
and a new decision to Split or Non-Split is taken. The 4231
CTUs that comprise the data set, belonging to the 8 test
sequences, are divided into 16 9 16 CUs for the Node16
training, and therefore the total training data set size for
this node is 67,696 instances.
The training results obtained after the tenfold crossvalidation step for the decision trees of Node16 are in the
CCI range of 72–79%. The decision tree for Node16 is
shown in Fig. 4. Node16 is defined with three inner nodes
and one condition for each rule with a specific threshold
Thi (8i ¼ 6; 7; 8), which determines the binary decision
within the inner nodes.
The proposed algorithm can be implemented in a scalable way, using only the top Node (Node64) or combining
this with the other two nodes as following:
1.
Node64 The fast classifier replaces the RDO for the
64 9 64 CU size, so only if the classifier decision is
‘‘Split’’, the four 32 9 32 CUs are exhaustively
Fig. 2 Decision tree for
Node64
CTU
σ2[μ32]
≤ Th1
> Th1
σ2[σ232]
≤ Th2
σ264
≤ Th3
RMD + RDO+ RQT
PU64 & PU32
123
Node32
> Th2
Node32
> Th3
Node32
J Real-Time Image Proc
Fig. 3 Decision tree for
Node32
CU 32x32
σ2[σ216]
≤ Th4
> Th4
σ232
≤ Th5
RMD + RDO + RQT
PU32 & PU16
Node16
> Th5
Node16
4.2 Stage 2: The fast mode decision (FMD)
algorithm
them are based on pixel gradient detection in the spatial
domain by computing the gradient of the image using the
Sobel filter [37] or other similar filters. This technique has
been proved robust when high-energy edges are present in
the image, but natural images often have wide areas with
weak edges, or even no edges, so this approach can be
inefficient for the intra-prediction mode decision.
This fact has motivated the algorithm presented for this
second stage of our Unified Architecture, denoted as Fast
Mode Decision (FMD) algorithm. In this paper, a novel
texture orientation detection algorithm is proposed, which
computes the Mean Directional Variance (MDV) using a
Sliding Window (SW), along a set of co-lines with rational
slopes, denoted as MDV-SW. The key point of the proposed
algorithm is based on the hypothesis that pixel correlation is
maximum in the texture orientation, and consequently the
variance computed in that direction will obtain a low value
compared with the variance computed in other directions.
Another noteworthy feature of this proposal is the use of a
set of rational slopes, which are exclusively defined in an
integer position of the discrete lattice K; thus, no pixel
interpolation is required. Moreover, it was observed that
there exists a strong dependence between the optimal mode
selected by the RDO and the distortion applied, set by the
QP parameter. Therefore, the directional variance computation for each N N PU is expanded to ðN þ 1Þ ðN þ 1Þ
window, thus the neighbouring pixels used as reference
samples for the construction of the predictor are also
included in the calculation of the directional variance.
In order to reduce the computational complexity of the
gradient detection, we need to define lines where their points
are located in the integer position of lattice K [ -Z--2, so that we
can describe the problem of the discretization of a direction in
the discrete space -Z--2 as the sub-sampling of an integer lattice.
4.2.1 Observations and motivation
4.2.2 Sub-sampling of integer lattice
Many proposals have been presented in the literature for
the fast optimal mode decision in HEVC [21–25]. Most of
The directional variance metric described in [38] uses the
traditional definition of ‘‘digital line’’ for the discrete space
CU 16x16
σ2[σ28]
≤ Th6
> Th6
RMD + RDO + RQT
PU16 & PU8
σ216
≤ Th7
> Th7
RMD + RDO + RQT
PU8 & PU4
σ2[σ28]
≤ Th8
RMD + RDO + RQT
PU16 & PU8
> Th8
RMD + RDO + RQT
PU8 & PU4
Fig. 4 Decision tree for Node16
2.
3.
evaluated by the RDO. This configuration achieves the
minimum speeding up.
Node64 ? Node32 The fast classifier substitutes the
RDO by the 64 9 64 and 32 9 32 CUs. If the
classifier decision in Node32 is ‘‘Split’’, the four
16 9 16 CUs are evaluated by the RDO in order to
achieve the optimal partitioning in 8 9 8 and 4 9 4.
Node64 ? Node32 ? Node16 The fast classifier substitutes the RDO by the 64 9 64, 32 9 32 and
16 9 16 CUs. If the classifier decision in Node16 is
‘‘Split’’, the four 8 9 8 CUs are evaluated by the RDO,
achieving this configuration the maximum speeding
up.
123
J Real-Time Image Proc
-Z--2, such that L(r, n) is a digital line with rational slope r, V
r [ Q and V n [ -Z--, meaning that every pixel with position
(x, y) [ -Z--2 is associated exclusively to one digital line,
defined as:
ð2Þ
ð3Þ
Therefore, for each rational slope r ¼ ry =rx 8 rx ; ry [ -Z-a set of integers, denoted as n, can be found, which define
the digital lines L(r, n). Digital lines have been widely used
in different fields, including image processing, image
coding and artificial vision, among others. However, they
do not provide enough accuracy for gradient detection in
images or pixel blocks with small size. Figure 5a depicts an
example of an image composed by four equally spaced bars
(grey bars) with a slope of 1=3; where we have plot the set
of digital lines with same slope L(1=3; n) covering the
image. As can be observed, although the digital lines and
the plotted bars have the same slope, the digital lines are
not representing the orientation of the bars with high
accuracy. There are some digital lines which only two of
each tree pixels belong to the bar, and the other digital lines
two of each three pixels are out of the bars but they have a
pixel into the bar, reducing the pixels correlation along the
digital lines.
This is due to the digital lines are not straight, instead
they are covering an area whose width depends on the
slope factors (rx ; ry ). Figure 5b shows an example of two
digital lines, L(1=3; n) and L(3=4; n), where the digital line
areas have been shaded and the digital line width is
denoted as l. Using simple trigonometric functions can be
qffiffiffiffiffiffiffiffiffiffiffiffiffiffi
demonstrated that l ¼ ry ðrx 1Þ= rx2 þ ry2 , and conse-
quently l increases with the rx ; ry increasing, as is shown in
Fig. 5b for rational slopes of 1=3 and 3=4. This feature of
the digital lines, their width, limits the orientation detection
accuracy of the digital lines, mainly for directions represented by rational slopes that requires high rx ; ry factors.
For this reason, the concept of a down-sampling lattice
in two-dimensional space described in [39] has been used,
together with the co-lines definition used in lattice theory
[40]. According to [39], in a 2D system an integer lattice K
can be obtained by down-sampling in two directions d1 and
d2 of the cubic integer lattice -Z--2.
Hence, the sub-lattice K , -Z--2 can be formally represented by a non-singular 2 9 2 integer matrix, denoted as
sub-sampling matrix or generator matrix, MK, such that
dx1 ; dy1 ; dx2 ; dy2 [ -Z--, that is:
d
dx2
M K ¼ ½r1 r2 ¼ x1
ð4Þ
dy1 dy2
The use of the rational slopes as sub-sampling directions
T
in MK, in their vector form d1 ¼ dx1 ; dy1
and
T
d2 ¼ dx2 ; dy2 , makes it possible to obtain a sub-lattice K
which describes the points of the line with slopes d1 andd2 ,
and thus it facilitates the variance computation along those
orientations. Lattice theory states that given a generator
matrix MK, the lattice -Z--2 is partitioned into jdetðM K Þj
L(1/3,n)
x
x
n= -2
n= -1
n= 0
L(1/3,n)
n= 1
n= 2
n= 3
n= 4
n= 5
n= 6
y
(a)
y
L(3/4,n)
(b)
Fig. 5 a Example of four bars with orientation r ¼ 1=3, and the set of digital lines L(1=3; n). b Area covered by two digital lines with rational
slopes of 1=3 and 3=4
123
J Real-Time Image Proc
MΛ=
12
2 -1
|Det(MΛ)| = 5
MΛ=
12
2 -1
|Det(MΛ)| = 5
C1 = 1
C2 = 2
C2 = 0
C2 = 1
(a)
(b)
Fig. 6 a Example of co-lines and their respective cosets for the slope
r1 , using as direction vectors r1 ¼ ½1; 2T and r2 ¼ ½2; 1T of the subsampling matrix MK . (b) Example of co-lines and their respective
cosets for the slope r2 , using as direction vectors r1 ¼ ½1; 2T and
r2 ¼ ½2; 1T of the sub-sampling matrix MK
number of cosets of the lattice K that are shifted versions of
the sub-lattice K. Each coset can be obtained by a shifting
T
vector sk ¼ skx ; sky 8 2 k ¼ 0; 1; ::detðM K Þ 1. In [40]
the concept of co-line, denoted as CLSk ðr; nÞ, is introduced,
and it is defined as the intersection of a kth coset in lattice
K and a digital line L ðr; nÞ. Therefore, the pixels ðx; yÞ
belonging to the co-lines with slope d1 and d2 can be
obtained by the linear combination of both vectors d1 ; d2 ,
8c1 ; c2 [ -Z--, such that,
dx1
dx2
s
x
¼ c1
þ c2
þ kx
ð5Þ
dy1
dy2
sky
y
y ¼ r1 x þ n 8 jr1 j [ 1; n
¼ c2 ry2 r1 rx2
r1 skx þ sky ; or
Our proposal computes the variance along the pixels of
the co-lines with a set of rational slopes ri , which can be
obtained using the generator matrix MK where any couple
of rational slopes r1 ¼ ry1 =rx1 and r2 ¼ ry2 =rx2 can be used
as vector directions d1 and d2 in MK. Equation (5) can be
rewritten as two independent equations using the r1 and r2
slopes, such as;
x ¼ c1 rx1 þ c2 rx2 þ skx
ð6Þ
y ¼ c1 ry1 þ c2 ry2 þ sky
ð7Þ
By isolating for the variable c1 from (6) and substituting
it into (7), the expression of the co-lines with orientation r1
is obtained as is shown in (8) and (9), V c2 [ -Z-k ¼ 0; . . . detðMK Þ 1:
ð8Þ
x ¼ y=r 1 þ n 8 jr1 j 1; n
sky =r1 þ skx
¼ c2 rx2 ry2 =r1
ð9Þ
y ¼ r2 x þ n 8 jr2 j [ 1; n
r2 skx þ sky ; or
¼ c1 ry1 r2 rx1
ð10Þ
With the aim of clarifying this process for co-lines with
slope r1 , Fig. 6a shows an example using the directional
vectors r1 ¼ ½1; 2T and r2 ¼ ½2; 1T . Given that
jdetðMK Þj ¼ 5, there are five cosets determined by the
shifting vectors s0 ¼ ½ 1; 0, s1 ¼ ½0; 1, s2 ¼ ½0; 0,
s3 ¼ ½0; 1, s4 ¼ ½1; 0, and those are represented by the
doted, black, grey, white and lined points, respectively.
Setting c2 ¼ 0, we obtain the expression of the co-lines for
the first five cosets, and setting c2 ¼ 1, the next five colines are likewise obtained, which are depicted as solid
lines and dashed lines, respectively, in Fig. 6a.
Following the same reasoning, the expression of the colines with orientation r2 is obtained by isolating for the
variable c2 from (8) and substituting it into (9), as is shown
in (10) and (11), V c1 [ -Z-- k ¼ 0; . . . detðMK Þ 1:
x ¼ y=r 2 þ n 8 jr2 j 1; n
sky =r2 þ skx
¼ c1 rx1 ry1 =r2
ð11Þ
123
J Real-Time Image Proc
Table 2 Six generator matrixes defining the twelve rational slopes
and the respective cosets
4.2.3 Selection of the co-lines orientation
M Ki
Vector directions
M K0
r3 ; r9
M K1
r0 ; r6
M K2
r1 ; r7
M K3
r5 ; r11
M K4
r2 ; r8
M K5
r4 ; r10
The decision of the co-lines orientations that best match
with the modes directionality in HEVC is one of the key
points of our approach. In order to estimate the dominant
texture orientation in each PU, twelve rational slopes ri ,
ð8 i ¼ 0; . . .; 11Þ have been selected. Four of them with the
same slope as the IPM modes, that is, the horizontal, vertical and two diagonal orientations (the diagonal H2 and
V34 are considered the same in terms of slope), and the
other eight slopes are defined with slopes close to some of
the FPM modes, but with rational slopes of
1=2; 1=4; 2; and 4. Table 2 summarizes the set of
six generator matrices M Ki , the vector directions used for
the integer lattice sub-sampling, and the respective cosets
defined by detðM Ki Þ. The chosen orientations have a
maximum slope factor of 4 instead of 32, as the HEVC
modes have (Table 1).
Consequently, there are always two points that determine a co-line with rational slope ri , except for the smaller
PU size of 4 9 4 and the largest slopes (r ¼ 1=4 and
r ¼ 4), which use a pixel of a neighbouring blocks. Figure 7 shows the set of twelve rational slopes.
With the aim of comparing the orientations of the
selected slopes with the HEVC modes’ directionality,
Fig. 8 depicts both, the 33 angular intra-prediction modes
defined in HEVC (grey solid lines) and the twelve proposed co-lines with rational slope (blue dashed lines),
which have been chosen for the analysis of the dominant
gradient.
Generator matrix
1 0
0 1
1 1
1 1
2 1
1 2
2 1
1
2
4 1
1 4
4 1
1
4
Cosets
1
2
5
5
17
17
Figure 6b shows the same example of Fig. 6a but now
for the expression of the co-lines with slope r2 . Now, setting c1 ¼ 1, the expression of the co-lines for the first five
cosets is obtained, and the next set of five co-lines are
equally obtained by setting c1 ¼ 2, which are depicted as
solid lines and dashed lines, respectively.
As can be observed in Fig. 6a, b, the distance between
two consecutive pixels belonging to one co-line is always
the same, and its position is free of any type of interpolation process. Consequently, the co-lines are available to
represent the geometrical orientation with highest accuracy
than the traditional digital lines, particularly for the small
size blocks, as is demanded in HEVC.
Fig. 7 Co-lines r0 to r11
selected to compute of
directional variance
r9
r8
r7
r3
r2
r1
r11
r6
r10
r5
r4
r0
123
J Real-Time Image Proc
r7
r6
V19
V20
V22 V23 V24
r9
r10
V25 V27 V28 V29
r11
V30
V32
r0
V33
V34
r5
r4
H2
H3
r1
H4
H6
r2
H7
H8
H9
r3
H16
H17
V18
r8
H11 H12 H13 H14
Fig. 8 Set of angular intrapredictions in HEVC (grey), and
the co-lines defined with
rational slopes r0 to r11 (doted
blue lines) for the dominant
gradient analysis
r0
As can be observed, the co-lines r0 ; r3 ; r6 , and r9 overlap
the angular modes H2, V10, V18, V26 and V34; thus, such
modes can be estimated with high accuracy from respective
co-lines. The co-lines with slopes r1 ; r5 ; r7 and r11 , mostly
overlap the angular modes H5, V15, V21 and V31, so these
co-lines can be also considered as a good estimation of the
co-located modes.
Finally, the four co-lines with slopes r2 ; r4 ; r8 and r10 ,
are located near the middle of two modes H7 and H8 for the
slope r2 , H12 and H13 for the slope r4 , V23 and V24 for the
slope r8 , and V28 and V29 for the slope r10 .
With the aim of considering the remaining angular
modes that are not covered by each of the twelve proposed
slopes ri , twelve classes denoted as Ci have been defined.
Each class selects a set of candidate modes Mi on the left
and right side of the respective rational slopes ri . Table 3
shows the features of the proposed co-lines ri . It should be
noted that the co-line r0 is the only one slope, which selects
four angular modes in its class. This is due to modes H2 and
V34 of HEVC being the same in terms of orientation, so the
two horizontal H2 and H3 modes, and the two vertical V33
and V34 modes, are selected as candidates.
Based on empirical simulations, one candidate mode on
the left and right side of each class has been added for the
smaller PU sizes of 8 9 8 and 4 9 4, except for the horizontal r3 and vertical, r6 , slopes. This is motivated by the
fact that the co-lines in those block sizes with no Cartesian
orientations are defined by a low number of pixels, and the
accuracy of the orientation detection is also lower. In
addition, the two non-angular modes, DC and Planar are
included as candidates for all the classes, because they
match quite well with weak edge images.
4.2.4 Computation of mean directional variance (MDV)
along co-lines
Cumulative variance along digital lines is proposed in [38],
which is proven as an efficient metric for texture orientation detection in images with large size. However, due to
the constraints imposed by factors such as the PU sizes, the
high density of the intra-prediction modes in HEVC, and
the need to reduce the encoder’s computational burden, the
following modifications and novel features are proposed:
1.
2.
The variance is computed along the co-lines according
to (8)–(11), instead of using the digital lines expression.
In order to reduce the computational complexity of the
variance, an approximation of the variance, denoted
textbook one-pass algorithm, proposed by Chan et al. [32]
is used. Equation (12) shows the directional variance
expression using the textbook one-pass approach, pj ðri ; nÞ
being the pixels belonging to CLðri ; nÞ and N the number
of pixels of the nth co-line with slope ri :
2
!2 3
N
N
X
X
1
1
p2 ð r i ; nÞ
pj ð r i ; nÞ 5
r2 ½CLðri ; nÞ ¼ 4
N j¼0 j
N j¼0
ð12Þ
3.
Finally, instead of calculating the cumulative variance
of the digital lines as is proposed in [38], the MDV is
123
J Real-Time Image Proc
Table 3 Slope, angle and candidate modes assigned to defined co-lines
Co-line
hi
Slope (ry/rx)
PU class (Ci )
Candidates modes
Mi for PU64, PU32, PU16
Additional candidates
Mi for PU8, PU4
r0
5p/4
-1/1
I
H2, H3, V33, V34
H4, V32
r1
8p/7
-1/2
II
H4, H5, H6
H3, H7
r2
27p/25
-1/4
III
H7, H8
H6, H9
r3
p
0
IV
H9, H10, H11
H8, H12
r4
23p/25
1/4
V
H12, H13
H11, H14
r5
6p/7
1/2
VI
H14, H15, H16
H13, H17
r6
r7
3p/4
9p/14
1/1
2/1
VII
VIII
H17, V18, V19
V20, V21, V22
H16, V20
V19, V23
r8
29p/50
4/1
IX
V23, V24
V22, V25
r9
p/2
?
X
V25, V26, V27
V24, V28
r10
21p/50
-4/1
XI
V28, V29
V27, V30
r11
5p/14
-2/1
XII
V30, V31, V32
V29, V33
computed as the average of the individual variances
obtained along the L co-lines with the same ri
orientation, as is shown in (13).
MDV ðri Þ ¼
L
1X
r2 ½CLðri ; nÞ
L n¼1
ð13Þ
It is worth noting that for a rational slope (ry =ry ), the
distance between two pixels belonging to that orientation
qffiffiffiffiffiffiffiffiffiffiffiffiffiffi
can be computed as 2 rx2 þ ry2 ; and therefore it has a strong
dependence with the rational values of the slope. From
statistical point of view, for most of the natural images the
variance value computed along the slopes with large pixel
distance could be penalized, compared to the variance
computed along the slopes with smaller distance, as the
Cartesian orientations are.
However, for highly textured images, as the image
depicted on Fig. 5a, the correlation between the pixels with
the same orientation (r ¼ 1=3) is high, even if the pixel
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
distance along the slope is large ( 2 32 þ 12 ).
The simulation results showed on Sect. 5 reveal that this
algorithm achieves a high performance for a wide variety
of sequences, containing a large set of patterns and
orientations.
4.2.5 Computation of mean directional variance using
sliding window (MDV-SW)
As has been described previously, the MDV is computed
for each PU using the pixels of co-segments CSm ðri ; nÞ
belonging to that PU in a scalable manner. However, the
angular modes in HEVC intra-prediction use as reference
sample the neighbouring pixels of decoded PUs, denoted as
PUd ðx; yÞ, which have been previously distorted by the
123
quantification factor QP. Based on empirical simulations,
we observed the strong dependence between the optimal
mode selected by the RDO and the QP parameter, especially for high QP values, which cause a strong smoothing
of the reference pixels; thus, the correlation between the
reference samples and the PU’s pixels that are not yet
distorted can be modified. Consequently, an enhancement
to the MDV algorithm is proposed in this subsection. The
main idea of this approach is to expand the window of the
MDV computation for aN 9 N PU to a window of
ðN þ 1Þ ðN þ 1Þ pixels, overlapping the left column and
top row of the window with the left and top decoded pixels
of the neighbouring PUs. Figure 9a depicts an example of
the computation of the MDV for a 4 9 4 PU(i, j) along the
slope r6 , where the reference samples (Pref) belonging to
decoded PUs (PUd) are lined. As can be observed, five cosegments with lengths of 2, 3 and 4 pixels are defined for
such slope. Figure 9b presents the new MDV approach,
where the new ðN þ 1Þ ðN þ 1Þ window allows the
MDV computation using the reference pixels from the
neighbouring PUs, and two new co-segments are now
available.
Accordingly, the MDV is computed in five overlapping
window sizes of 65 9 65, 33 9 33, 17 9 17, 9 9 9 and
5 9 5 pixels, thus this novel approach is named Mean
Directional Variance with Sliding Window (MDV-SW).
Figure 9c illustrates an example of the MDV-SW computation
of
four
consecutive
494
PUs
f PUði þ k; j þ lÞj8 k; l ¼ 0; 1g, showing that the sliding
windows are overlapping each other, in order to capture the
neighbouring reference samples.
The MDV-SW implementation needs a slight computation increase compared with the MDV due to the extended window size and therefore the border pixels are being
computed twice for the adjacent PUs.
J Real-Time Image Proc
PUd(i-1,j-1)
PUd(i,j-1)
PUd(i+1,j-1)
PUd(i-1,j-1)
PUd(i,j-1)
PUd(i+1,j-1)
Pref
PUd(i-1,j)
PUd(i,j-1)
PUd(i+1,j-1)
Pref
PUd(i-1,j)
Pref
PU(i,j)
PU(i+1,j)
PU(i,j+1)
PU(i+1,j+1)
PUd(i-1,j)
PU(i,j)
PUd(i-1,j+1)
PUd(i-1,j-1)
PU(i,j)
PUd(i-1,j+1)
Pref
PUd(i-1,j+1)
1)
Pref
(a)
Pref
(b)
(c)
Fig. 9 a Example of MDV computation of 4 9 4 PU. b Example of MDV-SW computation over expanded window for a 4 9 4 PUs. c Example
of MDV-SW using overlapped windows for the evaluation of 4 9 4 PUs
4.3 The fast partitioning and mode decision
architecture (FPMD)
Finally, this subsection presents the proposed unified
architecture to form a novel fast HEVC intra-prediction
coding algorithm, denoted as Fast Partitioning and Mode
Decision (FPMD). This approach combines the FPD
algorithm proposed in Sect. 4.1, and the FMD algorithm
proposed in Sect. 4.2. The FPMD algorithm presented in
this paper achieves a considerable complexity reduction of
over 67%, at the expense of a slight penalty in terms of rate
increase, due to sub-optimal partitioning and the mode
decision given by the FPMD.
The architecture of the FPMD is depicted in Fig. 10,
which shows the new functional algorithms introduced by
FPMD shaded in grey. The FPMD workflow works at the
CTU level, as was described in Sect. 4.1, evaluating the
CTU attributes that are used by the decision trees in the
CTU classifier stage, selecting the different PU sizes. With
the aim of organizing the set of PUs that partition the CTU,
a Partition Map is arranged by depth levels
(8 d ¼ 0; . . .; 4). For each d level, the k PUs PUd;k
belonging to that depth level is recorded in a list.
Intra-prediction is computed by the evaluation of all
PUs included in the Partition Map lists, which are processed in depth level order. Following the fast partitioning algorithm described in Sect. 4.1, for every PUd;k ,
the four sub-PUs (PUdþ1;4kþi 8 i ¼ 0; . . .3Þ are also
evaluated by the RMD, RDO and RQT stages. Consequently, five evaluations are always performed for each
element of the Partition Map.
Then, the MDV-SW algorithm is run for each PU in the
Partition Map, and it is classified by a class Ci, which
includes a set of 3 or 4 candidate angular modes in addition
to the two non-directional modes, DC and Planar, as was
described in Sect. 4.2.
Those modes are arranged in a Mode List, and they will
be the only modes that will be evaluated by the RMD,
instead of the 35 modes evaluated by the original RMD
stage of the HM reference software.
Because of the RMD process, a set of three candidate
modes is selected to be checked by the RDO stage, and the
mode with the lowest cost is selected to be further evaluated by the RQT, which selects the optimal TU size.
Finally, by comparing the cost of the PUd;k with the sum of
the costs of the four sub-PUs, the best option Non-Split
PUd;k or the four Split PUdþ1;4kþi ð8 i ¼ 0; . . .3Þ is selected.
5 Performance evaluation
With the aim of evaluating the proposed Fast Partitioning
and Mode Decision (FPMD) algorithm, the FPD algorithm
and the FMD algorithm for intra-prediction in HEVC were
implemented in the HEVC HM 16.6 reference software [14].
The non-modified HM 16.6 reference software was used as
anchor by using the same test sequences and encoding
parameters. The simulations were independently run for the
three-node configurations of the FPD algorithm, in order to
report the results by applying the algorithm in a scalable
way, starting with solely Node64, then Node64 ? Node32,
and finally Node64 ? Node32 ? Node16. FMD was evaluated based on the MDV and MDV-SW proposals. Then, all
combinations of FPD ? MDV-SW (Node64 ? MDV-SW,
Node64 ? Node32 ? MDV-SW
and
Node64 ? Node32 ? Node16 ? MDV-SW), were also activated to show
the simulation results for the FPMD algorithm, which
combines both proposals.
123
J Real-Time Image Proc
CTU
CTU Classifier
(Decision Trees Nodes)
ParonMap(d) = {PUd,i ,..,PUd,j}
ParonMap(d)
recommended by the JCT-VC [28] for the ‘‘All-Intra’’
mode configuration, and the Main profile (AI-Main). That
recommendation specifies the use of four QPs (QP22,
QP27, QP32 and QP37) and a set of 22 test sequences
classified in five classes, named from A to E, which cover
wide range of resolutions and frame rates. All the
sequences use 4:2:0 chroma sub-sampling and a bit-depth
of 8 bits.
The algorithm performance was evaluated in terms of
Computational Complexity Reduction (CCR) and Rate
Distortion (RD) performance, and both of them were
compared to the HM results. For the CCR measure, the
Time Saving metric was computed following (14):
Time Saving ð%Þ ¼
PUd,k
PUd+1,4k+i i=0,..,3
ð14Þ
MDV-SW (PU)
ModeList(PU) = Ci
RMD {ModeList(PU)}
3 Best Candidate Modes
RDO {PU, 3 candidate modes}
Concerning the RD performance, the average Peak
Signal to Noise Ratio (PSNR) metric was calculated for
each luma (Y_PSNR) and chroma components (U_PSNR,
V_PSNR). The YUV_PSNRs for the four QPs were used for
the computation of the RD performance by using the
Bjøntegaard Delta-Rate metric (BD-rate) defined by ITU
[41] and recommended in the CTC [28].
In order to obtain the increase in the BD-rate and the
increase in the Time Saving introduced by the fast mode
decision algorithm when it is combined in the architecture
of each node, the DBDrate and DT.Saving metrics are also
used, following (15) and (16), and
where
N ¼ 64; 64 þ 32; 64 þ 32 þ 16:
DBDrate ¼ BDrate NodeN
BDrate NodeN þMDV
DT:Saving ¼ T:SavingNodeN
RQT {PU, mode}
No
Enc:Time ðHM16:6Þ Enc:TimeðPropÞ
100
Enc:Time ðHM16:6Þ
Last PU
ð15Þ
SW
T:SavingNodeN þMDV
SW
ð16Þ
5.2 Simulation results
Yes
Best {PU size, TU size, Mode}
Last depth
No
Yes
End CTU
Fig. 10 Fast partitioning and mode decision (FPMD) algorithm
proposed
5.1 Encoding parameters and metrics
The experiments were conducted under the ‘‘Common Test
Conditions and Software Reference Configurations’’ (CTC)
123
Table 4 shows the experimental results of the FPD algorithm compared with the HM 16.6 reference software. It
can be observed that for the Node64 implementation, all
the sequences achieve a negligible 0.1% penalty in terms of
BD-rate, and an average time saving of around 12%. The
results for the Node64 ? Node32 decision tree implementation report much better time savings over 29%, and a
bit rate increase lower than 1%. Finally, the overall algorithm (Node64 ? Node32 ? Node16) shows a computational reduction of over 53%, increasing the bit rate penalty
by around 2.2%. It should be noted that we used only eight
frames for training the decision tree, which is a 0.081% of
the total frames that comprises 22 simulated sequences
(9780 frames). The overall experimental results confirm
that the FPD algorithm proposed can reduce the
J Real-Time Image Proc
Table 4 Performance results of the fast partitioning decision (FPD) algorithm
Classification
Sequence
Frames
N64
T. saving (%)
Class A (2560 9 1600)
Class B (1920 9 1080)
Class C (832 9 480)
Class D (416 9 240)
Class E (128 9 720)
N64 ? N32
BD-rate
(%)
T. saving (%)
N64 ? N32 ? N16
BD-rate
(%)
T. saving (%)
BD-rate
(%)
Traffic
150
11.60
0.0
29.39
0.9
56.61
1.6
PeopleOnStreet
150
12.19
0.0
28.03
0.6
50.09
1.1
BasketballDrive
BQTerrace
500
600
14.36
14.15
0.2
0.1
36.87
30.96
2.4
0.8
59.19
52.57
3.1
1.3
Cactus
500
13.18
0.1
29.77
1.0
56.95
2.1
Kimono
240
15.43
0.1
33.90
5.1
62.05
5.3
ParkScene
240
14.51
0.0
30.74
0.9
58.75
1.6
BasketballDrill
500
15.4
0.0
27.63
0.5
53.50
2.3
BQMall
600
10.27
0.0
26.88
0.8
50.76
2.5
PartyScene
500
10.89
0.0
22.08
0.1
45.48
2.1
RaceHorses
300
10.18
0.0
26.41
0.7
53.14
1.7
BasketballPass
500
10.25
0.0
23.54
0.6
45.66
2.1
BQSquare
600
5.98
0.0
21.11
0.3
35.61
1.1
BlowingBubbles
500
8.82
0.0
18.58
0.0
40.55
1.7
RaceHorses
300
6.80
0.0
19.75
0.4
43.20
1.8
FourPeople
600
8.13
0.0
33.88
0.8
56.64
2.0
Johnny
600
12.48
0.3
45.54
3.4
63.00
4.4
KristenAndSara
600
Class A
26.81
11.89
0.3
0.00
44.39
28.71
1.7
0.75
61.19
53.35
2.6
1.35
Class B
14.32
0.10
32.44
2.04
57.90
2.68
Class C
10.40
0.00
25.75
0.53
50.72
2.15
Class D
7.43
0.00
20.74
0.33
41.25
1.68
Class E
18.22
0.20
41.27
1.97
60.28
3.00
Average
12.42
0.1
29.82
1.2
53.08
2.2
computational complexity of HEVC intra-picture prediction by over 53% with a slight bit rate increase, favouring
real-time software and hardware implementation.
Table 5 shows the simulation results for the novel FMD
algorithm based on the MDV and MDV-SW proposals
compared with the HM 16.6 reference software. As can be
observed, with MDV-SW scheme the average time saving
has been reduced to a 29.7%, regarding the bit rate penalty,
it should be noted the average BD-rate is 0.4%.
Table 6 reports the simulation results individualized for
each possible combination, for the final Unified Architecture proposed based on FPMD algorithm, in terms of BDrate and Time Saving. As can be noted, the average speedup is now improved from the range of 12–53% obtained for
the FPD algorithm without the MDV-SW approach, as is
shown in Table 4, to the range of 41–67%. This enhancement is at the expense of bit a rate increase, where the BDrate is practically doubled compared with the FPD algorithm, due to the error introduced by the FMD algorithm.
An initial conclusion that can be drawn from this observation is that the penalty of the fast mode decision is not
additive to the penalty due to the fast partitioning decision;
instead, the error due to the wrong mode decision is
amplified when the wrong PU size classification decision is
given.
The first node, N64 ? MDV-SW, achieves an average
time saving of around 40%, which is quite similar for the
classes A, B, C and D, and only class E reaches a speed-up
of over 45%. In terms of rate penalty, the results are also
quite uniform, around 1%. Regarding the second node,
N64 ? N32 ? MDV-SW, the time saving increases by 15%
with respect to the N64 ? MDV-SW implementation,
achieving a notable average complexity reduction of
55.7%. The average BD-rate penalty is nearly doubled,
2.3%, compared with the previous node. Finally, the results
for the overall node implementation including the MDVSW approach, N64 ? N32 ? N16 ? MDV-SW, show a
considerable complexity reduction of 67%. In terms of rate
penalty, the sequences of class A obtain the best performance with a 2.5% BD-rate, which is nearly half the
average rate increase of 4.6%.
123
J Real-Time Image Proc
Table 5 Performance results of the fast mode decision (FMD) algorithm
Classification
Class A (2560 9 1600)
Class B (1920 9 1080)
Class C (832 9 480)
Class D (416 9 240)
Class E (128 9 720)
Sequence
Frames
MDV
MDV-SW
T. saving (%)
BD-rate (%)
T. saving (%)
BD-rate (%)
0.3
Traffic
150
30.87
0.5
30.43
PeopleOnStreet
150
30.34
0.9
29.84
0.4
BasketballDrive
500
31.34
0.6
31.14
0.2
BQTerrace
600
30.47
0.6
30.30
0.3
Cactus
500
30.33
0.8
30.12
0.4
Kimono
240
31.69
0.2
31.16
0.1
ParkScene
BasketballDrill
240
500
30.98
28.71
0.2
1.5
30.53
28.41
0.1
0.6
BQMall
600
29.69
1.0
29.21
0.5
PartyScene
500
29.16
1.0
28.64
0.7
RaceHorses
300
30.34
0.9
29.65
0.3
BasketballPass
500
30.39
1.3
29.11
0.6
BQSquare
600
28.27
1.4
28.27
1.0
BlowingBubbles
500
27.96
1.3
27.96
0.8
RaceHorses
300
29.04
1.4
28.10
0.6
FourPeople
600
30.53
0.8
30.33
0.3
Johnny
600
30.90
0.8
30.85
0.4
KristenAndSara
600
30.60
0.9
30.31
0.4
Class A
30.61
0.7
30.14
0.3
Class B
30.96
0.5
30.65
0.2
Class C
29.48
1.1
28.98
0.6
Class D
Class E
28.92
30.68
1.3
0.8
28.36
30.50
0.7
0.4
Average
30.10
0.9
29.70
0.4
Table 7 summarizes the results in terms of the DBD-rate
and DTime Saving for the three nodes. In terms of complexity reduction, the MDV-SW algorithm obviously provides the highest reduction for the first node, N64, of 30%,
which is practically the speed-up achieved when MDV-SW
is applied alone.
However, for the full node implementation, namely
N64 ? N32 ? N16, the speed-up of the MDV-SW is just
around 15%, because the fast mode partitioning has already
reduced the complexity by 50%, so the 30% speed-up due
to the fast mode decision just affects the remaining 50% of
the computational burden. Therefore, the benefits of the
fast mode decision are masked when high complexity
reduction is achieved by the fast partitioning decision.
The behaviour of the rate penalty is quite different from
the time saving. Unexpectedly, for the first two nodes,
N64 ? MDV-SW and N64 ? N32 ? MDV-SW, the rate
increase due to the fast mode decision is practically the
same, 1%. The BD-rate obtained for the MDV-SW standalone implementation, 0.4%, is practically doubled when
it is computed jointly with the fast partitioning mode, for
both nodes.
123
Nevertheless, in the overall implementation, namely
N64 ? N32 ? N16 ? MDV-SW, the BD-rate increase due
to MDV-SW is multiplied by 6 compared with MDV-SW
alone, an increase of 2.3% with respect to the rate penalty
of the same node without the MDV-SW approach.
5.3 Comparison with other fast intra-prediction
algorithms
In Sect. 3, several fast intra-prediction algorithms were
described. In this subsection, a performance comparison
between those proposals and the FPMD proposed in this
paper is made. The simulation results have been reported in
Table 8 using the same JCT-VC test sequences, CTCs and
performance metrics, in order to show a fair comparison.
The Sun et al. algorithm [16] can be considered the best
performing algorithm in the balance of time saving, 50%,
and bit rate penalty, 2.3%. The Sun et al. proposal outperforms the FPMD (N64 ? MDV-SW) implementation in
terms of encoder time reduction, with 50% instead of
41.67%, but the bit rate penalty also increases by over
1.2%.
J Real-Time Image Proc
Table 6 Performance results of fast partitioning and mode decision (FPMD) algorithm compared to HM16.6
Classification
Class A
(2560 9 1600)
Class B
(1920 9 1080)
Class C (832 9 480)
Class D (416 9 240)
Class E (128 9 720)
Sequence
Frames
N64 ? MDV-SW
N64 ? N32 ? MDV-SW
N64 ? N32 ? N16 ? MDVSW
T. saving
(%)
BD-rate
(%)
T. saving
(%)
BD-rate
(%)
T. saving (%)
BD-rate (%)
Traffic
150
41.09
0.9
55.98
2.0
69.55
2.0
PeopleOnStreet
150
41.09
0.9
55.98
1.6
66.88
3.0
BasketballDrive
500
42.56
0.9
59.31
3.2
70.55
8.0
BQTerrace
600
42.86
0.7
56.70
1.4
67.41
6.5
Cactus
500
42.00
1.0
55.67
2.1
69.66
3.2
Kimono
ParkScene
240
240
44.08
43.20
1.1
0.8
57.92
56.82
6.7
1.8
72.23
70.64
7.6
2.0
BasketballDrill
500
39.41
1.0
54.09
1.5
67.11
2.5
BQMall
600
40.46
1.2
54.65
2.1
66.29
4.7
PartyScene
500
40.43
1.3
52.47
1.4
63.68
3.7
RaceHorses
300
39.60
0.7
53.74
1.5
67.15
5.0
BasketballPass
500
37.97
1.2
50.61
1.9
62.35
3.7
BQSquare
600
39.01
1.6
50.30
1.8
57.62
5.5
BlowingBubbles
500
38.36
1.3
48.65
1.4
60.02
3.4
RaceHorses
300
39.20
1.1
49.34
1.5
61.69
3.5
FourPeople
600
42.14
1.1
58.34
1.9
69.74
4.3
Johnny
600
50.23
1.3
64.72
4.8
72.96
8.8
KristenAndSara
600
43.73
1.4
63.63
2.8
71.74
4.6
Class A
41.78
1.0
55.98
1.8
68.24
2.5
Class B
42.94
0.9
57.30
3.0
70.14
5.5
Class C
Class D
39.97
38.64
1.1
1.3
53.75
49.74
1.6
1.7
66.09
60.46
4.0
4.0
Class E
45.48
1.3
62.33
3.2
71.51
5.9
Average
41.67
1.1
55.71
2.3
67.34
4.6
Finally, the reported results prove that the proposed
FPMD algorithm for the overall implementation
(N64 ? N32 ? N16 ? MDV-SW) achieves the highest
time saving for intra-prediction coding of over 67% compared with the HEVC reference model, and it outperforms
the best proposal in the balance of complexity reduction
and bit rate penalty.
6 Conclusions
In this paper, we have presented a Unified Architecture to
form a novel fast HEVC intra-prediction coding algorithm,
denoted as Fast Partitioning and Mode Decision (FPMD),
which combines a Fast Partitioning Decision (FPD)
algorithm and a Fast Mode Decision (FMD) algorithm. The
FPD algorithm comprised a three-node decision tree using
low complexity attributes, allowing an early CU classification in terms of optimal CTU partitioning, thereby
reducing the number of PU sizes to be checked by the
RMD and RDO stages. Moreover, the algorithm can be
implemented in a scalable way by combining the first node,
the first two nodes, or all three nodes, achieving different
levels of coding performance. The FMD algorithm is based
on the texture orientation detection, by analyzing the MDV
along the digital co-lines. Instead of 33 angular directions
as is defined in HEVC, we have used twelve co-lines with
rational slopes. The orientation with lower variance is
selected as the dominant texture orientation, and a reduced
number of directional candidate modes are selected to be
further processed by the RDO stage. These approaches can
be combined to form the Unified Architecture using any
combination of nodes, obtaining a wide range of time
savings, from 44 to 67%, and light BD-rate penalties from
1.1 to 4.6%, with respect to HM 16.6. Comparisons with
similar state-of-the-art works show the proposed architecture achieves the best trade-off between complexity
reduction and rate distortion.
123
J Real-Time Image Proc
Table 7 Performance differences between the non-combined proposals and the combined proposals for each node
Classification
Class A (2560 9 1600)
Class B (1920 9 1080)
Class C (832 9 480)
Class D (416 9 240)
Class E (128 9 720)
Sequence
N64 vs. N64 ? MDV-SW
N64 ? N32 vs.
N64 ? N32 ? MDV-SW
N64 ? N32 ? N16 vs.
N64 ? N32 ? N16 ? MDVSW
DT.Saving
(%)
DT.Saving
(%)
DT.Saving
(%)
DBD-rate
(%)
DBD-rate
(%)
DBD-rate
(%)
Traffic
29.49
0.90
26.59
1.02
12.94
0.40
PeopleOnStreet
30.27
0.99
27.95
1.04
16.79
1.88
BasketballDrive
28.20
0.72
22.44
0.79
11.35
4.93
BQTerrace
28.71
0.65
25.75
0.68
14.84
5.22
Cactus
28.82
0.91
25.89
1.03
12.71
1.07
2.31
Kimono
28.65
0.96
24.02
1.52
10.18
ParkScene
28.68
0.78
26.08
0.91
11.89
0.37
BasketballDrill
29.14
1.04
26.46
1.06
13.62
0.15
BQMall
29.57
1.19
27.77
1.27
15.52
2.26
PartyScene
RaceHorses
30.24
29.35
1.33
0.73
30.39
27.34
1.34
0.78
18.19
14.01
1.61
3.31
BasketballPass
31.98
1.19
27.08
1.24
16.69
1.59
BQSquare
30.19
1.56
29.21
1.57
22.01
4.42
BlowingBubbles
31.55
1.32
30.07
1.35
19.46
1.67
RaceHorses
31.07
1.0
29.59
1.11
18.49
1.67
FourPeople
29.66
1.08
24.45
1.13
13.10
2.36
4.39
Johnny
23.42
1.03
19.18
1.39
9.96
KristenAndSara
28.36
1.12
19.24
1.15
10.54
1.94
Class A
29.88
0.95
27.26
1.03
14.78
1.14
Class B
28.61
0.81
24.80
0.99
12.12
2.78
Class C
29.58
1.07
27.96
1.11
15.26
1.83
Class D
31.20
1.28
28.97
1.32
19.09
2.34
Class E
27.02
1.08
20.83
1.23
11.14
2.90
Average
29.25
1.03
25.89
1.13
14.26
2.31
Table 8 Performance
comparison between FPMD and
the related works
123
Time saving (%)
BD-rate (%)
Sun [16]
50
2.30
Huang [17]
20
0.50
Cen [18]
16
2.80
Tian [19]
29
0.50
Khan [20]
42
1.20
Jiang [21]
20
0.74
Chen [22]
37.6
1.65
Yan [23]
23.5
1.30
Silva [24]
20
0.90
Yao [25]
36.2
1.86
Shen [26]
21
1.70
Kim [27]
30.5
1.20
FPMD (N64 ? MDV-SW)
41.67
1.10
FPMD (N64 ? N32 ? MDV-SW)
55.71
2.30
FPMD (N64 ? N32 ? N16 ? MDV-SW)
67.30
4.60
J Real-Time Image Proc
Acknowledgements This work was supported by the MINECO and
European Commission (FEDER funds) under the Projects TIN201238341-C04-04 and TIN2015-66972-C5-2-R.
References
1. High Efficiency Video Coding, Rec. ITU-T H.265 and ISO/IEC
23008-2 (2013)
2. Advanced Video Coding for Generic Audiovisual Services, Rec.
ITU-T H.264 and ISO/IEC 14496-10 (MPEG-4 AVC) (2012)
3. Ohm, J., Sullivan, J.G.J., Schwarz, H., Thiow, T., Wiegand, T.:
Comparison of the coding efficiency of video coding standards—
including high efficiency video coding (HEVC). IEEE Trans.
Circuits Syst. Video Technol. 22(12), 1669–1684 (2012)
4. Prabhakar, B., Reddy, D.K.: Analysis of video coding standards
using PSNR and bit rate saving. In: International Conference on
Signal Processing and Communication Engineering Systems
(SPACES), pp. 306–308 (2015)
5. Nguyen, T., Marpe, D.: Performance analysis of HEVC-based
intra coding for still image compression. In: Picture Coding
Symposium (PCS), pp. 233–236 (2012)
6. Il-Koo, K., Min, J., Lee, T., Woo-Jin, H., JeongHoon, P.: Block
partitioning structure in the HEVC standard. IEEE Trans. Circuits
Syst. Video Technol. 22(12), 1697–1706 (2012)
7. Min, J., Lee, S., Kim, I., Han, W.-J., Lainema, J., Ugur, K.:
Unification of the directional intra prediction methods in TMuC.
In: JCTVC-B100, Geneva, Switzerland (2010)
8. Sullivan, G.J., Wiegand, T.: Rate-distortion optimization for video
compression. IEEE Signal Process. Mag. 15(6), 74–90 (1998)
9. Bossen, F., Bross, B., Sühring, K., Flynn, D.: HEVC complexity
and implementation analysis. IEEE Trans. Circuits Syst. Video
Technol. 22, 1685–1696 (2012)
10. Correa, G., Assuncao, P., Agostini, L., da Silva Cruz, L.A.:
Complexity control of high efficiency video encoders for powerconstrained devices. IEEE Trans. Consum. Electron. 57(4),
1866–1874 (2011)
11. Khan, M., Shafique, M., Grellert, M., Henkel, J.: Hardwaresoftware collaborative complexity reduction scheme for the
emerging HEVC intra encoder. In: Design, Automation & Test in
Europe Conference & Exhibition (DATE), pp. 125–128, 18–22
March 2013
12. Sullivan, G.J., Ohm, J., Woo-Jin, H., Wiegand, T.: Overview of
the high efficiency video coding (HEVC) standard. IEEE Trans.
Circuits Syst. Video Technol. 22(12), 1649–1668 (2012)
13. Lainema, J., Bossen, F., Han, W.-J., Min, J., Ugur, K.: Intra
coding of the HEVC standard. IEEE Trans. Circuits Syst. Video
Technol. 22(12), 1792–1801 (2012)
14. Joint Collaborative Team on Video Coding Reference Software,
ver. HM 16.6. https://hevc.hhi.fraunhofer.de/
15. Piao, Y., Min, J.H., Chen, J. (2010) Encoder improvement of
unified intra prediction. In: JCTVC-C207, JCT-VC of ISO/IEC
and ITU-T, Guangzhou, China (2010)
16. Sun, H., Zhou, D., Goto, S.: A low-complexity HEVC intra
prediction algorithm based on level and mode filtering. In: IEEE
International Conference on Multimedia and Expo (ICME),
pp. 1085–1090
17. Huang, H., Zhao, Y., Lin, C., Bai, H.: Fast bottom-up pruning for
HEVC intraframe coding. In: Visual Communications and Image
Processing (VCIP), pp. 1–5 (2013)
18. Cen, Y., Wang, W., Yao, X.: A fast CU depth decision mechanism for HEVC. Inf. Process. Lett. 115(9), 719–724 (2015)
19. Tian, G., Goto, S.: Content adaptive prediction unit size decision
algorithm for HEVC intra coding. In: Picture Coding Symposium
(PCS), pp. 405–408 (2012)
20. Khan, M., Shafique, M., Henkel, J.: An adaptive complexity
reduction scheme with fast prediction unit decision for HEVC
intra encoding. In: IEEE International Conference on Image
Processing (ICIP), pp. 1578–1582 (2013)
21. Jiang, W., Hanjie, M., Chen, Y.: Gradient based fast mode
decision algorithm for intra prediction in HEVC. In: International
Conference on Consumer Electronics, Communications and
Networks (CECNet), pp. 1836–1840 (2012)
22. Chen, G., Liu, Z., Ikenaga, T., Dongsheng, W.: Fast HEVC intra
mode decision using matching edge detector and kernel density
estimation alike histogram generation. In: IEEE International
Symposium on Circuits and Systems (ISCAS), pp. 53–56 (2013)
23. Yan, S., Hong, L., He, W., Wang, Q.: Group-based fast mode
decision algorithm for intra prediction in HEVC. In: International
Conference on Signal Image Technology and Internet Based
Systems (SITIS), pp. 225–229 (2012)
24. da Silva, T.L., Agostini, L.V., da Silva Cruz, L.A.: Fast HEVC
intra prediction mode decision based on EDGE direction information. In: European Signal Processing Conference (EUSIPCO),
pp. 1214–1218 (2012)
25. Yao, Y., Xiaojuan, L., Yu, L.: Fast intra mode decision algorithm
for HEVC based on dominant edge assent distribution. Multimed.
Tools Appl. J. 75, 1–19 (2014)
26. Shen, L., Zhang, Z., An, P.: Fast CU size decision and mode
decision algorithm for HEVC intra coding. IEEE Trans. Consum.
Electron. 59(1), 207–213 (2013)
27. Kim, Y., Jun, D., Jung, S., Soo Choi, J., Kim, J.: A fast intraprediction method in HEVC using rate-distortion estimation
based on Hadamard transform. ETRI J. 35(2), 270–280 (2013)
28. Bossen, F.: Common test conditions and software reference
configurations, document JCTVC-L1100, ITU-T/ISO/IEC Joint
Collaborative Team on Video Coding (JCT-VC), 12th Meeting:
Geneve, CH, 14–23 Jan 2013
29. ITU-T Recommendation P.910: Subjective Video Quality
Assessment Methods for Multimedia Applications. International
Telecommunication Union, Geneva (1999)
30. Pratt, W.K.: Digital Image Processing: PIKS Inside, 3rd edn.
Wiley, New York (2001)
31. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P.,
Witten, I.H.: The WEKA data mining software: an update.
SIGKDD Explor. 11(1), 10–18 (2003)
32. Chan, T.F., Golub, G.H., LeVequ, R.J.: Updating formulae and a
pairwise algorithm for computing sample variances. Technical
Report. Stanford University, Stanford, CA, USA (1979)
33. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan
Kaufmann, San Francisco (1993)
34. Chen, L., Lin, J.: A study on review manipulation classification
using decision tree. In: International Conference on Service Systems and Service Management (ICSSSM), pp. 680–685 (2013)
35. Fernández-Escribano, G., Kalva, H., Cuenca, P., Orozco-Barbosa,
L., Garrido, A.: A fast MB mode decision algorithm for MPEG-2
to H.264 P-frame transcoding. IEEE Trans. Circuits Syst. Video
Technol. 18(2), 172–185 (2008)
36. Hulse, J.V., Khoshgoftaar, T.M., Napolitano, A.: Experimental
perspectives on learning from imbalanced data. In: Proceedings
of the 24th International Conference on Machine Learning,
pp. 935–942 (2007)
37. Gupta, S., Mazumdar, S.G.: Sobel edge detection algorithm. Int.
J. Comput. Sci. Manag. Res. 2(2), 1578–1583 (2013)
38. Jayachandra, D., Makur, A.: Directional variance: a measure to
find the directionality in a given image segment. In: IEEE
International Symposium on Circuits and Systems (ISCAS),
pp. 1551–1554 (2010)
39. Lei, Z., Makur, A.: Enumeration of downsampling lattices in twodimensional multirate systems. IEEE Trans. Signal Process.
56(1), 414–418 (2008)
123
J Real-Time Image Proc
40. Velisavljevic, V., Beferull-Lozano, B., Vetterli, M., Dragotti,
P.L.: Directionlets: anisotropic multidirectional representation
with separable filtering. IEEE Trans. Image Process. 15(7),
1916–1933 (2006)
41. Bjøntegaard, G.: Calculation of average PSNR differences
between RD-curves. ITU-T SG16 Q.6 Document, VCEG-M33,
Austin, US (2001)
Damian Ruiz received his B.S
and M.S degree in Electrical
Engineering from the Universidad Politècnica de Madrid
(UPM), Spain, and the Ph.D.
degree from the University of
Castilla-La Mancha (UCLM),
Albacete, Spain, in 2000 and
2016, respectively. In 2012, he
joined the Mobile Communication Group (MCG) at the
Polytechnic
University
of
Valencia (UPV), Valencia,
Spain. In 2017, he joined the
Department of Signal and
Communications Theory at the King Juan Carlos University, Madrid,
Spain, where he is currently Associate Ph.D. Professor. His research
interests include image and video coding, machine learning and
perceptual video quality. He has over 25 publications in these areas in
international refereed journals and conference proceedings. He has
also been a visiting researcher at the Florida Atlantic University, Boca
Raton (USA).
Gerardo Fernández-Escribano received the M.Sc. degree
in Computer Engineering and
the Ph.D. degree from the
University of Castilla-La Mancha (UCLM), Albacete, Spain,
in 2003 and 2007, respectively.
In 2008, he joined the Department of Computer Systems at
the UCLM, where he is currently an Associate Ph.D. Professor at the School of Industrial
Engineering.
His
research
interests include multimedia
standards, video transcoding,
video compression, video transmission and machine learning mechanisms. He has also been a visiting researcher at the Florida Atlantic
University, Boca Raton (USA), and at the Friedrich Alexander
Universität, Erlangen-Nuremberg (Germany).
123
José Luis Martı́nez (M’07)
received his M.S and Ph.D.
degrees in Computer Science
and Engineering from the
University of Castilla-La Mancha, Albacete, Spain in 2007
and 2009, respectively. In 2005,
he joined the Department of
Computer Engineering at the
University of Castilla-La Mancha, where he was a researcher
with the Computer Architecture
and Technology group at the
Albacete research institute of
informatics (I3A). In 2010, he
joined the department of Computer Architecture at the Complutense
University in Madrid where he was assistant lecturer. In 2011, he
rejoined the Department of Informatics Systems of the University of
Castilla-La Mancha, where he is currently assistant lecturer. His
research interests include video coding, video standards, video
transcoding and parallel video processing. He has also been a visiting
researcher at the Florida Atlantic University, Boca Raton (USA) and
Centre for Communication System Research (CCSR) at the University of Surrey, Guildford (UK). He has over 70 publications in these
areas in international refereed journals and conference proceedings.
Pedro Cuenca received his
M.Sc. degree in Physics (Electronics and Computer Science,
award extraordinary) from the
University of Valencia in 1994.
He received his Ph.D. degree in
Computer Engineering in 1999
from the Polytechnic University
of Valencia. In 1995, he joined
the Department of Computer
Engineering at the University of
Castilla-La Mancha (UCLM),
where he is currently a Full
Professor of Computer Architecture and Dean of the Faculty
of Computer Engineering. His research topics are centred in the area
of video compression, QoS video transmission and video applications
for multicore and GPU architectures. He has published over 100
papers in international journals and conferences. He has also been a
visiting researcher at Nottingham Trent University, University of
Ottawa and University of Surrey. He has served in the organization of
International Conferences as Chair and Technical Program Chair. He
was the Chair of the IFIP 6.8 Working Group during the 2006–2012
period.