View metadata, citation and similar papers at core.ac.uk
brought to you by
CORE
provided by INRIA a CCSD electronic archive server
Software description and configuration
Christine Guillemot, Laurent Guillo, Marco Cagnazzo, Giuseppe Valenzise,
Béatrice Pesquet-Popescu
To cite this version:
Christine Guillemot, Laurent Guillo, Marco Cagnazzo, Giuseppe Valenzise, Béatrice Pesquet-Popescu.
Software description and configuration. 2013, pp.11. hal-00935612
HAL Id: hal-00935612
https://hal.archives-ouvertes.fr/hal-00935612
Submitted on 23 Jan 2014
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Projet PERSEE
S hémas Per eptuels et Codage vidéo 2D et 3D
ANR-09-BLAN-0170
Livrable D5.3
26/09/2013
Software des ription and onguration
Christine
Laurent
GUILLEMOT
IRISA
GUILLO
IRISA
Mar o
CAGNAZZO
Giuseppe
VALENZISE
LTCI
PESQUET-POPESCU
LTCI
Béatri e
LTCI
Contents
2
Contents
1 DCR
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Matlab fun tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Modied en oder len ode.exe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
4
4
5
2 WTM
2.1 Qui k overview . . . . . . . . . . . . .
2.2 Some implementation details . . . . .
2.2.1 WTM a tivation . . . . . . . .
2.2.2 Template shape signalling . . .
2.2.3 Number and dimensions of sear
2.3 Conguration . . . . . . . . . . . . . .
2.4 Example of use . . . . . . . . . . . . .
6
6
7
7
8
9
9
9
Referen es
Software nal
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
h windows
. . . . . . .
. . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
D5.3
Contents
3
Introdu tion
This do ument gathers information about onguration and instru tions for use of two
software arried out during the Persee proje t: DCR and WTM.
Software nal
D5.3
DCR
4
1 DCR
The Don't Care Region (DCR) approa h is based on the idea that in multiple-viewplus-depth video, depth maps are not dire tly viewed, but are only used to provide
geometri
information for view synthesis at de oder. Thus, as long as the resulting
geometri
error does not lead to una
eptable quality for the synthesized view, ea h
depth pixel only needs to be re onstru ted at the de oder
oarsely within a tolerable
range. We rst formalize the notion of tolerable range per depth pixel asDonât Care
Region (DCR), by studying the synthesized viewdistortion sensitivity to the pixel value
a sensitive depth pixel will have a narrow DCR, and vi e versa.
1.1 Overview
We now dene per-pixel DCRs for depth map
n.
Dn ,
assuming target synthesized view is
In the following, we will rather refer to the disparity eld
from the depth on e the
vn ,
with asso iated disparity value
in view
n+1
dn , whi h an be obtained
vn (i, j) in texture map
amera parameters are known. A pixel
dn (i, j),
an be mapped to a
through a view synthesis fun tion
s(i, j; dn (i, j)).
orresponding pixel
In the simplest
ase
s(i, j; dn (i, j))
orresponds to a pixel in texture map vn+1 of view n + 1 displa ed in the x-dire tion
by an amount proportional to dn (i, j). The view synthesis error, ε(i, j; d), an thus
where the views are
aptured by purely horizontally shifted
ameras,
be dened as the absolute error between re onstru ted and original pixel value, given
ε(i, j; d) = |s(i, j; d) − vn (i, j)| . If dn is ompressed,
d˜n (i, j) employed for view synthesis may dier from
˜
dn (i, j) by an amount e(i, j) = dn (i, j) − dn (i, j), resulting in a (generally larger) view
synthesis error ε(i, j; dn (i, j) + e(i, j)) > ε(i, j; dn (i, j)). We dene the Don't Care
Region DCR(i, j) = [DCRlow (i, j), DCRup (i, j)] as the largest
ontiguous interval of
disparity values ontaining the ground-truth disparity dn (i, j), su h that the view
synthesis error for any point of the interval is smaller than ε(i, j; dn (i, j)) + τ , for a
given threshold τ > 0. Note that DCR intervals are dened per pixel, thus giving
disparity
d
for pixel
(i, j);
i.e.,
the re onstru ted disparity value
pre ise information about how mu h error
The DCR information
an be tolerated in the disparity maps.
an be then used in order to perform a more ee tive motion
estimation, to en ode the predi tion residual, and to enhan e the use of the SKIP
mode. We have implemented in Matlab a fun tion that
MVD sequen e, and save them into a binary le that
an generate the DCR's of a
an in turn be used by a modied
H.264/MPEG-4 AVC en oder (JM referen e software v. 18.0)
1.2 Matlab fun tions
[DCR_low DCR_up℄ = generate_DCR(isLeft, thres, depth_s ale)
Inputs
isLeft:
binary ag indi ating whether the DCT is to be
omputed for the left
or for the rigth view. E.g., if you have views 3 and 5 in Kendo, and you
have to synthesize view 4, you shall use isLeft = 1 to generate the DCR
Software nal
D5.3
DCR
5
needed to en ode disparity of view 3, and isLeft = 0 to generate the DCR
relative to disparity of view 5. Note that DCR is dened as the worst- ase
errror when you have texture and depth from one same view (e.g. texture
and depth from view 3) and you want to generate the other view (view 5)
thres:
it is the
τ
in the report (and in our PCS paper): the highest tolerated
threshold to dene the DCR. We advi e to use
depth_s ale:
τ =5
or similar values
it is the s aling fa tor from depth to disparity. It must be known
a priori. It depends on the sequen e, that should in any
ase be re tied.
For Kendo, the value to use is 0.204
Outputs
DCR_low
and
DCR_up,
respe tively the lower and upper bound of DCR interval
per pixel. In order to be used by the en oder, they must be
onverted into
a binary le by using write_DCR_bin.m
Note that this fun tion generates a DCR for a single image. So you need to
all it
image-by-image for an entire sequen e. The inputs of the s ripts are the texture and
the depth of the right and left view, that should be passed as png les with names
(left_depth.png, left_texture.png, et .)
write_DCR_bin.m
This s ript writes a matrix (M
× N × F , with M × N
the number of frames) into a le that
is the spatial resolution and
F
an be used by the en oder. The DCR must be
DCR_low and DCR_up; they will be written into two les
DCR_low.bin and DCR_up.bin Please note that generate_DCR only generates
stored into variables named
named
one 2D matrix. It is up to the user to generated as many matri es as you need and to
sta k them into a 3D matrix.
1.3 Modied en oder len ode.exe
This is the modied version of H.264 (JM v.18.0) using the les
DCR_up.bin produ
ed by the
write_DCR_bin.m fun
DCR_low.bin
and
tion in order to perform the DCR-
based en oding as des ribed in the delivrable D4.3 and in the PCS paper. It works in
baseline mode, with some restri tions, as one
le. In parti ular, the RDO should be high
an see from the
en oder_baseline. fg
omplexity and the motion estimation
should be full sear h.
Software nal
D5.3
WTM
6
2 WTM
WTM is an intra predi tion method based on a linear ombination of template mat hing predi tors. The method was previously des ribed in [1℄. After a qui k reminder, the
following se tions presents how some details pe uliar to this method were implemented
and how to ongure WTM. An example of use is then given.
2.1 Qui k overview
WTM aims at providing an intra predi tion for blo ks of 4x4, 8x8, 16x16 and 32x32
sizes. This predi tion is based on a linear ombination of template mat hing predi tors
belonging to the ausal neighbourhood.
Figure 1: Sear h regions from ausal neighbourhood.
Then, N blo ks Bi surrounded by the best mat hing areas are used to ompute
predi tors Pi , whi h are then averaged to get the predi tion P of the blo k B :
P =
N
1 X
Pi
N i=1
(1)
WTM relies on this general approa h but there are three main enhan ements:
• it uses 4 dierent template shapes whatever the blo k size: the traditional L-
shape whi h is 1 pixel large and three other shapes with the left, the top part of
both an be 4 pixel large. However, only one template shape is used to determine
all template predi tors.
• the orrelation fa tors is based on the dot produ t between the template and the
template predi tors.
• template predi tors are not sear hed within all the ausal neighbourhood but
within only two or three sear h windows. The number of sear h windows is
Software nal
D5.3
WTM
7
related to the rank of the blo k to be predi ted within the predi tion unit (PU)
and their size depends on the size of the blo k to be predi ted.
For more details about these three hara teristi s see [1℄.The following se tions gives
information about how they have been used and implemented.
Figure 2: Shape of templates.
Figure 3: Sear h windows positions relatively to blo k B.
2.2 Some implementation details
Distin tive features listed in the previous se tions lead to the following hoi es of
implementation.
2.2.1 WTM a tivation
WTM is not always a tivated for all PU sizes. Is a tivation depends on the lass of
videos belonging to the orpus provided by JCT-VC and the PU sizes.
The ases for whi h WTM is a tivated are listed in the Table 1
Software nal
D5.3
WTM
8
Table 1: A tivation of WTM a ording to video lasses and PU sizes
HE
LC
4x4 8x8 16x16 32x32 64x64 4x4 8x8 16x16 32x32 64x64
Class A
X
X
X
X
X
Class B X
X
X
X
X
X
X
Class C X
X
X
X
X
Class D X
X
X
X
X
Class E X
X
X
X
X
X
X
Class F X
X
X
X
X
X
X
Table 2: Relation between intra mode and shape of template
INTRA mode Shape
10
UDL
11
U
12
L
13
UL
2.2.2 Template shape signalling
The 4 template shapes are available. Consequently, two pie es of information must
be signalled to the de oder: when a blo k is predi ted with WTM and whi h shape
of template was used. To do so, four dire tion modes have been overloaded: from the
mode 10 up to the mode 13. They are asso iated to a shape ( f. Fig.1) as listed in
Table 2. An extra bit is added for all of these four modes and set to true if WTM is
used as des ribed in Fig. 4.
Figure 4: Signaling taking into a ount WTM.
Software nal
D5.3
WTM
Blo k B size
4x4
8x8
16x16
32x32
9
Table 3: Sear h areas hara teristi s
Sear h windows number Sear h windows width
3
12
2
20
2
8
2
4
Sear h windows height
4
8
16
32
2.2.3 Number and dimensions of sear h windows
The number of sear h windows and also their size depond on the size of the blo k to
be predi ted.
The hara teristi s of the sear h areas are summarized in Table 3.
2.3 Conguration
The WTM algorithm is written on top of the test model of HEVC, release 4.0. So, the
onguration les dedi ated to WTM are based on the HTM-4.0 all intra en oding
onguration le.
A se tion is added to spe ify parameters relation to WTM. In parti ular, this se tions
indi ates whether:
• WTM is a tivated or not
• 4x4, 8x8, 16x16, 32x32, WTM predi tion are a tivated
An optional parameter, STMObserver, an be set to generate statisti s or predi tion
maps.
The following ex erpt of a onguration le lists the parameters related to STM.
#============ WTM ================
STM : 1 # 0 : unsed, 1: a tivated
STM4x4
STM8x8
STM16x16
STM32x32
:
:
:
:
1
1
1
1
#
#
#
#
Predi
Predi
Predi
Predi
tion
tion
tion
tion
a
a
a
a
tivated
tivated
tivated
tivated
for
for
for
for
4x4 blo k
8x8 blo k
16x32 blo
32x32 blo
size
size
k size
k size
STMObserver : 3 # 0:unused, 1: % of sele tion, 2: stats files, 3: output frames
The other se tions of the onguration le are kept un hanged.
2.4 Example of use
To build the sofware, refer to the "how-to" provided in the deliverable 3.4 and uploaded
to the website of the Persee proje t (http://persee.ir yn.e -nantes.fr/prive/).
On e built, the en oding is laun hed with the following ommand:
Software nal
D5.3
WTM
10
TAppEn oder - tests. fg
where "tests. fg" is the onguration le.
To de ode the en oded video, just enter the following ommand:
TAppDe oder -b str.bin -o de .yuv
where "str.bin" is the en oded video (the name was spe ied in the en oding onguration le) and "de .yuv" the name of the de oded video.
Software nal
D5.3
Referen es
11
Referen es
[1℄ Persee, 2d
oding tools nal report, ANR-09-BLAN-0170, Delivrable D 3.4, July
2013.
Software nal
D5.3