37073
37073
37073
ABSTRACT
VP8 is an open source video compression format supported by a
consortium of technology companies. This paper provides a
technical overview of the format, with an emphasis on its unique
features. The paper also discusses how these features benefit VP8
in achieving high compression efficiency and low decoding
complexity at the same time.
Index TermsVP8, WebM, Video Codec, Web Video
1. INTRODUCTION
In May 2010, Google announced the start of a new open media
project WebM, which is dedicated to developing a high-quality,
open media format for the web that is freely available to everyone.
At the core of the project is a new open source video compression
format, VP8. The VP8 format was originally developed by a small
research team at On2 Technologies, Inc. as a successor of its VPx
family of video codecs. Compared to other video coding formats,
VP8 has many distinctive technical features that help it to achieve
high compression efficiency and low computational complexity for
decoding at the same time. Since the WebM announcement, not
only has VP8 gained strong support from a long list of major
industry players, but it has also started to attract broad interest in
the video coding research community from both industry and
academia.
This paper aims to provide a technical overview of the VP8
compression format, with an emphasis on VP8s unique features.
Section 2 briefly reviews VP8s design assumptions and overall
architecture; section 3 to section 7 describes VP8s key technical
features: transform and quantization scheme, reference frame
types, prediction techniques, adaptive loop filtering, entropy
coding and parallel processing friendly data partitioning; section 8
provides a short summary with experimental results and some
thoughts on future work.
2. DESIGN ASSUMPTIONS AND FEATURE HIGHLIGHTS
From the very beginning of VP8s development, the developers
were focused on Internet/web-based video applications. This focus
has led to a number of basic assumptions in VP8s overall design:
Low bandwidth requirement: One of the basic design
assumptions is that for the foreseeable future, available network
bandwidth will be limited. With this assumption, VP8 was
specifically designed to operate mainly in a quality range from
watchable video (~30dB in the PSNR metric) to visually
lossless (~45dB).
Heterogeneous client hardware: There is a broad spectrum of
client hardware connected to the web, ranging from low power
mobile and embedded devices to the most advanced desktop
computers with many processor cores. It must, therefore, be
Where X and Y are the 4x4 size input and output and H is defined
as:
(a)
(b)
Fig. 3. Illustration of VP8 inter prediction mode SPLITMV
In Fig. 3 (a), New represents a 4x4 bock coded with a new motion
vector, and Left and Above represent a 4x4 block coded using the
motion vector from the left and above, respectively. This example
effectively partitions the 16x16 macroblock into three different
segments with three different motion vectors (represented by 1, 2
and 3), as seen in Fig. 3 (b).
5.3 Sub-pixel Interpolation
VP8s motion compensation uses quarter pixel accurate motion
vectors for luma pixels. The sub-pixel interpolation of VP8
features a single-stage interpolation process and a set of high
performance six-tap interpolation filters. The filter taps used for
the six tap filters are:
[3, -16, 77, 77, -16, 3]/128 for pixel positions
[2, -11, 108, 36, -8, 1]/128 for pixel positions
[1, -8, 36, 108, -11, 2]/128 for pixel positions
Chroma motion vectors in VP8 are calculated from their luma
counterparts by averaging motion vectors within a macroblock, and
have up to one eighth pixel accuracy. VP8 uses four-tap bicubic
filters for the 1/8, 3/8, 5/8 and 7/8 pixel positions. Overall, the VP8
interpolation filtering process achieves optimal frequency response
with high computation efficiency.
6. ADAPTIVE LOOP FILTERING
Loop filtering is a process of removing blocking artifacts
introduced by quantization of the DCT coefficients from block
transforms. VP8 brings several loop-filtering innovations that
speed up decoding by not applying any loop filter at all in some
situations. VP8 also supports a method of implicit segmentation
where different loop filter strengths can be applied for different
parts of the image, according to the prediction modes or reference
frames used to encode each macroblock. For example it would be
possible to apply stronger filtering to intra-coded blocks and at the
same time specify that inter coded blocks that use the Golden
Frame as a reference and are coded using a (0,0) motion vector
should use a weaker filter. The choice of loop filter strengths in a
variety of situations is fully adjustable on a frame-by-frame basis,
so the encoder can adapt the filtering strategy in order to get the
best possible results. In addition, similar to the region-based
adaptive quantization in section 3, VP8 supports the adjustment of
loop filter strength for each segment. Fig. 4 shows an example
where the encoder can adapt the filtering strength based on content.
7. ENTROPY CODING AND DATA PARTITIONING
Except for very few header bits that are coded directly as raw
values, the majority of compressed VP8 data values are coded
VP8
H.264 High Profile
Night 720p
2000kbps
Sheriff 720p
2000kbps
Tulip 720p
2000kbps
45
VP8
H.264 High Profile
40
35
30
25
20
Night 720p
2000kbps
Sheriff 720p
2000kbps
Tulip 720
2000kbps
ACKNOWLEDGEMENTS
The authors would like to thank Prof. Bastiaan Kleijn at the School
of Electrical Engineering at KTH (the Royal Institute of
Technology) in Stockholm, Sweden for reviewing early drafts, the
paper has benefited a great deal from his valuable feedback. This
paper has also benefited from helpful comments by our colleagues
at Google, Dr. Pascal Massimino and Mr. John Luther, as well as
insightful comments from four anonymous reviewers.
REFERENCE
[1] J. Bankoski, P. Wilkins, Y. Xu, VP8 Data Format and
Decoding Guide, http://www.ietf.org/internet-drafts/draftbankoski-vp8-bitstream-01.txt, Jan 2011.
(b) Highway
[2]
[3]
(c) Pamphlet
(d) Deadline
Fig. 7. Encoding quality test results