Video Processing Communications Yao Wang Chapter13b

Video Coding Standards
(Chapter 13)
Yao Wang Polytechnic University, Brooklyn, NY11201
Outline
Overview of Standards and Their Applications ITU-T (International telecommunication Union) Standards for Audio-Visual Communications
H.261 H.263 H.263+, H.263++
ISO (International Organization for Standardization) Standards for

MPEG-1 MPEG-2 MPEG-4 MPEG-7
MPEG
International Electromechanical Commission (IEC) 1906 International Organization for Standardization (ISO) 1947 Joint ISO/IEC Technical Commission 1 (JTC1) on Information Technology Subcommittee 24: Computer Graphics and Image Processing VRML Subcommittee 26: Coding - MPEG
http://www.iso.org/iso/en/ISOOnline.frontpage
3
Multimedia Communications Standards and Applications
Standards
Application
Video Format
Raw Data Rate
Compressed Data Rate >=384 Kbps >=64 Kbps >=64 Kbps >=18 Kbps 1.5 Mbps 3-10 Mbps 28-1024 Kbps
H.320 (H.261) H.323 (H.263) H.324 (H.263) MPEG-1 MPEG-2 MPEG-4 GA-HDTV MPEG-7
Video conferencing over ISDN Video conferencing over Internet Video over phone lines/ wireless Video distribution on CD/ WWW Video distribution on DVD / digital TV Multimedia distribution over Inter/Intra net HDTV broadcasting Multimedia databases (content description and retrieval)
CIF QCIF 4CIF/ CIF/ QCIF QCIF CIF CCIR601 4:2:0 QCIF/CIF SMPTE296/295
37 Mbps 9.1 Mbps
9.1 Mbps 30 Mbps 128 Mbps
<=700 Mbps
18--45 Mbps
MPEG-1 Overview
Target near-VHS quality Audio/video on CD-ROM (1.5 Mbps, at the time access rate for CD-ROM). Implementation of VCR-like interactivity (fast forward, random access, etc) Start late 1988, test in 10/89, Committee Draft 9/90 ISO/IEC 11172-1~5 (Parts: Systems, video, audio, compliance, software). Prompted explosion of digital video applications: MPEG1 video CD and downloadable video over Internet Software only decoding, made possible by the introduction of Pentium chips, key to the success in the commercial market MPEG-1 Audio Offers 3 coding options (3 layers), higher layer have higher coding efficiency with more computations MP3 = MPEG1 layer 3 audio
MPEG-1 video vs H.261

Developed at about the same time Must enable random access (Fast forward/rewind)
Using GOP structure with periodic I-picture and P-picture
Not for interactive applications

Do not have as stringent delay requirement
Fixed rate (1.5 Mbps), good quality (VHS equivalent)

SIF video format (similar to CIF) CIF: 352x288, SIF: 352x240 Using more advanced motion compensation Half-pel accuracy motion estimation, range up to +/- 64 No loop filter (no need due to higher accuracy) Using bi-directional temporal prediction
P and B frames
Important for handling uncovered regions Using perceptual-based quantization matrix for I-blocks (same as JPEG) DC coefficients coded predictively
MPEG-1 video
Constrained Parameter Set (CPS) a special subset of the coding parameters: due to large range of choices supported by MPEG-1 CPS: limited set of sampling and bit rate parameters Limits decoder complexity, buffer size and memory bandwidth E.g. 4Mb of RAM for decoder Flag in the bit stream indicates whether it is CPS
Group of Picture Structure in MPEG
1 GOP
4 5
6 7
P B
B B
Encoding order: 4
8 5
6 7
8
MPEG-1 Video Encoder
MPEG2 Overview
A/V broadcast (TV, HDTV, Terrestrial, Cable, Satellite, High Speed Inter/Intranet) as well as DVD video 4~8 Mbps for TV quality, 10-15 for better quality at SDTV resolutions (BT.601) 18-45 Mbps for HDTV applications
MPEG-2 video high profile at high level is the video coding standard used in HDTV
Test in 11/91, Committee Draft 11/93 ISO/IEC 13818-1~6 (Systems, video, audio, compliance, software, DSM-CC) Consist of various profiles and levels Backward compatible with MPEG1 MPEG-2 Audio
Support 5.1 channel MPEG2 AAC: requires 30% fewer bits than MPEG1 layer 3
10
MPEG2 vs. MPEG1 Video
MPEG1 only handles progressive sequences (SIF). MPEG2 is targeted primarily at interlaced sequences and at higher resolution (BT.601 = 4CIF). More sophisticated motion estimation methods (frame/field prediction mode) are developed to improve estimation accuracy for interlaced sequences. Different DCT modes and scanning methods are developed for interlaced sequences. MPEG2 has various scalability modes. MPEG2 has various profiles and levels, each combination targeted for different application
11
MPEG2 Video
12
MPEG2 Video
13
Frame vs. Field Picture
14
Motion Compensation for Interlaced Video

New prediction modes for better motion compensation related to interlaced video
Field prediction for field pictures Field prediction for frame pictures Field DCT MPEG-2 alternative scan
15
Field prediction for field pictures

Each field is predicted individually from the reference fields
A P-field is predicted from one previous field A B-field is predicted from two fields chosen from two reference pictures
16
Field Prediction for Frame Pictures

17
DCT Modes
Two types of DCT and two types of scan pattern:

Frame DCT: divides an MB into 4 blocks for Lum, as usual Field DCT: reorder pixels in an MB into top and bottom fields.
18
MPEG-2 Scalability
Data partition
All headers, MVs, first few DCT coefficients in the high-priority base layer The rest of DCT coefficient in the low-priority enhancement layer(s) Can be implemented at the bit stream level Simple
SNR scalability
Base layer includes coarsely quantized DCT coefficients Enhancement layer further quantizes the base layer quantization error Relatively simple
Spatial scalability
Multiple complete encoders, complex (skip)
Temporal scalability
Base layer using lower frame rate Enhanced layer using higher frame rate Simple
19
Data Partitioning Codec
20
SNR Scalability Encoder
21
Spatial Scalability Codec
22
Temporal Scalability: Option 1
23
Temporal Scalability: Option 2
24
Profiles and Levels in MPEG-2

Profiles extend MPEG-1 concept of a constrain parameter set
Profiles: tools Levels: parameter range for a given profile Main profile at main level (mp@ml) is the most popular, used for digital TV Main profile at high level (mp@hl): HDTV 4:2:2 at main level (4:2:2@ml) is used for studio production
25
MPEG-4 Overview
Functionalities beyond MPEG-1/2

Interaction with individual objects
The displayed scene can be composed by the receiver from coded objects
Scalability of contents Error resilience Coding of both natural and synthetic audio and video
26
The displayed scene is composed by the receiver based on desired view angle and objects of interests
27
Object-Based Coding
Entire scene is decomposed into multiple objects

Object segmentation is the most difficult task! But this does not need to be standardized
Each object is specified by its shape, motion, and texture (color)

Shape and texture both changes in time (specified by motion)
MPEG-4 assumes the encoder has a segmentation map available, specifies how to code (actually decode!) shape, motion and texture
28
Object Description Hierarchy in MPEG-4
VO
VOL1
VOL2
VOP1
VOP2
VOP3
VOP4
VO: video object VOL: video object layer

(can be different parts of a VO or different rate/resolution representation of a VOL)
VOP: video object plane

29
Example of Scene Composition
VOP1
VOL1
VOP2
VOP3
VOL2
The decoder can compose a scene by including different VOPs in a VOL

30
Shape Coding Methods
Shape is specified by alpha maps

Binary alpha map: specifies whether a pel belongs to an object Gray scale alpha map: a pel belong to the object can have a transparency value in the range (0-255)
Bitmap coding
Run-length coding Pel-wise coding using context-based arithmetic coding Quadtree coding
Contour coding
Chain coding Fourier descriptors Polygon approximation
31
Quadtree Shape Coding
32
Chain Coding and Differential Chain Coding
33
MPEG-4 Shape Coding
Uses block-based approach (block=MB)

Boundary blocks (blocks containing both the object and background) Non-boundary blocks: either belong to the object or background
Boundary blocks binary alpha map (binary alpha block) is coded using context-based arithmetic coding
Intra-mode: context pels within the same frame Inter-mode: context pels include previous frame, displaced by MV
Shape MV separate from texture MV Shape MV predictively coded using texture MV
Grayscale alpha maps are coded using DCT Texture in boundary blocks coded using
padding followed by conventional DCT Or shape-adaptive DCT
34
MPEG4 video coder overview
Details of parameter coding
Still Texture Coding
MPEG-4 defines still texture coding method for intra frame, sprite, or texture map of an mesh object Use wavelet based coding method
36
Mesh Animation
An object can be described by an initial mesh and MVs of the nodes in the following frames MPEG-4 defines coding of mesh geometry, but not mesh generation
37
Body and Face Animation
MPEG-4 defines a default 3-D body model (including its geometry and possible motion) through body definition table (BDP) The body can be animated using the body animation parameters (BAP) Similarly, face definition table (FDP) and face animation parameters (FAP) are specified for a face model and its animation
38
Face Animation
39
Face Animation
40
Face Animation Through FAP
41
Face player
Customizable MPEG-4 face player using real-time 2D image sequence Shin, M.C. Goldgof, D. Kim, C. Jialin Zhong Dongbai Guo Univ. of South Florida, Tampa, FL; This paper appears in: Applications of Computer Vision, 2000, Fifth IEEE Workshop on. Publication Date: 2000 On page(s): 91-97 Meeting Date: 12/04/2000 - 12/06/2000 Location: Palm Springs, CA, USA ISBN: 0-7695-0813-8 References Cited: 12 INSPEC Accession Number: 6806441 Digital Object Identifier: 10.1109/WACV.2000.895408 Abstract This paper presents a framework for a customizable MPEG-4 face player using a FAPs (Facial Animation Parameters) sequence recovered from a real-time image sequence. First, the 3D nonrigid motion and structure of the facial features is recovered from a 2D image sequence and a personspecific model of the face. The model consists of the intensity and range image of the face. Then, the FAPs are computed from the recovered 3D structure. The customizable MPEG4 face animation is generated using a FAP sequence and a model of a specific person. The dataset of four face image sequences from three different face orientations equipped with range images are used. The GT (ground truth) results are generated using the 3D structure provided by range images. The results are evaluated quantitatively by comparing recovered FAP values, and qualitatively by comparing the generated MPEG-4 animations. FAPs are recovered up to 8% (relative) and 16 FAP units (absolute) accuracy and the animation is nearly identical to the MPEG-4 animation using GT FAPs
42
Face player
43
Face player
44
Text-to-Speech Synthesis with Face Animation
45
MPEG-4 Profiles
Elaborate Structure of profiles: see pages 452-453 for some sample profiles
46
MPEG-4 vs. MPEG-1 Coding Efficiency
47
MPEG-7 Overview
MPEG-1/2/4 make content available, whereas MPEG-7 allows you to find the content you need!
Enable multimedia document indexing, browsing, and retrieval Define the syntax for the metadata (e.g. index and summary) attached to the document Generation of index and summary is not part of the standard!
Content description in MPEG-7

Descriptor (D): describing low-level features Description scheme (DS): combining Ds to describe high-level features/structures Description definition language (DDL): define how Ds and DSs can be defined or modified System tools
48
Multimedia Description Scheme Overview
49
Content description: segment tree and event tree
Segment tree = table of contents
Event tree = index
MPEG-7 Visual Descriptors
Color
Histogram, dominant color, etc.
Texture
Homogeneity: energy in different orientation and frequency bands (Gabor transform) Coarseness, directionarity, regularity Edge orientation histogram
Motion
Camera motion Motion trajectory of feature points in non-rigid object Motion parameters of a rigid object Motion activity
Shape
Boundary-based vs. region-based
51
MPEG-21
The MPEG-21 standard, from the Moving Picture Experts Group aims at defining an open framework for multimedia applications. ISO 21000. Specifically, MPEG-21 defines a "Rights Expression Language" standard as means of sharing digital rights/permissions/restrictions for digital content from content creator to content consumer. As an XML-based standard, MPEG-21 is designed to communicate machinereadable license information and do so in an "ubiquitous, unambiguous and secure" manner. Among the aspirations for this standard that the industry hopes will put an end to illicit file sharing is that it will constitute: "A normative open framework for multimedia delivery and consumption for use by all the players in the delivery and consumption chain. MPEG-21 is based on two essential concepts: the definition of a fundamental unit of distribution and transaction, which is the Digital Item, and the concept of users interacting with them. At its most basic level, MPEG-21 provides a framework in which one user interacts with another one, and the object of that interaction is a Digital Item.
52
MPEG-21
53
Summary
H.261: First video coding standard, targeted for video conferencing over ISDN Uses block-based hybrid coding framework with integer-pel MC H.263: Improved quality at lower bit rate, to enable video conferencing/telephony below 54 bkps (modems or internet access, desktop conferencing) Half-pel MC and other improvement MPEG-1 video Video on CD and video on the Internet (good quality at 1.5 mbps) Half-pel MC and bidirectional MC MPEG-2 video TV/HDTV/DVD (4-15 mbps) Extended from MPEG-1, considering interlaced video
54
Summary
MPEG-4 To enable object manipulation and scene composition at the decoder -> interactive TV/virtual reality Object-based video coding: shape coding Coding of synthetic video and audio: animation MPEG-7 To enable search and browsing of multimedia documents Defines the syntax for describing the structural and conceptual content Newer standards H.264: improved coding efficiency (by having more options for optimization) MPEG-21: beyond MPEG-7, considering intellectual property protection, etc.
55

Video Processing Communications Yao Wang Chapter13b

Uploaded by

Copyright:

Available Formats

Video Processing Communications Yao Wang Chapter13b

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Video Processing Communications Yao Wang Chapter13b

Uploaded by

Copyright:

Available Formats

Video Coding Standards

Yao Wang Polytechnic University, Brooklyn, NY11201

ISO (International Organization for Standardization) Standards for

Multimedia Communications Standards and Applications

Raw Data Rate

37 Mbps 9.1 Mbps

9.1 Mbps 30 Mbps 128 Mbps

MPEG-1 video vs H.261

Not for interactive applications

Fixed rate (1.5 Mbps), good quality (VHS equivalent)

Group of Picture Structure in MPEG

MPEG-1 Video Encoder

MPEG2 vs. MPEG1 Video

Frame vs. Field Picture

Motion Compensation for Interlaced Video

Field prediction for field pictures

Field Prediction for Frame Pictures

Two types of DCT and two types of scan pattern:

Data Partitioning Codec

SNR Scalability Encoder

Spatial Scalability Codec

Temporal Scalability: Option 1

Temporal Scalability: Option 2

Profiles and Levels in MPEG-2

Functionalities beyond MPEG-1/2

Entire scene is decomposed into multiple objects

Each object is specified by its shape, motion, and texture (color)

Object Description Hierarchy in MPEG-4

VO: video object VOL: video object layer

VOP: video object plane

Example of Scene Composition

The decoder can compose a scene by including different VOPs in a VOL

Shape Coding Methods

Shape is specified by alpha maps

Quadtree Shape Coding

Chain Coding and Differential Chain Coding

MPEG-4 Shape Coding

Uses block-based approach (block=MB)

MPEG4 video coder overview

Details of parameter coding

Still Texture Coding

Body and Face Animation

Face Animation Through FAP

Text-to-Speech Synthesis with Face Animation

MPEG-4 vs. MPEG-1 Coding Efficiency

Content description in MPEG-7

Multimedia Description Scheme Overview

Content description: segment tree and event tree

Segment tree = table of contents

Event tree = index

MPEG-7 Visual Descriptors

You might also like