Video Processing Communications Yao Wang Chapter13b

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 55

Video Coding Standards

(Chapter 13)

Yao Wang Polytechnic University, Brooklyn, NY11201

Outline

Overview of Standards and Their Applications ITU-T (International telecommunication Union) Standards for Audio-Visual Communications
H.261 H.263 H.263+, H.263++

ISO (International Organization for Standardization) Standards for


MPEG-1 MPEG-2 MPEG-4 MPEG-7

MPEG

International Electromechanical Commission (IEC) 1906 International Organization for Standardization (ISO) 1947 Joint ISO/IEC Technical Commission 1 (JTC1) on Information Technology Subcommittee 24: Computer Graphics and Image Processing VRML Subcommittee 26: Coding - MPEG

http://www.iso.org/iso/en/ISOOnline.frontpage
3

Multimedia Communications Standards and Applications

Standards

Application

Video Format

Raw Data Rate

Compressed Data Rate >=384 Kbps >=64 Kbps >=64 Kbps >=18 Kbps 1.5 Mbps 3-10 Mbps 28-1024 Kbps

H.320 (H.261) H.323 (H.263) H.324 (H.263) MPEG-1 MPEG-2 MPEG-4 GA-HDTV MPEG-7

Video conferencing over ISDN Video conferencing over Internet Video over phone lines/ wireless Video distribution on CD/ WWW Video distribution on DVD / digital TV Multimedia distribution over Inter/Intra net HDTV broadcasting Multimedia databases (content description and retrieval)

CIF QCIF 4CIF/ CIF/ QCIF QCIF CIF CCIR601 4:2:0 QCIF/CIF SMPTE296/295

37 Mbps 9.1 Mbps

9.1 Mbps 30 Mbps 128 Mbps

<=700 Mbps

18--45 Mbps

MPEG-1 Overview

Target near-VHS quality Audio/video on CD-ROM (1.5 Mbps, at the time access rate for CD-ROM). Implementation of VCR-like interactivity (fast forward, random access, etc) Start late 1988, test in 10/89, Committee Draft 9/90 ISO/IEC 11172-1~5 (Parts: Systems, video, audio, compliance, software). Prompted explosion of digital video applications: MPEG1 video CD and downloadable video over Internet Software only decoding, made possible by the introduction of Pentium chips, key to the success in the commercial market MPEG-1 Audio Offers 3 coding options (3 layers), higher layer have higher coding efficiency with more computations MP3 = MPEG1 layer 3 audio

MPEG-1 video vs H.261


Developed at about the same time Must enable random access (Fast forward/rewind)
Using GOP structure with periodic I-picture and P-picture

Not for interactive applications


Do not have as stringent delay requirement

Fixed rate (1.5 Mbps), good quality (VHS equivalent)


SIF video format (similar to CIF) CIF: 352x288, SIF: 352x240 Using more advanced motion compensation Half-pel accuracy motion estimation, range up to +/- 64 No loop filter (no need due to higher accuracy) Using bi-directional temporal prediction
P and B frames

Important for handling uncovered regions Using perceptual-based quantization matrix for I-blocks (same as JPEG) DC coefficients coded predictively

MPEG-1 video

Constrained Parameter Set (CPS) a special subset of the coding parameters: due to large range of choices supported by MPEG-1 CPS: limited set of sampling and bit rate parameters Limits decoder complexity, buffer size and memory bandwidth E.g. 4Mb of RAM for decoder Flag in the bit stream indicates whether it is CPS

Group of Picture Structure in MPEG

1 GOP

4 5

6 7

P B

B B

Encoding order: 4

8 5

6 7
8

MPEG-1 Video Encoder

MPEG2 Overview

A/V broadcast (TV, HDTV, Terrestrial, Cable, Satellite, High Speed Inter/Intranet) as well as DVD video 4~8 Mbps for TV quality, 10-15 for better quality at SDTV resolutions (BT.601) 18-45 Mbps for HDTV applications
MPEG-2 video high profile at high level is the video coding standard used in HDTV

Test in 11/91, Committee Draft 11/93 ISO/IEC 13818-1~6 (Systems, video, audio, compliance, software, DSM-CC) Consist of various profiles and levels Backward compatible with MPEG1 MPEG-2 Audio
Support 5.1 channel MPEG2 AAC: requires 30% fewer bits than MPEG1 layer 3

10

MPEG2 vs. MPEG1 Video

MPEG1 only handles progressive sequences (SIF). MPEG2 is targeted primarily at interlaced sequences and at higher resolution (BT.601 = 4CIF). More sophisticated motion estimation methods (frame/field prediction mode) are developed to improve estimation accuracy for interlaced sequences. Different DCT modes and scanning methods are developed for interlaced sequences. MPEG2 has various scalability modes. MPEG2 has various profiles and levels, each combination targeted for different application

11

MPEG2 Video

12

MPEG2 Video

13

Frame vs. Field Picture

14

Motion Compensation for Interlaced Video


New prediction modes for better motion compensation related to interlaced video

Field prediction for field pictures Field prediction for frame pictures Field DCT MPEG-2 alternative scan

15

Field prediction for field pictures


New prediction modes for better motion compensation related to interlaced video
Each field is predicted individually from the reference fields
A P-field is predicted from one previous field A B-field is predicted from two fields chosen from two reference pictures

16

Field Prediction for Frame Pictures


New prediction modes for better motion compensation related to interlaced video

17

DCT Modes
New prediction modes for better motion compensation related to interlaced video

Two types of DCT and two types of scan pattern:


Frame DCT: divides an MB into 4 blocks for Lum, as usual Field DCT: reorder pixels in an MB into top and bottom fields.

18

MPEG-2 Scalability
Data partition
All headers, MVs, first few DCT coefficients in the high-priority base layer The rest of DCT coefficient in the low-priority enhancement layer(s) Can be implemented at the bit stream level Simple

SNR scalability
Base layer includes coarsely quantized DCT coefficients Enhancement layer further quantizes the base layer quantization error Relatively simple

Spatial scalability
Multiple complete encoders, complex (skip)

Temporal scalability
Base layer using lower frame rate Enhanced layer using higher frame rate Simple

19

Data Partitioning Codec

20

SNR Scalability Encoder

21

Spatial Scalability Codec

22

Temporal Scalability: Option 1

23

Temporal Scalability: Option 2

24

Profiles and Levels in MPEG-2


Profiles extend MPEG-1 concept of a constrain parameter set

Profiles: tools Levels: parameter range for a given profile Main profile at main level (mp@ml) is the most popular, used for digital TV Main profile at high level (mp@hl): HDTV 4:2:2 at main level (4:2:2@ml) is used for studio production

25

MPEG-4 Overview

Functionalities beyond MPEG-1/2


Interaction with individual objects
The displayed scene can be composed by the receiver from coded objects

Scalability of contents Error resilience Coding of both natural and synthetic audio and video

26

The displayed scene is composed by the receiver based on desired view angle and objects of interests
27

Object-Based Coding

Entire scene is decomposed into multiple objects


Object segmentation is the most difficult task! But this does not need to be standardized

Each object is specified by its shape, motion, and texture (color)


Shape and texture both changes in time (specified by motion)

MPEG-4 assumes the encoder has a segmentation map available, specifies how to code (actually decode!) shape, motion and texture

28

Object Description Hierarchy in MPEG-4

VO

VOL1

VOL2

VOP1

VOP2

VOP3

VOP4

VO: video object VOL: video object layer


(can be different parts of a VO or different rate/resolution representation of a VOL)

VOP: video object plane


29

Example of Scene Composition

VOP1

VOL1

VOP2

VOP3

VOL2

The decoder can compose a scene by including different VOPs in a VOL


30

Shape Coding Methods

Shape is specified by alpha maps


Binary alpha map: specifies whether a pel belongs to an object Gray scale alpha map: a pel belong to the object can have a transparency value in the range (0-255)

Bitmap coding
Run-length coding Pel-wise coding using context-based arithmetic coding Quadtree coding

Contour coding
Chain coding Fourier descriptors Polygon approximation

31

Quadtree Shape Coding

32

Chain Coding and Differential Chain Coding

33

MPEG-4 Shape Coding

Uses block-based approach (block=MB)


Boundary blocks (blocks containing both the object and background) Non-boundary blocks: either belong to the object or background

Boundary blocks binary alpha map (binary alpha block) is coded using context-based arithmetic coding
Intra-mode: context pels within the same frame Inter-mode: context pels include previous frame, displaced by MV
Shape MV separate from texture MV Shape MV predictively coded using texture MV

Grayscale alpha maps are coded using DCT Texture in boundary blocks coded using
padding followed by conventional DCT Or shape-adaptive DCT
34

MPEG4 video coder overview

Details of parameter coding

Still Texture Coding

MPEG-4 defines still texture coding method for intra frame, sprite, or texture map of an mesh object Use wavelet based coding method

36

Mesh Animation

An object can be described by an initial mesh and MVs of the nodes in the following frames MPEG-4 defines coding of mesh geometry, but not mesh generation

37

Body and Face Animation

MPEG-4 defines a default 3-D body model (including its geometry and possible motion) through body definition table (BDP) The body can be animated using the body animation parameters (BAP) Similarly, face definition table (FDP) and face animation parameters (FAP) are specified for a face model and its animation

38

Face Animation

39

Face Animation

40

Face Animation Through FAP

41

Face player
Customizable MPEG-4 face player using real-time 2D image sequence Shin, M.C. Goldgof, D. Kim, C. Jialin Zhong Dongbai Guo Univ. of South Florida, Tampa, FL; This paper appears in: Applications of Computer Vision, 2000, Fifth IEEE Workshop on. Publication Date: 2000 On page(s): 91-97 Meeting Date: 12/04/2000 - 12/06/2000 Location: Palm Springs, CA, USA ISBN: 0-7695-0813-8 References Cited: 12 INSPEC Accession Number: 6806441 Digital Object Identifier: 10.1109/WACV.2000.895408 Abstract This paper presents a framework for a customizable MPEG-4 face player using a FAPs (Facial Animation Parameters) sequence recovered from a real-time image sequence. First, the 3D nonrigid motion and structure of the facial features is recovered from a 2D image sequence and a personspecific model of the face. The model consists of the intensity and range image of the face. Then, the FAPs are computed from the recovered 3D structure. The customizable MPEG4 face animation is generated using a FAP sequence and a model of a specific person. The dataset of four face image sequences from three different face orientations equipped with range images are used. The GT (ground truth) results are generated using the 3D structure provided by range images. The results are evaluated quantitatively by comparing recovered FAP values, and qualitatively by comparing the generated MPEG-4 animations. FAPs are recovered up to 8% (relative) and 16 FAP units (absolute) accuracy and the animation is nearly identical to the MPEG-4 animation using GT FAPs

42

Face player

43

Face player

44

Text-to-Speech Synthesis with Face Animation

45

MPEG-4 Profiles
Elaborate Structure of profiles: see pages 452-453 for some sample profiles

46

MPEG-4 vs. MPEG-1 Coding Efficiency

47

MPEG-7 Overview

MPEG-1/2/4 make content available, whereas MPEG-7 allows you to find the content you need!
Enable multimedia document indexing, browsing, and retrieval Define the syntax for the metadata (e.g. index and summary) attached to the document Generation of index and summary is not part of the standard!

Content description in MPEG-7


Descriptor (D): describing low-level features Description scheme (DS): combining Ds to describe high-level features/structures Description definition language (DDL): define how Ds and DSs can be defined or modified System tools

48

Multimedia Description Scheme Overview

49

Content description: segment tree and event tree

Segment tree = table of contents

Event tree = index

MPEG-7 Visual Descriptors

Color
Histogram, dominant color, etc.

Texture
Homogeneity: energy in different orientation and frequency bands (Gabor transform) Coarseness, directionarity, regularity Edge orientation histogram

Motion
Camera motion Motion trajectory of feature points in non-rigid object Motion parameters of a rigid object Motion activity

Shape
Boundary-based vs. region-based

51

MPEG-21
The MPEG-21 standard, from the Moving Picture Experts Group aims at defining an open framework for multimedia applications. ISO 21000. Specifically, MPEG-21 defines a "Rights Expression Language" standard as means of sharing digital rights/permissions/restrictions for digital content from content creator to content consumer. As an XML-based standard, MPEG-21 is designed to communicate machinereadable license information and do so in an "ubiquitous, unambiguous and secure" manner. Among the aspirations for this standard that the industry hopes will put an end to illicit file sharing is that it will constitute: "A normative open framework for multimedia delivery and consumption for use by all the players in the delivery and consumption chain. MPEG-21 is based on two essential concepts: the definition of a fundamental unit of distribution and transaction, which is the Digital Item, and the concept of users interacting with them. At its most basic level, MPEG-21 provides a framework in which one user interacts with another one, and the object of that interaction is a Digital Item.

52

MPEG-21

53

Summary
H.261: First video coding standard, targeted for video conferencing over ISDN Uses block-based hybrid coding framework with integer-pel MC H.263: Improved quality at lower bit rate, to enable video conferencing/telephony below 54 bkps (modems or internet access, desktop conferencing) Half-pel MC and other improvement MPEG-1 video Video on CD and video on the Internet (good quality at 1.5 mbps) Half-pel MC and bidirectional MC MPEG-2 video TV/HDTV/DVD (4-15 mbps) Extended from MPEG-1, considering interlaced video

54

Summary
MPEG-4 To enable object manipulation and scene composition at the decoder -> interactive TV/virtual reality Object-based video coding: shape coding Coding of synthetic video and audio: animation MPEG-7 To enable search and browsing of multimedia documents Defines the syntax for describing the structural and conceptual content Newer standards H.264: improved coding efficiency (by having more options for optimization) MPEG-21: beyond MPEG-7, considering intellectual property protection, etc.

55

You might also like