Video Processing Communications Yao Wang Chapter13b
Video Processing Communications Yao Wang Chapter13b
Video Processing Communications Yao Wang Chapter13b
(Chapter 13)
Outline
Overview of Standards and Their Applications ITU-T (International telecommunication Union) Standards for Audio-Visual Communications
H.261 H.263 H.263+, H.263++
MPEG
International Electromechanical Commission (IEC) 1906 International Organization for Standardization (ISO) 1947 Joint ISO/IEC Technical Commission 1 (JTC1) on Information Technology Subcommittee 24: Computer Graphics and Image Processing VRML Subcommittee 26: Coding - MPEG
http://www.iso.org/iso/en/ISOOnline.frontpage
3
Standards
Application
Video Format
Compressed Data Rate >=384 Kbps >=64 Kbps >=64 Kbps >=18 Kbps 1.5 Mbps 3-10 Mbps 28-1024 Kbps
H.320 (H.261) H.323 (H.263) H.324 (H.263) MPEG-1 MPEG-2 MPEG-4 GA-HDTV MPEG-7
Video conferencing over ISDN Video conferencing over Internet Video over phone lines/ wireless Video distribution on CD/ WWW Video distribution on DVD / digital TV Multimedia distribution over Inter/Intra net HDTV broadcasting Multimedia databases (content description and retrieval)
CIF QCIF 4CIF/ CIF/ QCIF QCIF CIF CCIR601 4:2:0 QCIF/CIF SMPTE296/295
<=700 Mbps
18--45 Mbps
MPEG-1 Overview
Target near-VHS quality Audio/video on CD-ROM (1.5 Mbps, at the time access rate for CD-ROM). Implementation of VCR-like interactivity (fast forward, random access, etc) Start late 1988, test in 10/89, Committee Draft 9/90 ISO/IEC 11172-1~5 (Parts: Systems, video, audio, compliance, software). Prompted explosion of digital video applications: MPEG1 video CD and downloadable video over Internet Software only decoding, made possible by the introduction of Pentium chips, key to the success in the commercial market MPEG-1 Audio Offers 3 coding options (3 layers), higher layer have higher coding efficiency with more computations MP3 = MPEG1 layer 3 audio
Important for handling uncovered regions Using perceptual-based quantization matrix for I-blocks (same as JPEG) DC coefficients coded predictively
MPEG-1 video
Constrained Parameter Set (CPS) a special subset of the coding parameters: due to large range of choices supported by MPEG-1 CPS: limited set of sampling and bit rate parameters Limits decoder complexity, buffer size and memory bandwidth E.g. 4Mb of RAM for decoder Flag in the bit stream indicates whether it is CPS
1 GOP
4 5
6 7
P B
B B
Encoding order: 4
8 5
6 7
8
MPEG2 Overview
A/V broadcast (TV, HDTV, Terrestrial, Cable, Satellite, High Speed Inter/Intranet) as well as DVD video 4~8 Mbps for TV quality, 10-15 for better quality at SDTV resolutions (BT.601) 18-45 Mbps for HDTV applications
MPEG-2 video high profile at high level is the video coding standard used in HDTV
Test in 11/91, Committee Draft 11/93 ISO/IEC 13818-1~6 (Systems, video, audio, compliance, software, DSM-CC) Consist of various profiles and levels Backward compatible with MPEG1 MPEG-2 Audio
Support 5.1 channel MPEG2 AAC: requires 30% fewer bits than MPEG1 layer 3
10
MPEG1 only handles progressive sequences (SIF). MPEG2 is targeted primarily at interlaced sequences and at higher resolution (BT.601 = 4CIF). More sophisticated motion estimation methods (frame/field prediction mode) are developed to improve estimation accuracy for interlaced sequences. Different DCT modes and scanning methods are developed for interlaced sequences. MPEG2 has various scalability modes. MPEG2 has various profiles and levels, each combination targeted for different application
11
MPEG2 Video
12
MPEG2 Video
13
14
Field prediction for field pictures Field prediction for frame pictures Field DCT MPEG-2 alternative scan
15
16
17
DCT Modes
New prediction modes for better motion compensation related to interlaced video
18
MPEG-2 Scalability
Data partition
All headers, MVs, first few DCT coefficients in the high-priority base layer The rest of DCT coefficient in the low-priority enhancement layer(s) Can be implemented at the bit stream level Simple
SNR scalability
Base layer includes coarsely quantized DCT coefficients Enhancement layer further quantizes the base layer quantization error Relatively simple
Spatial scalability
Multiple complete encoders, complex (skip)
Temporal scalability
Base layer using lower frame rate Enhanced layer using higher frame rate Simple
19
20
21
22
23
24
Profiles: tools Levels: parameter range for a given profile Main profile at main level (mp@ml) is the most popular, used for digital TV Main profile at high level (mp@hl): HDTV 4:2:2 at main level (4:2:2@ml) is used for studio production
25
MPEG-4 Overview
Scalability of contents Error resilience Coding of both natural and synthetic audio and video
26
The displayed scene is composed by the receiver based on desired view angle and objects of interests
27
Object-Based Coding
MPEG-4 assumes the encoder has a segmentation map available, specifies how to code (actually decode!) shape, motion and texture
28
VO
VOL1
VOL2
VOP1
VOP2
VOP3
VOP4
VOP1
VOL1
VOP2
VOP3
VOL2
Bitmap coding
Run-length coding Pel-wise coding using context-based arithmetic coding Quadtree coding
Contour coding
Chain coding Fourier descriptors Polygon approximation
31
32
33
Boundary blocks binary alpha map (binary alpha block) is coded using context-based arithmetic coding
Intra-mode: context pels within the same frame Inter-mode: context pels include previous frame, displaced by MV
Shape MV separate from texture MV Shape MV predictively coded using texture MV
Grayscale alpha maps are coded using DCT Texture in boundary blocks coded using
padding followed by conventional DCT Or shape-adaptive DCT
34
MPEG-4 defines still texture coding method for intra frame, sprite, or texture map of an mesh object Use wavelet based coding method
36
Mesh Animation
An object can be described by an initial mesh and MVs of the nodes in the following frames MPEG-4 defines coding of mesh geometry, but not mesh generation
37
MPEG-4 defines a default 3-D body model (including its geometry and possible motion) through body definition table (BDP) The body can be animated using the body animation parameters (BAP) Similarly, face definition table (FDP) and face animation parameters (FAP) are specified for a face model and its animation
38
Face Animation
39
Face Animation
40
41
Face player
Customizable MPEG-4 face player using real-time 2D image sequence Shin, M.C. Goldgof, D. Kim, C. Jialin Zhong Dongbai Guo Univ. of South Florida, Tampa, FL; This paper appears in: Applications of Computer Vision, 2000, Fifth IEEE Workshop on. Publication Date: 2000 On page(s): 91-97 Meeting Date: 12/04/2000 - 12/06/2000 Location: Palm Springs, CA, USA ISBN: 0-7695-0813-8 References Cited: 12 INSPEC Accession Number: 6806441 Digital Object Identifier: 10.1109/WACV.2000.895408 Abstract This paper presents a framework for a customizable MPEG-4 face player using a FAPs (Facial Animation Parameters) sequence recovered from a real-time image sequence. First, the 3D nonrigid motion and structure of the facial features is recovered from a 2D image sequence and a personspecific model of the face. The model consists of the intensity and range image of the face. Then, the FAPs are computed from the recovered 3D structure. The customizable MPEG4 face animation is generated using a FAP sequence and a model of a specific person. The dataset of four face image sequences from three different face orientations equipped with range images are used. The GT (ground truth) results are generated using the 3D structure provided by range images. The results are evaluated quantitatively by comparing recovered FAP values, and qualitatively by comparing the generated MPEG-4 animations. FAPs are recovered up to 8% (relative) and 16 FAP units (absolute) accuracy and the animation is nearly identical to the MPEG-4 animation using GT FAPs
42
Face player
43
Face player
44
45
MPEG-4 Profiles
Elaborate Structure of profiles: see pages 452-453 for some sample profiles
46
47
MPEG-7 Overview
MPEG-1/2/4 make content available, whereas MPEG-7 allows you to find the content you need!
Enable multimedia document indexing, browsing, and retrieval Define the syntax for the metadata (e.g. index and summary) attached to the document Generation of index and summary is not part of the standard!
48
49
Color
Histogram, dominant color, etc.
Texture
Homogeneity: energy in different orientation and frequency bands (Gabor transform) Coarseness, directionarity, regularity Edge orientation histogram
Motion
Camera motion Motion trajectory of feature points in non-rigid object Motion parameters of a rigid object Motion activity
Shape
Boundary-based vs. region-based
51
MPEG-21
The MPEG-21 standard, from the Moving Picture Experts Group aims at defining an open framework for multimedia applications. ISO 21000. Specifically, MPEG-21 defines a "Rights Expression Language" standard as means of sharing digital rights/permissions/restrictions for digital content from content creator to content consumer. As an XML-based standard, MPEG-21 is designed to communicate machinereadable license information and do so in an "ubiquitous, unambiguous and secure" manner. Among the aspirations for this standard that the industry hopes will put an end to illicit file sharing is that it will constitute: "A normative open framework for multimedia delivery and consumption for use by all the players in the delivery and consumption chain. MPEG-21 is based on two essential concepts: the definition of a fundamental unit of distribution and transaction, which is the Digital Item, and the concept of users interacting with them. At its most basic level, MPEG-21 provides a framework in which one user interacts with another one, and the object of that interaction is a Digital Item.
52
MPEG-21
53
Summary
H.261: First video coding standard, targeted for video conferencing over ISDN Uses block-based hybrid coding framework with integer-pel MC H.263: Improved quality at lower bit rate, to enable video conferencing/telephony below 54 bkps (modems or internet access, desktop conferencing) Half-pel MC and other improvement MPEG-1 video Video on CD and video on the Internet (good quality at 1.5 mbps) Half-pel MC and bidirectional MC MPEG-2 video TV/HDTV/DVD (4-15 mbps) Extended from MPEG-1, considering interlaced video
54
Summary
MPEG-4 To enable object manipulation and scene composition at the decoder -> interactive TV/virtual reality Object-based video coding: shape coding Coding of synthetic video and audio: animation MPEG-7 To enable search and browsing of multimedia documents Defines the syntax for describing the structural and conceptual content Newer standards H.264: improved coding efficiency (by having more options for optimization) MPEG-21: beyond MPEG-7, considering intellectual property protection, etc.
55