videoprocessing4-240501171322-058694b4

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 32

INTRODUCTION TO VIDEO PROCESSING

Dr . P. MUKILAN
VIDEO SIGNAL

• Video signal is basically any sequence of time varying images.

• A still image is a spatial distribution of intensities that remain constant with time, whereas a time varying image has
a spatial intensity distribution that varies with time.

• Video signal is treated as a series of images called frames.

• An illusion of continuous video is obtained by changing the frames in a faster manner which is generally termed as
frame rate.
• Despite the advance of digital video technology, the most common consumer display mechanism for video still uses
analogue display devices such as CRT.

• Until all terrestrial and satellite broadcasts become digital, analogue video formats will remain significant.

• The three principal Analogue Video Signal formats are: NTSC (National Television Systems Committee), PAL (Phase
Alternate Line) and SECAM (Sequential Color with Memory).

• All the three are television video formats in which the information in each picture is captured by CCD or CRT is
scanned from left to right to create a sequential intensity signal.

• The formats take advantage of the persistence of human vision by using interlaced scanning pattern in which the odd
and even lines of each picture are read out in two separate scans of he odd and even fields respectively.

• This allows good reproduction of movement in the scene at the relatively low field rate of 50 fields/sec for PAL and
SECAM and 60 fields/sec for NTSC.
Progressive and Interlaced Scan Pattern
• Progressive scan patterns are used for high resolution displays like computer CRT monitors
Digital cinema projections.

• In progressive scan, each frame of picture information is scanned completely to create the
video signal.

• In interlaced scan pattern, the odd and even lines of each picture are read out in two
separate scans of the odd and even fields respectively.

• This allows good reproduction of movement in the scene at relatively low field rate.

• The progressive and interlaced scan patterns are shown in figure 1.


Digital Video
• In a digital video, the picture information is digitized both spatially and temporally and the resultant pixel
intensities are quantized.

• The block diagram depicting the process of obtaining digital video from continuous natural scene is shown in
figure 2.

• The demand for digital video is increasing in areas such as video teleconferencing, multimedia authoring
systems, education, and video-on-demand systems.
Spatial Sampling

• The sensitivity of Human Visual System (HVS) varies according to the spatial frequency of an image.

• In the digital representation of the image, the value of each pixel needs to be quantized using some finite precision.
In practice, 8 bits are used per luminance sample.

Temporal Sampling
• A video consists of a sequence of images, displayed in rapid succession, to give an illusion of continuous motion.

• If the time gap between successive frames is too large, the viewer will observe jerky motion.

• The sensitivity of HVS drops off significantly at high frame rates.

• In practice, most video formats use temporal sampling rates of 24 frames per second and above.
Video Formats
• Digital video consists of video frames that are displayed at a prescribed frame rate.

• A frame rate of 30 frames/sec is used in NTSC video.

• The frame format specifies the size of individual frames in terms of pixels.

• The Common Intermediate Format (CIF) has 352 x 288 pixels, and the Quarter CIF (QCIF) format has 176 x 144
pixels.

• Some of the commonly used video formats are given in table 1.

• Each pixel is represented by three components: the luminance component Y, and the two chrominance components
Cb and Cr.
Frame Type
• Three types of video frames are I-frame, P-frame and B-frame.

• ‘I’ stands for Intra coded frame, ‘P’ stands for Predictive frame and ‘B’ stands for Bidirectional predictive frame.

• ‘I’ frames are encoded without any motion compensation and are used as a reference for future predicted ‘P’ and ‘B’
type frames.

• ‘I’ frames however require a relatively large number of bits for encoding.

• ‘P’ frames are encoded using motion compensated prediction from a reference frame which can be either ‘I’ or ‘P’
frame.

• ‘P’ frames are more efficient in terms of number of bits required compared to ‘I’ frames, but still require more bits than
‘B’ frames.

• ‘B’ frames require the lowest number of bits compared to both ‘I’ and ‘P’ frames but incur computational complexity.
• Frames between two successive ‘I’ frames, including the leading ‘I’ frame, are collectively called as group of
pictures (GOP).

• The GOP is illustrated in figure 3.

• The illustrated figure has one ‘I’ frame, two ‘P’ frames and six ‘B’ frames.

• Typically, multiple ‘B’ frames are inserted between two consecutive ‘P’ or between ‘I’ and ‘P’ frames.

• The existence of GOPs facilitates the implementation of features such as random access, fast forward or fast and
normal reverse playback
Video Processing
• Video processing technology has revolutionized the world of multimedia with products such as Digital Versatile Disk
(DVD), the Digital Satellite System (DSS), high definition television (HDTV), digital still and video cameras.

• The different areas of video processing includes (i) Video Compression (ii) Video Indexing (iii) Video Segmentation
(iv) Video tracking etc.

Video Indexing

• Video indexing is necessary to facilitate efficient content-based retrieval and browsing of visual information stored in
large multimedia databases.

• To create an efficient index, a set of representative key frames are selected which capture and encapsulate the entire
video content.
Subsampling
• The basic concept of subsampling is to reduce the dimension of the input video (horizontal dimension and / or
vertical dimension) and thus the number of pels to be coded prior to encoding process.

• At the receiver the decoded images are interpolated for display.

• This technique may be considered as one of most elementary compression techniques which also makes use of
specific physiological characteristics of the human eye and thus removes subjective redundancy contained in the
video data.

• This concept is also used to explore subjective redundancies contained in chrominance data, i.e., human eye is
more sensitive to changes in brightness than to chromaticity changes.

• RGB format is not preferred because R, G, B components are correlated and transmitting R,G,B components
separately is redundant.
• To overcome this, the input image is divided into YUV components (one luminance and two chrominance
components).

• Next, the chrominance components are subsampled relative to luminance component with a Y:U:V ratio specific to
particular applications.

• Subsampling is denoted in the format X:X:X, where the first digits represent the number of luminance samples, used
as a reference and typically “4”.

• The second and third digits are the number of chrominance samples, with respect to the number of Y samples.

• For example, 4:1:1 means that for every four Y samples, there are one U and one V samples. 4:4:4 chrominance
format is shown in figure 4.
• The choice of the subsampling depends on application. Figure 5 illustrates the concept of 4:4:2 chrominance
subsampling.
Video Compression
• Video compression plays an important role in many digital video applications such as digital libraries, video on
demand, and high definition television.

• A video sequence with frame size of 176 X 144 pixels at 30 frames per second and 24 bits per pixel would require
18.25 Mbps, making it impractical to transmit the video sequence to transmit over standard telephone lines where data
rates are typically restricted to 56,000 bits per second.

• This example illustrates the need for video compression.

• Effective video compression can be achieved by minimizing both spatial and temporal redundancy.

• A video sequence consists of a series of frames. In order to compress the video for efficient storage and transmission,
the temporal redundancy among adjacent frames must be exploited.
• Temporal redundancy implies that adjacent frames are similar whereas spatial redundancy implies that neighboring
pixels are similar.

• Video coding translates video sequences into an efficient bitstream.

• This translation involves the removal of redundant information from video sequence.

• Video sequence contains two kinds of redundancies spatial and temporal.

• Removal of spatial redundancy is generally termed as interframe coding and removal of temporal redundancy is
termed as interframe coding.

• Video compression algorithms can be broadly classified into two types (i) Lossless video compression and (ii) Lossy
video compression.

• Due to its importance in multimedia applications, most of the algorithms in video compression has centered on lossy
video compression.

• Lossless video compression is important to applications in which the video quality cannot tolerate any degradation
such as archiving of a video, compression of medical and satellite videos etc.
Intraframe Coding
• Removing the spatial redundancy with a frame is generally termed as intraframe coding.

• The spatial redundancy within a frame is minimized by using transform.

• The commonly used transform is Discrete Cosine Transform.

Interframe Coding
• The temporal redundancy between successive frames is removed by interframe coding.

• Interframe coding exploits the interdependencies of video frames. Interframe coding relies on the fact that adjacent
pictures in a video sequence have high temporal correlation.

• To minimize the temporal correlation, a frame is selected as a reference, and subsequent frames are predicted from the
reference.

• The general block diagram of a video encoder is shown in figure 6. The explanation of different blocks are given below
Subsampling
• The basic concept of subsampling is to reduce the dimension of the input video (horizontal dimension and / or vertical
dimension) and thus the number of pels to be coded prior to encoding process.

• At the receiver the decoded images are interpolated for display.

Motion Estimation and Compensation


• Motion estimation describes the process of determining the motion between two or more frames in an image sequence.

• Motion compensation refers to the technique of predicting and reconstructing a frame using a given reference frame and
a set of motion parameters.

• Motion compensation can be performed once an estimate of motion is available.

• Motion estimation/compensation is not only used in the field of video compression but also in the field of spatio-temporal
segmentation, scene cut detection, frame rate conversion, de-interlacing, object tracking etc.

• Motion estimation and compensation have traditionally been performed using block- based methods.
Transform Coding
• Transform coding has been widely used to redundancy between data samples.

• In transform coding, a set of data samples is first linearly transformed into a set of transform coefficients.

• These coefficients are then quantized and entropy coded.

• A proper linear transform can de-correlate the input samples, and hence remove the redundancy.

• Another way to look at this is that a properly chosen transform can concentrate the energy of input samples into a
small number of transform coefficients, so that the resulting coefficients are easier to encode than the original
samples.

• The most commonly used transform for video coding is the discrete cosine transform.
• The DCT is a unitary transform, that is, the transformation preserves the energy of the signal.

• Unitary transforms pack a large portion of the energy of the image into relatively few components of the transform
coefficients.

• When the transform is applied to a block of pixels that are highly correlated, as in the case in a block of an image, the
transform coefficients tend to be uncorrelated.

• Block processing yields good results when he bits allocated to encoding the frame is enough to guarantee a good
reconstruction in the decoder.

• However, if the bit budget is limited, as in low data rate applications, blocking artifacts may be evident in the
reconstructed frame.

• This problem can be reduced by performing pre- and post-processing on the sequence.

• However, the visual quality can only be improved to a certain degree, and additional processing requires additional
resources from the encoder and decoder.

• Another approach to solve this problem is to use non-block-based transform.


Predictive Coding
• In interframe coding, the temporal redundancy of a video sequence is reduced by using motion estimation and motion
compensation techniques.

• There are two types of frames used in interframe coding: predictive-coded (P) frames, which are coded relative to a
temporally preceding ‘I’ or ‘P’ frame; and bidirectionally predictive-coded (B) frames, which are coded relative to the
nearest previous / or future ‘I’ and ‘P’ frames.

• The forward motion-compensated prediction and bidirectional motion compensated prediction are illustrated in figure 7
and 8 respectively.

• In forward prediction, one motion vector per macroblock is obtained. For bidirectional prediction, two motion vectors are
found.

• This motion vector specifies where to retrieve the macro-block from the reference frame.
• They offer the advantage of being fast, easy to implement and fairly effective over a wide range of video content.

• Block-based motion estimation is the most practical approach to obtain motion compensated prediction frames.

• It divides frames into equally sized rectangular blocks and finds out the displacement of the best-matched block from
previous frame as the motion vector to the block in the current frame within a search window.

• Based on block distortion measure or other matching criteria, the displacement of the best matched block will be
described as the motion vector to the block in the current frame.

• The best match is evaluated by a cost function such as Mean Square Error (MSE), Mean Absolute Error (MAE), or
Sum of Absolute Differences (SAD).
Motion Estimation Algorithm
• Compression techniques which reduce temporal redundancies are referred to as interframe techniques while those
reducing spatial redundancies are referred to as intraframe techniques.

• Motion estimation (ME) algorithms have been applied for the reduction of temporal redundancies.

• ME algorithms are originally developed for applications such as computer vision, image sequence analysis and video
coding.

• They can be categorized in the following main groups: gradient techniques, pel-recursive techniques, block matching
techniques, and frequency-domain techniques.

• Gradient techniques have been developed in the framework of image sequence coding.
Video Compression Standards
• Video coding standards define the bitstream syntax, the language that the encoder and the decoder used to
communicate.

• Besides defining the bitstream syntax, video coding standards are also required to be efficient, in that they should
support good compression algorithms as well as allow the efficient implementation of the encoder and decoder.

• Standardization of video compression standards has become a high priority because only a standard can reduce the
high cost of video compression codecs and resolve the critical problem of interoperability of equipments from
different vendors.

• Standardization of compression algorithms for video was first initiated by CCITT for teleconferencing and
videotelephony.
H.120:
•H.120 is the first international digital video coding standard.

•H.120 was developed by ITU- T organization.

•ITU-T stands for the International Telecommunications Union – Telecommunications


Standardization Sector.

• H.120 got its approval in 1984. In 1988, a second version of H.120 added motion compensation
and background prediction.
H.261:

•H.261 was approved by ITU-T in early 1991.

•It was later revised in 1993 to include backward-compatible high resolution graphics transfer mode.

•It is a coding standard targeted to video conference and video telephone applications operating at bit rates between 64
Kbit/s and 2 M bit/s.

•This bit rate was chosen because of the availability of ISDN (Integrated Services Digital Network) transmission lines that
could be allocated in multiples of 64 Kbit/s.

•The colour space used by H.261 is YCbCr with 4:2:0 chrominance subsampling.
H.263:

• H.263 standard is intended for video telecommunication.

•It was approved in early 1996. The key features of H.263 standard were variable block size compensation, overlapped
block motion compensation.

•H.263 can achieve better video at 18-24 Kbps than H.261 at 64 Kbps and enable video phone over regular phone lines or
wireless modem.

•H.263 standard supports five resolutions: QCIF, CIF, SQCIF, 4CIF, and 16 CIF.

H.263+:

•H.263+ standard offers a high degree of error resilience for wireless or packet-based transport networks.

•It was approved by ITU-T in 1998.


MPEG-1:

•The main objective of the MPEG-1 standard was to compress 4:1:1 CIF digital video sequences to a target bit rate of 1.5
Mbits/s.

•The standard defined a generic decoder but left the implementation of the encoder open to the individual design.

•MPEG-1 was designed for non- interlaced video sources, common in displays.

•Although it can be used with interlaced video streams such as television signals, its compression efficiency is smaller than
other techniques due to its non-interlaced frame-based processing.

MPEG-2:

•MPEG-2 forms the heart of broadcast quality digital television for both standard definition and high definition television.

• MPEG-2 incorporates various features from H.261 and MPEG-1.

•MPEG-2 can be seen as a superset of MPEG-1 and it was designed to be backward compatible to MPEG-1. MPEG-2
supports various modes of scalability, including spatial, temporal, and SNR scalability.
MPEG-4:

•MPEG-4 became an international standard in 1998. MPEG-4 is designed to address the requirement of the interactive
multimedia applications, while simultaneously supporting traditional applications.

•Bit rates targeted for MPEG-4 video standard range between 5-64 Kbits/s for mobile or PSTN (Public Switched Telephone
Network) video applications and up to 2 Mbit/s for TV/Film applications so that this standard supersedes MPEG-1 and
MPEG-2 for most applications.

•Video object coding is one of the most important features introduced by MPEG-4.

•By compressing an arbitrarily shaped video object rather than a rectangular frame, MPEG-4 enables the possibility to
manipulate and interact with the objects after they are created and compressed.

•The compression of an arbitrarily shaped video object includes the compression of its shape, motion and texture.

You might also like