2010 6th International Conference on Emerging Technologies (ICET)
An Implementation of MPEG in Multi-Terminal
Video Coding
Saima Shaheen, M. Younas Javed
Department of Computer Engineering
College of Electrical & Mechanical Engineering, NUST
Islamabad, Pakistan
[email protected],
[email protected]
is designed to be strong enough to do fast and efficient joint
decoding. This encoder-decoder complexity encapsulation is
the major essence of DSC concept. Focusing on all the
requirement issues of MTVC and this DSC concept, a system
has been designed to achieve higher compression gain at lower
bitrates. The proposed system is an extension of MPEG, a
standard video codec for a multi-terminal environment.
Basically, in this proposed multi-camera setup, camera nodes
are allowed to communicate in two different strategies,
strategy-1 allows minimum communication among the sensor
nodes. While sensor nodes are allowed to communicate a bit
more in strategy-2. Moreover, system is designed to work
under two working modes (named as Scenario-A and
Scenario-B1) Scenario-A takes more processing time but gives
large PSNR values (good quality video frames) while system
in Scenario-B can be deployed in a situation where quick
system response is required and degraded video quality can be
compromised. Both working modes are suitable for deploying
in different environments having different sets of
requirements. Appropriate system architecture has been
designed to achieve the following performance objectives as:
(a) to attain highest compression gain. (b) At the same time,
getting highest quality reconstructed videos at receiver end. (c)
And this all should be achieved by keeping the communication
among the camera nodes at minimum. Critical analysis of the
results shows that higher system performance gains are
achieved at lower bitrates.
Abstract—In a multi-camera setup, inter-sensor statistical
redundancy should be eliminated most significantly to achieve
higher compression efficiency. This consequently saves the
required bandwidth for the resultant piece of information coming
out of a camera sensor network. In this research work, Moving
Picture Expert Group (MPEG) a video compression standard has
been extended to eliminate inter-sensor statistical redundancy for
a multi-camera setup. For the formulation of results from all
aspects, two strategies have been designed for inter-sensor
communication. Moreover system is designed to work under two
working modes (named as Scenario-A and Scenario-B). Results
formulated in graphical formats show the system performance
for both communication strategies under both working modes.
Overall system performnace higher compression gains are
obtained at lower bitrates.
Keywords- Multi-Camera Setup; Multi-Terminal Video Coding;
Distributed Source Coding; Group of Pictures; Moving Picture
Expert Group
I.
INTRODUCTION
Video compression is the first and enormous need of
today’s world of multimedia computing. In a camera sensor
network, huge transmission data coming out of all the camera
nodes require significant amount of channel bandwidth.
Usually, all standard video compression algorithms are
deployed on each video sensor independently, they work to
eliminate temporal as well as spatial redundancies with in each
video sequence. But, still there is great repetition due to intersequence statistical redundancy that exists in the resultant
transmission data. Elimination of this type of correlation that
exists among the findings of all camera nodes, improves the
compression efficiency up to greater extent, as well as
maximum bandwidth saving is achieved. Since camera sensor
networks are power and energy constrained, so compression
efficiency should be achieved while keeping the
communication among the camera nodes at minimum.
Otherwise bandwidth saved through compression will be
consumed in inter-sensor communication, and net effect on
bandwidth saving will be zero. To solve these and other kinds
of requirement issues of Multi-Terminal Video Coding
(MTVC), concept of Distributed Source Coding (DSC) has
been deployed in number of ways. According to DSC
encoder-decoder deployment, encoders are designed to be
simple and fast, while decoder is comparatively complex and
II.
A lot of research has been done in multi-terminal video
compression domain. Different issues of multi-terminal video
compression have been addressed in different techniques using
DSC approach. In some techniques (e.g. [1], [2], [3]), DSC
concept has only been deployed to exploit temporal
correlation for a single video sequence at each encoder
independently, where the encoder designed is a lowcomplexity encoder and better error resilience is achieved.
Another approach [4] utilizes image resolution technique to
exploit inter-sensor statistical redundancy. This approach
encodes the image at low resolution and decodes it using super
resolution techniques, but correlation between low resolution
images has not been exploited using DSC. Therefore, higher
coding gains have not been achieved that can be obtained
applying multi-terminal source coding theory [5], [6], [7].
1
steps.
978-1-4244-8058-6/10/$26.00 ©2010 IEEE
PREVIOUS WORK
144
These two working modes differ in design of basic video compression
Another research work [8], [9] has deployed a distributed
image coding technique for a multi-terminal setup. They have
posed certain restrictions on various cameras and objects
settings e.g. cameras are located along a horizontal line.
Objects are supposed to be located at a certain known range or
distance from the camera. They have defined a lower bound
on minimum number of cameras to regenerate some video
scene. Another similar approach [10] has made certain
assumptions about sensors location to get correlation among
various video sequences. Other approaches as one [11],
requires complex computation like depth estimation at decoder
level. This is done to get correspondence between two camera
views under consideration. Another Whyner-Ziv coding based
approach [12] has also been proposed where encoder is
designed to be simple and fast while decoder is complex. This
is a good example of image based rendering applications but
geometrical relationship among camera positions has not been
taken into account. Likewise, inter-sensor redundancy is
exploited by deploying the idea of feedback [13], where
central decoder is allowed to provide feedback to one of the
encoder. The most recent work in this domain focuses on the
concept of MTVC [14], [15]. In the first approach [14],
correspondence between the two camera views is achieved by
applying Epipolar geometry concept. Second approach [15] is
based on model–based approach to get corresponding points
of two camera views. In the recent approach [16], again
epipolar geometry approach is exploited to get corresponding
points. They have proposed an MTVC framework, but for two
cameras only. However system performance gains at low
bitrates are shown. Another important task is simulation of
epipolar geometry [17], which can be deployed to pick
epipolar lines and other related parameters for pinhole
cameras.
III.
Since MPEG coding is based on intraframe and interframe
coding. Interframe compression uses one or more earlier or
later frames in a sequence to compress the current frame,
while intraframe compression uses only the current frame,
which is the approach used in image compression. Interframe
predictive coding is used to eliminate the large amount of
temporal and spatial redundancy that exists in video sequences
and helps in compressing them by motion estimation. MPEG
encoding steps include discrete cosine transformation,
quantization, zigzag scan, run-length encoding and Huffmantype entropy encoding. Likewise MPEG decoding process
steps are just the inverse of encoding steps and are carried out
in exactly the reverse order. These are inverse discrete cosine
transformation, de-quantization, inverse zigzag Scan, runlength decoding and Huffman-type entropy decoding. In [18],
important concepts and algorithms regarding image and video
compression have been assembled in an easy and
comprehensive way.
IV.
This research work is based on a lossy compression
algorithm, encapsulating the extension of MPEG for a camera
sensor network. This extension is used to eliminate intersensor statistical redundancy. While spatial and temporal
redundancies with in each video sequence are removed by
transform coding of motion-compensated video frames at each
camera node. Since camera sensor nodes are allowed to
communicate at different levels in the proposed strategies,
named as strategy-1 and strategy-2 (see Fig. 1 and 2). Sensor
nodes are allowed to communicate to perform motion
estimation at encoder-decoder ends.
A. System Architecture
To attain all the performance objectives, system
architecture must comprise the following components (see
Fig. 3):
MOVING PICTURE EXPERT GROUP (MPEG)
MPEG is an ISO/IEC working group, established in 1988
to develop standards for digital audio and video formats.
There are number of MPEG standards being used or
developed. Each compression standard was designed with a
specific application and bitrate. The algorithm employed by
MPEG-1 doesn’t provide a lossless coding scheme. However
the standard can support a variety of input formats and be
applied to a wide range of applications. The main purpose of
MPEG-1 video is to code moving image sequences or video
signals. To achieve higher compression ratio, both interframe
and intraframe redundancies should be exploited. MPEG is
based on transform coding of motion compensated video
frames. The Discrete Cosine Transform (DCT) coding is used
to remove the intraframe redundancy and motion
compensation is used to remove the interframe redundancy.
Intraframe coding is also implemented to satisfy the
requirement of random access (for I frames). In MPEG
coding, video sequence is first divided into Groups of Pictures
(GOP). Each GOP includes three types of pictures: Intracoded
(I), Predictive-coded (P) and bidirectional predictive-coded
(B) pictures. GOP size is dynamically chosen.
PROPOSED SYSTEM DESIGN
x
x
x
145
Random Selector: This module is designed to
randomly pick any sensor node with in the camera
sensor network to initiate the proposed algorithm.
Then encoding process starts in that sensor node
following standard MPEG encoding steps. Finally
this node transmits its I-frame as reference frame to
its nearest neighbor.
Encoder Layer: Encoder layer comprises a number of
encoders depending upon number of camera nodes.
Each encoder has its own encoding algorithm
implemented on a camera node independently.
Through out the encoder layer, reference I-frames are
transmitted from one node to the next one. Finally
compressed bit streams are sent to channel for
transmission.
Joint Decoder: Since all encoders send their
compressed findings to a central decoder. Here
standard MPEG decoding process takes placebut in a
Figure1.
x
x
Inter-sensor communication strategy-1
Figure3.
Proposed Scheme Architecture
c)
Above encoding steps are carried out when
communication strategy-1 is followed. In case of
strategy-2 only first frame of second camera node’s
GOP is motion compensated using first camera node
I-reference frame. While remaining frames of this
GOP undergo motion compensation using their
neighboring P frames as shown in strategy-2 (see Fig.
2).
d) After applying remaining video compression steps on
frame residues at each camera node, resultant bit
streams are fed to the transmission link. Since each
camera node is deployed with its own encoding
algorithm, while a central decoder is strong and
efficient enough to do joint decoding of the resultant
bit stream. Where this bit stream is the sum of all the
bit streams coming out of camera sensor network.
At receiver end, following decoding steps are applied on
received bitstream by fast processing central decoder as:
a) Frame residues of each video sequence are
reconstructed by applying decoding steps of MPEG
video compression. These steps are just the inverse of
encoding steps and applied exactly in reverse order.
b) Next, transmitted motion vectors and the respective Ireference frame are used to reconstruct a part of
frame similar to reference frame, the process called
motion compensation. And frame residues generated
in the last step is then added to this frame part to
reconstruct the full video frame for each video
sequence.
c) After applying identification steps using bit stream
ids, individual video sequence for each camera node
is reconstructed.
A useful work has been done regarding video
compression steps, available at Math works website [21].
shared environment, and video sequence frames are
the output.
Post-processor: Each video sequence frames are
finally fed to post-processor which is logically
designed to reconstruct the individual video
sequence.
Analyzer: This module runs in parallel to system
algorithm and specialized for capturing the values of
various compression parameters that are used for
result analysis.
B. Algorithmic Steps
The proposed algorithm can be deployed in a multi-camera
setup comprising number of cameras arbitrarily located in
space. These camera nodes are capturing the same video scene
from different angles depending upon their locations.
Encoding steps can be depicted as:
a) Algorithm is initiated by randomly picking any
camera node from the network and its GOP is
generated. First frame is picked as I-reference frame
and is intracoded. This I-reference frame is further
used to do motion compensation for remaining
frames of this GOP. As a result, motion vectors and
frame residues are generated. Since motion vectors
can be generated using any of the block matching
algorithms proposed in [19].
b) This camera node passes its I-reference frame to its
nearest camera node to do motion compensation for
the first frame of its GOP. Next, this motion
compensated frame (P21) is used to do motion
compensation for the remaining frames of this GOP.
Process of motion compensation yields frame
residues and respective motion vectors as done for
the first camera node.
V.
RESULTS EVALUATION AND DISCUSSION
System performance has been analyzed from different
aspects. Reconstructed video quality and compression
efficiency achieved while comparing the proposed
communication strategies and system working modes, have
been shown graphically. Similarly, values of different
compression parameters are analyzed with respect to each
other and shown through graphs.
Figure2.
Inter-sensor communication strategy-2
146
Figure4.
Strategy-1 under working mode-1 results showing movie frames. Above are original frames (each pair from two different views) and below
are the decoded frames, with respective compression parameter values.
A. Communication Strategy Comparison
An experiment has been conducted capturing a scene from
two different views; results obtained through strategy-1 under
working mode-1 are shown in Fig. 3. Since comparing the two
communication strategies is a trade off between camera nodes
communication and video quality. Results can be analyzed
using different performance metrics (see Fig. 5 and 6). These
figures show that both the communication strategies give the
same results up to movie-1 but there is a remarkable
difference in parameter values with the beginning of movie-2.
Strategy-1 gives good results as compared to strategy-2 on
quality (peak signal to noise ratio (PSNR)) basis but strategy-2
gives more saving (compression ratio (CR)).
C. System performance Against Various Compression
Parameters
This aspect of system performance shows system response
against different compression parameters (like PSNR,
Distortion (Mean Squared Error) and saving ratio) at various
bitrates. Fig. 9 shows that with the progress in number of
movie frames for both the strategies, there is a fall in PSNR
values as the bitrate increases. Both the strategies start from
the same point but with the beginning of movie-2 frames, both
the graphs are separated as strategy-1 ends with PSNR 20 dB
at bitrate of 420 kbps. While strategy-2 ends with PSNR value
17 dB at bitrate 490 kbps. So for both the strategies best
quality of decoded video frame is found at lower bitrates for
Scenario-A. Same relationship can be observed for Scenario-B
as shown in Fig. 10. One major difference for both the
scenarios is that Scenario-B ends at comparatively low bit
rates as for Scenario-B strategy-1 ends at 150 kbps with PSNR
value of 18.3 dB. And strategy-2 ends at 185 kbps with same
PSNR value. Fig. 11 and 12 show the saving ratio obtained at
various bitrates for Scenario-A and B respectively. Both the
scenarios show the same relationship, as a prominent decline
can be seen in saving ratio with an increase in the bitrates.
B. System Working Modes Comparison
The proposed system is capable of switching between two
working modes or scenarios depending upon a particular
situation and its requirements. In both the scenarios, system
performance can be analyzed on the basis of compression
results obtained as shown in Fig. 7 and 8. Fig. 7 and 8 show
that working mode-1 gives good quality reconstructed videos
as compared to working mode-2, but working mode-2 is good
on the basis of saving ratio achieved.
Figure5.
Figure6.
Compression ratio comparison of communication
strategies
147
Peak signal to noise ratio comparison of
communication strategies
Figure7.
Compression ratio comparison for Scenario A and B
Figure9.
For Scenario-A, saving ratio is decreased from 93.5% to
89.5% as the bitrate increases from 265 to 360 kbps for
strategy-1. And for stargey-2 this decline is from 93.5% to
88% as the bitrate increases from 265 to 435 kbps. Like wise,
for Scenario-B there is a fall in saving ratio as the bitrate
increases for both the strategies but comparatively at lower
bitrates, as shown in Fig. 12. According to Fig. 13, more gain
is obtained at lower bitrate and then video frames start
distorting as the bitrate increases. Fig. 14 shows the same
relationship between distortion and bitrate for Scenario-B. For
both the strategies at lower bitrates, distortion is less and then
it gradually increases and reaches its maximum value near
800. The only difference is that this rise in distortion reaches
early for strategy-1 as near bitrate of 145kbps while for
strategy-2 it is obtained at 190 kbps.
VI.
Peak signal to noise ratio vs. Bitrate (Scenario-A)
Camera nodes are allowed to communicate in two ways called
communication strategy-1 and strategy-2. And for each
communication strategy, system is capable of work in two
different modes or scenarios. Each working mode has its
benefits and can be deployed in different environments with
different requirements. Results have been computed for both
strategies under each working mode. Results shown in
graphical format prove that system performance highest gains
are obtained at lower bit rates and there is a remarkable
distortion in video quality as the bitrate increases. Overall out
of the two communication strategies, strategy-1 should be
adopted to get correspondence among video camera frames
keeping the communication among the sensor nodes at
minimum. Stargey-1 with working mode-1 gives good results.
So this combination should be exploited in future with more
advanced and sophisticated compression techniques.
Additionally, one drawback in working mode-1 is that it takes
too much processing time. This should also be removed in
future by improving the basic video compression steps.
CONCLUSIONS AND FUTURE WORK
To attain higher compression gains and consequently
greater bandwidth saving, in addition to spatial and temporal
correlation with in a single video sequence inter-sensor
statistical redundancy should also be removed. Keeping all the
requirement issues of a camera sensor network in mind, an
algorithm has been designed which is an extension of MPEG,
for a multi-camera setup.
Figure10. Peak signal to noise ratio vs. Bitrate (Scenario-B)
Figure8.
Peak signal to noise ratio comparison of Scenario A and
B
148
Figure14. Distortion vs. Bitrate (Scenario-B)
Figure11. Saving ratio vs. Bitrate (Scenario-A)
[5]
[6]
[7]
[8]
[9]
[10]
Figure12. Saving ratio vs. Bitrate (Scenario-B)
[11]
[12]
[13]
[14]
[15]
Figure13. Distortion vs. Bitrate (Scenario-A)
[16]
REFERENCES
[1]
[2]
[3]
[4]
View publication stats
Girod, A. Margot, S. Rane, and D. Rebollo-Monedero,
“Distributed video coding,” Proceedings of the IEEE, vol.93, no 1,
pp. 71–83, January 2005.
R. Puri and K. Ramchandran, “PRISM: A video coding
architecture
based
on
distributed
compression
principles,”submitted to IEEE Transactions on Image Processing.
Sehgal, A. Jagmohan, and N. Ahuja, “Wyner-Ziv Cod-ing of
Video: An Error-Resilient Compression Frame-work,”IEEE
Transactions on Multimedia, vol. 6, no 2, pp. 249–258, April 2004.
R. Wagner, R. Nowak, R. Baranuik,
“Distributed Image
Compression for sensor networks using correspondence analysis
and super resolution,” in ICIP 2003.
[17]
[18]
[19]
[20]
[21]
149
T. Burger, “Multi-terminal source encoding,” in the Information
Theory Approach to Communications, G. Longo, Ed. CISM
Courses and Lectures 229.Springer, New York, 1978.
Y. Oohama, “Gaussian multi-terminal source coding,” IEEE
Transactions on Information Theory, vol. 43, no.6, pp. 1912-1923,
November 1997.
Slepian and J. K. Wolf, “Noiseless Coding of Correlated
information sources,” IEEE Transactions on Information Theory,
vol. 19. No 4. Pp 471-480, July 1973.
N. Gehrig and P. L. Dragotti, “DIFFERENT: Distributed and Fully
Flexible image EncodeRs for camEra sensor NeTworks,” in ICIP
2005.
N. Gehrig and P. L. Dragotti, “Distributed Sampling and
Compression of Scenes with Finite Rate of Innovation in Camera
Sensor Networks,” in Data Compression Conference 2006.
X. Zhu, A.Aaron, and B. Girod, “Distributed Compression for
large camera arrays” in IEEE Workshop on Statistical Signal
Processing, September 2003.
J. C. Dagher, M. W. Marcellin and M. A. Neifeld, “A Method for
Coordinating the Distributed Transmission of Imagery,” in IEEE
Transactions on Image Processing, pp.1705-1717, July 2006.
Y. Yang, V. Stankovic, W. Zhao, and Z. Xiong, “Multiter-minal
video coding,” in Information Theory and Applica-tions
Workshop, San Diego, CA, January 2007.
M. Flierl and B. Girod, “Coding of multi-view image sequences
with video sensors,” in IEEE Intl. Conf. On Image Processing,
2006.
B. Song, A. Roy-Chowdhury, and E.Tuncel. “Towards a multiterminal Video Compression algorithm using epipolar geometry,”
IEEE Intl. Conf. On Acoustics, Speech and Signal Processing,
2006.
B. Song, A. Roy-Chowdhury, and E.Tuncel. “A multi-terminal
model-based video compression algorithm,” IEEE Intl. Conf. on
Image Processing, 2006.
B. Song, A. Roy-Chowdhury, and E.Tuncel. “Towards A MultiTerminal Video Compression Algorithm By Integrating
Distributed Source Coding With Geometrical Constraints,” in
Journal of Multimedia, vol. 2, no 3, June 2007.
“The Epipolar Geometry Toolbox” by Gian Luca Mariottini and
Domenico Prattichizzo, 1070-9932, 2005.
Image and Video Compression for Multimedia Engineering:
Fundamentals, Algorithms and Standards by Yun Q.Shi and
Huifang Sun.
http://www.mathworks.com/matlabcentral/fileexchange/8761block-matching-algorithms-for-motion-estimation.
http://en.wikipedia.org/wiki/Moving_Picture_Experts_Group
http://www.mathworks.com/matlabcentral/fileexchange/13020.