Academia.eduAcademia.edu

An implementation of MPEG in Multi-Terminal Video Coding

2010, 2010 6th International Conference on Emerging Technologies (ICET)

In a multi-camera setup, inter-sensor statistical redundancy should be eliminated most significantly to achieve higher compression efficiency. This consequently saves the required bandwidth for the resultant piece of information coming out of a camera sensor network. In this research work, Moving Picture Expert Group (MPEG) a video compression standard has been extended to eliminate inter-sensor statistical redundancy for a multi-camera setup. For the formulation of results from all aspects, two strategies have been designed for inter-sensor communication. Moreover system is designed to work under two working modes (named as Scenario-A and Scenario-B). Results formulated in graphical formats show the system performance for both communication strategies under both working modes. Overall system performnace higher compression gains are obtained at lower bitrates.

2010 6th International Conference on Emerging Technologies (ICET) An Implementation of MPEG in Multi-Terminal Video Coding Saima Shaheen, M. Younas Javed Department of Computer Engineering College of Electrical & Mechanical Engineering, NUST Islamabad, Pakistan [email protected], [email protected] is designed to be strong enough to do fast and efficient joint decoding. This encoder-decoder complexity encapsulation is the major essence of DSC concept. Focusing on all the requirement issues of MTVC and this DSC concept, a system has been designed to achieve higher compression gain at lower bitrates. The proposed system is an extension of MPEG, a standard video codec for a multi-terminal environment. Basically, in this proposed multi-camera setup, camera nodes are allowed to communicate in two different strategies, strategy-1 allows minimum communication among the sensor nodes. While sensor nodes are allowed to communicate a bit more in strategy-2. Moreover, system is designed to work under two working modes (named as Scenario-A and Scenario-B1) Scenario-A takes more processing time but gives large PSNR values (good quality video frames) while system in Scenario-B can be deployed in a situation where quick system response is required and degraded video quality can be compromised. Both working modes are suitable for deploying in different environments having different sets of requirements. Appropriate system architecture has been designed to achieve the following performance objectives as: (a) to attain highest compression gain. (b) At the same time, getting highest quality reconstructed videos at receiver end. (c) And this all should be achieved by keeping the communication among the camera nodes at minimum. Critical analysis of the results shows that higher system performance gains are achieved at lower bitrates. Abstract—In a multi-camera setup, inter-sensor statistical redundancy should be eliminated most significantly to achieve higher compression efficiency. This consequently saves the required bandwidth for the resultant piece of information coming out of a camera sensor network. In this research work, Moving Picture Expert Group (MPEG) a video compression standard has been extended to eliminate inter-sensor statistical redundancy for a multi-camera setup. For the formulation of results from all aspects, two strategies have been designed for inter-sensor communication. Moreover system is designed to work under two working modes (named as Scenario-A and Scenario-B). Results formulated in graphical formats show the system performance for both communication strategies under both working modes. Overall system performnace higher compression gains are obtained at lower bitrates. Keywords- Multi-Camera Setup; Multi-Terminal Video Coding; Distributed Source Coding; Group of Pictures; Moving Picture Expert Group I. INTRODUCTION Video compression is the first and enormous need of today’s world of multimedia computing. In a camera sensor network, huge transmission data coming out of all the camera nodes require significant amount of channel bandwidth. Usually, all standard video compression algorithms are deployed on each video sensor independently, they work to eliminate temporal as well as spatial redundancies with in each video sequence. But, still there is great repetition due to intersequence statistical redundancy that exists in the resultant transmission data. Elimination of this type of correlation that exists among the findings of all camera nodes, improves the compression efficiency up to greater extent, as well as maximum bandwidth saving is achieved. Since camera sensor networks are power and energy constrained, so compression efficiency should be achieved while keeping the communication among the camera nodes at minimum. Otherwise bandwidth saved through compression will be consumed in inter-sensor communication, and net effect on bandwidth saving will be zero. To solve these and other kinds of requirement issues of Multi-Terminal Video Coding (MTVC), concept of Distributed Source Coding (DSC) has been deployed in number of ways. According to DSC encoder-decoder deployment, encoders are designed to be simple and fast, while decoder is comparatively complex and II. A lot of research has been done in multi-terminal video compression domain. Different issues of multi-terminal video compression have been addressed in different techniques using DSC approach. In some techniques (e.g. [1], [2], [3]), DSC concept has only been deployed to exploit temporal correlation for a single video sequence at each encoder independently, where the encoder designed is a lowcomplexity encoder and better error resilience is achieved. Another approach [4] utilizes image resolution technique to exploit inter-sensor statistical redundancy. This approach encodes the image at low resolution and decodes it using super resolution techniques, but correlation between low resolution images has not been exploited using DSC. Therefore, higher coding gains have not been achieved that can be obtained applying multi-terminal source coding theory [5], [6], [7]. 1 steps.  978-1-4244-8058-6/10/$26.00 ©2010 IEEE PREVIOUS WORK 144 These two working modes differ in design of basic video compression Another research work [8], [9] has deployed a distributed image coding technique for a multi-terminal setup. They have posed certain restrictions on various cameras and objects settings e.g. cameras are located along a horizontal line. Objects are supposed to be located at a certain known range or distance from the camera. They have defined a lower bound on minimum number of cameras to regenerate some video scene. Another similar approach [10] has made certain assumptions about sensors location to get correlation among various video sequences. Other approaches as one [11], requires complex computation like depth estimation at decoder level. This is done to get correspondence between two camera views under consideration. Another Whyner-Ziv coding based approach [12] has also been proposed where encoder is designed to be simple and fast while decoder is complex. This is a good example of image based rendering applications but geometrical relationship among camera positions has not been taken into account. Likewise, inter-sensor redundancy is exploited by deploying the idea of feedback [13], where central decoder is allowed to provide feedback to one of the encoder. The most recent work in this domain focuses on the concept of MTVC [14], [15]. In the first approach [14], correspondence between the two camera views is achieved by applying Epipolar geometry concept. Second approach [15] is based on model–based approach to get corresponding points of two camera views. In the recent approach [16], again epipolar geometry approach is exploited to get corresponding points. They have proposed an MTVC framework, but for two cameras only. However system performance gains at low bitrates are shown. Another important task is simulation of epipolar geometry [17], which can be deployed to pick epipolar lines and other related parameters for pinhole cameras. III. Since MPEG coding is based on intraframe and interframe coding. Interframe compression uses one or more earlier or later frames in a sequence to compress the current frame, while intraframe compression uses only the current frame, which is the approach used in image compression. Interframe predictive coding is used to eliminate the large amount of temporal and spatial redundancy that exists in video sequences and helps in compressing them by motion estimation. MPEG encoding steps include discrete cosine transformation, quantization, zigzag scan, run-length encoding and Huffmantype entropy encoding. Likewise MPEG decoding process steps are just the inverse of encoding steps and are carried out in exactly the reverse order. These are inverse discrete cosine transformation, de-quantization, inverse zigzag Scan, runlength decoding and Huffman-type entropy decoding. In [18], important concepts and algorithms regarding image and video compression have been assembled in an easy and comprehensive way. IV. This research work is based on a lossy compression algorithm, encapsulating the extension of MPEG for a camera sensor network. This extension is used to eliminate intersensor statistical redundancy. While spatial and temporal redundancies with in each video sequence are removed by transform coding of motion-compensated video frames at each camera node. Since camera sensor nodes are allowed to communicate at different levels in the proposed strategies, named as strategy-1 and strategy-2 (see Fig. 1 and 2). Sensor nodes are allowed to communicate to perform motion estimation at encoder-decoder ends. A. System Architecture To attain all the performance objectives, system architecture must comprise the following components (see Fig. 3): MOVING PICTURE EXPERT GROUP (MPEG) MPEG is an ISO/IEC working group, established in 1988 to develop standards for digital audio and video formats. There are number of MPEG standards being used or developed. Each compression standard was designed with a specific application and bitrate. The algorithm employed by MPEG-1 doesn’t provide a lossless coding scheme. However the standard can support a variety of input formats and be applied to a wide range of applications. The main purpose of MPEG-1 video is to code moving image sequences or video signals. To achieve higher compression ratio, both interframe and intraframe redundancies should be exploited. MPEG is based on transform coding of motion compensated video frames. The Discrete Cosine Transform (DCT) coding is used to remove the intraframe redundancy and motion compensation is used to remove the interframe redundancy. Intraframe coding is also implemented to satisfy the requirement of random access (for I frames). In MPEG coding, video sequence is first divided into Groups of Pictures (GOP). Each GOP includes three types of pictures: Intracoded (I), Predictive-coded (P) and bidirectional predictive-coded (B) pictures. GOP size is dynamically chosen.  PROPOSED SYSTEM DESIGN x x x 145 Random Selector: This module is designed to randomly pick any sensor node with in the camera sensor network to initiate the proposed algorithm. Then encoding process starts in that sensor node following standard MPEG encoding steps. Finally this node transmits its I-frame as reference frame to its nearest neighbor. Encoder Layer: Encoder layer comprises a number of encoders depending upon number of camera nodes. Each encoder has its own encoding algorithm implemented on a camera node independently. Through out the encoder layer, reference I-frames are transmitted from one node to the next one. Finally compressed bit streams are sent to channel for transmission. Joint Decoder: Since all encoders send their compressed findings to a central decoder. Here standard MPEG decoding process takes placebut in a Figure1. x x Inter-sensor communication strategy-1 Figure3. Proposed Scheme Architecture c) Above encoding steps are carried out when communication strategy-1 is followed. In case of strategy-2 only first frame of second camera node’s GOP is motion compensated using first camera node I-reference frame. While remaining frames of this GOP undergo motion compensation using their neighboring P frames as shown in strategy-2 (see Fig. 2). d) After applying remaining video compression steps on frame residues at each camera node, resultant bit streams are fed to the transmission link. Since each camera node is deployed with its own encoding algorithm, while a central decoder is strong and efficient enough to do joint decoding of the resultant bit stream. Where this bit stream is the sum of all the bit streams coming out of camera sensor network. At receiver end, following decoding steps are applied on received bitstream by fast processing central decoder as: a) Frame residues of each video sequence are reconstructed by applying decoding steps of MPEG video compression. These steps are just the inverse of encoding steps and applied exactly in reverse order. b) Next, transmitted motion vectors and the respective Ireference frame are used to reconstruct a part of frame similar to reference frame, the process called motion compensation. And frame residues generated in the last step is then added to this frame part to reconstruct the full video frame for each video sequence. c) After applying identification steps using bit stream ids, individual video sequence for each camera node is reconstructed. A useful work has been done regarding video compression steps, available at Math works website [21]. shared environment, and video sequence frames are the output. Post-processor: Each video sequence frames are finally fed to post-processor which is logically designed to reconstruct the individual video sequence. Analyzer: This module runs in parallel to system algorithm and specialized for capturing the values of various compression parameters that are used for result analysis. B. Algorithmic Steps The proposed algorithm can be deployed in a multi-camera setup comprising number of cameras arbitrarily located in space. These camera nodes are capturing the same video scene from different angles depending upon their locations. Encoding steps can be depicted as: a) Algorithm is initiated by randomly picking any camera node from the network and its GOP is generated. First frame is picked as I-reference frame and is intracoded. This I-reference frame is further used to do motion compensation for remaining frames of this GOP. As a result, motion vectors and frame residues are generated. Since motion vectors can be generated using any of the block matching algorithms proposed in [19]. b) This camera node passes its I-reference frame to its nearest camera node to do motion compensation for the first frame of its GOP. Next, this motion compensated frame (P21) is used to do motion compensation for the remaining frames of this GOP. Process of motion compensation yields frame residues and respective motion vectors as done for the first camera node. V. RESULTS EVALUATION AND DISCUSSION System performance has been analyzed from different aspects. Reconstructed video quality and compression efficiency achieved while comparing the proposed communication strategies and system working modes, have been shown graphically. Similarly, values of different compression parameters are analyzed with respect to each other and shown through graphs. Figure2.  Inter-sensor communication strategy-2 146 Figure4. Strategy-1 under working mode-1 results showing movie frames. Above are original frames (each pair from two different views) and below are the decoded frames, with respective compression parameter values. A. Communication Strategy Comparison An experiment has been conducted capturing a scene from two different views; results obtained through strategy-1 under working mode-1 are shown in Fig. 3. Since comparing the two communication strategies is a trade off between camera nodes communication and video quality. Results can be analyzed using different performance metrics (see Fig. 5 and 6). These figures show that both the communication strategies give the same results up to movie-1 but there is a remarkable difference in parameter values with the beginning of movie-2. Strategy-1 gives good results as compared to strategy-2 on quality (peak signal to noise ratio (PSNR)) basis but strategy-2 gives more saving (compression ratio (CR)). C. System performance Against Various Compression Parameters This aspect of system performance shows system response against different compression parameters (like PSNR, Distortion (Mean Squared Error) and saving ratio) at various bitrates. Fig. 9 shows that with the progress in number of movie frames for both the strategies, there is a fall in PSNR values as the bitrate increases. Both the strategies start from the same point but with the beginning of movie-2 frames, both the graphs are separated as strategy-1 ends with PSNR 20 dB at bitrate of 420 kbps. While strategy-2 ends with PSNR value 17 dB at bitrate 490 kbps. So for both the strategies best quality of decoded video frame is found at lower bitrates for Scenario-A. Same relationship can be observed for Scenario-B as shown in Fig. 10. One major difference for both the scenarios is that Scenario-B ends at comparatively low bit rates as for Scenario-B strategy-1 ends at 150 kbps with PSNR value of 18.3 dB. And strategy-2 ends at 185 kbps with same PSNR value. Fig. 11 and 12 show the saving ratio obtained at various bitrates for Scenario-A and B respectively. Both the scenarios show the same relationship, as a prominent decline can be seen in saving ratio with an increase in the bitrates. B. System Working Modes Comparison The proposed system is capable of switching between two working modes or scenarios depending upon a particular situation and its requirements. In both the scenarios, system performance can be analyzed on the basis of compression results obtained as shown in Fig. 7 and 8. Fig. 7 and 8 show that working mode-1 gives good quality reconstructed videos as compared to working mode-2, but working mode-2 is good on the basis of saving ratio achieved.  Figure5.  Figure6. Compression ratio comparison of communication strategies 147 Peak signal to noise ratio comparison of communication strategies Figure7. Compression ratio comparison for Scenario A and B Figure9. For Scenario-A, saving ratio is decreased from 93.5% to 89.5% as the bitrate increases from 265 to 360 kbps for strategy-1. And for stargey-2 this decline is from 93.5% to 88% as the bitrate increases from 265 to 435 kbps. Like wise, for Scenario-B there is a fall in saving ratio as the bitrate increases for both the strategies but comparatively at lower bitrates, as shown in Fig. 12. According to Fig. 13, more gain is obtained at lower bitrate and then video frames start distorting as the bitrate increases. Fig. 14 shows the same relationship between distortion and bitrate for Scenario-B. For both the strategies at lower bitrates, distortion is less and then it gradually increases and reaches its maximum value near 800. The only difference is that this rise in distortion reaches early for strategy-1 as near bitrate of 145kbps while for strategy-2 it is obtained at 190 kbps. VI. Peak signal to noise ratio vs. Bitrate (Scenario-A) Camera nodes are allowed to communicate in two ways called communication strategy-1 and strategy-2. And for each communication strategy, system is capable of work in two different modes or scenarios. Each working mode has its benefits and can be deployed in different environments with different requirements. Results have been computed for both strategies under each working mode. Results shown in graphical format prove that system performance highest gains are obtained at lower bit rates and there is a remarkable distortion in video quality as the bitrate increases. Overall out of the two communication strategies, strategy-1 should be adopted to get correspondence among video camera frames keeping the communication among the sensor nodes at minimum. Stargey-1 with working mode-1 gives good results. So this combination should be exploited in future with more advanced and sophisticated compression techniques. Additionally, one drawback in working mode-1 is that it takes too much processing time. This should also be removed in future by improving the basic video compression steps. CONCLUSIONS AND FUTURE WORK To attain higher compression gains and consequently greater bandwidth saving, in addition to spatial and temporal correlation with in a single video sequence inter-sensor statistical redundancy should also be removed. Keeping all the requirement issues of a camera sensor network in mind, an algorithm has been designed which is an extension of MPEG, for a multi-camera setup. Figure10. Peak signal to noise ratio vs. Bitrate (Scenario-B) Figure8.  Peak signal to noise ratio comparison of Scenario A and B 148   Figure14. Distortion vs. Bitrate (Scenario-B) Figure11. Saving ratio vs. Bitrate (Scenario-A) [5] [6] [7] [8] [9] [10] Figure12. Saving ratio vs. Bitrate (Scenario-B) [11] [12] [13] [14] [15] Figure13. Distortion vs. Bitrate (Scenario-A) [16] REFERENCES [1] [2] [3] [4]  View publication stats Girod, A. Margot, S. Rane, and D. Rebollo-Monedero, “Distributed video coding,” Proceedings of the IEEE, vol.93, no 1, pp. 71–83, January 2005. R. Puri and K. Ramchandran, “PRISM: A video coding architecture based on distributed compression principles,”submitted to IEEE Transactions on Image Processing. Sehgal, A. Jagmohan, and N. Ahuja, “Wyner-Ziv Cod-ing of Video: An Error-Resilient Compression Frame-work,”IEEE Transactions on Multimedia, vol. 6, no 2, pp. 249–258, April 2004. R. Wagner, R. Nowak, R. Baranuik, “Distributed Image Compression for sensor networks using correspondence analysis and super resolution,” in ICIP 2003. [17] [18] [19] [20] [21] 149 T. Burger, “Multi-terminal source encoding,” in the Information Theory Approach to Communications, G. Longo, Ed. CISM Courses and Lectures 229.Springer, New York, 1978. Y. Oohama, “Gaussian multi-terminal source coding,” IEEE Transactions on Information Theory, vol. 43, no.6, pp. 1912-1923, November 1997. Slepian and J. K. Wolf, “Noiseless Coding of Correlated information sources,” IEEE Transactions on Information Theory, vol. 19. No 4. Pp 471-480, July 1973. N. Gehrig and P. L. Dragotti, “DIFFERENT: Distributed and Fully Flexible image EncodeRs for camEra sensor NeTworks,” in ICIP 2005. N. Gehrig and P. L. Dragotti, “Distributed Sampling and Compression of Scenes with Finite Rate of Innovation in Camera Sensor Networks,” in Data Compression Conference 2006. X. Zhu, A.Aaron, and B. Girod, “Distributed Compression for large camera arrays” in IEEE Workshop on Statistical Signal Processing, September 2003. J. C. Dagher, M. W. Marcellin and M. A. Neifeld, “A Method for Coordinating the Distributed Transmission of Imagery,” in IEEE Transactions on Image Processing, pp.1705-1717, July 2006. Y. Yang, V. Stankovic, W. Zhao, and Z. Xiong, “Multiter-minal video coding,” in Information Theory and Applica-tions Workshop, San Diego, CA, January 2007. M. Flierl and B. Girod, “Coding of multi-view image sequences with video sensors,” in IEEE Intl. Conf. On Image Processing, 2006. B. Song, A. Roy-Chowdhury, and E.Tuncel. “Towards a multiterminal Video Compression algorithm using epipolar geometry,” IEEE Intl. Conf. On Acoustics, Speech and Signal Processing, 2006. B. Song, A. Roy-Chowdhury, and E.Tuncel. “A multi-terminal model-based video compression algorithm,” IEEE Intl. Conf. on Image Processing, 2006. B. Song, A. Roy-Chowdhury, and E.Tuncel. “Towards A MultiTerminal Video Compression Algorithm By Integrating Distributed Source Coding With Geometrical Constraints,” in Journal of Multimedia, vol. 2, no 3, June 2007. “The Epipolar Geometry Toolbox” by Gian Luca Mariottini and Domenico Prattichizzo, 1070-9932, 2005. Image and Video Compression for Multimedia Engineering: Fundamentals, Algorithms and Standards by Yun Q.Shi and Huifang Sun. http://www.mathworks.com/matlabcentral/fileexchange/8761block-matching-algorithms-for-motion-estimation. http://en.wikipedia.org/wiki/Moving_Picture_Experts_Group http://www.mathworks.com/matlabcentral/fileexchange/13020.