Stereo Image Coding: A Projection Approach: Hal Uk Aydıno Glu, and Monson H. Hayes, III

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

506 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO.

4, APRIL 1998

Stereo Image Coding: A Projection Approach


Haluk Aydnoglu, Student Member, IEEE, and Monson H. Hayes, III, Fellow, IEEE

Abstract Recently, due to advances in display technology, exploiting the inherent redundancy in a stereo pair, twice
three-dimensional (3-D) imaging systems are becoming increas- as many bits will be required to represent a stereo pair
ingly popular. One way of stimulating 3-D perception is to use compared to a single image. Therefore, there has been an
stereo pairs, a pair of images of the same scene acquired from
different perspectives. Since there is an inherent redundancy increasing interest in stereo image coding in the literature. The
between the images of a stereo pair, data compression algorithms proposed techniques generally employ block-based disparity
should be employed to represent stereo pairs efficiently. This compensation (DC), which is similar to motion compensation
paper focuses on the stereo image coding problem. We begin with [8][11]. As it is discussed later in this paper, the stereo image
a description of the problem and a survey of current stereo coding coding problem, however, is substantially different from the
techniques. A new stereo image coding algorithm that is based
on disparity compensation and subspace projection is described. video coding problem.
This algorithm, the subspace projection technique (SPT), is a In this paper, a new approach to stereo image coding is
transform domain approach with a space-varying transformation presented. We propose to combine disparity compensation
matrix and may be interpreted as a spatial-transform domain and transform coding of the residual into a single framework
representation of the stereo data. The advantage of the pro- by using an adaptive incomplete transform. The proposed
posed approach is that it can locally adapt to the changes in
the cross-correlation characteristics of the stereo pairs. Several technique has the unique advantage of adapting to the local
design issues and implementations of the algorithm are discussed. changes in cross-correlation statistics of the stereo data as
Finally, we present empirical results suggesting that the SPT well as correcting for systematic luminance differences. In
approach outperforms current stereo coding techniques. addition, empirical results have suggested that the proposed
Index Terms Adaptive incomplete transform, data compres- algorithm, namely the subspace projection technique (SPT),
sion, disparity compensation, orthogonal decomposition, polyno- performs better, in the rate-distortion sense, than other dis-
mial approximation. parity compensated coding techniques such as fixed block
size DC, variable block size DC (VDC), and windowed DC
(WDC), especially at very low bit rates.
I. INTRODUCTION The organization of the paper is as follows. Section II
provides background concerning the assumed stereo geometry
T HE HUMAN brain can process subtle differences be-
tween the images that are presented to the left and right
eyes to perceive a three-dimensional (3-D) outside world. This
and the concept of disparity estimation. Section III summarizes
issues associated with stereo image coding. In particular,
a mathematical model for stereo pairs is presented, current
ability is called stereo vision. A stereoscopic system may be
coding techniques are discussed, and the differences between
used to stimulate the stereo vision ability artificially. A stereo
stereo pairs and image pairs from a video sequence are
pair, a pair of images of the same scene acquired from two
explained. Section IV focuses on the proposed SPT algorithm,
different perspectives, is presented to the observer so that the
and experimental results are presented in Section V. Finally,
right image is seen by the right eye and the left image is
results are summarized in Section VI.
seen by the left eye. The human observer then perceives the
scene in depth by processing the relative displacement, i.e., the
disparity, of the objects between the two images of a stereo
pair. II. BACKGROUND
Recently, applications of 3-D vision systems have appeared
Two cameras with parallel optical axes, i.e., image planes
in telecommunication [1], geoscience [2], [3], robotics [4], [5],
of the cameras are coplanar and collinear, and with identical
and navigation [6], [7]. To accommodate these applications,
optical characteristics, are employed to acquire a stereo pair.
it is necessary to transmit and store stereo pairs. Without
This camera geometry is presented in Fig. 1.
The parallel optical axes geometry is convenient for several
Manuscript received April 8, 1996; revised May 16, 1997. This work
was supported in part by the Joint Services Electronic Program under Grant reasons. First, it ensures that the vertical component of the
DAAH 04-96-1-0161. The associate editor coordinating the review of this disparity vectors is zero, and that the horizontal component,
manuscript and approving it for publication was Dr. Christine Podilchuk. , is (see Fig. 1 for notation). The depth of an object,
H. Aydinoglu was with the Center for Signal and Image Processing,
Georgia Institute of Technology, Atlanta, GA 30339 USA. He is now , is inversely proportional and is given by
with Retek Information Systems, Atlanta, GA 30305 USA (e-mail:
[email protected]).
M. H. Hayes, III, is with the Center for Signal and Image Processing,
Georgia Institute of Technology, Atlanta, GA 30339 USA. (1)
Publisher Item Identifier S 1057-7149(98)02458-0.

10577149/98$10.00 1998 IEEE


AYDINOGLU AND HAYES: STEREO IMAGE CODING 507

Fig. 1. Parallel optical axis binocular stereo geometry.

where is the focal length of the cameras, and is the distance III. STEREO IMAGE CODING
between the centers of the two cameras. Note that is always In this section, we investigate the problem of stereo image
negative for the objects in the scene. coding and compare it to the problem of video compres-
Second, the projections of a rectangular surface lying in sion. This section is composed of three subsections. First, a
a plane that is parallel to the both image planes have the
mathematical model for stereo sources is developed and the
same area and aspect ratio, and the relative displacement of
rate-distortion characteristics of the model is described. Then,
this surface is purely translational. Rectangular surfaces that
an overview of current stereo coding techniques is presented.
are not parallel to the image plane, however, suffer from
In addition, advantages and disadvantages of these approaches
perspective distortion (deformation of the aspect ratio), and
are described. Finally, a comparison between stereo pairs
purely translational models are no longer valid [12]. Finally,
and an image pair from a video sequence is presented. This
parallel optical axes geometry is appropriate to model the
human visual system [13], [14]. comparison provides the insight used to develop the coding
Estimating the disparity field is an important problem in algorithm that is presented in Section IV.
stereo image analysis. In vision applications, for example, this A. Theory of Stereo Image Coding
field provides depth information. In coding applications, on
the other hand, the disparity field represents the redundant This section investigates the theoretical basis of coding
information between two images. There are two types of correlated sources (stereo pair sources in particular). A stereo
disparity estimation algorithms: feature based and intensity pair can be modeled as a vector-valued outcome of
based. Feature-based algorithms attempt to match significant two correlated discrete random processes. First, lets consider
abstract features, such as edges, in order to estimate the the lossless encoding of a stereo pair. According to Shannons
disparity field. Selected features are generally those that are theorem, a rate that is greater than , which is equal
not sensitive to noise and calibration errors. Matching pairs to , is sufficient to recover the stereo source
(for selected features) are found only if the same features are with arbitrarily low probability of error if they are encoded
extracted in both images. An interpolation stage is necessary together. An optimal lossless coding strategy, which is referred
to obtain a dense disparity field [15]. Since this is a non- to as the conditional coder (CONCOD) for a stereo pair is,
trivial procedure, typically these algorithms are only used in then, to code one image and then code the second image given
applications where exact disparity information is necessary. the coded first image. On the other hand, another optimal
If a feature-based disparity estimation method is used for coding scheme, known as the SlepianWolf coder, may be
coding, then it is necessary to transmit both the coordinates employed such that two sources are coded independently of
of the features and the corresponding disparities. Since this each other and decoded by a common decoder [16]. The basic
is inefficient for coding applications, intensity-based methods theorem of the SlepianWolf encoder is given by the following
are generally employed. With an intensity-based method, it theorem.
is necessary to match corresponding points on the basis of Theorem 1 [SlepianWolf]: Correlated sources and
the similarity of the intensity of the corresponding areas in can be separately described at rates and and recovered
the right and left images. A disparity value is assigned to with arbitrarily low probability of error by a common decoder
the corresponding points, even if a real match does not exist. if and only if
Assuming the disparity values are locally stationary, a block-
by-block disparity estimation is sufficient within the limits of
the human visual system (HVS) [15]. The block sizes are
either fixed (therefore, no side information is necessary for
the location of the blocks) or variable. (2)
508 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 4, APRIL 1998

One can refer to [17] for a proof of the theorem and actual discuss several algorithms that have been proposed based on
implementation of the SlepianWolf coder. this observation.
An important question, however, is how to design an Finally, it should be pointed out that, for most coding
optimum lossy coder, in the rate-distortion sense, for correlated systems, the human observer should be able to perceive the
sources. Specifically, What is the minimum number of bits 3-D scene when the coded stereo pair is presented to him.
necessary to code the stereo source such that the Therefore, instead of the distortion pair , one may
distortion for is and the distortion for is ? want to use the distortion that quantifies the distortion in
One possible solution is to use the CONCOD structure, which depth perception.1 Unfortunately, no such distortion measure
has been shown to be suboptimal for some region in the rate- has been developed.
distortion surface [9]. Another candidate is the SlepianWolf
coder. However, the general rate-distortion region remains B. Overview of Data Compression Schemes for Stereo Pairs
unknown for the SlepianWolf problem with distortion in both An obvious but inefficient way to compress a stereo pair is
and [17]. It is useful to examine the performance of the to code the images independently. This inefficiency stems from
CONCOD structure with respect to the optimal coder. It can the fact that it does not exploit the redundancy in a stereo pair.
be shown that The first proposed stereo compression algorithm is to code the
sum and difference of the two images [13]. The theoretical
reasoning behind this approach is that the sum and difference
images obtained from a stereo pair are uncorrelated if the right
(3) and left images have the same first and second-order statistics.
Three-dimensional discrete cosine transform (DCT) coding of
where is the rate-distortion function for , stereo pairs [11] is equivalent to the sum-difference coding in
is the joint rate-distortion function for and the transform domain. However, according to our experimental
(the optimal coder), and is the conditional rate- results, the performance of these techniques decreases with
distortion function for given the encoded (the optimal increased disparity values.
conditional coder). As it is discussed later in this section, due A modification to the sum-difference coding is to shift
to practical reasons, we like to allocate only a small percentage one of the images horizontally to the point where the cross-
of the total bit rate to the second image. Let us consider the correlation between the images of stereo pair reaches its
extreme case, in which we allocate zero bits for the second maximum value (the shift approximately equals the mean
image [the distortion for the second image is equal to disparity of the scene). The shifted image is then subtracted
its maximum value , i.e., ]. For this from its partner image to remove the redundancy, and the
case, the CONCOD structure is optimal, since the lower and difference is encoded [8]. This method, referred to as the global
upper bounds of (3) are identical. As a result of the continuity translation method, assumes that objects in the scene have
of the rate-distortion functions, for any there exists a similar disparity values. However, since objects in the scene
such that if then have generally different disparity values, this method is not
particularly efficient. Another approach is to translate the row
blocks instead of the whole image. The amount of translation
is determined either by cross correlation statistics (correlation
(4) enhanced technique) [8] or using the most interesting feature
in the row block [19].
Both global translation and correlation enhanced techniques
However, since are the simplest form of disparity compensation. The aim of
disparity compensation is to estimate the disparities between
the objects in a stereo pair, and use these estimates to remove
the stereo redundancy. The idea is to use a disparity-corrected
(5) version of one image as a predictor of its partner image
[20]. Since the disparity information should be transmitted, in
Rewriting the left side of the (5) using (3) and substituting it applications where the compressed stereo pairs will be viewed
into (4) yields the following inequality: by a human, computing a sparse disparity field is the key to
compression [21].
The methodology of block-based disparity estimation is as
follows. First, the left image is independently coded. Then, the
(6) right image is divided into blocks. Either fixed or variable size
blocks are used. Each of these blocks is shifted horizontally
and compared to the corresponding blocks in the coded left
Therefore, the CONCOD structure is performing arbitrarily
close to the optimal solution given that is small. D =D +D
1 One candidate distortion measure is
SP X Y [13]. However,
a human observer can perceive a stereo pair if one of the images is low
In addition, conditional coders are backward compatible with resolution and the other one is high resolution [18]. The distortion measure
monocular image coding systems. In the next subsection, we D SP cannot explain this phenomenon.
AYDINOGLU AND HAYES: STEREO IMAGE CODING 509

image using some measure to determine the similarity between A generalized block matching algorithm that approximates
the two blocks. The most similar block, , is the disparity- the deformations of objects, by deforming the corresponding
compensated prediction, and the corresponding translation blocks in the picture, has been proposed to overcome this
is the disparity for the block . The most commonly used problem [12]. The generalized block matching algorithm is
similarity measures are the mean-square error (MSE) and the used in conjunction with the quadtree decomposition. Note
mean absolute difference (MAD) [22]. The mean absolute that perspective distortion is not severe for outdoor images
difference is generally preferred due to its simplicity. and can sometimes be properly handled if we only employ a
Given the right image that is to be encoded and a disparity quadtree decomposition.
vector field, there is a variety of coding strategies that may be The conventional disparity compensation algorithms using
used. For example, the residual of the disparity compensated rectangular blocks often give discontinuities between neigh-
prediction may be encoded and transmitted. This method is boring disparity compensated blocks in the predicted images.
referred to as disparity compensated residual coding (DC- A windowed disparity compensation (WDC) scheme has been
RC) [8], [23]. An alternative to encoding the residual was proposed to overcome this effect [24]. This approach is similar
proposed by Dinstein et al. [11]. In this system, each block is to windowed motion compensation [25].
either encoded using disparity compensated prediction (only
the disparity for the block is transmitted), or the block is
independently coded using adaptive DCT. Which approach is C. A Comparison of Stereo Pairs and Video Sequences
used for a given block is determined by the accuracy of the Disparity and motion compensation are closely related. In
disparity compensated prediction. fact, previously proposed disparity compensation techniques
Perkins proposed a disparity compensated transform domain are generally adapted from approaches that are used for motion
predictive coder (DCTDP) that estimates the transform of the compensation in video coding. Stereo imaging is generally
right image blocks, , from the transform of the considered to be an optional extension to a basic monocular
matching left image block, , using system. Therefore, we should allow only a small distortion in
where the array multiplication operation is performed by the quality of the mono image, i.e., only a small percentage of
multiplying the th element of the first matrix by th the overall bit rate should be allocated for the second image.
element of the second matrix. The matrices A and B are The problem of coding the second image of a stereo pair
chosen based on the statistics obtained from a training set given the first one appears to be similar to the problem of
so that each coefficient is predicted using a minimum variance video coding problem, i.e., to encode the P frame given the
predictor. After estimating the transform domain coefficients, I frame. However, important differences do exist between the
the transform domain error is quantized and sent through the two problems [26].
channel. The minimum variance predictor performs better than For example, for typical video sequences, where low bit rate
the simple predictor proposed in [8], [23] only for bit rates coding algorithms are employed, background objects do not
higher than 0.5 b/pixel [9]. generally move from one frame to the next, and only a small
As previously mentioned, disparity compensation can be percentage of the scene undergoes motion. The displacement
implemented using both fixed block size or variable block for the moving objects, which may be modeled as purely
sizes. Fixed block size schemes have the unique advantage of translational, usually does not exceed a few pixels [27]. In fact,
eliminating the overhead that is required to specify the block motion-compensated video coding algorithms exploit these
locations. However, if multiple objects or an occluded object properties for higher compression rates [28]. By contrast, every
exist in a block, these schemes cannot fully exploit the stereo object in a stereo pair is displaced, and the displacements
redundancy. One solution is to decrease the block size, which may be very large compared to video sequences [29]. As
will increase the overall bit rate for the disparity field. Another a consequence, the performance of disparity compensation
approach is to use a segmentation approach. However, not for stereo images is generally lower than the performance of
all segmentation techniques are suitable due to the excessive motion compensation for interframe video coding, especially
number of bits required to describe the shape and the location for very low bit rates [11], [26].
of each region. Quadtree decomposition provides a relatively Standard block matching algorithms, such as MSE or MAD,
economical and effective solution to this problem. The idea is assume the axiom of constant intensity along the displacement
to decrease the block size adaptively when the prediction fails. trajectory. For a video sequence, this axiom states that object
A regular quadtree approach, where a block can be divided points maintain the same luminance values from one frame to
only at the midpoint of its sides has been implemented in [12] the next. For a stereo pair, the axiom states that the recorded
and [24]. Also, a generalized quadtree can be considered where intensity values by the right and left cameras are identical
a block can be divided at locations ( is the number of for all object points. While this axiom is generally valid for
bits that can be allocated per side per node) [15]. The division video sequences, it rarely is true for stereo pairs. The light
takes place at the permitted location that lies closest to a sharp intensity that is reflected from objects in a scene and recorded
intensity discontinuity. In general, variable block size schemes by the camera depends on the position of the camera relative
outperform the fixed block size schemes. to the scene [30]. This is a primary source of photometric
Block matching cannot handle nonlinear deformations such variations. Other variations, however, may arise from noise
as perspective distortion. Perspective distortion occurs when and different photometric characteristics of the cameras. In
the object surface is not perpendicular to the camera axis. fact, the corresponding regions may have different intensity
510 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 4, APRIL 1998

values, and area-based disparity estimation may yield inac- The number of these polynomial vectors define the order of
curate correspondences. Histogram modification [11], global the approximation. Generally, either one (zero-order approxi-
calibration [31], and normalized similarity measures [32] have mation), three (first-order approximation), or six (second-order
been proposed to solve the false estimation problem in the approximation) fixed vectors are used. For example, with
presence of photometric variations. The first two techniques , the fixed vectors for a second-order approximation
are global approaches and cannot handle local variations, would be
whereas the third method is computationally intensive.
The third difference between a stereo pair and a video
sequence is the source of occlusion. In video sequences,
occlusion occurs due to moving objects, as in the case of
background covering. For stereo pairs, occlusion occurs when
some part of the scene can only be captured by one of the
cameras (Fig. 3) due to the finite viewing area, which is
referred to as the framing error, or due to depth discontinuity
[29]. In some applications, by suitable horizontal cropping of
the images, effects of occlusion due to finite viewing area can
be minimized.
The choice of these vectors is motivated by the smooth
IV. THE SUBSPACE PROJECTION TECHNIQUE variations that are characteristics of the image pixel intensities
Conventional stereo image coders employ disparity com- found in most natural images. The image intensity at a given
pensated prediction followed by the transform coding of the pixel is generally close to the average intensity in a small
residual to encode the second image of a stereo pair. Disparity neighborhood of that pixel. The fixed vector is used to
compensation removes the stereo redundancy between the represent the average intensity or dc component of each block.
images of a stereo pair, whereas the transform coder exploits Similarly, the coefficients of and provide a measure
the correlation between the samples of the residual signal. of the slope in image pixel intensities in the vertical and
Although transform coders, such as the DCT, are generally horizontal directions, respectively. If is the only vector
efficient in terms of packing a large fraction of the energy in the spanning set, then the fixed subspace approximation
of an image into relatively few transform coefficients, this is will be a zero-order approximation to . If we add
not always the case for residual coding. Therefore, in order and into the vector set , then it will be a first-order
to improve the coding gain, a new technique, the subspace approximation. If all six vectors are used in the spanning set,
projection technique (SPT), has been proposed for combining then a second-order approximation will be obtained. Higher
disparity compensation and residual coding into a single order approximations can be constructed in a similar way.
framework [26]. The idea is to apply a transform to each An advantage of polynomial fixed vectors is that even if the
block (or equivalently -dimensional vector in the Euclidean estimate obtained from disparity compensation fails, the fixed
space ) in the right image in such a way that it exploits subspace model will be a good approximation for a highly
the stereo and spatial redundancy, simultaneously. correlated grey-value image [33].
The spanning vectors are orthogonalized using the
A. The Algorithm GramSchmidt procedure to obtain a basis as
As with many data compression schemes, the motivation follows:
is to find an efficient representation for each vector . The (7)
transformation is chosen to be a reduced order operator that
projects the block onto a subspace of . This subspace (8)
is spanned by a set of vectors. The first vector in the
spanning set is a block-dependent vector and is equal to
, the estimate of the block that is obtained from the (9)
left image through disparity compensation. The remaining
vectors, are fixed and selected so where is the inner product. Note that the vector is
that they approximate the low-frequency components of the the component of that is orthogonal to the fixed subspace
block by forming a polynomial approximation to the vector spanned by . Since fixed vectors and only one varying
. The transformation that is applied to each image block vector are used in the spanning set, the orthogonalization
is motivated by the following two observations. First, the procedure is computationally efficient. The vectors
correlation between the low-frequency components in a pair are calculated once and then stored at both sides
of matching blocks is greater than the correlation between the of the channel. In addition, the vector for each block can
high-frequency components [10]. Second, if one of the images be calculated at the decoder without a transmission burden.
of the stereo pair is encoded with little distortion, thereby The subspace representation for the vector is given by
resulting in a high-resolution decoded image, then the low-
frequency components of the second image are perceptually (10)
more important than the high-frequency components [9].
AYDINOGLU AND HAYES: STEREO IMAGE CODING 511

where and s are the projection coefficients, i.e., the


coordinates of the vector with respect to the basis , and
they can be obtained by

(11)

(12)

There are two orthogonal components in (10), the spatial


domain term and the fixed subspace estimate Fig. 2. Approximation of the vector br by an element of the subspace
. The first component exploits the stereo similarity. The generated by the fixed vector set F and the block varying vector b0 .
coefficient controls the multiplicative gain in the spatial
domain. Moreover, it adapts to the local changes in the cross-
correlation statistics of the stereo data. On the other hand, the
fixed subspace estimate is a reduced-order linear transform
domain representation. In addition, the dc term adjusts to the
local changes in the mean statistics. In short, the SPT algorithm
presents a mixed spatial-transform domain representation and
can compensate for photometric variations in terms of first
and second-order statistics.
Note that the stereo redundancy can be further exploited in
the subspace representation by employing disparity compensa-
tion for the fixed subspace coefficients. Since , it can be
exactly represented in terms of the basis vectors as follows:

(13)
Fig. 3. Occlusion problem due to depth discontinuity and finite viewing area.

where
predefined threshold , then the coefficient is considered
(14) to be insignificant and is set to one, i.e., standard disparity
compensation is applied to these blocks.
Since is the disparity compensated prediction for , there The subspace projection technique does not specify which
exists a stereo redundancy between their projections on the type of disparity estimation algorithm should be used. One
fixed subspace and the respective coordinates, i.e., between the can obtain the disparity compensated prediction using
projection coefficients and . In fact, the prediction error, windowed DC (WDC), fixed block size DC, or variable
, given by can be calculated and transmitted in block size DC (VDC). In addition, in some applications, the
order to achieve further bit rate reduction. Note that, since the projection error may be encoded and transmitted.
left image and the disparity field are available at the decoder,
it is straightforward to obtain the coefficient at the decoder. B. Coefficient Quantization
There are several properties of the SPT algorithm that are This subsection describes the design of the quantizer. After
worth noting. First, exact reconstruction cannot be achieved the left image is encoded and transmitted, for each block
in general, since we are performing a projection operation in the right image, it is necessary to encode and transmit
onto a lower dimensional subspace. However, among all the the disparity vector , the residual coefficients and the
vectors , the estimate is the closest to the vector projection coefficient .
. Also, since we adaptively change the subspace , the The disparity vector field is coded using a lossless differen-
estimation error, , which is orthogonal to tial pulse code modulation followed by a modular warping
is, in general, very small. The orthogonality and the best scheme as proposed in the Motion Picture Expert Group
estimate properties are direct consequences of the properties of (MPEG) coding standard [34]. Two near-lossless schemes
the projection operator. In Fig. 2, the subspace approximation have also been considered for coding the disparity field [35],
generated by the fixed vector set and the block varying vector [36], but these approaches require further studies and are not
is presented. discussed in this paper.
There are several design issues that can modify the perfor- The coefficients, and are quantized. Previous empirical
mance of the proposed algorithm. For example, if the fixed work has shown that efficient quantization of the coefficients
subspace approximation is good enough then the norm of is the key to higher compression rates [29]. At the decoder,
the vector will be small, and the term will not we have
contribute any additional information to the estimate. In order
to improve the rate-distortion performance, the significance of (15)
the term is tested locally. The test criteria is the norm
of the block-varying estimate . If this norm is less than a where and are quantized coefficients.
512 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 4, APRIL 1998

Analysis of the quantization error is helpful at this moment. 3) Determine the class for which the maximum value of
As previously mentioned, SPT cannot achieve exact recon- is largest and assign to this class the quantizer
struction even without the quantization step. The modeling for which this maximum is obtained.
error, , or the distortion of the coding scheme, is 4) Calculate the new bit rate and distortion and check
if ; if so, then stop the algorithm. If not, go
(16)
back to step 2.
We can write as follows: The bit allocation algorithm is locally optimal [37]. That is,
given the points of the operational rate distortion curve one
(17)
can find the best bit allocation among the coefficients.
Standard transform coders employ zigzag scanning, uni-
where and is equal to . Since we have form quantization, and entropy-coding schemes for encoding
an orthogonal decomposition, we get the equality the transform coefficients. These schemes do not consider
the interblock dependencies between the coefficients of the
(18) neighboring blocks. However, simulation results indicate that
if a smaller transform block size is used, there exists a
If the projection block size is small enough, then the projection correlation between the adjacent pixels of the coefficient
error vector is very small. We can, then, construct an images. The correlation is higher for the dc (zero-order) coef-
approximate operational rate-distortion function , for the ficient compared to the first-order coefficients and it continues
right image, where is given by to decrease with increased approximation order. A subband
coder is employed to quantize the projection coefficients
so that the spatial correlation between the coefficients of
the neighboring blocks is exploited. Each coefficient image
(19) is decomposed into four subbands using 8-tap quadrature
Since the distortion is additive in each term, we can find a mirror filters (QMFs). A Laplacian distribution is assumed for
locally optimal bit allocation scheme using the operational the quantizer design and bit allocation among the subbands.
rate-distortion curves as proposed in [37]. The proposed al- Finally, a nonadaptive Huffman coding is used for entropy
location scheme is described below. coding. Lookup tables for operational rate-distortion curves
Each block can be represented in terms of different are prepared for each coefficient image using the described
coding scheme. These operational rate-distortion curves are
projection coefficients. We construct coefficient images for
used for the bit allocation scheme. The gain achieved with
each coefficient and different quantizers , ,
the proposed quantizer becomes significant at very low bit
, where are used for each
rates. In addition, it has been demonstrated that exploiting
coefficient image. The quantizers are ordered so that the
and alleviating interblock correlation of the transform domain
distortion is decreasing and the rate is increasing along
coefficients diminishes the visibility of the blocking artifacts
the quantizer sequence. [38].
The distortion for each component of the error term is given A second implementation of the SPT is to use a larger block
by size, in which case the residual is no longer negligible;
however, (18) is still valid. The distortion function is now
(20)
given by
(21)

We add the distortion components for each term to find


the actual distortion, for the image. Lets define the rate (23)
to be the actual bit rate for the th coefficient (summed
over all blocks) using the th quantizer. Our aim is to find
the optimal set to minimize subject to the constraint that Unfortunately, an optimal bit allocation algorithm between
. We follow an iterative four step algorithm to the projection coefficients and the residual does not exist.
achieve this objective. However, in practice, only a small percentage of the total bit
1) Determine the initial bit assignment using the quan- budget is spent on the projection coefficients, and an optimal
tizer for each coefficient. bit allocation scheme is not crucial. The encoding strategy
is straightforward. The projection coefficients are quantized
2) Calculate all possible values of the slope function
using near-lossless schemes. Uniform quantizers are used in
(22) the quantization process. The quantized signal is, then, entropy
coded. The projection error, , is encoded using standard
for each class . Here, is the current bit assign- coding schemes. This implementation is referred to as the
ment. For each class, find the quantizer for which the subspace projection technique followed by residual coding
slope function is maximum. (SPT-RC).
AYDINOGLU AND HAYES: STEREO IMAGE CODING 513

TABLE I
ENCODING AND DECODING ALGORITHMS FOR THE SPT

We summarize the encoding and decoding algorithm in


Table I. In the next section, the experimental results and the
effects of several design parameters on the performance of the
proposed scheme are presented.

V. EXPERIMENTAL RESULTS
This section presents a comparison of several stereo image
coding algorithms. In the experiments, we code the left image
independently (at a high quality) and the right image based on
the coded left image. The comparison criterion is the quality
of the coded right image when a fixed rate is allocated to
encode it. Previous stereo image coding literature used the
peak-signal-to-noise-ratio (PSNR) as a distortion measure of
the coded right image. Although PSNR cannot quantify the
perceptual 3-D quality of the coded images, we follow the
same approach since no other measure has been proposed. Fig. 4. Test pairs that are used for experimental results.
Three different stereo pairs are used for the experiments.
The first pair is obtained from the flower garden video se- is only slightly more than that of a single mono image. The
quence (frame numbers 0 and 2). These images are 352 240. following algorithms are compared.
The second is the lab pair (512 480). This pair is obtained by Fixed block size (8-by-8) disparity compensation (DC).
shifting a video camera and taking two pictures of a stationary Windowed (overlapped) DC (WDC).
scene from two horizontally shifted locations. Finally, the Variable block size (quadtree implementation) DC (VDC).
third one is the room pair (256 256), and obtained using Subspace projection technique (8 8 projection block
the same procedure as the lab pair. The test images are size) using the fixed block size (16 16) DC estimates
presented in Fig. 4. The flower garden is an outdoor scene. (SPT).
The images contain texture and small details. Occluded regions SPT using the variable block size estimates (V-SPT).
and photometric variations are not significant. The remaining SPT using the windowed DC estimates (W-SPT).
two pairs are indoor scenes. Both pairs contain occluded The bit rate is fixed (0.06 b/pixel for the lab and the room
and textured objects. In addition, photometric variations are stereo pairs and 0.05 b/pixel for the flower garden pair) and
significant for both pairs. All three pairs can be fused as stereo the PSNRs of the encoded images are compared.2 The results
by using free viewing techniques. are given in Table II. The following observations are obtained
First, the performance of the stereo image coding algorithms from the simulation results.
at very low bit rates, i.e., around 0.05 b/pixel, are compared.
All implementations of SPT are found to be superior to
To operate at this bit rate is very challenging. The bits spent
DC-based techniques in the rate-distortion sense.
on the disparity field constitute an important portion of the
WDC improves the PSNR by 0.32 dB on the average
total bit allocation for the second image. On the other hand,
over DC for the test data. Moreover, blocking artifacts
the prediction suffers from blocking artifacts and mismatches
are removed by the windowing operation.
(especially for occlusion regions). Coding the prediction error
is not possible due to the bit rate constraint. The advantage of 2 First, the bit rates for fixed block size DC are calculated. Then, the bit
this lower rate is that the transmission cost for the stereo pair rates for the other algorithms are adjusted to this rate.
514 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 4, APRIL 1998

TABLE II TABLE III


RATE-DISTORTION PERFORMANCE OF THE STEREO IMAGE CODING RATE-DISTORTION PERFORMANCE OF THE STEREO IMAGE CODING ALGORITHMS.
ALGORITHMS. THE PSNRS OF THE ENCODED IMAGES (IN dB) ARE THE PSNRS OF THE ENCODED IMAGES (IN dB) ARE PRESENTED AT A
PRESENTED AT A FIXED BIT RATE (0.06 B/PIXEL FOR THE LAB AND FIXED BIT RATE (0.013 B/PIXEL FOR THE LAB PAIR, 0.16 B/PIXEL FOR
ROOM PAIRS AND 0.05 B/PIXEL FOR THE FLOWER GARDEN PAIR) THE FLOWER GARDEN PAIR, AND 0.15 B/PIXELFOR THE ROOM PAIR)

employed as described in the previous section. For the SPT-


Variable block size DC is very efficient compared to the RC a 16 16 block size is used for both disparity estimation
fixed block size implementation. An average gain of 0.75 and subspace projection. The residual is coded using a subband
dB is achieved for the given pairs. Occlusion regions are coder. Finally, DC is implemented using 16 16 blocks for
better estimated by this method. estimation. The residual due to the disparity compensation is
If an improved disparity compensation scheme, such as coded with a subband coder. The rate-distortion curves are
WDC or VDC, is used to obtain the block-varying vector presented in Fig. 5. For low bit rates, the SPT is 3 dB better
, the rate-distortion performance of the SPT improves than the disparity compensation and 2 dB better than the SPT-
correspondingly. RC. Another interpretation of this result is that we can achieve
The quality of the images that are coded by the SPT can a bit rate reduction of 60% over DC. For bit rates higher than
be improved by either decreasing the projection block 0.15 b/pixel, the SPT-RC is preferable.
size or increasing the number of fixed vectors if a small The minimum distortion achieved by the SPT is bounded
increase in the bit rate is allowed. Similar gains are not from below, since it cannot perform exact reconstruction. The
possible for disparity compensation based methods. maximum achievable PSNR may be increased by decreasing
As a rule of thumb, the projection block size should the block size or increasing the number of the fixed vectors.
be smaller or equal to the disparity block size to avoid The rate-distortion performances of these different implemen-
blocking artifacts. tations (different block size or fixed vector size) are, however,
Second, the performance of the coding algorithms at low almost identical for low bit rates.
bit rates (around 0.15 b/pixel) are investigated. This problem The coded images are shown in Figs. 6 and 7. The first
is less challenging than the previous problem. However, at coded image (Fig. 6) is generated using SPT with coefficient
this bit rate, other possible coding schemes do exist, such quantization. We set for this experiment and use three
as disparity-compensated residual coding. In this experiment, fixed vectors and one block varying vector. An optimal bit
we compare the performance of the following algorithms: allocation scheme is employed to determine the quantization
disparity compensated (16 16 block size) residual coding levels. The bit rate is 0.05 b/pixel and the PSNR of the
(DC-RC), variable block size DC (VDC), windowed DC-RC coded image is 31.10 dB. The second coded image (Fig. 7)
(WDC-RC), subspace projection technique (16 16 projection is obtained using SPT-RC. The projection block size is 16-
block size) followed by residual coding (SPT-RC), SPT using by-16. The residual is coded using a subband coder. The bit
the windowed DC estimates (16 16 projection block size) rate for this reconstruction is 0.11 b/pixel and the PSNR is
followed by residual coding (W-SPT-RC). The results are pre- 32.85 dB.
sented in Table III. Again, the subspace projection technique Three observations are worth mentioning. First, the SPT
is found superior to disparity compensation based techniques. cannot preserve the texture in some cases. This is due to
It is also observed that it is easier to code the residual for the smoothing effect of the projection operation. Second, the
windowed techniques. Simulation results also indicate that it projection coefficients have different statistics for occluded and
is better practice to increase the bits spent on the residual nonoccluded regions. The ringing effect in Fig. 6 is due to this
than to increase the bits spent on the disparity field (the bit property. Particularly, the performance of the algorithm can be
rate increase can be due to decreasing the fixed block size or modified if the framing error is independently coded. Finally,
using a smaller threshold for splitting criteria in the quadtree a smaller block size implementation of the SPT is preferable
implementation). However, slightly increasing the bit rate for to a larger block size one in order to achieve better perceptual
the disparity field using the VDC method and then coding the quality.
residual field is a feasible solution.
Finally, the operational rate distortion curves for the DC, VI. CONCLUSION
SPT, and SPT-RC algorithms are compared for the lab stereo This paper summarizes our recent work on stereo image
pair. For SPT, a 4 4 projection block size is used, no residual coding. We propose a new coding technique based on subspace
is coded, the projection coefficients are quantized using a projection. The novelty of the approach is that the transforma-
subband coder, and a locally optimal bit allocation scheme is tion matrix of the projection operation adaptively changes to
AYDINOGLU AND HAYES: STEREO IMAGE CODING 515

Fig. 5. Comparison of the rate distortion performance of the stereo coding algorithms. Lab stereo pair is used for comparison. Three algorithms, DC-RC,
SPT-RC, and SPT, are compared.

Fig. 6. Coded right lab image at 0.05 b/pixel with a PSNR of 31.10 dB. Fig. 7. Coded right lab image at 0.11 b\pixel with a PSNR of 32.85 dB. The
2
SPT is implemented using a 4 4 projection block size, no residual is coded, coding technique is SPT-RC.
the projection coefficients are quantized using a subband coder, and a locally
optimal bit allocation scheme is employed.
the perceptual quality of the coded images and the effects of
exploit the inherent stereo redundancy and nonstationary cross- coding on depth map estimation. In addition, a methodology
correlation characteristics between the images of a stereo pair. should be developed to choose certain design parameters. Our
In addition, we used a combined transform-subband coding results suggest that smaller block sizes may be required to
scheme that is very efficient for coding transform domain co- improve the perceptual quality when the image sizes are small,
efficients. The subspace projection technique is appealing since texture information is important, and outside images or inside
its performance at very low bit rates was found to be superior images that contain small objects are coded. Another improve-
to the standard stereo coding algorithms. The proposed coder ment that we recently considered is near-lossless encoding of
is flexible, can operate over a broad range of the rate-distortion the disparity field. Our initial results showed that the overall
curve, and does not require training. In fact, it is compatible rate-distortion performance can be improved by using this
with the current standards such as MPEG or Joint Photogra- technique especially at very low bit rates. Two extensions
phers Expert Group (JPEG). Further work is required to test of the stereo image coding are multiview image and stereo
516 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 4, APRIL 1998

video coding. Our belief is that the proposed algorithm can [30] S. D. Cochran and G. Medioni, 3-D surface description from binocular
be successfully modified to address these problems if special stereo, IEEE Trans. Pattern Anal. Machine Intell., vol. 14, pp. 981994,
Oct. 1992.
characteristics of these sequences are considered. [31] K. Diamantaras, N. Grammalidis, and M. Strintzis, KLT-based coding
of I-frames in stereoscopic image sequences, J. Commun., vol. 45, pp.
REFERENCES 2023, May/June 1994.
[32] C. Torras, Ed., Computer Vision Theory and Applications. New York:
[1] M. Waldowski, A new segmentation algorithm for videophone appli- Springer-Verlag, 1992.
cations based on stereo image pairs, IEEE Trans. Commun., vol. 39, [33] P. Strobach, Quadtree-structured recursive plane decomposition coding
pp. 18561868, Dec. 1991. of images, IEEE Trans. Signal Processing, vol. 39, pp. 13801397,
[2] D. Kauffman and S. Wood, Digital elevation model extraction from June 1991.
stereo satellite images, in Proc. Int. Geoscience and Remote Sensing [34] R. Jonsson, Adaptive subband coding of video using probability
Symp., 1987, vol. 1, pp. 349352. distribution models, Ph.D. dissertation, Georgia Inst. Technol., Atlanta,
[3] J. Rodrguez and J. Aggarwal, Matching aerial images to 3-D ter- 1994.
rain maps, IEEE Trans. Pattern Anal. Machine Intell., vol. 12, pp. [35] H. Aydnoglu, F. Kossentini, and M. H. Hayes, A new framework for
11381150, Dec. 1990. multi-view image coding, in Proc. ICASSP, May 1995, pp. 21732176.
[4] K. Nishihara and T. Poggio, Stereo vision for robotics, in Proc. [36] H. Aydnoglu, F. Kossentini, Q. Jiang, and M. H. Hayes, Region-based
Robotics Research, First Int. Symp., Cambridge, MA, 1984, pp. 489505. stereo image coding, in Proc. ICIP, Oct. 1995, vol. II, pp. 5761.
[5] N. Ayache and F. Lustman, Trinocular stereo vision for robotics, IEEE [37] A. Gersho and R. Gray, Vector Quantization and Signal Compression.
Trans. Pattern Anal. Machine Intell., vol. 13, pp. 7385, Jan. 1991. Boston, MA: Kluwer, 1992.
[6] H. Antonisse, Active stereo vision routines using PRISM3, in Proc. [38] K. Lim, K. Chun, and J. Ra, Improvements on image transform coding
SPIEInt. Soc. Opt. Eng., 1993, vol. 1825, pp. 745756. by reducing interblock correlation, IEEE Trans. Image Processing, vol.
[7] W. Poelzleitner, Robust spacecraft motion estimation and elevation 4, pp. 11461150, Aug. 1995.
modeling using a moving binocular head, in Proc. SPIE- Int. Soc. Opt.
Eng., vol. 1829, pp. 4657, 1993.
[8] H. Yamaguchi, Y. Tatehira, K. Akiyama, and Y. Kobayashi, Stereo-
scopic images disparity for predictive coding, in Proc. ICASSP, 1989, Haluk Aydnoglu (S95) was born in Istanbul,
pp. 19761979. Turkey, on October 15, 1969. He received the B.S.
[9] M. G. Perkins, Data compression of stereopairs, IEEE Trans. Com- degree in electrical engineering and mathematics
mun., vol. 40, pp. 684696, Apr. 1992. (with honors) from Bogazici University, Istanbul,
[10] T. Ozkan and E. Salari, Coding of stereoscopic images, Image Video
in 1991, the M.S.E.E. degree from Georgia Tech
Process., vol. 1903, pp. 228235, 1993.
Lorraine, Metz, France, in 1992, and the Ph.D. de-
[11] I. Dinstein et al., On stereo image coding, in Proc. Int. Conf. Pattern
gree from Georgia Institute of Technology, Atlanta,
Recognition, 1988, pp. 357359.
[12] V. Seferidis and D. Papadimitriou, Improved disparity estimation in in 1997.
stereoscopic television, Electron. Lett., vol. 29, pp. 782783, Apr. 1993. From 1989 to 1990, he was a Lecturer in the
[13] M. Perkins, Data compression of stereopairs, Ph.D. dissertation, Department of Mathematics, Bogazici University.
Stanford University, Stanford, CA, 1988. From 1991 to 1992, he was a Rockwell International
[14] V. Grinberg, G. Podnar, and M. Siegel, Geometry of binocular imag- and French government scholar. He has also been a Research Assistant in the
ing, in Proc. IS&T/SPIE Symp. Electronic Imaging, Stereoscopic Dis- Center for Signal and Image Processing at Georgia Tech. He is currently a
plays and Applications, 1994, vol. 2177. Staff Scientist with Retek Information Systems, Atlanta, GA. His research
[15] S. Sethuraman, A. Jordan, and M. Siegel, A multiresolutional region interests include signal and system modeling, image and video coding, and
based hierarchical segmentation scheme for stereoscopic image com- applications of DSP to finance and economics.
pression, in Proc. Conf. Digital Video Compression: Algorithms and
Technologies, 1995, vol. 2419, 1995.
[16] D. Slepian and J. Wolf, Noiseless coding of correlated information
sources, IEEE Trans. Inform. Theory, vol. IT-19, pp. 471480, July Monson H. Hayes, III (S76M81SM84F92)
1973. received the B.S. degree in physics from the Uni-
[17] T. Cover and J. Thomas, Elements of information theory, Telecom- versity of California, Berkeley, in 1971, and the
munications. New York: Wiley, 1991. S.M., E.E., and Sc.D. degrees, all in electrical
[18] B. Julesz, Foundations of Cyclopean Perception. Chicago, IL: Univ. engineering, from the Massachusetts Institute of
Chicago Press, 1971. Technology (MIT), Cambridge, in 1978, 1978, and
[19] E. Salari and W. Whyte, Compression of stereoscopic image data, in 1981, respectively.
Data Compression Conf., 1991, p. 425. He worked as a Systems Engineer in IR systems
[20] M. E. Lukacs, Predictive coding of multi-viewpoint image sets, in technology at Aerojet Electrosystems from 1972
Proc. ICASSP, 1986, pp. 521524. to 1974. From 1975 to 1979, he was a Teaching
[21] S. Sethuraman, A. Jordan, and M. Siegel, Multiresolution based hi-
Assistant in the Department of Electrical Engineer-
erarchical disparity estimation for stereo image compression, in Proc.
ing, MIT, and from 1979 to 1981, he was a Research Assistant at MIT
Symp. Application of Subbands and Wavelets, 1994.
Lincoln Laboratories, Lexington, in the Multidimensional Digital Signal
[22] P. Burt, C. Yen, and X. Xu, Local correlation measures for motion
Processing Group. Since 1981, he has been with the Georgia Institute of
analysis: A comparative study, in Proc. PRIP Conf., 1982, pp. 269274.
[23] H. Yamaguchi, Y. Tatehira, K. Akiyama, and Y. Kobayashi, Data Technology, Atlanta, where he is currently a Professor of Electrical and
compression and depth shape reproduction of stereoscopic images, Syst. Computer Engineering. His research interests include signal modeling, image
Comput. Jpn., vol. 22, pp. 5363, 1991. and video coding, stereo and multiview image processing, and modeling of
[24] H. Aydnoglu and M. H. Hayes, Performance analysis of stereo image facial movements. He has contributed more than 100 articles to journals
coding algorithms, in Proc. ICASSP, 1996, vol. IV, pp. 21912195. and conference proceedings and has written chapters for two edited books:
[25] H. Watanabe and S. Singhal, Windowed motion compensation, in Advances in Computer Vision and Image Processing (Greenwich, CT: JAI,
Proc. Conf. Visual Communications and Image Processing, 1991, vol. 1984) and Image Recovery: Theory and Application (New York: Academic,
1605, pp. 582589. 1986), and is author of the textbook Statistical Digital Signal Processing and
[26] H. Aydnoglu and M. H. Hayes, Compression of multi-view images, Modeling (New York: Wiley, 1996).
in Proc. Int. Conf. Image Processing, Nov. 1994, vol. II, pp. 385389. Dr. Hayes has been an Associate Editor in Signal Processing for the IEEE
[27] A. Kopernik and D. Pele, Improved disparity estimation for the coding TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, Chairman of
of stereoscopic television, in Proc. SPIE Conf. Visual Communication the IEEE ASSP Society Technical Committee on Digital Signal Processing,
and Image Processing, 1992, vol. 1818, pp. 11551116. Secretary-Treasurer and Chairman of the ASSP Publications Board, Chairman
[28] A. Puri and H. Hang, Adaptive schemes for motion-compensated cod- of the DSP Technical Committee, a member of the ASSP Administrative
ing, in Proc. SPIE Conf. Visual Communication and Image Processing, Committee, and General Chairman of ICASSP96. He received the Senior
1988, vol. 1001, pp. 925934. Award from the IEEE Acoustics, Speech, and Signal Processing Society in
[29] H. Aydnoglu and M. H. Hayes, Stereo image coding, in Proc. ISCAS, 1983, the Presidential Young Investigator Award in 1984, and was elected to
Apr. 1995, vol. I, pp. 247250. the grade of IEEE Fellow in 1992.

You might also like