Digital Video Transcoding: Jun Xin, Chia-Wen Lin, Ming-Ting Sun

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Digital Video Transcoding

JUN XIN, MEMBER, IEEE, CHIA-WEN LIN, SENIOR MEMBER, IEEE, AND
MING-TING SUN, FELLOW, IEEE

Invited Paper

Video transcoding, due to its high practical values for a wide terminals for accessing the Internet. These network ter-
range of networked video applications, has become an active minals vary significantly in resources such as computing
research topic. In this paper, we outline the technical issues and power and display capability. To flexibly deliver multimedia
research results related to video transcoding. We also discuss tech-
niques for reducing the complexity, and techniques for improving data to users with different available resources, access net-
the video quality, by exploiting the information extracted from the works, and interests, the multimedia content may need to
input video bit stream. be adapted dynamically according to the usage environment
Keywords—Error resilience, motion estimation, rate control, [2]. Transcoding is one of the key technologies to fulfill
transcoding architecture, video transcoding. this challenging task. Transcoding is also useful for content
adaptation for peer-to-peer networking over shared multihop
communication links [3].
I. INTRODUCTION
There are many other transcoding applications besides
Video transcoding is the operation of converting a video the universal multimedia access. In statistical multiplexing
from one format into another format. A format is defined by [4], multiple variable-bit-rate video streams are multiplexed
such characteristics as the bit rate, frame rate, spatial resolu- together to achieve the statistical multiplexing gain. When
tion, coding syntax, and content, as shown in Fig. 1. the aggregated bit rate exceeds the channel bandwidth, a
One of the earliest applications of transcoding is to adapt transcoder can be used to adapt the bit rates of the video
the bit rate of a precompressed video stream to a channel streams to ensure that the aggregated bit rate always sat-
bandwidth. For example, a TV program may be originally isfies the channel bandwidth constraint. A transcoder can
compressed at a high bit rate for studio applications, but later also be used to insert new information including company
needs to be transmitted over a channel at a much lower bit logos, watermarks, as well as error-resilience features into
rate. a compressed video stream. Transcoding techniques are
In universal multimedia access [1], different terminals also shown useful for supporting VCR trick modes, i.e.,
may have different accesses to the Internet, including fast forward, reverse play, etc., for on-demand video ap-
local access network (LAN), digital subscriber line (DSL), plications [8]–[10]. In addition, object-based transcoding
cable, wireless networks, integrated services digital network techniques are discussed in [7] for adaptive video content
(ISDN), and dial-up. The different access networks have delivery. A general utility-based framework is introduced in
different channel characteristics such as bandwidths, bit [11] to formulate some transcoding and adaptation issues
error rates, and packet loss rates. At the users’ end, network as resource-constrained utility maximization problems. In
appliances including handheld computers, personal digital [12], a utility-function prediction is performed using auto-
assistants (PDAs), set-top boxes, and smart cellular phones matic feature extraction and regression for MPEG-4 video
are slated to replace personal computers as the dominant transcoding. Several rate-distortion models for transcoding
optimization are introduced in [13] to facilitate the selection
Manuscript received January 16, 2004; revised July 9, 2004. of transcoding methods under a rate constraint. Envisioning
J. Xin is with Mitsubishi Electric Research Laboratories, Cambridge, MA the need of transcoding, the emerging MPEG-7 standard [5],
02139 USA (e-mail: [email protected]).
C.-W. Lin is with the Department of Computer Science and Information which standardizes a framework for describing audiovisual
Engineering, National Chung Cheng University, Chiayi 612, Taiwan, R.O.C. contents, has defined “transcoding hints” to facilitate the
(e-mail: [email protected]). transcoding of compressed video contents [6], [7].
M.-T. Sun is with the Department of Electrical Engineering, University
of Washington, Seattle, WA 98195 USA (e-mail: [email protected]). Dynamic change of coding parameters such as bit rates,
Digital Object Identifier 10.1109/JPROC.2004.839620 frame rates, and spatial resolutions could also be achieved

0018-9219/$20.00 © 2005 IEEE

84 PROCEEDINGS OF THE IEEE, VOL. 93, NO. 1, JANUARY 2005


Fig. 1. Format conversion using a video transcoder.

Fig. 2. Block diagram of a standard video encoder.

to a limited extent by scalable coding [14], [15]. How- reduces the temporal redundancy. DCT reduces the spatial
ever, in the current scalable video coding standards, the redundancy and achieves energy compaction. Quantization
enhancement layers are generated by coding the prediction is performed to achieve higher compression ratio. Vari-
residuals between the original video and the base-layer able-length coding (VLC) is applied after the quantization
video. In many applications, the network bandwidth may to reduce the remaining redundancy. A decoder is embedded
fluctuate wildly with time. Therefore, it may be difficult to in the encoder to reconstruct video frames, which are stored
set the base-layer bit rate. If the base-layer bit rate is set in the frame memory for prediction of future frames.
low, the base-layer video quality will be relatively low and A straightforward realization of a transcoder to cascade
the overall video quality degradation may be severe, since a decoder and an encoder: the decoder decodes the com-
the prediction becomes less effective. On the other hand, pressed input video and the encoder reencodes the decoded
if the base-layer bit rate is set high, the base-layer video video into the target format. It is computationally very ex-
may not get through the network completely. In general, the pensive. Therefore, reducing the complexity of the straight-
achievable quality of scalable coding is significantly lower forward decoder-encoder implementation is a major driving
than that of nonscalable coding. In addition, scalable video force behind many research activities on transcoding.
coding demands additional complexities at both encoders What makes transcoding different from video encoding is
and decoders. The inherent weaknesses of scalable coding that the transcoding has access to many coding parameters
have kept it from being widely deployed in practical applica- and statistics that can be easily obtained from the input com-
tions. Nevertheless, scalable coding is still an active research pressed video stream. They may be used not only to sim-
area. A new coding standard is being developed to overcome plify the computation, but also to improve the video quality.
these drawbacks [16]. With these problems addressed, the The transcoding can be considered as a special two-pass en-
scalable coding schemes may be more suitable for streaming coding: the “first-pass” encoding produces the input com-
video applications when a large number of users require pressed video stream, and the “second-pass” encoding in the
different levels of format adaptations, since less computation transcoder can use the information obtained from the first-
is involved. pass to do a better encoding. Therefore, it is possible for the
In this paper, the input to the transcoder is a compressed transcoder to achieve better video quality than the straightfor-
video produced by a standard video encoder. Current major ward implementation, where the encoding is single pass. The
video coding standards—MPEG-1 [17], MPEG-2 [18], challenge of the research on transcoding is then how to intel-
MPEG-4 [19], H.263 [20], and the emerging H.264 [21] all ligently utilize the coding statistics and parameters extracted
use hybrid discrete cosine transform (DCT) and block-based from the input to achieve the best possible video quality and
motion compensation (MC) schemes. Note that H.264 uses the lowest possible computational complexity.
an integer transform that approximates DCT. A block dia- In this paper, we discuss the issues and research results re-
gram of the standard video encoders is shown in Fig. 2. MC lated to the transcoding of video streams compressed using

XIN et al.: DIGITAL VIDEO TRANSCODING 85


Fig. 3. Transcoding using requantization.

Fig. 4. CPDT architecture.

the hybrid MC/DCT schemes. An overview of transcoding The drift problem is explained as follows. A video
architectures and techniques has been given in [22], which picture is predicted from its reference pictures and only
presents many of the fundamentals in this area. This paper the prediction errors are coded. For the decoder to work
is intended to provide a more in-depth view of architectures properly, the reference pictures reconstructed and stored in
and techniques, and cover such topics as quality optimiza- the decoder predictor must be same as those in the encoder
tion, complexity reduction techniques, and related applica- predictor. The open-loop transcoders change the prediction
tions such as logo and watermark insertion. errors and, therefore, make the reference pictures in the de-
The remainder of this paper is organized as follows. coder predictor different from those in the encoder predictor.
In Section II, we first review the transcoding techniques The differences accumulate and cause the video quality to
used for the bit-rate reduction. In Section III, we discuss deteriorate with time until an intrapicture is reached. The
the transcoding techniques for spatial and temporal resolu- error accumulation caused by the encoder/decoder predictor
tion reductions. Section IV discusses the issues associated mismatch is called drift and it may cause severe degradation
with the standards conversion. Section V addresses the to the video quality [26], [30]. It should be noted that in the
transcoding quality optimization. Section VI discusses the following discussions, many transcoder architectures are not
transcoding for information insertion. Finally, Section VII strictly drift free. However, the degree of the video quality
concludes this paper. degradation caused by the drift varies with architectures.
In addition, the drift will be terminated by an intrapicture.
In the applications where the number of coded pictures
II. BIT-RATE TRANSCODING
between two consecutive intrapictures is small and the
Generally, there exist three transcoding architectures for quality degradation caused by the drift is acceptable, these
the bit-rate transcoding: open-loop transcoders [23]–[25], architectures, although not drift free, can still be quite useful
cascaded pixel-domain transcoders (CPDTs) [24], [26] and due to the potentially lower cost in terms of computation
DCT-domain transcoders (DDTs) [27]–[29]. The open-loop and required frame memory.
architectures include selective transmission [23], [24], where Fig. 4 illustrates the drift-free CPDT [24], a concatena-
the high-frequency DCT coefficients are discarded, and re- tion of a decoder and a simplified encoder. Rather than per-
quantization [24], [25], where the DCT coefficients are forming the full-scale motion estimation, as in a stand-alone
requantized. Fig. 3 shows a requantization transcoder. The video encoder, the encoder reuses the motion vectors along
open-loop transcoders are computationally efficient, since with other information extracted from the input video bit-
they operate directly on the DCT coefficients. However, they stream. Thus, the motion estimation, which usually accounts
suffer from the drift problem. for 60%–70% of the encoder computation [31], is omitted. To

86 PROCEEDINGS OF THE IEEE, VOL. 93, NO. 1, JANUARY 2005


Fig. 5. SDDT architecture.

In addition, SDDT can only be applied to bit-rate


transcoding, since it assumes that the frame memories in
the encoder and the decoder have the same spatial/temporal
resolution and the output video uses the same frame coding
types, motion vectors and coding modes as the input video.
In contrast, CPDT enjoys the flexibility to allow changes in
these coding parameters.
The cascaded DCT-domain transcoder (CDDT) [38],
shown in Fig. 7, can be used for spatial/temporal resolution
Fig. 6. DCT-MC. downscaling and other coding-parameter changes. How-
ever, compared to SDDT, its flexibility is achieved using
additional DCT-MC and frame memory, which results in
address another major source of computational complexity, a significantly higher cost in computation and storage. It
DCT, in [26], the end-of-block locations of the output video is thus often adopted for downscaling applications, where
are predicted using those of the input video, and only partial the encoder-side DCT-MC and memory will not cost much,
DCT is performed. since the encoder operates at a reduced resolution [39], [40].
The simplified DCT-domain transcoder (SDDT) is de- Table 1 compares the computational complexity of five
rived based on the assumption that DCT, IDCT, and MC are different MPEG-2 transcoders: CPDT, SDDT, CDDT,
linear operations. It was first derived in [27] and [28], and CPDT using full-scale full-search motion reestimation
then further simplified in [29], as shown in Fig. 5. SDDT (DEC-ENC1), and CPDT using three-step fast search
eliminates the DCT/IDCT and reduces the number of frame motion reestimation (DEC-ENC2). Two 300-frame CIF
buffers by half. The DCT-domain MC (DCT-MC) [32] is the (352 288) test sequences, “Foreman” and “Mobile & Cal-
major computation-intensive operation in SDDT. As shown endar,” coded with two group-of-pictures (GOP) structures,
in Fig. 6, the goal is to compute the DCT coefficients of the (15, 1) (i.e., IPPP ) and GOP (i.e., IBBP ), are
target DCT block from the coefficients of its four overlap- used in the simulations. The experiments are performed on a
ping DCT blocks, – . To speed up the DCT-MC, several Pentium-IV 1.8-GHz PC. The two DCT-domain transcoders
fast schemes have been proposed. In [33], the DCT-MC is are significantly faster than the pixel-domain transcoders
simplified through a matrix decomposition. A fast algorithm based on our implementations. It should be noted that the
utilizing shared information in a macroblock (MB) is pro- speed comparison might depend on the specific implemen-
posed in [34]. In [35], the fact that the energy of a DCT tations of the architectures. There exist different low-level
block is concentrated in its low frequency coefficients is optimization methods that can be used to reduce the com-
exploited to perform an efficient approximation of the MC putational complexities of CPDT, SDDT, and CDDT. For
operation. Another approach is to separate the MC into two example, the method in [90] can significantly speed up
one-dimensional (1-D) operations [36], which are further CPDT, while the methods in [29] and [33]–[35] can speed
simplified by a lookup table scheme [37]. up SDDT and CDDT.
With these fast algorithms, SDDT may require less com- Our experiments show that the pixel-domain
putation and memory than CPDT. However, the linearity transcoders (except DEC-ENC2) usually have better
assumptions on which the derivation is based are not strictly peak signal-to-noise ratio (PSNR) performance as
true, since there are clipping functions performed in the illustrated in Figs. 8 and 9. In Fig. 8, the test video is
video encoder/decoder, and rounding operations performed first encoded at QP Quantization Parameter and
in the interpolation for fractional-pixel MC. The failed as- GOP , and the bit rate is then reduced using the
sumptions may cause drift in the transcoded video. Detailed five transcoders by transcoding at QP ,
analyses of the causes and impacts of drift can be found in respectively. In Fig. 9, the incoming video is transcoded
[30]. at QP for five different GOP sizes GOP ,

XIN et al.: DIGITAL VIDEO TRANSCODING 87


Fig. 7. CDDT architecture.

Table 1 motion vectors. Note that the motion vectors formed by the
Runtime Complexity Comparison of Five Different Transcoders.
The Video Sequences Are Encoded at QP = 7, and Then
above algorithms need to be downscaled to the target spatial
Transcoded at QP = 15 resolution.
Recent works extend these strategies to tackle the
transcoding of arbitrary down-sampling ratio [45], [46] by
taking care of the unequal contributions of related input mo-
tion vectors. When the down-sampling ratio is large and one
target MB is down-sampled from a number of input MBs,
the motion vectors of the input MBs are more likely to be
inconsistent. A multicandidate approach is proposed in [47]
to address this issue. The transcoding of interlaced video
for , and , respectively). The is discussed in details in [46], where the motion mapping
number associated with each operation point indicates the is further complicated by various types of frame and field
bit rate generated. We can observe from Figs. 8 and 9 that motion vectors.
the drift caused by the two DCT-domain transcoders is not For transcoding with temporal resolution changes, due to
serious for small GOP sizes. However, the performance the frame dropping, one has to derive a new set of motion
degradation, especially for SDDT, can become rather vectors that do not exist in the input video. This issue is ad-
significant with large GOP sizes. Such large GOP sizes may dressed in [48], where a technique called forward dominant
be used in applications such as networked video streaming vector selection (FDVS) is proposed. The FDVS scheme
and wireless video that demand high coding efficiency. is illustrated in Fig. 11. The best-match area referenced
by the motion vector of the current MB overlaps with at
most four MBs in its reference frame. The motion vector
III. SPATIAL AND TEMPORAL TRANSCODING
of the MB with the largest overlapping portion is called the
The heterogeneity of communication networks and dominant motion vector and is selected for composing the
network access terminals often demand the conversion of target motion vector. This process is repeated for all the
compressed video not only in the bit rates, but also in the dropped frames and the final target motion vector is formed
spatial/temporal resolutions. One of the challenging tasks in by adding all the dominant motion vectors together, followed
spatial/temporal transcoding is how to efficiently reestimate by a motion vector refinement. In [49], the dominant motion
(or map) the target motion vectors from the input motion vector is selected based on the activity of the overlapping
vectors. MBs, instead of the area as in FDVS. Another method,
Many works on the motion reestimation for spatial telescopic vector composition (TVC) [31], accumulates all
transcoding consider the simple case of 2 : 1 downscaling. motion vectors of the current MB’s colocated MBs in the
Fig. 10 illustrates a case of the motion-mapping problem, dropped frames and adds the resulting motion vector to the
where the input MBs have four motion vectors while the current MB’s motion vector. For typical videos with small
target output MB has a single motion vector. Several strate- motion vectors, TVC can achieve similar performance as
gies have been proposed to compose the target motion vector FDVS. It is shown [31] that a 2-pixel refinement around
using the input motion vectors. One strategy is to randomly the composed motion vector can achieve similar perfor-
choose from the four input motion vectors [41], [42]. The mance as the full-scale full-search motion reestimation. For
weighted average taking into account the prediction error the spatial resolution reduction, typically a half-pixel refine-
is presented in [43]. Different methods are compared in ment is enough to achieve a good quality [31], [46]. For
[31] and [41], including median, majority, average, and the temporal resolution reduction, as the number of skipped
random selection. The median method is shown to achieve frames increases, more refinement may be desirable. The
the best performance. The work in [44] selects the motion refinement range may be dynamically decided based on the
vector using a likelihood score based on the statistical char- motion vector magnitudes and the number of skipped frames
acteristics of the MBs associated with the best matching [50]. In [48], the refinement range is determined based

88 PROCEEDINGS OF THE IEEE, VOL. 93, NO. 1, JANUARY 2005


Fig. 8. Performance comparison of average PSNR for five transcoders. The “Foreman” sequence is
encoded at QP = 7, and then transcoded at different QP’s, respectively. GOP = (15; 1).

Fig. 9. Performance comparison of average PSNR for CPDT, SDDT, and CDDT for different GOP
sizes. The “Mobile & Calendar” sequence is encoded at QP = 5 using size different GOP sizes, and
then transcoded at QP = 11.
proposed for the DCT-domain spatial downscaling, where
the speedup is achieved through exploiting the redundancies
in the DCT-MC computations of two adjacent checkpoints.
Due to the spatial/temporal resolution reduction, the drift
problem is usually significant in open-loop transcoding
architectures [52]. Therefore, the drift-free CPDT architec-
ture is more favorable in terms of quality. In [53] and [54],
the drift in spatial transcoding is analyzed. Based on the
analyses, several drift-compensation architectures providing
different levels of complexity-quality tradeoffs are proposed.
Fig. 10. In 2 : 1 spatial transcoding, the target motion vector(s) for In-depth analyses and performance comparisons of these
that MB is highly correlated with the four input motion vectors.
alternative architectures and CPDT are provided in [55].
A DCT-domain architecture is proposed in [56] for tem-
on the input/output quantization scales and the prediction poral transcoding, where the reencoding errors are reduced
errors. In [51], a fast motion vector refinement scheme is using the direct addition of DCT coefficients and signals

XIN et al.: DIGITAL VIDEO TRANSCODING 89


Fig. 11. FDVS. V and V are dominant motion vectors.

Fig. 12. Example of MPEG-2 to MPEG-4 simple profile transcoding.

from an error compensation feedback loop. In [57], a hybrid that are formed by using the motion information from cur-
DCT/pixel-domain transcoder architecture is proposed for rent and adjacent frames. The work in [58] and [59] intro-
video downscaling. It contains a DCT-domain decoder fol- duces an intermediate, virtual layer of video, which has the
lowed by a pixel-domain encoder, where a modified DCT-do- same frame rate and frame type as the target video and the
main inverse transformation and down-sampling method is same spatial resolution as the input video. The motion rees-
developed to convert a DCT block into a downscaled pixel timation process consists of two steps. In the first step, one
block. or more intermediate motion vectors are formed for each MB
in the intermediate video frame using the motion information
IV. STANDARDS TRANSCODING of the input video. In the second step, these motion vectors
of the intermediate-layer video are used to compose the mo-
In many applications, video coded in one coding standard tion vectors for the target video. This step also takes care of
(e.g., MPEG-2) may need to be converted to another standard the mismatch of the motion vector types caused by the inter-
(e.g., MPEG-4) besides the changes in bit rate and resolution. laced input and the progressive output. Effectively, the first
In what follows, we use two examples to illustrate how the step handles the frame-rate reduction and the frame-type con-
information obtained from the input video sequence may be version, and the second step deals with the spatial-resolution
used to help the standards transcoding process. reduction and the interlaced-to-progressive processing. This
two-step process has low complexity, since all operations are
A. MPEG-2 to MPEG-4 Simple Profile (SP) Transcoding performed on the motion vectors and, therefore, the compu-
MPEG-4 SP is aimed at low-complexity and low-bit-rate tationally expensive block matching is not needed.
video applications. Compared to MPEG-2 video, it does
not support B-frames and interlaced video. In addition, it B. MPEG-2 to MPEG-4 Advanced Simple Profile (ASP)
usually operates at lower spatial resolutions and frame rates Transcoding
than MPEG-2 video. Fig. 12 illustrates a typical scenario: Aiming at providing high quality video coding, MPEG-4
an interlaced MPEG-2 video of 720 480 resolution and at ASP incorporates several new coding tools. One of the tools
30 frames/s is transcoded to the progressive MPEG-4 SP of is global MC (GMC), which can improve the coding per-
176 144 resolution at 15 frames/s. It involves conversion formance for scenes with global motions [60]. No previous
of video formats and frame coding types besides the spatial video coding standards, including MPEG-2, support GMC.
and temporal resolution conversions. The new challenge is Therefore, in the transcoding of MPEG-2 to MPEG-4 ASP,
that the motion vectors of an incoming video frame may not global motion (GM) parameters may be estimated to take ad-
use the same reference frame as the target frame. vantage of this tool. The estimation is referred to as global
The motion reestimation problem in the case of frame type motion estimation (GME). Direct GME methods operate in
conversion is first discussed in [31], where the target mo- the pixel-domain [61], [62]. They are computationally ex-
tion vector is chosen from several candidate motion vectors pensive due to the iterative processes in the nonlinear esti-

90 PROCEEDINGS OF THE IEEE, VOL. 93, NO. 1, JANUARY 2005


mations and the number of pixels involved when the general A. Requantization
perspective model is used.
Quantization is the only operation in current video coding
A much more efficient algorithm is presented in [63] by
standards that introduces quality loss. Video coding stan-
performing the GME based on the input MB motion vec-
dards specify the representation levels of the quantization,
tors, instead of on estimating the pixel-wise motion vector
not the decision levels. Not considering other constraints, the
for each decoded pixel. The motion vectors in the input video
optimal quantizer simply maps an input value to its nearest
stream are obtained from the block matching motion estima-
representation level [71]. In [72], optimal requantization for
tion process. They contain GM information plus local mo-
transcoding MPEG-2 intraframes (I-frames) in the proba-
tion and noise due to the inaccurate block-matching process.
bilistic sense are proposed based on two principles: mini-
The local motion and the block-matching noise are modeled
mization of the MSE (MMSE) cost function and maximum
as a Gaussian distribution with zero mean. The GM param-
a posteriori (MAP). Both methods require the knowledge
eters are obtained by iteratively minimizing the fitting error
of the original quantization method and the original DCT
between the input motion vectors and the sampled motion
coefficients distribution, which may be carried in the input
vectors generated from the estimated motion model using the
video stream as the user data or may be estimated from the
Newton–Raphson method with outlier rejections. This com-
input video with minor additional complexity. Methods for
pressed-domain GME is shown to be fast (only requires less
estimating model parameters for this purpose are discussed
than 0.2% of the processing compared to the pixel-domain
in [73]. The MMSE and MAP requantization strategies are
GME implemented in the MPEG-4 reference software), ro-
shown to achieve improved performance compared to those
bust, and give accurate results, so the computationally expen-
designed for a video encoder. The requantization distortion
sive pixel-domain GME can be saved.
is especially significant for certain ratios between output and
input quantization scales [72], [74]. This leads to the selec-
C. Transcoding Between Other Standards tive requantization scheme [74] that simply avoids those crit-
ical input/output quantization-scale ratios in the decision of
The recently developed scalable coding standard MPEG-4
quantization scales. In [75], the quantization is optimized
fine granularity scalability (FGS) [15] has attracted interests
jointly with deblocking.
of transcoding between FGS and single-layer video. An effi-
cient architecture is derived in [64] for transcoding the FGS
video to a single-layer video. In [65], it is pointed out that at B. Rate Control
the same bit rate, the transcoding from FGS to single-layer Rate control determines quantization parameters and is
yields better video quality than simply truncating the FGS responsible for maintaining consistent video quality while
bit-stream. Various methods for transcoding a single-layer satisfying bandwidth, delay, and memory constraints. Gener-
MPEG stream to an FGS stream, both with the open-loop ally, all rate-control algorithms designed for video coding are
and closed-loop structures, are investigated in [66]. applicable to transcoding, for instance, MPEG-2 TM5 [76].
The syntax translation between different formats with In practice, however, functional limitations need to be con-
minimal quality loss is addressed in [67], where mapping sidered. For example, a rate-control algorithm usually needs
techniques of the syntactic and semantic elements of H.263 to know the GOP configuration. However, in transcoding, the
and MPEG-4 video are presented. Other format transcoding output GOP configuration is often determined by the input
techniques not covered by this section can be found in one, since transcoding typically does not change the frame
[68]–[70]. coding types in order to keep complexity low. In real-time
transcoding, the input GOP configuration is usually unknown
to the transcoder. For these applications, in order to use the
V. TRANSCODING QUALITY OPTIMIZATION GOP-based rate-control algorithms, the input GOP configu-
ration needs to be estimated. One solution is to predict the
As mentioned earlier, transcoding can be considered as current GOP configuration based on the configuration of the
the second pass of a two-pass video encoding process where immediately preceding GOP and adjust the rate-control algo-
the input video bit-stream is considered as the result of the rithm accordingly when new GOP parameters are detected
first pass. Many useful statistics, such as the quantization [77]. Another solution is to simply scale the bits of each
step sizes, coding modes, coded bits of each MB and frame, frame according to the rate conversion ratio [78]. In [79],
and motion vectors, can be easily obtained from the input in addition to scaling the input frame bit number, relatively
video bit-stream to help the second pass encoding. There- more bits are allocated to I-frames, since I-frames need a
fore, it is possible for the transcoder to achieve better video larger portion of the available bit budget to achieve a con-
quality than the direct one-pass encoding using the original sistent quality with other frames [46], [59], [79], especially
source. Although, in the transcoding, the video is encoded when the target bit rate is low.
twice, the degradation in the first encoding pass may be neg- For applications where the delay of a GOP time is allowed
ligible compared to the degradation in the second encoding such that the coding statistics of the current input GOP can
pass. In what follows, we discuss the technologies related to be collected before transcoding, improved rate control can
the video quality optimization: requantization, rate control, be achieved by making use of the coding statistics [46]. A
and mode decision. rate-control algorithm allocates bits to frames proportional to

XIN et al.: DIGITAL VIDEO TRANSCODING 91


Fig. 13. X
Performance of the bit-allocation using output or input complexity ( ).

their complexities. In video coding, however, the current and both MPEG-2, the bit rate is reduced from 10 to 4 Mb/s. The
future frame complexities are usually unknown prior to en- above techniques can be adapted to the joint transcoding of
coding. Rate-control algorithms designed for encoding, such multiple preencoded video streams (statistical multiplexing)
as MPEG-2 TM5, based on the stationary assumption, es- [4], [81], [82].
timate the complexity of the current frame using the com- MB-layer rate control adjusts the quantization parameters
plexity of the previous frame of the same type. It is well based on the encoder buffer feedback and is particularly de-
known that this estimation is poor when the stationary as- sirable for low-delay transcoding. Research works on this
sumption fails. In transcoding, intuitively it is possible to topic can be found in [59], [79], and [83]. In [7], a joint
compute the frame complexities from the input bit-stream rate-control scheme taking into account the various spatio-
(since the quantization step sizes and the number of bits in- temporal tradeoffs among all objects in a scene for MPEG-4
formation are available), and then use these complexities for object-based transcoding is proposed. Dynamic rate-control
the bit allocation in transcoding. The approach of scaling algorithms tailored for multipoint video conferencing are dis-
down the input frame bits is one special case of such algo- cussed in [84]–[86].
rithms where the number of bits is taken as the complexity
measure. It is found in [46] and [80] that complexity mea- C. Mode Decision
sures depend on the coding bit rate. Therefore, the com-
plexity measures calculated from the input video bit-stream There are various levels of mode decisions, including
at the input bit rate may not be suitable to directly serve as the MB-level, frame-level, and object-level. Rate-distortion
complexity measure for coding the frames at the output bit optimized mode decision techniques are explained in details
rate. Instead, the correlations between the complexity mea- in [87]. These techniques are also applicable to transcoding.
sures of the input and output videos are utilized to provide However, suboptimal but simple mode decision strategies
a more accurate estimation of the output frame complexity, are often desirable in complexity-constrained transcoding.
which leads to improved bit-allocation and video quality as In bit-rate transcoding, typically the modes of the input
shown in Fig. 13, where the transcoder input and output are video are reused by the transcoder [28].

92 PROCEEDINGS OF THE IEEE, VOL. 93, NO. 1, JANUARY 2005


Fig. 14. Logo insertion in: (a) the pixel domain and (b) the DCT domain.

In spatial transcoding, the MB-type (inter/intra) deci- where , and are the pixel values of the logo
sion usually follows the majority of the input MB-types signal, the decoded picture, and the logo-inserted picture
[31], [46]. Various heuristic strategies are discussed in [54] for frame , respectively. and are the scaling factors
to perform the decision for the open-loop architectures. for frame controlling the intensity of the logo in order to
MB prediction-mode decision techniques, including the provide uniform visibility [89], [90]. Efficient architectures
frame/field prediction for interlaced transcoding, are dis- performing this operation in the compressed-domain as
cussed in [46]. A good overview of MB-level mode decision illustrated in Fig. 14(b) are proposed in [88] and [90]. These
techniques can be found in [22]. architectures realize the same function as their pixel-domain
Dynamic frame skipping based on the accumulated mag- counterpart does.
nitude of motion vectors is proposed in [50] and [86]. In [7], Fig. 15 shows a logo and a sample picture after logo in-
strategies are proposed to drop less relevant objects if the sertion in the DCT domain [90]. The logo is inserted into an
scene has been coded as a set of objects. In [6], it is demon- MPEG-2 encoded bit-stream whose bit rate is reduced from
strated that transcoding hints of MPEG-7 are valuable in im- 8 to 4 Mb/s. The approach of [89] inserts the information in a
proving mode decisions at various levels. simple, open-looped manner, where only the area affected by
the inserted information needs to be modified. However, it is
VI. INFORMATION INSERTION TRANSCODING subject to drift. The insertion of new information may affect
the optimality of the existing coding parameters of the af-
In general, any operation that changes the content of a
fected picture area, including the motion vectors and coding
compressed video stream may be regarded as transcoding. In
modes as discussed in [91], where techniques are discussed
this section, we discuss two information insertion examples.
to modify these coding parameters.
A. Logo/Watermark Insertion
B. Error-Resilience Transcoding
For copyright protection, video watermarks and company
In practical applications where video contents are com-
logos can be inserted into the compressed video stream [88].
pressed and stored for future delivery, the encoding process
In the pixel-domain transcoders, the logo insertion can be
is typically performed without enough prior knowledge
implemented as illustrated in Fig. 14(a)
about the channel characteristics of network hops between
the encoder and the decoder. In addition, the heterogeneity

XIN et al.: DIGITAL VIDEO TRANSCODING 93


and so on. Forward error correction (FEC) and automatic
retransmission request (ARQ) are two major schemes for
channel protection.
An error-resilience MPEG-2 transcoding scheme based
on EREC is proposed in [96]. In this method, the incoming
bit-stream is reordered without adding redundancy such
that longer VLC blocks fill up the spaces left by shorter
blocks in a number of VLC blocks that form a fixed-length
EREC frame. Such fixed-length EREC frames of VLC
codes are then used as synchronization units, where only
one EREC frame, rather than all the codes between two
synchronization markers, will be dropped should any VLC
code in the EREC frame be corrupted due to transmission
errors. In [92], the authors propose a rate-distortion frame-
work with analytical models that characterize the error
propagation of a corrupted video bit-stream subjected to
bit errors. These models are then used to guide the use
of spatial and temporal localization tools: synchronization
Fig. 15. DCT-domain logo insertion ( = 0:2; = 0:8).
markers and intrarefresh, respectively, to compute the op-
(a) Logo. (b) Logo-inserted sample picture. timal bit-allocation among spatial error-resilience, temporal
error-resilience, and the source rate. The work in [93]
proposes an error-resilience transcoder for general packet
radio service (GPRS) mobile-access networks with the
transcoding process performed at a video proxy that can be
located at the edge of two or more networks. Two error-re-
silience tools: the AIR and RFS methods with feedback
control signaling (FCS) are used adaptively to reduce error
effects, while preserving the transmission rate adaptation
feature of the video transcoders. In [97], a rate-distortion
optimized GOP-based bit-allocation scheme is proposed
Fig. 16. System framework of error-resilience video transcoder. based on models accounting for the interframe dependence
in both video source requantization and error propagation
of client networks also makes the encoder difficult to adapt of motion compensated video. In [94], the authors propose
the video contents to a wide degree of different channel a multiple-description FEC (MD-FEC)-based transcoding
conditions, especially for wireless client terminals. To over- scheme, which uses the Reed–Solomon
come these problems, a video transcoder can be placed in erasure-correction block code to protect the th layer of an
a network node (e.g., mobile switch/base-station, proxy -layer scalable video. The multiple-description packeti-
server, and video gateway) connected to a high-loss network zation method is specially designed to allow the th layer
(e.g., wireless network or highly congested network) to to be decodable when or more descriptions arrive at the
insert error-resilience features into the video bit-stream to decoder. The scheme in [98] proposes to implement an
achieve robust video transmission over wireless channels. ARQ proxy at the base station of a wireless communication
Fig. 16 shows a typical example of error-resilience system for handling ARQ requests and tracking errors to
transcoder with feedback [92]–[94]. The transcoder first reduce retransmission delays as well as to enhance the error
extracts the video features (e.g., locations of video data resilience. The ARQ proxy resends important lost packets
which are likely to result in more serious error propagation (e.g., packets with header information and motion vectors)
if lost) from the incoming bit-stream as well as estimates the detected through the retransmission requests from wireless
client channel conditions according to the feedback channel client terminals, while dropping less important packets (e.g.,
statistics. The extracted features and the estimated channel packets carrying DCT coefficients) to meet the bandwidth
conditions are then used to determine the error resilience limit. A transcoder is used to compensate for the mismatch
policy that guides the joint allocation of source/channel error between the front-end video encoder and the client
coding resources. The features of video contents can also decoders caused by the dropped packets.
be precomputed in the front-end encoding process and sent
VII. CONCLUSION
to the transcoder as auxiliary data to assist the transcoding.
Commonly used error-resilience source coding tools [95] In this paper, we provide an overview of issues related to
include data partitioning, synchronization marker, reversible transcoding, including transcoding applications, transcoder
variable length codes (RVLC), error-resilience entropy architectures, techniques for reducing the computation, and
coding (EREC), multiple-description coding (MDC), refer- techniques for improving the video quality. Transcoding is
ence frame selection (RFS), adaptive intra refresh (AIR), an active research topic. The challenge is how to use the

94 PROCEEDINGS OF THE IEEE, VOL. 93, NO. 1, JANUARY 2005


information from the input video bit-stream to reduce the [17] “Information technology—Coding of moving pictures and associ-
complexity of a transcoder and improve the quality of the ated audio for digital storage media at up to about 1.5 Mbit/s—Part
2: Video (MPEG-1 video),” Int. Standards Org./Int. Electrotech.
output video. Video transcoding is also related to the re- Comm. (ISO/IEC), ISO/IEC 11 172-2, 1st ed., 1993.
search activities on compressed-domain video processing, [18] “Information technology—Generic coding of moving pictures and
associated audio information: Video (MPEG-2 video),” Int. Stan-
since the incoming video stream to a transcoder is in the dards Org./Int. Electrotech. Comm. (ISO/IEC), ISO/IEC 13 818-2,
compressed-domain. As new video coding standards con- 2nd ed., 2000.
tinue to be developed, the need for transcoding will continue [19] “Coding of audio-visual objects—Part 2: Visual (MPEG-4 video),”
Int. Standards Org./Int. Electrotech. Comm. (ISO/IEC), ISO/IEC
to exist. For example, currently, MPEG-2 video is widely 14 496-2:2001, 2nd ed., 2001.
used in digital TV, DVD, and HDTV applications. However, [20] “Video coding for low bit rate communication,” Int. Telecommun.
Union-Telecommun. (ITU-T), Geneva, Switzerland, Recommenda-
newer coding standards such as H.264/MPEG-4 AVC have tion H.263, 1998.
been developed which perform much better than MPEG-2. A [21] “Draft text of final draft international standard for advanced video
coding,” Int. Telecommun. Union-Telecommun. (ITU-T), Geneva,
transcoder will be useful for solving format incompatibility Switzerland, Recommendation H.264 (draft), Mar. 2003.
problems for universal multimedia access. [22] A. Vetro, C. Christopulos, and H. Sun, “Video transcoding architec-
tures and techniques: An overview,” IEEE Signal Process. Mag., vol.
20, no. 2, pp. 18–29, Mar. 2003.
ACKNOWLEDGMENT [23] A. Eleftheriadis and D. Anastassiou, “Constrained and general dy-
namic rate shaping of compressed digital video,” in Proc. IEEE Int.
Conf. Image Processing, vol. 3, 1995, pp. 396–399.
The authors would like to thank Dr. A. Vetro and [24] H. Sun, W. Kwok, and J. W. Zdepski, “Architectures for MPEG
Prof. S.-F. Chang for their valuable suggestions to improve compressed bitstream scaling,” IEEE Trans. Circuits Syst. Video
this manuscript. Technol., vol. 6, no. 2, pp. 191–199, Apr. 1996.
[25] Y. Nakajima, H. Hori, and T. Kanoh, “Rate conversion of MPEG
coded video by re-quantization process,” in Proc. IEEE Int. Conf.
Image Processing, vol. 3, 1995, pp. 408–411.
REFERENCES [26] J. Youn, M.-T. Sun, and J. Xin, “Video transcoder architectures for
bit rate scaling of H.263 bit streams,” in Proc. ACM Multimedia,
[1] R. Mohan, J. R. Smith, and C.-S. Li, “Adapting multimedia Internet Nov. 1999, pp. 243–250.
content for universal access,” IEEE Trans. Multimedia, vol. 1, no. 1, [27] D. G. Morrison, M. E. Nilson, and M. Ghanbari, “Reduction of the
pp. 104–114, Mar. 1999. bit-rate of compressed video while in its coded form,” in Proc. 6th
[2] S.-F. Chang and A. Vetro, “Video adaptation: Concepts, technolo- Int. Workshop Packet Video, 1994, pp. D17.1–D17.4.
gies, and open issues,” Proc. IEEE, vol. 93, no. 1, pp. 148–158, Jan. [28] G. Keesman, R. Hellinghuizen, F. Hoeksema, and G. Heideman,
2005. “Transcoding of MPEG bitstreams,” Signal Process. Image
[3] Y. Wu, A. Vetro, H. Sun, and S. Y. Kung, “Intelligent multi-hop video Commun., vol. 8, no. 6, pp. 481–500, Sep. 1996.
communications,” presented at the IEEE Pacific-Rim Conf. Multi- [29] P. A. A. Assuncao and M. Ghanbari, “A frequency-domain video
media, Beijing, China, 2001. transcoder for dynamic bitrate reduction of MPEG-2 bit streams,”
[4] J. Xin, M. T. Sun, and K.-S. Kan, “Bit allocation for joint transcoding IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 8, pp. 953–967,
of multiple MPEG coded video streams,” in Proc. IEEE Int. Conf. Dec. 1998.
Multimedia and Expo, 2001, pp. 8–11. [30] J. Youn and M.-T. Sun, “Video transcoding with H.263 bit streams,”
[5] “Information technology—Multimedia content description inter- J. Visual Commun. Image Represent., vol. 11, pp. 385–404, Dec.
face—Part 5: Multimedia description schemes (MPEG-7 MDS),” 2000.
Int. Standards Org./Int. Electrotech. Comm. (ISO/IEC), ISO/IEC [31] T. Shanableh and M. Ghanbari, “Heterogeneous video transcoding to
lower spatial-temporal resolutions and different encoding formats,”
CD 15 938-5, 1st ed., 2001.
IEEE Trans. Multimedia, vol. 2, no. 2, pp. 101–110, Jun. 2000.
[6] P. M. Kuhn, T. Suzuki, and A. Vetro, “MPEG-7 transcoding hints for [32] S.-F. Chang and D. G. Messerschmitt, “Manipulation and com-
reduced complexity and improved quality,” presented at the Interna- positing of MC-DCT compressed video,” IEEE J. Sel. Areas
tional Packet Video Workshop 2001, Kyongju, Korea. Commun., vol. 13, no. 1, pp. 1–11, Jan. 1995.
[7] A. Vetro, H. Sun, and Y. Wang, “Object-based transcoding for [33] N. Merhav and V. Bhaskaran, “Fast algorithms for DCT-domain
adaptable video content delivery,” IEEE Trans. Circuits Syst. Video image downsampling and for inverse motion compensation,” IEEE
Technol., vol. 11, no. 3, pp. 387–401, Mar. 2001. Trans. Circuits Syst. Video Technol., vol. 7, no. 3, pp. 468–476, Jun.
[8] S. Wee, “Reversing motion vector fields,” in Proc. IEEE Int. Conf. 1997.
Image Processing, vol. 2, 1998, pp. 209–212. [34] J. Song and B.-L. Yeo, “A fast algorithm for DCT-domain inverse
[9] S. Wee and B. Vasudev, “Compressed-domain reverse play of MPEG motion compensation based on shared information in a mac-
video streams,” in Proc. SPIE Int. Symp. Voice, Video, and Data roblock,” IEEE Trans. Circuits Syst. Video Technol., vol. 10, no. 5,
Communications, 1998, pp. 237–248. pp. 767–775, Aug. 2000.
[10] Y.-P. Tan, Y.-Q. Liang, and J. Yu, “Video transcoding for fast for- [35] C.-W. Lin and Y.-R. Lee, “Fast algorithms for DCT-domain video
ward/reverse video playback,” in Proc. IEEE Int. Conf. Image Pro- transcoding,” in Proc. IEEE Int. Conf. Image Processing, vol. 1,
cessing, vol. 1, 2002, pp. 713–716. 2001, pp. 421–424.
[11] S.-F. Chang, “Optimal video adaptation and skimming using a [36] S. Acharya and B. Smith, “Compressed domain transcoding of
utility-based framework,” presented at the Tyrrhenian Int. Work- MPEG,” in Proc. IEEE Int. Conf. Multimedia Computing and
shop Digital Communications, Capri Island, Italy, 2002. Systems, 1998, pp. 295–304.
[37] S. Liu and A. C. Bovik, “Local bandwidth constrained fast inverse
[12] J.-G. Kim, Y. Wang, and S.-F. Chang, “Content-based utility function
motion compensation for DCT-domain video transcoding,” IEEE
prediction for real-time MPEG-4 video transcoding,” in Proc. IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 5, pp. 309–319, May
Int. Conf. Image Processing, 2003, pp. 189–192. 2000.
[13] P. Yin, A. Vetro, and B. Liu, “Rate-distortion models for video [38] W. Zhu, K. Yang, and M. Beacken, “CIF-to-QCIF video bitstream
transcoding,” in Proc. SPIE Conf. Image Video Communications down-conversion in the DCT domain,” Bell Labs. Tech. J., vol. 3,
Processing, 2003, pp. 479–488. no. 3, pp. 21–29, Jul.–Sep. 1998.
[14] M. Ghanbari, “Two-layer coding of video signals for VBR net- [39] R. Dugad and N. Ahuja, “A fast scheme for image size change in the
works,” IEEE J. Sel. Areas Commun., vol. 7, no. 5, pp. 771–781, compressed domain,” IEEE Trans. Circuit Syst. Video Technol., vol.
Jun. 1989. 11, no. 4, pp. 461–474, Apr. 2001.
[15] W. Li, “Overview of fine granularity scalability in MPEG-4 video [40] Y.-R. Lee, C.-W. Lin, and Y.-W. Chen, “Computation reduction in
standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 3, cascaded DCT-domain video downscaling transcoding,” in Proc.
pp. 301–317, Mar. 2001. IEEE Int. Symp. Circuits and Systems, 2003, pp. 860–863.
[16] “Applications and requirements for scalable video coding,” Int. [41] N. Bjork and C. Christopoulos, “Transcoder architecture for video
Standards Org./Int. Electrotech. Comm. (ISO/IEC), ISO/IEC/JTC1/ coding,” IEEE Trans. Consum. Electron., vol. 44, no. 1, pp. 88–98,
SC29/WG11/N5540, Mar. 2003. Feb. 1998.

XIN et al.: DIGITAL VIDEO TRANSCODING 95


[42] S. Wee, J. G. Apostolopoulos, and N. Feamster, “Field-to-frame [67] S. Dogan, A. H. Sadka, and A. M. Kondoz, “Efficient MPEG-
transcoding with spatial and temporal downsampling,” in Proc. 4/H.263 video transcoder for interoperability between heteroge-
IEEE Int. Conf. Image Processing, vol. 4, 1999, pp. 271–275. neous multimedia networks,” Electron. Lett., vol. 35, no. 11, pp.
[43] B. Shen, I. K. Ishwar, and V. Bhaskaran, “Adaptive motion-vector 863–864, May 1999.
re-sampling for compressed video downscaling,” IEEE Trans. Cir- [68] H. Kato, H. Yanagihara, Y. Nakajima, and Y. Hatori, “A fast motion
cuits Syst. Video Technol., vol. 9, no. 6, pp. 929–936, Sep. 1999. estimation algorithm for DV to MPEG-2 conversion,” in Proc. IEEE
[44] S.-H. Jang and N. S. Jayant, “An adaptive nonlinear motion vector Int. Conf. Consumer Electronics, 2002, pp. 140–141.
resampling algorithm for down-scaling video transcoding,” in Proc. [69] N. Feamster and S. Wee, “An MPEG-2 to H.263 transcoder,” pre-
IEEE Conf. Multimedia and Expo, vol. 2, 2003, pp. 229–232. sented at the SPIE Int. Symp. Voice, Video, and Data Communica-
[45] G. Shen, B. Zeng, Y.-Q. Zhang, and M.-L. Liou, “Transcoder with tion Conf., Boston, MA, 1999.
arbitrarily resizing capability,” in Proc. IEEE Int. Symp. Circuits and [70] J.-L. Wu, S.-J. Huang, Y.-M. Huang, C.-T. Hsu, and J. Shiu, “An
Systems, vol. 5, 2001, pp. 25–28. efficient JPEG to MPEG-1 transcoding algorithm,” IEEE Trans.
[46] J. Xin, M.-T. Sun, B. S. Choi, and K. W. Chun, “An HDTV to SDTV Consum. Electron., vol. 42, no. 3, pp. 447–457, Aug. 1996.
spatial transcoder,” IEEE Trans. Circuits Syst. Video Technol., vol. [71] A. Gersho and R. Gray, Vector Quantization and Signal Compres-
12, no. 11, pp. 998–1008, Nov. 2002. sion. Norwell, MA: Kluwer, 1991.
[47] J. Xin, M.-T. Sun, and T.-D. Wu, “Motion vector composition for [72] O. Werner, “Requantization for transcoding of MPEG-2 in-
MPEG-2 to MPEG-4 video transcoding,” in Proc. Workshop and Ex- traframes,” IEEE Trans. Image Process., vol. 8, no. 2, pp. 179–191,
hibition MPEG-4, 2002, pp. 9–12. Feb. 1999.
[48] J. Youn, M.-T. Sun, and C.-W. Lin, “Motion vector refinement for [73] Z. Guo, O. C. Au, and K. B. Letaief, “Parameter estimation for
high-performance transcoding,” IEEE Trans. Multimedia, vol. 1, no. image/video transcoding,” in Proc. IEEE Int. Symp. Circuits and Sys-
1, pp. 30–40, Mar. 1999. tems, vol. 2, 2000, pp. 269–272.
[49] M.-J. Chen, M.-C. Chu, and C.-W. Pan, “Efficient motion estima- [74] H. Sorial, W. Lynch, and A. Vincent, “Selective requantization for
tion algorithm for reduced frame-rate video transcoder,” IEEE Trans. transcoding of MPEG compressed video,” in Proc. IEEE Int. Conf.
Circuits Syst. Video Technol., vol. 12, no. 4, pp. 269–275, Apr. 2002. Multimedia and Expo., 2000, pp. 217–220.
[50] J.-N. Hwang, T.-D. Wu, and C.-W. Lin, “Dynamic frame-skipping [75] B. Shen, “Efficient deblocking and optimal quantizer selection for
in video transcoding,” in Proc. IEEE Workshop Multimedia Signal video transcoding,” in Proc. IEEE Int. Conf. Image Processing,
Processing, 1998, pp. 616–621. 2003, pp. 193–196.
[51] K.-D. Seo and J.-K. Kim, “Motion vector refinement for video down- [76] “Test Model 5,” Int. Standards Org./Int. Electrotech. Comm.
sampling in the DCT domain,” IEEE Signal Process. Lett., vol. 9, no. (ISO/IEC), ISO/IEC JTC1/SC29/WG11, N0400, Apr. 1993.
11, pp. 356–359, Nov. 2002. [77] L. Wang, A. Luthra, and B. Eifrig, “Rate control for MPEG
[52] P. Yin, M. Wu, and B. Liu, “Video transcoding by reducing spatial transcoding,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, no.
resolution,” in Proc. IEEE Int. Conf. Image Processing, vol. 1, Sep. 2, pp. 222–234, Feb. 2001.
2000, pp. 972–975. [78] P. N. Tudor and O. H. Werner, “Real-time transcoding of MPEG-2
[53] P. Yin, A. Vetro, H. Sun, and B. Liu, “Drift compensation archi- video bit streams,” in Proc. IEE Int. Broadcasting Conv., 1997, pp.
tectures and techniques for reduced resolution transcoding,” Proc. 286–301.
SPIE, Visual Commun. Image Process., vol. 4671, pp. 180–191, Jan. [79] P. Assunção and M. Ghanbari, “Buffer analysis and control in CBR
2002. video transcoding,” IEEE Trans. Circuits Syst. Video Technol., vol.
[54] P. Yin, A. Vetro, B. Liu, and H. Sun, “Drift compensation for 10, no. 1, pp. 83–92, Feb. 2000.
reduced spatial resolution transcoding,” IEEE Trans. Circuits Syst. [80] J. Xin, M.-T. Sun, and K. W. Chun, “Bit allocation for transcoding
Video Technol., vol. 12, no. 11, pp. 1009–1020, Nov. 2002. of pre-encoded video streams,” Proc. SPIE: Visual Commun. Image
[55] A. Vetro, T. Hata, N. Kuwahara, H. Kalva, and S. Sekiguchi, “Com- Process., vol. 4671, pp. 164–171, Jan. 2002.
[81] I. Koo, P. Nasiopoulos, and R. Ward, “Joint MPEG-2 coding for
plexity-quality analysis of transcoding architectures for reduced spa-
multi-program broadcasting of pre-recorded video,” in Proc. IEEE
tial resolution,” IEEE Trans. Consum. Electron., vol. 48, no. 3, pp.
Int. Conf. Acoustics, Speech, and Signal Processing, vol. 4, 1999,
515–521, Aug. 2002.
pp. 2227–2230.
[56] K.-T. Fung, Y.-L. Chan, and W.-C. Siu, “New architecture for dy-
[82] H. Sorial, W. E. Lynch, and A. Vincent, “Joint transcoding of mul-
namic frame-skipping transcoder,” IEEE Trans. Image Process., vol.
tiple MPEG video bitstreams,” in Proc. IEEE Int. Symp. Circuits Sys-
11, no. 8, pp. 886–900, Aug. 2002.
tems, 1999, pp. 251–254.
[57] T. Shanableh and M. Ghanbari, “Hybrid DCT/pixel domain archi-
[83] H. Kasai, T. Hanamura, W. Kamayama, and H. Tominaga, “Rate con-
tecture for heterogeneous video transcoding,” Signal Process. Image
trol scheme for low-delay MPEG-2 video transcoder,” in Proc. IEEE
Commun., vol. 18, no. 8, pp. 601–620, Sep. 2003.
Int. Conf. Image Processing, 2000, pp. 964–967.
[58] J. Xin, M.-T. Sun, and K. Chun, “Motion re-estimation for MPEG-2 [84] M.-T. Sun, T.-D. Wu, and J.-N. Hwang, “Dynamic bit allocation in
to MPEG-4 simple profile transcoding,” presented at the Int. Packet video combining for multipoint video conferencing,” IEEE Trans.
Video Workshop, Pittsburgh, PA, 2002. Circuits Syst. II, Exp. Briefs, vol. 45, no. 5, pp. 644–648, May 1998.
[59] J. Xin, “Improved standard-conforming video transcoding tech- [85] T.-D. Wu and J.-N. Hwang, “Dynamic bit rate conversion in multi-
niques,” Ph.D. dissertation, Univ. Washington, Seattle, 2002. point video transcoding,” in Proc. IEEE Int. Conf. Image Processing,
[60] H. Jozawa, K. Kamikura, A. Sagata, H. Kotera, and H. Watanabe, 1999, pp. 817–821.
“Two-stage motion compensation using adaptive global MC and [86] C.-W. Lin, Y.-C. Chen, and M-.T. Sun, “Dynamic region of interest
local affine MC,” IEEE Trans. Circuit Syst. Video Technol., vol. 7, transcoding for multipoint video conferencing,” IEEE Trans. Cir-
no. 1, pp. 75–85, Feb. 1997. cuits Syst. Video Technol., vol. 13, no. 10, pp. 982–992, Oct. 2003.
[61] “MPEG-4 video verification model version 18.0,” Int. Stan- [87] G. Sullivan and T. Wiegand, “Rate-distortion optimization for video
dards Org./Int. Electrotech. Comm. (ISO/IEC), ISO/IEC compression,” IEEE Signal Process. Mag., vol. 15, no. 6, pp. 74–90,
JTC1/SC29/WG11, Jan. 2001. Nov. 1998.
[62] F. Dufaux and J. Konrad, “Efficient, robust and fast global motion [88] F. Hartung and B. Girod, “Watermarking of uncompressed and com-
estimation for video coding,” IEEE Trans. Image Process., vol. 9, pressed video,” Signal Process., vol. 66, no. 3, pp. 283–301, May
no. 3, pp. 497–501, Mar. 2003. 1998.
[63] Y. Su, M.-T. Sun, and V. Hsu, “Global motion estimation from [89] J. Meng and S. F. Chang, “Embedding visible video watermarks in
coarsely sampled motion vector field and the applications,” in Proc. the compressed domain,” in Proc. IEEE Int. Conf. Image Processing,
IEEE Int. Symp. Circuits and Systems, vol. 2, 2003, pp. 628–631. vol. 1, 1998, pp. 474–477.
[64] Y.-C. Lin, C.-N. Wang, T. Chiang, A. Vetro, and H. Sun, “Efficient [90] J. Youn, J. Xin, and M.-T. Sun, “Fast video transcoding architectures
FGS-to-single layer transcoding,” in Proc. IEEE Int. Conf. Con- for networked multimedia applications,” in Proc. IEEE Int. Symp.
sumer Electronics, 2002, pp. 134–135. Circuits and Systems, vol. 4, 2000, pp. 25–28.
[65] Y.-P. Tan and Y.-Q. Liang, “Methods and need for transcoding [91] K. Panusopone, X. Chen, and F. Ling, “Logo insertion in MPEG
MPEG-4 fine granularity scalability video,” in Proc. IEEE Int. transcoder,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal
Symp. Circuits and Systems, vol. 4, 2002, pp. 719–722. Processing, vol. 2, 2001, pp. 981–984.
[66] E. Barrau, “MPEG video transcoding to a fine-granular scalable [92] G. de los Reyes, A. R. Reibman, S.-F. Chang, and J. C.-I. Chuang,
format,” in Proc. IEEE Int. Conf. Image Processing, vol. 1, 2002, “Error-resilient transcoding for video over wireless channels,” IEEE
pp. 717–720. J. Sel. Areas Commun., vol. 18, no. 6, pp. 1063–1074, Jun. 2000.

96 PROCEEDINGS OF THE IEEE, VOL. 93, NO. 1, JANUARY 2005


[93] S. Dogan, A. Cellatoglu, M. Uyguroglu, A. H. Sadka, and A. M. Chia-Wen Lin (Senior Member, IEEE) received
Kondoz, “Error-resilient video transcoding for robust internetwork the M.S. and Ph.D. degrees in electrical engi-
communications using GPRS,” IEEE Trans. Circuits Syst. Video neering from National Tsing Hua University,
Technol., vol. 12, no. 6, pp. 453–564, Jun. 2002. Hsinchu, Taiwan, R.O.C., in 1992 and 2000,
[94] R. Puri, K.-W. Lee, K. Ramchandran, and V. Bhargavan, “An inte- respectively.
grated source transcoding and congestion control paradigm for video He was a Section Manager of the Cus-
streaming in the internet,” IEEE Trans. Multimedia, vol. 3, no. 1, pp. tomer Premise Equipment (CPE) and Access
18–32, Mar. 2001. Technologies Department, Computer and Com-
[95] Y. Wang and Q.-F. Zhu, “Error control and concealment for video munications Research Laboratories (CCL),
communication: A review,” Proc. IEEE, vol. 86, no. 5, pp. 974–997, Industrial Technology Research Institute (ITRI),
May 1998. Taiwan, R.O.C. From April 2000 to August
[96] R. Swann and N. Kingsbury, “Transcoding of MPEG-II for enhanced 2000, he was a Visiting Research Scholar with Information Processing
resilience to transmission errors,” in Proc. IEEE Int. Conf. Image Laboratory, Department of Electrical Engineering, University of Wash-
Processing, vol. 2, 1996, pp. 813–816. ington, Seattle. In August 2000, he joined the Department of Computer
[97] M. Xia, A. Vetro, B. Liu, and H. Sun, “Rate-distortion optimized bit Science and Information Engineering, National Chung Cheng University,
allocation for error resilient video transcoding,” in Proc. IEEE Int. Chiayi, Taiwan, R.O.C., where he is currently an Assistant Professor. From
Symp. Circuits and Systems, vol. 3, 2004, pp. III-945–III-948. July 2002 to August 2002, he was also a Visiting Professor with Microsoft
[98] T.-C. Wang, H.-C. Fang, and L.-G. Chen, “Low delay and error ro- Research Asia, Beijing, China. He has authored or coauthored over 50
bust wireless video transmission for video communication,” IEEE technical papers. He holds ten patents with more pending. His research
Trans. Circuits Syst. Video Technol., vol. 12, no. 12, pp. 1049–1058, interests include video coding and networked multimedia technologies.
Dec. 2002. Dr. Lin was the recipient of the 2000 Research Achievement Award pre-
sented by ITRI. He was also the recipient of the 2000 and 2001 Best Ph.D.
Thesis Awards presented by the Acer Foundation and the Ministry of Edu-
cation, Taiwan, R.O.C., respectively.

Ming-Ting Sun (Fellow, IEEE) received the B.S.


degree from National Taiwan University, Taipei,
in 1976, the M.S. degree from the University of
Texas, Arlington, TX, in 1981, and the Ph.D. de-
gree from the University of California, Los An-
geles, in 1985, all in electrical engineering.
He was the Director of the Video Signal
Processing Research Group at Bellcore (now
Telcordia). He joined the University of Wash-
ington, Seattle,in August 1996 where he is
Jun Xin (Member, IEEE) received the B.S. now a Professor. He holds nine patents and has
degree in electrical engineering from South- published over 140 technical papers, including ten book chapters in the
east University, Nanjing, China, in 1993, the area of video technology.
M.S. degree in electrical engineering from the Prof. Sun received an Award of Excellence from Bellcore for his work
Institute of Automation, Chinese Academy of on the digital subscriber line in 1987. He received the IEEE TRANSACTIONS
Sciences, Beijing, in 1996, and the Ph.D. degree ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (TCSVT) Best Paper
in electrical engineering from the University of Award in 1993 and an IEEE Circuits and Systems Society Golden Jubilee
Washington, Seattle, WA, in 2002. Medal in 2000. He was a Distinguished Lecturer of the Circuits and Systems
From 1996 to 1998, he was a Software Engi- Society from 2000 to 2001. He was the General Cochair of the Visual Com-
neer with Motorola-ICT Joint R&D Laboratory, munications and Image Processing 2000 Conference. From 1988 to 1991,
Beijing. From January 2003 to August 2003, he he was the Chairman of the IEEE Circuits and Systems Standards Com-
was a Senior Software Engineer with Broadware Technologies, Cupertino, mittee and established the IEEE Inverse Discrete Cosine Transform Stan-
CA. Since 2003, he has been with Mitsubishi Electric Research Laborato- dard. He was the Editor-in-Chief of TCSVT from 1995 to 1997 and was the
ries (MERL), Cambridge, MA, as a Member of Technical Staff. His research Editor-in-Chief of the IEEE TRANSACTIONS ON MULTIMEDIA (TMM) from
interests include digital video processing and multimedia communication. 1999 to 2001.

XIN et al.: DIGITAL VIDEO TRANSCODING 97

You might also like