DIVP unit-5 unit-4
DIVP unit-5 unit-4
DIVP unit-5 unit-4
Analog Video:
Today most video recording, storage, and transmission is still in analog
form.
For example, images that we see on TV are recorded in the form of
analog electrical signals, transmitted on the air by means of analog amplitude
modulation, and stored on magnetic tape using video cassette recorders as
analog signals.
Motion pictures are recorded on photographic film, which is a high-
resolution analog medium, or on laser discs as analog signals using optical
technology.
The analog video signal refers to a one-dimensional (1-D) electrical
signal f(t) of time that is obtained by sampling f (x1, x2, t) in the vertical x2 and
temporal coordinates. This periodic sampling process is called scanning. The
signal f(t), then, captures the time-varying image intensity f (x1, x2, t) only
along the scan lines, such as those shown in Figure 1.1. It also contains the
timing information and the blanking signals needed to align the pictures
correctly.
The most commonly used scanning methods are
1) progressive scanning:
A progressive scan traces a complete picture, called a frame, at every At
sec. The computer industry uses progressive scanning with At= l/72 see
for high-resolution monitors.
2) Interlaced scanning:
The TV industry uses 2: l interlace where the odd- numbered and even-
numbered lines, called the odd field and the even field, respectively, are
traced in turn. The solid line and the dotted line represent the odd and the
even fields, respectively. The spot snaps back from point B to C, called
the horizontal retrace, and from D to E, and from F to A, called the
vertical retrace.
An analog video signal f(t) is shown in Figure 1.2. Blanking pulses
(black) are inserted during the retrace intervals to blank out retrace lines
on the receiving CRT. Sync pulses are added on top of the blanking
pulses to synchronize the receiver’s horizontal and vertical sweep
circuits. The sync pulses ensure that the picture starts at the top left corner
of the receiving CRT. The timing of the sync pulses are, of course,
different for progressive and interlaced video.
2) Composite video
Digital Video:
The main advantage of digital representation and transmission is that they
make it easier to provide a diverse range of services over the same network and
more robust.
Digital video on the desktop brings computers and communications
together in a truly revolutionary manner.
Almost all digital video systems use component representation of the
color signal. Most color video cameras provide RGB outputs which are
individually digitized. Component representation avoids the artifacts that result
from composite encoding, provided that the input RGB signal has not been
composite-encoded before.
In digital video, there is no need for blanking or sync pulses, since a
computer knows exactly where a new line starts as long as it knows the number
of pixels per line. Thus, all blanking and sync pulses are removed in the A/D
conversion.
Even if the input video is a composite analog signal, e.g., from a
videotape, it is usually first converted to component analog video, and the
component signals are then individually digitized.
It is also possible to digitize the composite signal directly using one A/D
converter with a clock high enough to leave the color subcarrier components
free from aliasing, and then perform digital decoding to obtain the desired RGB
or YIQ component signals. This requires sampling at a rate three or four times
the color subcarrier frequency, which can be accomplished by special-purpose
chip sets. Such chips do exist in some advanced TV sets for digital processing
of the received signal for enhanced image quality.
The horizontal and vertical resolution of digital video is related to the
number of pixels per line and the number of lines per frame.
The artifacts in digital video due to lack of resolution are quite different
than those in analog video.
In analog video the lack of spatial resolution results in blurring of the
image in the respective direction.
In digital video, we have pixellation (aliasing) artifacts due to lack of
sufficient spatial resolution.
The arrangement of pixels and lines in a contiguous region of the memory
is called a bitmap.
There are five key parameters of a bitmap:
1) the starting address in memory,
2) the number of pixels per line,
3) the pitch value,
4) the number of lines, and
5) number of bits per pixel.
The pitch value specifies the distance in memory from the start of one
line to the next.
After two vertical scans a “composite frame” in a contiguous region of
the memory is formed.
Each component signal is usually represented with 8 bits per pixel to
avoid “contouring artifacts.” Contouring result in slowly varying regions of
image intensity due to insufficient bit resolution.
Color mapping techniques exist to map 224 distinct colors to 256 colors
for display on &bit color monitors without noticeable loss of color resolution.
The major factor in preventing the widespread use of digital video today
has been the huge storage and transmission bandwidth requirements. For
example, digital video requires much higher data rates and transmission
bandwidths as compared to digital audio.
Exchange of digital video between different applications and products
requires digital video format standards. Video data needs to be exchanged in
compressed form, which leads to compression standards.
CCITT Group 3 and 4 codes are developed for fax image transmission,
and are presently being used in all fax machines.
JBIG has been developed to fix some of the problems with the CCITT
Group 3 and 4 codes, mainly in the transmission of halftone images.
JPEG is a still-image (monochrome and color) compression standard, but
it also finds use in frame-by-frame video compression, mostly because of its
wide availability in VLSI hardware. CCITT Recommendation
H.261 is concerned with the compression of video for videoconferencing
applications over ISDN lines.
Typically, videoconferencing using the CIF format requires 384 kbps,
MPEG-1 targets 1.5 Mbps for storage of CIF format digital video on CD-ROM
and hard disk. MPEG-2 is developed for the compression of higher-definition
video at lo-20 Mbps with HDTV as one of the intended applications.
Interoperability of various digital video products requires not only
standardization of the compression method but also the representation (format)
of the data. There is an abundance of digital video formats/standards, besides
the CCITT 601 and CIF standards.
X=[ ] and [ ]
denote the coordinates of an object point at times t and t’ with respect to the
center of rotation, respectively.
That is, the 3-D displacement can be expressed as the sum of a 3-D
rotation and a 3-D translation.
The Rotation Matrix:
Three-dimensional rotation in the Cartesian coordinates can be
characterized either by the Eulerian angles of rotation about the three coordinate
axes, or by an axis of rotation and an angle about this axis. The two descriptions
can be shown to be equivalent under the assumption of infinitesimal rotation.
Eulerian angles in the Cartesian coordinates: An arbitrary rotation in the
3-D space can be represented by the Eulerian angles, θ, ψ, and ɸ, of rotation
about the X1, X2, and X3 axes, respectively.
The matrices that describe clockwise rotation about the individual axes
are given by
Assuming that rotation from frame to frame is infinitesimal, i.e., ɸ ,
etc., and thus approximating cos ≈1 and sin ≈ ɸ, and so on, these
matrices simplify as
=| |
R= RθRψRɸ
=[ ]
R=
[ ]
[ ]
Where ̃ =[ ]
A=[ ]=SR
Where ̃ =[ ]
Where ̃ =[ ]
and denotes the elements of the rotation matrix R in the Cartesian coordinates
Zooming in the Homogeneous Coordinates:
The effect of zooming can be incorporated into the 3-D motion model as
̃
Where ̃ =[ ]
= and =
Or
= and =
where f denotes the distance from the center of projection to the image plane.
If we move the center of projection to coincide with the origin of the
world coordinates, a simple change of variables yields the following equivalent
expressions:
= and =
The configuration and the similar triangles used to obtain these expressions are
shown in Figure 2.5, where the image plane is parallel to the (Xl, X2) plane of
the world coordinate system.
[ ] [ ] [ ]
⁄ ⁄
=ρ L
The rate of change of the normal vector N at the point (Xl, X2, X3) can
be approximated by
where AN denotes the change in the direction of the normal vector due to
the 3-D motion from the point ( ) to ( ) within the period At.
This change can be expressed as
AN=N ( )-N ( )
= -
= =
= =
Observation Noise:
Image capture mechanisms are never perfect. As a result, images
generally suffer from graininess due to electronic noise, photon noise, film-
grain noise, and quantization noise.
In video scanned from motion picture film, streaks due to possible
scratches on film can be modeled as impulsive noise.
Speckle noise is common in radar image sequences and biomedical tine-
ultrasound sequences.
The available signal-to-noise ratio (SNR) varies with the imaging devices
and image recording media.
Even if the noise may not be perceived at full-speed video due to the
temporal masking effect of the eye, it often leads to poor-quality “freeze-
frames.”
The observation noise in video can be modeled as additive or
multiplicative noise, signal-dependent or signal-independent noise, and white or
colored noise.
For example, photon and film-grain noise are signal-dependent, whereas
CCD sensor and quantization noise are usually modeled as white, Gaussian
distributed, and signal independent.
Ghosts in TV images can also be modeled as signal-dependent noise.
We will assume a simple additive noise model given by
= +
where and denote the ideal video and noise at
time t, respectively.
The SNR is an important parameter for most digital video processing
applications, because noise hinders our ability to effectively process the data.
For example, in 2-D and 3-D motion estimation, it is very important to
distinguish the variation of the intensity pattern due to motion from that of the
noise.
In image resolution enhancement, noise is the fundamental limitation on
our ability to recover high frequency information.
Furthermore, in video compression, random noise increases the entropy
hindering effective compression.
The SNR of video imagery can be enhanced by spatio-temporal filtering,
also called noise filtering.