Module 1
Module 1
Module 1
Module - 1
The field of digital image processing refers to processing digital images by means of a digital
computer. Note that a digital image is composed of a finite number of elements, each of which has a
particular location and value. These elements are called picture elements, image elements, pels, and
pixels. Pixel is the term used most widely to denote the elements of a digital image.
Image Definition: An image is a two-dimensional function, f(x, y), where x and y are spatial
coordinates, and the amplitude at any point is called the intensity or gray level.
Applications of Digital Image Processing: Digital image processing has a broad range of
applications, extending across the entire electromagnetic spectrum, from gamma to radio waves.
Unlike humans, who perceive only the visual band, imaging machines can process images from
sources like ultrasound, electron microscopy, and computer-generated images.
Overlapping Fields: There’s no clear distinction between image processing, image analysis, and
computer vision. Image processing typically involves both inputs and outputs as images, while
computer vision aims to emulate human vision and intelligence, often using AI. Image analysis sits
between these, involving tasks like segmentation and object recognition.
Processing Levels: The continuum from image processing to computer vision can be categorized as
follows:
• Low-level processing: Basic operations like noise reduction, contrast enhancement, and
sharpening, where inputs and outputs are images.
• Mid-level processing: Involves tasks like segmentation, description, and classification, where
inputs are images, and outputs are attributes like edges and contours.
• High-level processing: Involves making sense of recognized objects and performing cognitive
functions.
The Origins of Digital Image Processing
Early Applications:
Digital images were first used in the newspaper industry in the 1920s, with the Bartlane cable system
transmitting pictures between London and New York. This reduced the transmission time from over a
week to less than three hours with an image of 5 levels of Gray. One of the image transmitted is as
shown in figure below.
Technological Improvements:
Early challenges involved improving image quality through better printing techniques. By 1929, the
Bartlane system could encode 15 levels of gray, enhancing image reproduction. One of the image is
as shown below.
Dependence on Computers:
Digital image processing truly began with the development of computers in the 1940s. The
introduction of key concepts like stored memory and conditional branching by John von Neumann,
along with technological advances like the transistor, integrated circuits, and microprocessors, paved
the way for digital image processing.
First Image Processing Tasks:
In 1964, NASA's Jet Propulsion Laboratory used computers to improve space images from Ranger 7,
marking the beginning of meaningful image processing tasks. This led to advancements in image
enhancement techniques for space exploration. Its image is shown below.
Expanding Applications:
By the 1960s and 1970s, digital image processing extended to fields like medical imaging, remote
sensing, astronomy, and more. Techniques like image enhancement, restoration, and machine
perception became widely used in various domains, including industry, defense, and science.
Machine Perception:
Machine vision uses image processing techniques for automated tasks like character recognition,
product inspection, and military applications. These tasks often rely on information such as statistical
moments or Fourier transform coefficients, which may not resemble human interpretation of images.
Continued Growth:
With advancements in computing power, networking, and data transmission, digital image processing
continues to grow, finding applications in fields as diverse as archeology, law enforcement,
environmental monitoring, and space exploration.
1. Nuclear Medicine:
o Involves injecting patients with a radioactive isotope that emits gamma rays.
o Bone Scan: Used to detect bone pathology (e.g., tumors or infections) by capturing
gamma-r ay emissions with detectors, as shown in Figure below.
Nuclear Reactor: Figure below depicts gamma radiation from a valve in a reactor, highlighting
areas of strong radiation.
X-RAY IMAGING
X-rays, among the oldest forms of EM radiation used for imaging, have applications in medicine,
industry, and astronomy:
1. Medical Imaging:
o Chest X-ray: Generated by placing the patient between an X-ray source and film,
showing areas where X-rays are absorbed differently, as in Fig. below,
o Angiography: A catheter is used to inject a contrast medium to highlight blood
vessels for clearer imaging, as seen in Figure.
2. Industrial Applications:
o X-rays: Used to inspect electronic circuit boards for defects, shown in Figure.
Industrial CAT scans can examine larger objects like plastic assemblies or rocket
motors.
3. Astronomy:
o X-ray imaging is also used to capture celestial objects, like the Cygnus Loop,
in Figure below, revealing cosmic phenomena.
Ultraviolet (UV) light has a wide range of applications, including lithography, industrial inspection,
microscopy, lasers, biological imaging, and astronomy. Two key areas where UV imaging is prevalent
are microscopy and astronomy:
Each of these examples illustrates the vast potential of digital image processing in the visual and
infrared spectrum across different fields.
• Radar Imaging: The dominant use of microwave imaging is radar, which can capture data in
any weather and lighting conditions, making it invaluable for regions where traditional
imaging fails. Some radar waves can penetrate clouds, vegetation, ice, and dry sand,
providing a unique way to explore otherwise inaccessible areas.
• Functionality: Imaging radar operates similarly to a flash camera but uses microwave pulses
instead of light for illumination. It captures the energy reflected back toward the radar
antenna, which is processed to create an image.
• Advantages: Radar can provide clear and detailed images regardless of weather conditions or
time of day. It’s used for mapping rugged terrains, monitoring agricultural regions, and
studying remote areas.
• Example: A spaceborne radar image of southeast Tibet illustrates the technology’s capacity
to capture detailed images of mountainous terrain and valleys, unaffected by clouds or
atmospheric disturbances. This allows for detailed study of regions like the Lhasa River
valley.
Radar's ability to bypass traditional barriers like cloud cover makes it a powerful tool for
environmental and geographical studies.
1. Image Acquisition:
o First step in image processing; involves capturing or receiving an image in digital form.
o Pre-processing may include tasks like scaling or noise reduction.
2. Image Enhancement:
o Improves image quality for specific applications.
o Enhancement is subjective, depending on the problem (e.g., X-rays vs. satellite images).
o Introduced in early chapters for newcomers to understand processing techniques.
3. Image Restoration:
o Focuses on correcting image degradation based on mathematical models.
o Unlike enhancement, which is subjective, restoration is objective.
4. Color Image Processing:
o Essential due to the rise of digital images on the internet.
o Concepts of color models and basic color processing are covered in Chapter 6.
5. Wavelets and multiresolution processing:
o Wavelets represent images at various resolutions.
o Image compression reduces storage and bandwidth needs (e.g., JPEG format).
6. Compression:
o Image compression reduces storage and bandwidth needs (e.g., JPEG format).
7. Morphological Processing:
o Deals with extracting image components for shape representation and description.
8. Segmentation:
o Divides an image into parts or objects for analysis.
o Critical for tasks like object recognition; weak segmentation leads to failure.
9. Representation and Description:
o Converts raw pixel data into a form suitable for computer processing.
o Focuses on external (boundary) or internal (region) characteristics of objects.
10. Recognition:
o Assigns labels to objects based on extracted features (e.g., “vehicle”).
11. Knowledge Base:
o Encodes domain knowledge into the system to guide processing.
o Controls interaction between processing modules.
6. Image Displays:
o Color flat-screen monitors are commonly used, driven by image display cards.
o Specialized applications may require stereo displays embedded in goggles.
7. Hardcopy Devices:
o Include laser printers, film cameras, inkjet printers, and digital storage like optical disks
and CD-ROMs.
o Film offers the highest resolution, while paper is preferred for written materials.
8. Networking:
o Networking is critical for image transmission, with bandwidth being a major concern,
especially over the Internet.
o Advances in broadband technology (e.g., optical fiber) are improving transmission
efficiency.
The cornea is a transparent tissue that covers the eye's front, while the sclera is an opaque membrane
covering the rest of the eye. The choroid, located beneath the sclera, contains blood vessels essential
for eye nourishment and is heavily pigmented to reduce light scatter.
At the front, the choroid splits into the ciliary body and the iris, which controls the amount of light
entering the eye through the pupil. The lens, suspended by fibers from the ciliary body, is made of
fibrous cells and contains water and protein. It slightly filters visible light and protects against infrared
and ultraviolet light, which can damage the eye. Excessive clouding of the lens (cataracts) can impair
vision.
The retina, the innermost membrane, lines the back portion of the eye. It contains photoreceptors,
which are crucial for vision. Light entering the eye is focused on the retina, where two types of
receptors cones and rods are distributed. Cones (6-7 million) are concentrated in the fovea at the
retina’s center, enabling high-resolution, color vision in bright light (photopic vision). Rods (75-150
million) are distributed over the retina and provide overall, low-resolution vision, especially in low
light (scotopic vision). While rods help detect shapes in dim light, they are not sensitive to color, which
explains why objects lose their color in low-light conditions.
The human eye contains two types of photoreceptors: rods and cones. There are approximately 75 to
150 million rods distributed over the retinal surface, which are responsible for low-light vision
(scotopic vision). Rods are not sensitive to color and provide a general view of the field, but their
connection to multiple nerve ends reduces the detail they can discern. This is why objects seen under
low light, like moonlight, appear colorless and less detailed.
Cones, on the other hand, number between 6 to 7 million and are highly concentrated in the fovea, the
central region of the retina, measuring about 1.5 mm in diameter. Cones are responsible for high-
resolution, color vision (photopic vision) and are most dense in the center of the fovea. As the distance
from the fovea increases, rod density rises, peaking at 20° off-axis, then decreases toward the periphery
of the retina.
The absence of receptors where the optic nerve exits the eye creates a blind spot. The fovea can be
considered roughly as a square sensor array of 1.5 mm by 1.5 mm, containing about 337,000 cones,
with a density of 150,000 elements per mm². Comparatively, modern CCD imaging sensors of medium
resolution can have a similar number of elements in arrays as small as 5 mm by 5 mm. Despite the
superficial nature of such comparisons, the resolving power of the human eye is quite similar to that
of current electronic imaging sensors.
In an ordinary photographic camera, focusing is achieved by adjusting the distance between the lens
and the imaging plane, where the film or digital imaging chip is located. In contrast, the human eye
maintains a fixed distance between the lens and the retina, with the focal length adjusted by altering
the lens's shape. The ciliary body fibers facilitate this change, flattening the lens for distant objects and
thickening it for nearby objects. The distance from the lens center to the retina along the visual axis is
approximately 17 mm, with the eye's focal lengths ranging from about 14 mm to 17 mm. The longest
focal length occurs when the eye is relaxed and focused on objects more than 3 meters away.
To illustrate image formation, consider a scenario where a person views a 15-meter-high tree from a
distance of 100 meters. The geometry involved shows that the height of the tree in the retinal image
(denoted as hh) can be calculated using the ratio 15100=h1710015=17h, resulting
in h≈2.55h≈2.55 mm. The focused retinal image primarily occurs in the fovea, where perception is
based on the relative excitation of light receptors. These receptors transform radiant energy into
electrical impulses, which are then decoded by the brain for visual perception.
Figure above illustrates this relationship, with the long solid curve indicating the intensity range the
visual system can adapt to. In photopic vision, the adaptable range is about 106. The transition between
scotopic (dim light) and photopic (bright light) vision occurs gradually, approximately between 0.001
to 0.1 millilambert, as shown by the double branches of the adaptation curve.
Brightness Discrimination-
The ability of the eye to discriminate between changes in light intensity at any specific adaptation
level is brightness discrimination.
A Classic Experiment -
• Having a subject look at a flat, uniformly illuminated area (diffuser-device that distribute
light evenly) large enough to occupy the entire field of view.
• It is illuminated from behind by a light source whose intensity, can be varied.
• To this field, an incremental illumination is added in the form of a short-duration flash that
appears as a circle in the centre of the uniformly illuminated field, as shown in figure.
The second phenomenon, known as simultaneous contrast, highlights that a region’s perceived
brightness depends on its surrounding areas, not just its own intensity. For example, in Figure below,
all center squares have the same intensity but appear darker as the background lightens. A common
experience of this phenomenon occurs when a piece of paper looks white on a desk but can appear
black when held up against a bright sky.
Additionally, optical illusions showcase how the human visual system can misinterpret visual
information. Figure below presents several such illusions:
To create a 2-D image using a single sensor, there must be relative movement between the sensor and
the area being imaged in both the x- and y-directions. Figure below illustrates one approach, in which
a film negative is placed on a rotating drum, providing displacement in one dimension. Meanwhile,
the single sensor is mounted on a lead screw that moves perpendicularly to the drum's motion,
enabling scanning in the other dimension. This arrangement allows for high-precision scanning, and
though it is slow, it is a cost-effective way to achieve high-resolution images due to the precise
mechanical control.
Image Acquisition Using Sensor Strips
A more common geometry for imaging than single sensors involves an in-line sensor strip, as shown
in Figure. This strip provides imaging elements in one direction, and motion perpendicular to the strip
completes the imaging in the second dimension. This configuration is widely used in flatbed
scanners and airborne imaging systems. In these applications, the imaging strip captures one line of
the image, while the movement of the object or the sensor completes the two-dimensional image.
In medical and industrial imaging, sensor strips are arranged in a ring configuration to
obtain cross-sectional (slice) images of 3D objects, as depicted in Figure. In this setup, a rotating X-
ray source provides illumination, and sensors opposite the source collect the energy that passes
through the object. This forms the basis for computerized axial tomography (CAT). However, the
data from the sensors require extensive processing via reconstruction algorithms to transform them
into meaningful cross-sectional images. As the object moves perpendicularly to the sensor ring,
multiple cross-sectional images are captured and stacked to form a 3D digital volume.
Image Acquisition Using Sensor Arrays
Figure above illustrates individual sensors arranged in a 2D array format, which is commonly used
in digital cameras and other electromagnetic or ultrasonic sensing devices. A well-known example is
the CCD (Charge-Coupled Device)array, widely utilized for its low-noise properties in applications
such as astronomy. CCD arrays can have configurations like 4000x4000 elements and more, making
them highly effective for high-resolution image capture.
The key feature of a 2D sensor array is that it can capture a complete image in one exposure, as the
energy pattern is projected onto the surface of the array. This eliminates the need for motion or
scanning, unlike the single sensor or sensor strip arrangements discussed previously.
1. Illumination energy (such as light) reflects from a scene element and is collected by an
imaging system.
2. An optical lens focuses the incoming energy onto a focal plane, where the 2D sensor array is
located.
3. The sensor array produces outputs proportional to the amount of light energy received at each
sensor.
4. Analog circuitry sweeps these outputs and converts them into an analog signal, which is
then digitized by the imaging system, resulting in a digital image.
To convert a continuous image into digital form, two key processes are
required: sampling and quantization. Sampling involves digitizing the x- and y-coordinates of the
image, while quantization digitizes the amplitude values.
• Sampling: This process captures discrete points from a continuous image along both the x-
and y-axes. As illustrated in Figure above, the amplitude (intensity) values of the image are
plotted along a line (AB). Sampling occurs by taking measurements at equally spaced intervals,
as shown by the small white squares superimposed on the function in Figure. These spatial
locations give rise to a discrete set of coordinate values, but the amplitude remains continuous.
• Quantization: To fully digitize the image, the amplitude values must also be converted into
discrete levels. Figure shows that the intensity range is divided into discrete intervals (in this
case, eight levels from black to white). Each sampled amplitude is assigned a value based on
its proximity to these intervals. This process is called quantization.
• Final Digital Image: Once both sampling and quantization are completed, the image consists
of discrete values for both the coordinates and amplitude, as shown in Figure. Repeating this
process line by line for the entire image results in a two-dimensional digital image.
1. Image plotted as a surface – here, two axis determines spatial location and third axis being
values of f as a function of the two spatial variables x and y as shown below.
2. Image displayed as a visual intensity array – here, intensity of each point is proportional to the
value of f at that point as shown below.
3. Image can be represented as 2D numerical array (0, 0.5, 1) representing dark, gray and white-
here, we have to display the numerical value of f(x,y) as an array(matrix) as shown below.
Mathematically, In equation form, we write the representation of an M x N numerical array as
Both sides of this equation are equivalent ways of expressing a digital image quantitatively. The right
side is a matrix of real numbers. Each element of this matrix is called an image element, picture
element, pixel, or pel.
It is advantageous to use a more traditional matrix notation to denote a digital image and its elements:
Where M is the number of rows and N is the number of columns. There are no restrictions on choosing
the values of M and N other than they have to be positive integers. However, due to storage and
quantizing hardware considerations, the number of intensity levels typically is an integer power of 2.
i.e L = 2K.
Sometimes, the range of values spanned by the gray scale is referred to informally as the dynamic
range. This is a term used in different ways in different fields. Here, we define the dynamic range of
an imaging system to be the ratio of the maximum measurable intensity to the minimum detectable
intensity level in the system. As a rule, the upper limit is determined by saturation and the lower limit
by noise Closely associated with this concept is image contrast, which we define as the difference in
intensity between the highest and lowest intensity levels in an image. When an appreciable number of
pixels in an image have a high dynamic range, we can expect the image to have high contrast.
Conversely, an image with low dynamic range typically has a dull, washed-out gray look.
The number, b, of bits required to store a digitized image is
Table below shows the number of bits required to store square images with various values of N and k.
The number of intensity levels corresponding to each value of k is shown in parentheses. When an
image can have 2k intensity levels, it is common practice to refer to the image as a “k-bit image.” For
example, an image with 256 possible discrete intensity values is called an 8-bit image. Note that storage
requirements for 8-bit images of size 1024 * 1024 and higher are not insignificant.
The images in Figs. 2.20(a) through (d) are shown in 1250, 300, 150, and 72 dpi, respectively. In order
to facilitate comparisons, all the smaller images were zoomed back to the original size. This is
somewhat equivalent to “getting closer” to the smaller images so that we can make comparable
statements about visible details. There are some small visual differences between Fig(a) and (b), the
most notable being a slight distortion in the large black needle. For the most part, however, Fig.(b) is
quite acceptable. In fact, 300 dpi is the typical minimum image spatial resolution used for book
publishing, so one would not expect to see much difference here. Figure(c) begins to show visible
degradation (see, for example, the round edges of the chronometer and the small needle pointing to 60
on the right side). Figure(d) shows degradation that is visible in most features of the image called
checker board effect.
INTENSITY RESOLUTION à due to quantization
Intensity resolution is a measure of smallest discernible change in intensity level.
Figure-a Figure-b
Figure-c Figure-d
Figure-e Figure-f
Figure-g figure-h
Here, we keep the number of samples constant and reduce the number of intensity levels from 256 to
2, in integer powers of 2. Figure(a) is a 452 * 374 CT projection image, displayed with k= 8 (256
intensity levels). Figures(b) through (h) were obtained by reducing the number of bits k= 7 to k= 1
while keeping the image size constant at 452 * 374 pixels. The 256-, 128-, and 64-level images are
visually identical for all practical purposes. The 32-level image in Fig. (d), however, has an
imperceptible set of very fine ridge-like structures in areas of constant or nearly constant intensity
(particularly in the skull). This effect, caused by the use of an insufficient number of intensity levels
in smooth areas of a digital image, is called false contouring. False contouring generally is quite
visible in images displayed using 16 or less uniformly spaced intensity levels, as the images in Figs.
(e) through (h) show.
False contouring effect - use of insufficient number of intensity levels in smooth areas of a digital
image
Checkerboard effect - because of under sampling of image (insufficient number of pixels)
Image Interpolation
Its a basic tool used in image processing tasks like zooming, shrinking, rotating, geometric corrections.
Whereas Zooming and shrinking task in image are called image resampling.
Interpolation is the process of using known data to estimate values at unknown location.
Three types of interpolation
• Nearest Neighbour interpolation
• Bilinear interpolation
• Bicubic interpolation
Bilinear interpolation –
Here, we use the four nearest neighbours to estimate the intensity at a given location. Let (x, y) denote
the coordinates of the location to which we want to assign an intensity value (think of it as a point of
the grid described previously), and let v(x, y) denote that intensity value. For bilinear interpolation,
the assigned value is obtained using the equation
where the four coefficients are determined from the four equations in four unknowns that can be written
using the four nearest neighbours of point (x, y). Bilinear interpolation gives much better results than
nearest neighbour interpolation, with a modest increase in computational burden.
Bicubic interpolation –
It involves the sixteen nearest neighbours of a point. The intensity value assigned to point
(x, y) is obtained using the equation
where the sixteen coefficients are determined from the sixteen equations in sixteen unknowns that can
be written using the sixteen nearest neighbours of point (x, y). Generally, bicubic interpolation does a
better job of preserving fine detail than its bilinear counterpart. Bicubic interpolation is the standard
used in commercial image editing programs, such as Adobe Photoshop and Corel Photopaint.
This set of pixels, called the 4-neighbors of p, is denoted by N4(p). Each pixel is a unit distance from
(x, y), and some of the neighbour locations of p lie outside the digital image if (x, y) is on the border
of the image.
.
and are denoted by ND(p). These points, together with the 4-neighbors, are called the 8-neighbors of
p, denoted by N8(p). As before, some of the neighbour locations in ND(p) and N8(p) fall outside the
image if (x, y) is on the border of the image.
b. 8-adjacency -
Two pixels p and q with values from V are 8-adjacent if q is in the set N8(p).
Connectivity
Let S represent a subset of pixels in an image. Two pixels p and q are said to be connected in S if
there exists a path between them consisting entirely of pixels in S as shown below.
Regions
Let R be a subset of pixels in an image, we call R a region of the image if R is a connected set as
shown below.
Two regions, Ri and Rj are said to be adjacent if their union forms a connected set as shown below.
Regions that are not adjacent are said to be disjoint.
Boundaries
The boundary of a region R is the set of points that are adjacent to points is the complement of R as
shown below.
Distance Measures
For pixels p, q, and z, with coordinates (x, y), (s, t), and (v, w), respectively, D is a distance function
or metric if