Detecting and Classifying Faults in Infrared Images

Download as pdf or txt
Download as pdf or txt
You are on page 1of 75

Thermographic Decision Support

– Detecting and Classifying Faults


in Infrared Images

MAGNUS SMEDBERG

Master of Science Thesis


Stockholm, Sweden 2006
Thermographic Decision Support
– Detecting and Classifying Faults
in Infrared Images

MAGNUS SMEDBERG

Master’s Thesis in Computer Science (20 credits)


at the School of Electrical Engineering
Royal Institute of Technology year 2006
Supervisor at CSC was Eric Hayman
Examiner was Jan-Olof Eklundh

TRITA-CSC-E 2006:112
ISRN-KTH/CSC/E--06/112--SE
ISSN-1653-5715

Royal Institute of Technology


School of Computer Science and Communication

KTH CSC
SE-100 44 Stockholm, Sweden

URL: www.csc.kth.se
Abstract

Thermography is the process of using a camera capable of detecting infrared radiation to


determine the temperature of an object. There is a multitude of possible uses for the
technique, although traditionally the use of thermography has been limited to industrial
inspections, due to the high costs of the necessary equipment. However, lately, due to
development in the field, the cameras have become much cheaper and therefore the different
uses, and users, of the technique has increased. With the lowered costs of the cameras
comes new inexperienced users that may need help in interpreting the results produced by
the cameras. For these users, a decision support system, that could guide them in their use
of the camera, would be of great importance.
This thesis discusses how such a system could be implemented and also presents a pro-
posed system that can be used to identify faults in infrared images of electrical installations.
The proposed system works in two layers, Identification of regions of interest (ROIs), and
Classification of ROIs. The identification is conducted using an algorithm that detects re-
peating objects in an image by extracting SIFT features and determines the most probable
translation between these features. The classification of the identified regions is done using
a feed-forward neural network.
This is a novel field of research, and to the best of my knowledge, this is the first work
done on decision systems based on infrared images of electrical installations.
Termografibeslutsstöd: Att finna och klassificera
fel i infraröda bilder

Examensarbete vid CSC

Sammanfattning

Termografi är det mätsätt där man använder en kamera för att detektera och mäta in-
fraröd strålning. Det finns en uppsjö av olika möjliga användningsområden där termografi
kan användas. Traditionellt sätt har dock tekniken varit begränsad till kontroll och över-
sikt av industriella processer och installationer. Detta beror mycket på den höga kostnad
som har varit förknippat med tekniken under många år. På senare tid har dock tekniken
utvecklats dramatiskt, vilket bland annat har lett till att betydligt billigare kameror kan
produceras. Med dessa billigare kameror kommer nya användningsområden, och framförallt
nya användare. Dessa nya användare är ofta oerfarna och outbildade inom termografi och
handhavande av infraröda kameror. För dessa användare skulle ett beslutsstödsystem som
kan hjälpa till att tolka resultaten från kamerorna vara av stor nytta.
Det här examensarbetet behandlar hur ett sådant system skulle kunna konstrueras, och
beskriver även utvecklandet av ett sådant system. Det nyutvecklade systemet använder sig
av ny teknik inom datorseende för att identifiera fel i infraröda bilder av elektriska installa-
tioner. Systemet är uppbyggt i två delar; identifiering av intresseregioner, och klassificering
av regioner. Identifieringen av regioner görs med en nyutvecklad algorithm som hittar up-
prepningar i bilder med hjälp av SIFT-intressepunkter och en Hough-transform liknande
röstningsprocedur. Klassificeringen utförs av ett tvålagers Artificiellt Neuronnät tränat på
bilddata från elektriska installationer.
Det här är ett nytt forskningsområde, och såvitt jag är medveten, är detta det första pub-
licerade arbetet i sitt slag som behandlar beslutstöd för infraröda bilder av el-installationer.
Acknowledgements
This thesis was initiated and sponsored by FLIR Systems™in Danderyd, Sweden,
and supervised by the CVAP group of the School of Computer Science and Com-
munications, KTH. I would like to thank my supervisors Lars-Åke Tunell, Malin
Ingerhed and Anton Grönholm at FLIR for their help and support during the work
on this thesis. I would also like to thank my supervisor at KTH, Eric Hayman for
his brilliant ideas, and great feedback.
Furthermore, the work presented in this thesis was done in collaboration with
David Wretman, and much of his work, presented in [1], is also present in this thesis.
Last, I would like to thank David Wretman for the great cooperation during these
20 weeks!

Magnus Smedberg, June 2006, Danderyd, Sweden


Contents

1 Introduction 1
1.1 Infrared imaging and thermography . . . . . . . . . . . . . . . . . . 2
1.1.1 Infrared imaging . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Thermography . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 The infrared image . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.4 Thermography of electrical applications . . . . . . . . . . . . 5
1.2 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.1 Identification of regions . . . . . . . . . . . . . . . . . . . . . 9
1.2.2 Information extraction . . . . . . . . . . . . . . . . . . . . . . 10
1.2.3 Decision making . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.4 The image database . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Report outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Previous work 13
2.1 The segmentation-classification approach . . . . . . . . . . . . . . . . 14
2.2 Segmentation by grouping repetitive structure . . . . . . . . . . . . . 15
2.3 Decision making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Segmentation and Region-finding 19


3.1 Bottom-Up segmentation . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Finding repeating structures . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.1 A review of the SIFT algorithm . . . . . . . . . . . . . . . . . 21
3.2.2 Matching features . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.3 Finding most probable translation . . . . . . . . . . . . . . . 23
3.2.4 Finding complementary features . . . . . . . . . . . . . . . . 26
3.2.5 Grouping image features . . . . . . . . . . . . . . . . . . . . . 30
3.2.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.7 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4 Classification and Decision-making 37


4.1 Information representation . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.1 Temperature values . . . . . . . . . . . . . . . . . . . . . . . 38
4.1.2 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.3 Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2 The dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3 Classification of data . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3.1 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . 44
4.3.2 Constructing the Neural Net . . . . . . . . . . . . . . . . . . 45
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.4.1 The network parameters . . . . . . . . . . . . . . . . . . . . . 47
4.4.2 Manual or automatic segmentation . . . . . . . . . . . . . . . 50
4.4.3 The feature parameters . . . . . . . . . . . . . . . . . . . . . 51
4.4.4 Histogram distance . . . . . . . . . . . . . . . . . . . . . . . . 52
4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.6 Other possible solutions . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.7 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5 Summary and Conclusions 57


5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 The demonstration system . . . . . . . . . . . . . . . . . . . . . . . . 58
5.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.3.1 Finding and grouping features . . . . . . . . . . . . . . . . . 58
5.3.2 The data set . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.3.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Bibliography 63
Chapter 1

Introduction

Infrared thermography has a number of different uses. Some of the more prominent
fields of use are control and maintenance of industrial applications, research and
development, and human and veterinary medicine. Infrared thermography is used
in situations ranging from monitoring oil levels of tanks in oil-refineries to measuring
the body temperature of a zoo lion without having to use sedation. In addition to
these fields where thermography already is a natural component, there is an ever-
growing number of new applications and uses that appear as the technology gets
cheaper and more readily available [2].
One of the most common uses of infrared thermography is the inspection of
electrical installations and applications. The reason for this is that it is one of the
fields that are best suited for using non destructive testing in general and infrared
thermography in particular. A reason for why it is so well suited is that many
industrial electrical installations are either too dangerous to test while running, too
expensive to turn off for testing, or a combination of both. Electrical installation are
also, on a very basic level, suitable for infrared thermography simply because heat is
very closely related to electricity. If an electrical component is exposed to too much
current, or has too high resistance, its temperature will rise. This temperature rise
is an excellent target for thermography and infrared cameras.
This Master’s project was initiated and sponsored by FLIR Systems. FLIR
Systems is the market leader in infrared thermography, they develop and manu-
facture infrared cameras in a multitude of varieties for many different applications.
FLIR Systems have initiated this project in their ambition to learn more about how
computer vision can be applied to their business.
The work of this masters thesis has been performed in close collaboration with
David Wretman and his thesis [1] is closely related to this one. Portions of the
material for both [1] and this thesis are based on joint work, and are thus featured
in both theses. These portions include this chapter, chapter 2 and parts of chapter
3.
This introductory chapter contains the background for the subject matter, it
describes the problem to be solved, and gives the outline for the rest of the report.

1
Section 1.1 serves as an introduction to the area of infrared imaging and thermogra-
phy, section 1.2 defines the problem to be solved and section 1.3 presents the report
outline.

1.1 Infrared imaging and thermography


This section will introduce how an infrared image can be produced and how it
should be interpreted. The knowledge of how to interpret an infrared image will
prove to be important in the understanding of the rest of the thesis. The section is
based on material from [3] and [4].

1.1.1 Infrared imaging


An infrared imaging system detects radiation in the infrared part of the electro-
magnetic spectrum and produces images from that radiation. All objects emits
infrared radiation and the amount of emitted radiation increases with temperature.
Therefore, infrared imaging allows us to see variations in temperature.
Infrared radiation is the part of the radiation spectra that lies between visible
light and microwaves. There is no unambiguous definition of which wavelengths
constitute infrared radiation but according to [3] a reasonable interval is between
0.7 µm and 1 mm.
In infrared imaging the infrared spectrum is often subdivided into three bands
(this division is not unambiguous either but according to [4] a common subdivision
is):

• Near IR or short wave (SW): 0.7 – 2 µm

• Mid wave (MW): 2 – 5 µm

• Long wave (LW): 8 – 14 µm

These bands do not cover the full infrared spectrum because not all parts of the
spectrum are suitable for infrared imaging. The reason for this is that the atmo-
spheric transmission of infrared radiation is low in some ranges of the spectrum.
This means that the atmosphere will block infrared radiation in these ranges thus
making these wavelengths unsuitable for infrared imaging. Most infrared cameras
today work in the MW or LW ranges.

1.1.2 Thermography
Thermography is the use of infrared imaging systems to not only see differences in
temperature but also to measure the temperature of the depicted object. Thermo-
graphic cameras convert the measured intensity of infrared radiation to temperature.
The outgoing intensity of radiation from an object can in general have three
sources. The radiation can be emitted from the object itself, it can be reflected on

2
the object from a source in front of the object and it can be transmitted through the
object from a source behind the object. How much of the outgoing radiation from
an object that is emitted, reflected or transmitted can be described by three object
specific constants: the emissivity, ε, the reflectivity, ρ, and the transmittance, τ .
For any object these three constants sum to one:

ε+ρ+τ =1 (1.1)

In reality very few materials are transparent in the infrared spectrum, i.e. have
a transmittance τ > 0. An example of a material that is transparent to infrared
radiation is germanium, from which lenses to infrared cameras are made. The very
common plastic polyethylene is also partially transparent. Because of the rareness
of materials transparent to infrared radiation the transmittance factor τ can often
be neglected. Non-transparent objects are termed opaque.
The property of interest in thermography is the emissivity. The emissivity tells
us how much of the thermal radiation from an object that is emitted due to the
temperature of the object. An object with an emissivity ε = 1 is called a black
body.
The relation between temperature and radiation is Stefan–Boltzmann’s law and
it states that the total energy radiated per unit surface area of a black body in unit
time is directly proportional to the fourth power of the absolute temperature of the
body:
j ? = εσT 4 (1.2)
The irradiance, j ? , is the power density and is measured in W/m2 . The temperature,
T , is measured in Kelvin. The constant of proportionality, σ, is Stefan–Boltzmann’s
constant and has the value 5.67 Wm−2 K−4 .
If the irradiance, j ? , could be measured with some instrument the temperature
of a black body could be calculated. Unfortunately, it is not j ? that is measured
by the thermographic camera but rather a portion of the thermal radiation defined
by the spectral sensitivity of the specific camera. Therefore, each camera must be
calibrated and given a relationship between registered thermal radiation and the
temperature of the depicted object.[5]
The calibration is done as follows: Images are taken of a number of reference
sources, i.e. simulated black bodies, calibrated to give a radiation corresponding to
black bodies of certain temperatures. The output signal of the camera is registered
for each reference source. To these temperature-signal pairs a curve can be fitted
that gives a continuous relationship between output signal and temperature. Now
the camera can measure the temperature of a perfect black body.
If the object to be examined is not a perfect black body (and it never is), we have
to determine how much radiation originates from the object itself and how much of
the radiation has another source. The result we get if measuring the temperature
of a real object while assuming that it is a perfect black body is called the apparent
temperature, TA . This result has to be adjusted to get a correct temperature
reading.

3
Material Specification Emissivity
Aluminum Polished 0.04–0.06
Aluminum Rough surface 0.06–0.07
Copper Polished 0.02–0.03
Steel Polished 0.14–0.38
Steel Heavily rusted 0.69
Carbon Graphite, filed surface 0.98
Cloth Black 0.98
Skin Human 0.98
Ice Smooth 0.98
Rubber Hard 0.95
Wood Planed, oak 0.90
Table 1.1. Examples of different materials and their approximate emissivity (extract
from emissivity table in [4]).

Assuming that we are considering an opaque object we now have to determine


the emissivity (and thus implicitly the reflectivity because of equation 1.1) and the
apparent temperature of surrounding objects that will be reflected in the interest-
ing object, TArefl . If these two parameters are known the true temperature can be
calculated as:
1
T = (TA − ρTArefl ) (1.3)
ε
On site, the apparent temperature of reflecting objects can be measured by
different means but it is difficult to obtain a accurate result robustly. If TArefl is not
measured on site it cannot be reconstructed afterwards and an accurate temperature
reading cannot be obtained.
Remaining is the problem of finding the emissivity of the object to be examined.
Determining the emissivity of an object has to be done for each object considered.
The emissivity is dependent on such properties as material, surface structure and
geometry and it is therefore difficult to construct emissivity tables and general
guidelines for the determination of this constant.
Because of the difficulties in determining the reflected apparent temperature
and the emissivity accurately it is common to try to find surfaces which are known
to have high emissivity and measure the temperature on those. In such a situa-
tion the second term of equation 1.3 can be neglected and since the emissivity, ε,
is approximately 1 the true temperature can be approximated with the apparent
temperature.
Examples of materials which in general have high emissivity are rubber, cloth,
carbon, and wood. Therefore, surfaces of such materials are well suited for temper-
ature measurement with thermographic cameras. Examples of materials with low
emissivity are almost all kinds of shiny metals. More examples of materials and
their approximate emissivity can be seen in table 1.1.

4
1.1.3 The infrared image
Figure 1.1 shows an example of a typical infrared image. The image depicts three
fuses in a three phase electrical installation. Dark areas represents surfaces that
radiate the least and in general those are the coolest areas. Lighter shades means
the opposite: more radiation and thus in general a higher temperature. To every
grayscale value there belongs an apparent temperature, i.e. the true temperature of
the object if it was a perfect black body. The scale in Figure 1.1 shows the apparent
temperature for the image grayscale.
What has to be considered when analyzing an infrared image is that not only
the temperature of the object influences the amount of outgoing radiation. Also
the emissivity, reflectivity and transmittance of the depicted object and radiation
of surrounding objects will influence what we see. Even if we assume that objects
are opaque and thus ignore the transmittance, a number of factors have to be
considered. Therefore, we have to be careful when making conclusions of which
areas in an image are hotter than others.
If we look at the example image we see that the lightest area is located at the
bottom of the body of the leftmost fuse. According to the temperature scale this
area has a temperature of a little more than 60◦ C. Since the fuse body can be
assumed to be made from a ceramic material with a high emissivity, this reading is
probably not too far from the truth.
If we look just below the hottest area we see an approximately rectangular
region with significantly lower apparent temperature, the average temperature is
approximately 35◦ C in this region. This is the connection of the fuse which is made
of metal. In reality this area is probably at least as hot as the hottest segment of
the ceramic part of the fuse, but due to the lower emissivity of the metallic surface
the area appears cooler than it really is.
Although the connection appears cool we can conclude that this part of the fuse
is the source of heat by noting that the direction in which the temperature increases
on the fuse body is towards the connection. Also, if we look at the cable that is
connected to this fuse (the narrow curved object below the fuse), this also has a
temperature that increases towards the connection.
In conclusion we have to be careful when analyzing an infrared image and not
interpret the apparent temperature as the truth. On the other hand, if we use our
knowledge about the depicted objects and of how the image has been produced a
significant amount of information can be extracted from an infrared image.

1.1.4 Thermography of electrical applications


This thesis will mainly concentrate on thermography of electrical installations.
Therefore, this section will introduce some issues that has to be considered when
working with infrared imaged depicting electrical installations.
To understand the principle of thermography of electrical installations we have
to answer the following question: What kind of faults are there in electrical ap-

5
H0925-04.img
THV 550 image from 1997-09-25 08:41:08,950

61,0°C
60

55

50

45

40

35

30

25

20

17,9°C

Figure 1.1. A typical infrared image depicting three fuses in a three phase electri-
cal installation. In the grayscale representation light areas radiate more than dark
areas. The temperature scale shows apparent temperature, i.e. the temperature if
the depicted object was a perfect black body, in ◦ C.

plications, and how do you detect them with an infrared camera? Unfortunately,
in reality not every electrical part that is broken appears warm seen through an
infrared camera, neither will every hot region in an infrared image correspond to a
broken electrical part. On the contrary, there are a number of contributing factors
that needs to be taken into consideration.
Firstly, there are a number of different reasons to why an electrical part might
be malfunctioning, and some of these do not even result in a rise of temperature.
Some examples on different reasons for faults are: bad connections, overload and
loss of contact. In [6], a checklist has been developed to help determine whether an
electrical component is malfunctioning, or about to break down. One of the initial
criteria is temperature. According to [6], any electrical component that has an
absolute temperature above 94◦ C (200◦ F) should be subject to a closer investigation.
Here we have to remember what is said about absolute temperature in section
1.1.2. Because of the shifting emissivity of the materials of components, a good
measurement of absolute temperature can be very difficult to achieve. In addition,
the components that are most likely to be subject to high temperatures, such as
bad connections, are often made of shiny metal. From Table 1.1 we see that shiny
copper, a common material where good connection is wanted, has an emissivity of
about 0.03. This is a very low number, and a quick example using equation 1.3
shows that a piece of shiny copper at 97◦ C will only appear as 30◦ C, in the camera,

6
given a reflected temperature of 27◦ C. In fact, in [6], the recommendation is given
that any measurement on a surface with a emissivity below 0.6 is to be regarded as
risky.
There is another side to the problem of low emissivity surfaces, and that is that
they will have a high reflexivity (remember equation 1.1). The problem with objects
with high reflexivity is, just as with visually reflective objects, that they reflect their
surroundings. Consider the case of a two adjacent objects; the first is flat and has a
high reflexivity and a low temperature, and the second object has high temperature
and high emissivity. From some angle, the radiation from the second object will
be reflected off the surface of the first object, into the camera. From the camera’s
point of view, the first object will then appear to have almost as high temperature
as the second object. As soon as the camera is moved into another position, the
first object will appear cold again.
In addition to the ever present problem with emissivity, there is also the problem
of correct focus. Even though it is easy to forget, the optics of an infrared camera
work very much in the same way as the optics of an ordinary camera. This means
that it is very important to set the correct focus, in fact it is even more important
than in an ordinary camera. This is because in an infrared camera you do not
only get a blurry picture when the lens is out of focus, but you also get erroneous
temperature measurements. The radiation from a small hot spot gets scattered over
a larger area on the detector and is therefore interpreted as a lower temperature.
The problem of focusing is, along with the reflection problem, one of the two
problems that cannot be corrected after the image has been taken. Because of this,
it is always up to the user to make sure that these two problems are taken care of
at the time of the image capturing.
The nature of a problem can also be the cause of some extra complications. In
the case of a loss of contact, for example, the detached component soon gets the
same temperature as the environment, since no heating electricity passes through it.
This is a case which can be very difficult to detect using thermography, especially
if there are a lot of warmer objects around the detached component. In that case,
the cool, detached component is very likely to blend in with the background. This
happens because of range limitations of the camera. Although most cameras have
a continuous range of temperatures (a standard range typically lies somewhere be-
tween -40◦ C and +200◦ C [7]) that is measured constantly, only a small part of this
range is shown in the cameras monitor at one time. This is done to facilitate the
interpretation of an infrared image, since both the screen used to show the image,
and the human eye has limitations on the number of displayable/perceivable inten-
sities. This is also why the spectrum of many infrared images, although originally
one dimensional (grayscale), are transformed to a higher dimensionality (color)1 .
Regardless of whether the image is shown in grayscale or color, there are only a cer-
tain number of different colors/intensities to chose from, and these are distributed
over the temperature range that has been chosen for use. If the temperature range
1
In this thesis all images are displayed in grayscale because of printing limitations.

7
is large, small differences (like those of the background) will be represented by a
small number of adjacent colors/intensities, which can make them very difficult to
perceive.
In the case of a disconnected or broken component the problem of it blending
with the background is often accompanied by the problem of adjacent components
getting overheated. This is typical for a three phase system, where the remaining
two phases gets overloaded, and consequently overheated, when one phase fails.
This of course makes the detection of the true fault more difficult to find, since the
overheating of the adjacent phases might be interpreted as faults instead.
If the objective is not to find the erroneous phase, but rather to find a general
error, the issue described above is not a problem. Instead it can be used as a tool for
finding errors. The temperature difference between the phases is actually a very re-
liable sign of an error. In fact, temperature comparison between phases or otherwise
identical components is probably the most reliable indication of faults in electrical
applications. This is because it is a relative measurements and therefore more ro-
bust to emissivity errors than absolute measurements. In [6], it is recommended
that any phase-to-phase temperature difference larger than 9◦ C (20◦ F) should be
considered as an anomaly and be inspected closer. The phase-to-phase temperature
will prove to be of high interest later on in this report.
Temperature differences between phases can of course also be a result of normal,
non faulty variations in load. Such load variations can prove to be a difficult issue in
the inspection of electrical applications. Not only is there the problem of variation
between phases, but general load variations that affect all three phases can cause
difficulties in an inspection as well. For example; a small error is found (e.g. a hot
spot due to slightly loose connection) but considered not to be in need of immediate
repair. It then turns out that the application was inspected during idle load (say
40% of max load), and when the application is at full load (100%), the inspected
connection breaks down because of overheating. The change from 40% load to 100%
load can sometimes lead to temperature raises of up to 100◦ C.
So, how do we avoid this problem? In [7] there is an equation that can be
used to calculate the expected temperature at different loads. The equation is not
presented here since it is not regarded to be inside the scope of this thesis. It is
however important to remember that this problem exists, and that it is another
issue that needs to be addressed at the site of inspection, and cannot be accurately
compensated for afterwards.
In this section a lot of different issues have been presented that can make the
inspection of an electrical application harder. However, there are also some tech-
niques that can be used to facilitate an inspection. Temperature differences between
similar items is one such technique; another one is to look for gradients in the image.
In the case of a real hot spot (i.e. not a reflection), there will be a smooth
temperature decrease as you move away from the hot spot in the image. This
gradient is a good sign of an overheating somewhere in the image. As described in
section 1.1.3, a gradient can also be used to detect the true location of an error when
the true source of the heat is not visible in the image. This is especially helpful

8
when the heat source has very low emissivity values or when there is an internal
error, i.e. when the heat source is inside another object.

1.2 Problem description


As mentioned, one of the most important applications of infrared thermography
is the inspection of electrical installations. Such inspections can be of great value
since they have the potential of predicting failure. For example, this can prevent
unnecessary stopping of the activity in a production facility, thus saving money.
For an untrained operator of a thermographic camera it can be a difficult task
to interpret the infrared image. Many factors influence what is shown in the image
and a significant amount of training and experience is required to make accurate
interpretations. It can thus be a problem to reveal if temperature differences within
or between different parts of the installation are normal or due to some error or mal-
function. Therefore, a well functioning decision support system, which would guide
the operator in making the right decision would be of great value. Such a system
should capture expert knowledge and experience from competent thermographers
and use it to guide less experienced users.
A decision support system based on imagery can in general be divided into three
major steps:

1. The identification of interesting image regions.

2. The extraction of relevant and discriminative information from these regions.

3. The decision or diagnosis of the image, based on the information extracted


from the interesting image regions.

This thesis will discuss these three steps, and will also describe the implemen-
tation of a demo system based on these steps. The demo system will be developed
without any attempts optimize the code for speed or efficiency. The emphasis will
lie on the development of robust methods for solving the problem. A more thorough
discussion of step one is available in [1], where different approaches to the issue will
be evaluated.
The basic requirements on the three steps above will be presented in the following
three sections.

1.2.1 Identification of regions


Since images depicting electrical installations show a high variation in appearance
the identification has to be general enough to handle these variations. Also, the
result of the region identification has to deliver regions from which it is possible,
and preferably simple, to extract information for use in the further processing.

9
1.2.2 Information extraction
The second step of the decision support system will be the extraction of relevant
information from the identified regions. The extracted information should be repre-
sented in such a way that it can be readily utilized by the next step of the system.
The representation should be as concise as possible to ensure a compactness of the
system, and at the same be descriptive and discriminant to ensure that as little of
the relevant information as possible is lost in the process.

1.2.3 Decision making


The final part of the system will be the making of a decision or a diagnosis based on
the information extracted in the previous step. The decision made should answer
the question of whether these regions contain a possible electrical failure or not.
The decision making should have a good ability to generalize in order to be able to
handle the wide variety of situations that can be found among electrical installations
and applications.

1.2.4 The image database


The set of images used in this thesis contains 74 infrared images, of slightly varying
quality and size, all depicting some sort of electrical installation. A sample of nine
images from the data set is shown in Figure 1.2. The majority of the images have
the dimension 240 × 320 pixels but other dimensions exists as well.
Most of the images depicts three-phase fuses or different forms of connections.
The vast majority of images contains some sort of abnormal heating due to electrical
problems. Only a few contains completely healthy installations.

1.3 Report outline


The remainder of this thesis is described by chapters as follows:

Chapter 2 - Previous work Here, some previous work done on areas connected
to this thesis is discussed. This chapter is joint work with David Wretman.

Chapter 3 - Segmentation and region-finding An approach to finding and ex-


tracting interesting regions from an infrared image is presented. Parts of this
chapter is joint work with David Wretman.

Chapter 4 - Classification and decision-making The classification of regions


is discussed, and a classifier using neural networks is presented.

Chapter 5 - Summary and conclusion The work done in this thesis is summa-
rized and concluded. Furthermore, proposals for further work and improve-
ments is presented.

10
Image: Connect.mat Image: Gsm1.mat

Image: H092527.mat Image: H121516.mat Image: H092511.mat

Student Version of MATLAB Student Version of MATLAB


Student Version of MATLAB
Image: Img008.mat Image: Img3.mat Image: Overhead.mat

Student Version of MATLAB Student Version of MATLAB Student Version of MATLAB

Figure 1.2. A sample of nine out of the 74 images used as data set. The whole set
consists of images depicting electrical installations.

Student Version of MATLAB Student Version of MATLAB Student Version of MATLAB

11
Chapter 2

Previous work

This chapter will serve as an introduction to the subject and establish a foundation
for the main material of the thesis.
To the best of my knowledge, no prior work on the subject of decision support
for thermography of electrical installations has been published. This thesis (along
with [1]) is thus the first of it’s kind in this particular field of work. Because of this,
little previous work has been available for revision. However, there exist plenty of
work on adjacent fields, and some of this will be reviewed in this chapter.
The project of developing a decision support system for infrared images can
be divided into two sub-problems. The first sub-problem is that of how to find
and segment interesting regions or objects in the infrared image. This is a typical
Computer Vision problem. The second sub-problem is that of how to deal with
these regions or objects, how to classify them as belonging to any of a number of
different classes. This is a typical Pattern Recognition and Classification problem.
The classes can be divided on the objects’ shape, size, color, condition or any other
property of the object.
Previous related work on these two sub-problems will be discussed in this chap-
ter; the former in sections 2.1 and 2.2, and the latter in section 2.3.
The Computer Vision problem of segmentation can also be addressed in two
different ways. The first way or approach is called the Bottom-Up approach, and
the second the Top-Down approach. In the Bottom-Up approach, the image is
segmented using only information obtained from the image; the information process
starts from a low level with the pixels of the image, and proceeds to forming higher
levels of information, the segmented regions.
For the Top-Down approach, certain a priori information about the contents of
the image is used to facilitate the segmentation. The a priori information can be of
the kind that the image contains circular objects, and therefore the segmentation
will only try to find round objects. High order information (circular objects) is
passed down to a low level task (the segmentation).

13
2.1 The segmentation-classification approach
As has been stated, the most straightforward approach to the solution of the initial
stage of our problem, identifying relevant parts of the image, is a segmentation-
classification approach. In this approach the image can first be divided into smaller
pieces, by the means of bottom-up segmentation. The pieces can then be fed to a
classification algorithm that tries to identify them as belonging to a class of objects
or not. Alternatively, an object in the image can first be identified by some means
and the knowledge of if and where in the image the object is identified can guide
the segmentation. This type of segmentation is henceforth referred to as top-down
segmentation.
In the literature there exists numerous approaches to image segmentation. In
[8], [9], [10] and [11] four different approaches to bottom-up segmentation are pre-
sented. These segmenting techniques uses color (or grayscale value), texture and
boundary consistency as cues for the segmentation. In our application grayscale and
boundary consistency are valid measures while texture is less probable to provide
reliable information since texture in infrared images are rare and of low value. Both
the advantage and the problem of these four methods is that they rely strictly on
image information to segment the image. It is an advantage because no additional
information has to be provided to guide the segmentation. This is beneficial if, as in
this case, the result of the segmentation is supposed to be used to extract informa-
tion about the image contents. It can also be a disadvantage because, naturally, the
less information about the image contents is available, the harder the segmentation
task becomes.
Although the bottom-up methods above will have no problem in dividing a sim-
ple image into correct regions, complex images can be wrongly segmented. There are
two mistakes a segmentation algorithm can make: dividing an object into multiple
regions and merging object parts with the background. Both these problems have
the potential of making a segmented image useless. All the bottom-up approaches
therefore have little potential of succeeding in the segmentation of the interesting
images since infrared images in general have less cues to guide the segmentation
algorithm than do visual imagery (less contrast, more noise, etc).
If a well segmented object with characteristic shape can be produced, the identi-
fication method “Shape Context” of [12] will be a strong candidate for the identifica-
tion task. This method tries to match points from a sampled contour from the image
with a sampled contour from a model of the shape of an object. The method has
been successful in character identification and is robust enough to identify distorted
shapes in cluttered surroundings.
Unfortunately, there are two major problems with the Shape Context method
in our application. Firstly, it requires a robust segmentation that can extract the
full contour of an object, something we have concluded will be very hard, especially
without a-priori knowledge of the object to be segmented. Secondly, a problem that
will occur in all model based identification tasks arrives: A model has to be created
for each class of objects that shall be identified by the system. These models have

14
to be created manually for Shape Context and this is a disadvantage if the system
shall be flexible and easily extendible. Furthermore, if the number of object classes
is large the search for the best fitting model has the potential of being very time
consuming.
Two other methods for identifying objects in images are described in [13] and
[14]. Both of these methods use part-based approaches where automatically learned
parts are flexibly combined to identify if and where an object of a specific class is
present in an image. Such an approach could be attractive for an object identifica-
tion without prior segmentation but also has the disadvantage of having to test the
image against multiple object models.
A method that has the potential of identifying and segmenting objects belong-
ing to specific classes is described in [15]. This method combines the bottom-up
segmentation from [16] with the top-down method from [17] in a system that can
segment objects of specific classes quite robustly.
The bottom-up segmentation in [16] takes into account image properties in mul-
tiple scales and in addition to information on pixel intensities it considers boundary
integrity and texture differences. In infrared images the boundary integrity measure
is probably of value since objects are often separated by weak yet consistent edges
while the texture measure will have no importance as concluded earlier.
The top-down segmentation scheme in [17] uses a fragment-based object model
with the different regions of fragments labeled as foreground or background. The
fragments from a database are combined to fit an object in the considered image
and to cover it as completely as possible. The figure ground labelings that were
previously assigned to the isolated fragments can then be applied to the complete
object. In [18] the method is extended to automatically learn the object models,
something that is also used in the combined algorithm.
An advantage with the combined method is that it uses the coarse segmentation
from the top-down model to identify the object and the bottom-up results to find
the true boundaries of the object. A disadvantage is again the time aspect: The
search for the best fitting model is potentially tedious if a large number of object
classes are to be considered.

2.2 Segmentation by grouping repetitive structure


Instead of using the classical bottom-up approach discussed in section 2.1, a more
top-down oriented approach can be used which can take advantage of a-priori in-
formation about possible image contents. This second approach can be very useful
since a successful bottom-up segmentation can be very difficult to achieve, and quite
often you have a certain knowledge about what kind of objects that might be repre-
sented in the image. In the case of electrical installations, for example, it is known
that in a majority of the images, there will be certain objects that are repeated one
or more times (e.g. fuses in a three-phase system). Since electrical installations
tend to be made in a systematical, and often symmetrical, order there will also be

15
a high degree of spatial regularity in the images.
The problem of detecting repeated or regular objects in an image can be seen
as a special case of the problem of finding irregularities in an image. In [19], the
problem of detecting irregular objects in images and video is addressed. In this
paper the image is divided into sets (ensembles) of patches, where the patches can
have different scale and size, and use statistical models to determine whether these
sets of patches exist elsewhere in the image. The method can also be used to detect
irregular or unfamiliar objects or behavior given a pre-specified database of images.
To increase the individuality of each feature or patch, the relative locations of the
patches in each set are stored together with the gradient information of the set’s
patches. With slight alterations, the method may be used to detect patterns or
repeated objects in an image.
In [20] an approach to finding symmetry, even under perspective distortion,
within an image is presented. Since a repeated structure imposes symmetry in the
image this work is relevant to ours.
In another approach to solving the problem of finding repeated objects, the task
can be broken down into two separate steps: (i) finding interesting features in the
image and describing these using pre-specified descriptors, and (ii) comparing all
the features and look for matches.
In [21], it is concluded that the SIFT algorithm [22] features one of the best
descriptors today. The SIFT algorithm also include feature matching, which makes
it highly interesting for detecting repeated objects. The SIFT algorithm searches
for distinctive features among the extrema of multi-scale Differentials of Gaussians,
and then describes these features as descriptors of gradient direction histograms.
These descriptors are both scale and rotation invariant, and are compared to each
other using simple Nearest-Neighbor classification [23]. In the original paper, all the
features in one image are compared to the features of a database image in order to
find any occurrences of objects from the database image in the first image. Similarly,
the comparison can also be done internally in an image by simply comparing the
features in an image with each other. It is then possible to find repetitions of
features, something that might indicate that multiple similar objects exist in the
image.
Applying SIFT features within images to detect symmetry is very recently done
in [24]. This simultaneous and independent work uses strategies similar to the
ones presented in this thesis and presents efficient methods for detecting bilaterally
and rotationally symmetric figures. [25] is also simultaneous and explores texture
regularity in real images.

2.3 Decision making


Since the ultimate goal of the project is to develop a decision support system differ-
ent approaches regarding machine learning, pattern recognition and decision making
have to be considered. An overview of these topics are given in [26]. In this book,

16
decision support systems are referred to as Expert Systems and defined as “a system
that uses human knowledge captured in a computer to solve problems that ordinar-
ily require human expertise”. Although the book mostly concentrates on rule-based
systems, it also discusses the use of learning systems such as neural networks. It
also features information on knowledge acquisition and general tips on building and
maintaining expert systems.
As concluded earlier in this chapter, using infrared images to make decisions
about the condition of electrical installations is a fairly original subject. However,
infrared images are used for decision support systems in a few other situations.
Among the categories where decision systems using infrared images are being used
are different military applications and medical diagnostics systems. The biggest of
these uses is probably the military field of Automatic Target Recognition or ATR.
In ATR, the objective is to automatically find targets and classify these as
friendly or enemy. ATR is a rather large area of research and a search on Google
for the subject yields about 150 000 hits1 . However, since the research on ATR is
mostly performed as military research, the amount of information available through
the Internet is probably not proportional to the actual amount of research done on
the subject. The use of infrared images is quite widespread in ATR, probably much
because of the frequent use of infrared cameras in military applications [27]. A few
examples of ATR performed on infrared images can be found in [28] and [29]. The
classification work that is done within ATR is, however, in most cases quite far from
the problem of deciding whether a fuse in an electrical installation is broken of not.
A field that lies much closer to the work done in this thesis is the use of infrared
images for decision support systems in medicine.
Infrared imaging is sporadically used in several disciplines of medicine in appli-
cations as different as breast cancer scans and analysis of burn trauma[30]. Among
these, the discipline where IR is most frequently used is in the detection of breast
cancer [31]. In this field there has also been attempts to implement decision support
systems to make the diagnosis of patients easier. In [32] a general suggestion for
the design of an decision support system is discussed, and in [33], a specific system
is presented which uses a neural net for the decision making. Both these works
are possibly some of the most closely related works to the electrical thermography
decision support system of this thesis.
The description of the properties of a region will typically be represented in a
multidimensional vector (constructed from temperature histograms, region gradient
information or similar). The classification of such a vector as belonging to one of
several possible classes is the essence of the decision.
The classification can be conducted in a number of different ways. If the clas-
sification problem is fairly simple and the different clusters are well separated (i.e.
the descriptors are successful) an algorithm like Nearest-Neighbor (NN) [23] will be
sufficient. If the classification problem is more complex and the data is not linearly
separable a more powerful classifier must be used. One such classifier is the Support
1
www.google.com, on May 3rd 2006.

17
vector machine (SVM) as described in [34] and [35]. The SVMs in their original
forms are a linear classifier but can be extended with a transform of the feature
space so that non-linearly separable clusters can be separated. Another possible
method for classification is Artificial Neural Networks (ANN) [36]. ANNs can be
taught to construct an arbitrarily complex separation surface. Another popular
state-of-the-art classifier is AdaBoost [37]. AdaBoost is short for Adaptive Boosting
and is adaptive in the sense that the classifier is iteratively adjusted in favor of those
data points that were misclassified in previous iterations.
All the above classification methods assume that training vectors are labeled as
belonging to one of several classes. If such a division is not possible to do or not
suitable, unsupervised clustering of the data can be applied. Such clustering can be
performed through for example K-means [23] or Self-organizing maps (SOM) [36].
Another technique often applied in computerized decision making is Bayesian
networks [38]. Such networks are based on graphs where each node represents a
state and the arcs specifies the independence assumptions between the states. There
exist efficient computation techniques to calculate the probability of a certain state
given observations of other states in the network. The problem is that posterior
probabilities between connected nodes have to be known or guessed.
In this project the posterior probabilities were not known and guessing them
would be hard since the image database was not large enough to make statistically
sound estimations. Because of this limitation Bayesian networks were rejected as a
decision mechanism.

18
Chapter 3

Segmentation and Region-finding

This section will discuss the identification and extraction of regions of interest from
images. The first section will discuss the traditional Bottom-Up approach to image
segmentation. In the second section, a different approach to finding regions of
interest will be discussed. This approach builds on the assumption that there will
exist repeating objects or features in most of the images used, and can be classified
as a Top-Down approach.
Large portions of the work in this chapter was performed in collaboration with
David Wretman, both as joint work and as individual work pieced together. Sections
3.2.2 and 3.2.3 were performed by David Wretman, and sections 3.1, 3.2.4 and 3.2.6
were performed as joint work together with David Wretman. This is also stated at
the end of each of these sections.

3.1 Bottom-Up segmentation


The most obvious and perhaps most naive way to try to divide an image into the
two categories, object and background, is via a so called bottom-up segmentation.
This approach does not rely on any a priori information of the possible objects in
the image but rely strictly on image data, i.e. pixel intensities, to divide the image
into segments.
To evaluate the possibility to use a bottom-up strategy to divide the images of
interest into object and background, three state-of-the-art segmentation algorithms
with implementations available on the Internet were evaluated. Figure 3.1 shows
samples of the results obtained during this evaluation, the original image in figure
3.1(a) and segmentation results on this image in figure 3.1(b) – 3.1(d). Figure
3.1(b) shows the segmentation results obtained with the JSEG system described in
[10]. Figure 3.1(c) shows result with the EDISON system which implements the
Mean Shift segmentation described in [8]. Finally figure 3.1(d) shows results from
a Normalized Cuts segmentation described in [9].
As can be seen, a strict bottom up segmentation of the image into object and
background is not very successful. As have been stated there are two typical mis-

19
(a) The original image. (b) The image segmented with JSEG.

(c) The image segmented with EDISON. (d) The image segmented with Normalized
Cuts.
Figure 3.1. Segmentation experiments.

takes a segmentation algorithm can make: dividing an object into multiple regions
and merging object parts with the background. Both these mistakes are obvi-
ous in the examples. Specifically, the majority of pixels belonging to the middle,
colder, fuse are in all the three example segmentations grouped with the background.
Hence, it was concluded that a pure bottom up segmentation of the images would
not produce results that were accurate enough to use as a base for further analysis.
The work described in this section was performed together with David Wretman,
and a similar section can be found in his thesis [1].

3.2 Finding repeating structures


As mentioned in section 2.2 a potential way of finding interesting regions in the
considered images is to find repeating structures. Repeating structures are present
in almost every image of electrical installations. This is due to the fact that electrical
installations often are made in a very structured and symmetrical way. Another
factor is the presence of a three-phase system, where there will be at least three

20
similar copies of most components. Such repeating structures can be considered as
an indication of the existence of multiple similar objects in the image. If working
properly, similar objects ought to have the same temperature. If not, you can
conclude that the function of the installation is not optimal. Thus, by comparing
temperature properties of regions with the same geometric appearance, conclusions
on the function of the installation can be made.
To be able to find a repeating pattern in an image one approach is to identify
distinctive features in the image, describe the features and compare them with each
other to find similar regions within the image. If a number of such matches are
found in the image you can further investigate the mutual properties between the
matches to strengthen the hypothesis of a repeating structure. An indication of a
repeating structure from matching feature points is if pairs of matching points have
the same relative translation. In the following sections a stable feature detector and
descriptor, a simple matching procedure and a voting procedure to find a probable
translation between objects are described.

3.2.1 A review of the SIFT algorithm


There exist numerous algorithms to find and describe features in images. The
purpose of all these algorithms is to find interesting (in some respect) features and
describe them as efficiently as possible. The way in which this is done is also what
distinguishes the algorithms from each other. The goal of the process is to detect as
distinctive features as possible, and to represent them in a way that is both compact
and invariant to transformations. This representation of features, the end product,
is called a descriptor.
In 2003, Mikolajczyk and Schmid studied some of the most common descriptors
to see if any one performed better than the others [21]. The result of this study
was that Lowe’s SIFT descriptors [22] generally outperformed the other ones. This
played a great role in the choosing of SIFT as the algorithm used in this project.
The SIFT algorithm is also one of the most commonly used algorithms for matching
of images, another strong reason for why it was chosen for this problem. There are
also implementations of it available on the Internet, a reason that should not be
forgotten.
The SIFT algorithm is normally used to find occurrences of specified objects in
an image. The features in one picture is compared to the features of a database
image, containing the object that one wishes to find in the new picture. This is
done by first identifying the interesting features in each picture, calculating the
descriptors for these, and then comparing the descriptors. If enough matches are
found, the object in question is concluded to be present in the second image. The
SIFT algorithm is quite successful performing this kind of matching, much because
it is very robust to transformations of the object that is to be found. The algorithm
can be modified to search for multiple occurrences of the same object in one image,
how this is done is explained in the following sections.
The original SIFT algorithm is best described by breaking it down into the three

21
separate steps mentioned above: identifying features, computing the descriptors for
the features and matching the features with each other.
The identification of features is done on multiple scales in accordance with Lin-
deberg’s theory on scale-space [39]. However, SIFT does not use Laplacians of
Gaussians as is described by Lindeberg, instead it uses Difference of Gaussians
(DoG), a good approximation of the former [22].
The DoGs are calculated by blurring the image with Gaussian kernels of two
different variances, thus producing two slightly differently blurred images. The
difference of these two pictures, the DoG, is then stored. The original image is then
subsampled, producing a smaller image, and another DoG is produced from this
image. This procedure is then repeated a number of times, creating a pyramid of
DoGs. To find the desired features, the local extrema of each level of the pyramid
are found and then compared to the corresponding pixels in the two adjacent levels
of the pyramid. Only those points that are extrema in all three levels are considered
to be well defined features.
The next step is calculating the descriptors for each of the features. Much of
the SIFT algorithm’s strength lies in its descriptors. The descriptors are invariant
to scaling and rotation and are also robust to other affine transformations. To
achieve scale-invariance, the information for the descriptors is taken from the same
scale that it was detected in. An area of 16 by 16 pixels around the feature’s central
point is chosen, and the gradients of the subsampled image (of the same scale as the
feature) is calculated. The area is then divided into 16 squares arranged in a 4 by
4 pattern. The gradients within each square (16 gradients, one per pixel) are then
binned into 8 different direction-bins, creating a direction histogram. The histogram
is then normalized to further increase the robustness against intensity differences.
This procedure is repeated for each feature, creating an array of descriptors, where
each descriptor is 128 elements long.
To find the best match for each feature, the nearest neighbor is simply chosen as
the feature with the smallest Euclidean distance, i.e the feature whose descriptor is
closest to the first feature’s descriptor (in the 128-dimension descriptor space). To
get a good measure on whether the match is good enough, the two best matches for
each feature are compared, and a relative measure is computed (best match divided
by next best match). Only the matches that have a relative value above a certain
level (typically 0.8) are considered to be distinct enough. To find out whether
the selected matches represent one of the objects in the database, or if they are
just random matches from reasonably similar objects, a Hough-like transform is
performed where each match votes for the object it “belongs” to. By selecting the
object that got the highest number of votes, a highly probable object match is made.
In this project, the SIFT algorithm is used to find reliable features and calculate
robust descriptors for these features, i.e. step one and two from the paragraphs
above. The matching from step three is done in a slightly different way, which will
be described in the following sections.

22
3.2.2 Matching features
To be able to use the detected SIFT features to find repeating structures in the image
the descriptors of the points must be compared to detect similar image regions. In
this project the comparison is done in the simplest possible way: The Euclidean
distance between the descriptor vectors are calculated and pairs of features with
distances below a certain threshold are considered similar.
Although the SIFT descriptors are scale invariant only features of the same
scale are compared. This is due to the fact that in our problem, in contrast to
the original intended use of the SIFT features, features from the same image are
compared. Thus, if a detail in the image is detected as a feature at a certain scale
the corresponding detail in a similar object will be detected at the same scale (if it is
detected at all). Unfortunately, this is not always true. Because of the discreteness
of the concept of scale, nearly identical details in an image can appear as features of
adjacent scales. Therefore, one improvement of the system would be to also match
features of adjacent scales, thus making the system more robust.
The strategy of simply considering pairs of features with a sufficiently small
mutual distance as being similar is not very reliable. It will generate many false
matches if the threshold is chosen too high and will miss many true matches if the
threshold is chosen too low. Fortunately, this is not a problem since at this stage
it is sufficient to find possibly matching features. False matches will be removed in
following stages of the system. Because of this the threshold is chosen generously
not to miss any interesting matches.
Since the descriptor vectors are normalized to unit length, the maximum distance
between a pair is 2.0. In the system the threshold is set to 1.0. Experiments
have shown that this is a suitable threshold that removes clearly false matches. In
practice the system will manage without this removal of false matches but it reduces
the computation time of the following steps.
The work described in this section was performed by David Wretman, and this
section is borrowed, with permission, from his thesis [1].

3.2.3 Finding most probable translation


The next step towards finding repeating structures is to try to identify the most
probable translation between repeated objects in the image. Since pairs of matched
features have been identified it is possible to compare the positions of these and
find the most common difference in position between the pairs. This difference in
position can thus be concluded to be a probable translation between similar objects
in the image.
The identification of the most probable translation is conducted via a Hough-
transform like voting. Consider the image I of size w × h with an image pixel
denoted as i(x, y). The translation t between two Image points im (xm , ym ) and
in (xn , yn ) is thus defined as tmn = (xn − xm , yn − ym ). Define a matrix A of size
2w × 2h and let each element auv of A represent a specific translation t.

23
The voting procedure proceeds as follows: A is initialized to contain all zeros.
Each pair of possible matching feature points are considered in turn. The translation
between two possible matching points are calculated and the element of the matrix
A corresponding to this translation is increased by one. The pair has now “voted”
for it’s translation and the next pair is considered. When each such pair has been
considered, the maximum of A is located and thus the most common translation
between two matching points is known. This translation is considered as a possible
translation between repeating structures in the image. This information together
with the information of which pairs that voted for the winning translation can now
be used to cluster the image features and find interesting regions of the image, more
on this in section 3.2.5.
The above procedure is very sensitive to small variations in feature location since
a precision of location of one pixel is needed for two pairs of matches to vote for the
same translation. This is a very hard constraint since features are located in images
with different levels of subsampling and blurring and thus of varying precision in po-
sition. Additionally, the actual objects depicted in the image might not be perfectly
aligned and hence cause slight variations in the actual translations between objects.
The risk in such a situation is that very few matches that actually represents the
same conceptual or sought after translation is found and considered. The scenario
where multiple matching features with the same approximate translation in similar
objects do exists without any of them voting for the same exact translation is not
uncommon. This risk must of course be eliminated or at least reduced considerably.
One means of doing this is to reduce the necessary precision for two translations
to be considered as equal. This can be achieved by reducing the number of “bins”
in the voting matrix A and letting each bin represent a small number of different
translations. This approach gives significantly better and more robust results than
the original procedure but it is still not optimal. Because of the discrete nature
of this approach translations with very close actual translations can still put their
votes in different bins. Therefore a less discrete voting procedure is desirable.
To keep the idea of lowering the precision while reducing the discreteness of the
voting, another concept that proved to be efficient was introduced. The concept
was to keep the original finely divided matrix A but to blur the votes with an
approximation of the two dimensional Gaussian function. This meant that each
voter voted mainly for its actual translation but also voted partially for nearby
translations. The height of the Gaussian was always set to 1.0 to make the center
vote of all voters be of the same importance. To make the larger uncertainty of
position at higher scale influence the voting, the variance of the Gaussian was set
to increase by a factor of 1.5 for each scale. This is the same factor the image is
subsampled and smoothed with at each new scale of the feature extraction process
in the SIFT implementation used in this project.
To make the uncertainty of position increase with a factor of 1.5 for each scale
the standard deviation, σ, of the Gaussian was calculated according to
p
σ = 0.5 1.5(n+1) (3.1)

24
(a) SIFT features of the first test image. (b) SIFT features of the second test image.

(c) The winning features after voting, the lines (d) The winning features after voting, the lines
illustrates which features that are considered illustrates which features that are considered
as being pairs. as being pairs.

Figure 3.2. Results from running the SIFT feature detector, a Nearest-Neighbor
feature matching, and the Hough-like voting procedure on two test images. In the
voting procedure translations between features that are considered as being similar
enough according to the Nearest-Neighbor matching are examined. The most com-
mon translation is identified and only pairs of features with this approximate mutual
translation are allowed to proceed to the next step in the processing. The images
both depict different types of electrical installations.

where n is the scale. This will increase the variance of the blurring function by a
factor 1.5 for each scale. The factor 0.5 proved to give an initial uncertainty which
was appropriate in this particular situation.
After the blurred voting, the maxima of A is extracted as before and the element
with the highest number of votes (no longer an integer number) is considered as the
winning translation. These blurred votes give results that are stable and insensitive
to small differences in translation between matching objects. Figure 3.2 illustrates
the results of the blurred voting. Figures 3.2(a) and 3.2(b) shows extracted features
from two test images, and figures 3.2(c) and 3.2(d) shows the features that voted

25
for the winning translation. The lines in figures 3.2(c) and 3.2(d) illustrates the
winning translations.
The larger variance of the Gaussian blurring function at higher scale gives a
larger total amount of voting power to features at higher scale. This can be inter-
preted as giving more importance to features at higher scale and thus producing
biased results. On the other hand as only one single element of A is picked as
winning the total voting power of a single feature is of small importance.
The alternative to a larger total voting power for features of larger scale would be
to normalize the sum of the weights in the blurring function to unity. This, although
sounding attractive, will cause a large bias towards giving more importance to
features of fine scales. Consider a situation where multiple feature pairs on a coarse
scale have the same translation and one single feature pair has another translation.
If the sum of the weights would be equal for all feature pairs, the single feature pair
of the fine scale could beat the features of the coarse scale since it’s center vote has
a much bigger value than multiple center votes of feature pair on coarser scales.
Thus, the alternative of setting all center votes to 1.0 is more attractive.
Another fact that has to be considered is that the direction of the translation
is of no importance. The translation t from (xm , ym ) to (xn , yn ) is the same as
the translation −t from (xn , yn ) to (xm , ym ). Therefore, for every translation t
voting is done both at the location corresponding to t and −t in A. A will thus
become diagonally symmetric around the point corresponding to the translation
vector t = (0, 0). In looking for the maxima of A we therefore only has to consider
half the matrix. Figure 3.3 shows the voting matrix A for the image features in
Figure 3.2(a).
The symmetry of the matrix in Figure 3.3 is clearly visible. The peak labeled
1 is the highest one and corresponds to the translation illustrated in Figure 3.2(c).
The peak labeled 2 corresponds to the translation between the leftmost and the
rightmost fuse in Figure 3.2(a) and the peak labeled 3 corresponds to matches
between features in the upper and lower parts of the same fuse. The peaks labeled
4 and 5 corresponds to diagonal matches between features in upper and lower parts
of different fuses.
The values of the six highest peaks can be seen in table 3.1. The large difference
in value between the highest and the second highest value (more than a factor 2)
indicates that the method is robust.
The work described in this section was performed by David Wretman, and this
section is borrowed, with permission, from his thesis [1].

3.2.4 Finding complementary features


As can be seen from figure 3.2 the same features are not always detected in all
the repeated objects. For the grouping of features belonging to the same object
and the extraction of interesting image regions it would be advantageous if they
were. Therefore, an attempt to find complementary features to complete the feature
extraction is made.

26
Figure 3.3. The voting matrix A for the image features in Figure 3.2(a). The max-
ima of the matrix corresponds to the most commonly occuring translations between
matching features in the image. The symmetry of the matrix about the center is
due to that the translatons t and −t are in fact the same. Therefore, every pair of
matching features increase the matrix value at two symmetric positions. The values
of the six highest peaks can be seen in table 3.1. The grayscale has been transformed
according to s = ln(0.1 + r) for illustrational purposes.

Peak number 1 2 3 4 5 6
Value 8.38 3.82 1.66 1.94 1.31 1.66
Table 3.1. The values of the peaks labeled 1–6 in figure 3.3.

The first step towards finding complementary features is to group the identified
pairs of features into “rows”. This is accomplished with algorithm 1.
Once the row has been constructed you can follow the row and look to the sides
of it to see if additional features can be found that match the ones already in the
row. This is done by calculating the SIFT descriptor (of the appropriate scale) for
the region surrounding the possible new feature point one translation to the side of
the features at the end of each row. Next, the descriptor vector of the new region is
compared to each of the descriptor vectors of the features already in the row. The
new region is considered as a feature if the mean distance between it’s descriptor
vector and the descriptor vectors of the features already in the row is smaller than
some threshold. If the region is similar add it to the row and look for additional
features to the side of this one.

27
Algorithm 1 Construct rows
construct a list of all pairs giving each feature a unique number
while the list is not empty do
remove the first pair in the list and define this pair as a new row
while true do
search the list and find any of the newly added members of the row still
present in the list with another partner
if occurrence is found then
add the partner to the row
else
break
end if
end while
end while

The threshold for a region to be added as a feature is chosen more conservatively


than was the case in the previous matching. This is due to the desire to suppress
false matches so that the rows are not extended if no true object exists to the side
of the ones already identified. The threshold was experimentally calibrated and a
value of 0.8 was found to give satisfactory results.
The images 3.4(a) and 3.4(b) shows complementary features found in the two
test images from Figure 3.2. As can be seen these features complement the previ-
ously found ones in a satisfactory way.
To make the finding of complementary features more robust an experiment was
conducted that attempted to locate such features not only at the exact position
one translation away but also at positions close to this one. A descriptor was
calculated first at the most probable location and if this descriptor did not give
a match new descriptors surrounding this position was calculated. Unfortunately,
these experiments did not improve the situation much. As it turned out, almost no
new matches were found on positions close to the original one if the original position
did not match. Therefore, this addition was not included in the final implementation
of the test system.
Another attempt to make the finding of complementary features more robust was
also tried. This time, the computational costs were considered from the beginning,
and instead of calculating new descriptors for each new position, a simple cross-
correlation measure was used. A window was chosen around the proposed position
of the new feature, to be used as the search area. A smaller area of approximately the
size of the last feature in the row was picked as the template for the comparison.
Both the search window and the template was chosen from the subsampled and
blurred image used when calculating the original descriptors (see section 3.2.1). The
normalized cross-correlation between the feature template and the search window
was then calculated, and the point with the highest correlation was picked to be the
most probable position of the new feature. A new descriptor was then calculated

28
(a) Complementary features of the first test (b) Complementary features of the second test
image. image.

(c) The final regions and corresponding fea- (d) The final regions and corresponding fea-
tures. tures.
Figure 3.4. Results from finding complementary features and grouping the features,
same test images as in figure 3.2. (a) and (b) shows complementary features to the
rows identified in Figures 3.2(c) and 3.2(d). These features are found by looking
one translation to the side of the end features of each row and calculating a feature
descriptor at this location. This descriptor is then compared to the rest of the feature
descriptors in the row and if it is considered similar it is added to the row. This is
repeated until no more similar features are found. (c) and (d) shows how the features
can be grouped and the regions that are extracted.

at this position, and the proposed new feature was then submitted to the same
thresholding criteria as described earlier in this section.
This new method proved to be more successful in finding new features, and much
less computationally demanding. It increased the chances of finding complementary
features significantly, with hardly noticeable computational costs.
The work described in this section was performed together with David Wretman,
and a similar section can be found in his thesis [1].

29
3.2.5 Grouping image features
After successfully finding a set of matching features, there still remain the problem of
grouping them together as belonging to the same, or different, objects. The problem
might seem simple at a first glance, mostly because this is a problem humans are
very good at solving, but can be harder than it seems.
It is commonly accepted that people, more or less, automatically group similar
or proximal objects together, something that was discovered by the German Gestalt
group of scientists in the early 20th century [40]. The Law of Similarity and the Law
of Proximity, as they called them, states that objects that are perceived as similar
or close to each other are seen by our mind as belonging together. Computers,
however, do not have the same natural talent for grouping objects, so this has to
be done explicitly.
There exists a number of problems that has to be overcome in order to succeed
in grouping the features. Firstly, one does not know how many repeating objects
there exists in the image. Secondly, it is not, initially, obvious in which order the
features match to each other, or in which direction they are to be ordered. And
thirdly, one cannot be positively sure that every feature is present in each object,
some features may be absent in some objects, e.g. because of occlusion or poor
focusing. Altogether these problems make the grouping task non trivial.
Fortunately, most of these problems are relatively easily solved individually. By
connecting all the matches that have one feature in common, the relative order of
the features can be found (in fact this is already done to find the end-nodes when
searching for complementary features, see previous section). The absolute order
(given a “normal” direction) can be found by comparing the direction of a set of
matches to the direction of the most probable translation. With all the sets of
matching features ordered, one can make a qualified guess on how many occur-
rences there are of the sought after object. If the feature completion is reasonably
successful, it is quite probable that the most common number of repeated features
is the same as the number of repeated objects in the image. If only the sets of
features with as many features as there are (presumably) objects in the image are
considered, and the sets have been ordered in the same order, the features can be
grouped into objects by simply numbering them, starting from the same “side”.
Unfortunately, this method is not foolproof. If the feature completion process
is unsuccessful, an erroneous guess for the number of objects can be made. If
the wrong number of objects is assumed, one or more objects might be missed in
the grouping, or the grouping might fail altogether and group features of different
objects together. The latter can, for example, happen if there is one feature missing,
alternatively to the right and to the left in each row of features for a set of objects,
as in Figure 3.5. This will result in a zig-zag shaped object being constructed, since
the method will assume that the first feature in each row belongs to the first object,
which will not be the case then.
In an attempt to reduce the frequence of this problem, a few different approaches
to the determining of the number of objects were tried. The first approach was to

30
Figure 3.5. The danger of underestimating the number of objects in an image. If
different features are absent in the different objects, this can lead to severe misinter-
pretation of the object’s form. The dashed line represents the believed object’s form
(but not the actual chosen regions).

assume that the highest number of repeating features would always be the correct
number of objects. This would take care of the problem of underestimating the
number of objects, e.g. in the case of there being 4 rows with 2 features in each,
and 3 rows with 3 features each (and the actual number of objects in the image
were 3). In this case, picking the highest number of repeating features would yield
the correct answer (3), while using the most common number of features (2) would
result in an erroneous answer.
However, this would also lead to severe misinterpretations in the case of over-
estimating the number of objects, e.g. with 1 row of 4 features, and 4 rows of 3
features (with three actual objects in the image). For these cases the highest num-
ber approach would result in an entirely wrong set of regions (because of too many
repeating features being found).
To combine the most common number and the highest number approaches, the
dominant number approach was proposed. The idea behind the dominant number
approach is to weight the importance of the repeated features so that rows with
high number of proposed objects will be regarded as more important than the rows
with lower number of proposed objects. This is done based on the assumption that
it is less probable to overestimate the number of objects than the other way around.
The weights, wdom are assigned to the results simply as the number of proposed
objects, nobj :
wdom = nrows × nobj (3.2)
where nrows is the number of rows of the same amount of features. The number
with the highest wdom is the dominant number, and the proposed number of objects
in the image. This technique of guessing the number of objects in the image works
sufficiently well in most case. However, the question still remains of how to treat

31
Figure 3.6. Erroneous grouping of features can sometimes lead to overlapping ROIs

Student Version of MATLAB

situations where there are two guesses with the same weight (e.g. 4 rows of 3
features, and 3 rows of 4 features). With the method used here, the guess with the
highest number of features is chosen (in this case, 3 rows with 4 features each), but
how to deal with this kind of problem still remains an open question.
In [1], Wretman proposes another approach to the grouping of features. This,
the grid approach, groups the features by assuming that the objects are oriented
perpendicular to the chosen translation. In tests, this approach is approximately
equal to the dominant number approach in terms of performance. However, the
two approaches differ in terms of limitations, the dominant number approach has
problems with groupings that the grid approach does not, and vice versa. A possible
fusion of these two methods is discussed in section 5.3, Future work. The results
from the tests done with the two feature-grouping methods can be seen along with
the results of the entire repeating structures algorithm in Table 3.2 of section 3.2.6.
After having grouped the features as belonging to certain objects, an area sur-
rounding the features is chosen as a ROI. To achieve a good fit around the objects
independent of the objects orientation, the features are assumed to be circular in
shape and of a size corresponding to the scale in which they were found. The ROI
is then chosen as the convex hull of all the features in an object, see Figure 3.4(c)
and Figure 3.4(d).
When using the grouping method described above, there is always the possibility
that one or more features that do not belong to an object is grouped together with
the features of that object. When this happens, the complex hulls of two adjacent
objects can become overlapping, i.e. two different objects are partially described by
the same area of the image, as seen in Figure 3.6. This phenomenon occurred in
15 of the 74 images, or approximately 20 percent of the times. This is of course an
unwanted situation, which might make future classifications and decisions based on
the regions more difficult.

32
Figure 3.7. Sometimes overlapping regions are caused by features that are not inside
another region. Since only features inside other regions are considered for removal,
this situation will lead to a deletion of the wrong feature. In this case, the middle
feature from each row will be removed, leading to regions covering more than one
object.

To resolve this kind of situations, overlap between regions is monitored by cal-


culating the union of all regions. If an overlap is detected (i.e. the union of two
regions is non-zero), the feature (or features) responsible for the overlap is removed.
The most basic case of this is when only one feature overlaps another region, as
is the case in Figure 3.6. In this case, the overlapping feature is removed, along
with the other features in the same row. When more than one feature overlaps
another region, the row of features to be deleted is selected on the criteria of which
one shortens the circumference of the regions convex hull the most. If the removal
of this feature cancels the overlap, the process is stopped; if not, the process is
repeated until there is no overlap.
This method for avoiding overlap has one disadvantage, however; features that
are not overlapping another region will never be removed, even if they might be the
real reason for the overlap. When this happens as in Figure 3.7, correctly classified
features will be removed until there is no overlap.
This removal of the wrong features happened in 3 of the 15 registered cases of
overlap. Since the amount of images is so small, it is not possible to say that this
is a representative rate. However, if it were representative, it would mean that this
phenomenon happened in 20 percent of the cases of overlap, a quite high number.
For this thesis, 3 cases was not considered enough for the problem to be dealt with,
instead it was considered as a candidate for future work.

3.2.6 Results
The algorithm described in this chapter was tested for performance using the set of
74 images mentioned in 1.2.4. The tests were performed with all of the parts of the

33
algorithm put together.
Two tests were conducted, one using the dominant number approach to the
grouping of features (see section 3.2.5), and one using the grid approach to grouping
described in [1]. The outputs from the test were the regions of interest, meant to
indicate where in the image the repeating objects were located. The rate of success
of which these regions were chosen by the algorithm were classified as one of five
grades: Good, OK, Incomplete, Incorrect and No regions, where Good is the best
grade and Failure mode 3 the worst. The grade given to each result is meant to
reflect how well the regions correspond to the actual objects of the image. Good,
means that the regions are chosen in a satisfactory way, containing all the relevant
parts of each object in the image. Results labeled “Borderline” have well chosen
regions, but one or more objects are not covered. For Failure mode 1, all the
objects are found, but the regions do not cover them satisfactory. Results classified
as Failure mode 2 have erroneous regions, e.g. regions perpendicular to the real
objects, and the results classed as Failure mode 3 are the ones where no regions
are found at all. Examples of regions and images that have been labeled with the
different classes can be seen in Figure 3.8.
The results of these two tests can be seen in Table 3.2.

Dominant number Grid


# images % of total # images % of total
Good 35 47 32 43
Borderline 7 9 8 11
Failure mode 1 9 12 11 15
Failure mode 2 22 29 15 20
Failure mode 3 1 1 8 11
Table 3.2. The results from test done with the “Finding repeating objects algo-
rithm” described in this chapter. The table contains the results from using both
the “Dominant number” grouping-approach described in section 3.2.5 and the “Grid”
grouping-approach from [1]. The grades on the left hand of the table refers to how
well the ROIs were chosen, and the results refer to how many of the 74 images used
that were classified as belonging to each grade.

The results classified as Good and Borderline are both considered to be possible
to be used in the classification in chapter 4. Thus, if these two groups are considered
as one, the system returns satisfactory results for respectively 58 and 54 percent of
the images used.
The work described in this section was performed together with David Wretman,
and a similar section can be found in his thesis [1].

3.2.7 Implementation
All work done on the finding of repeating structures were implemented using Matlab
Student Edition 7.0 with Matlab’s Image Processing Toolbox. The code for finding

34
Regions of image A05 Regions of image B030811

(a) Good (b) Borderline


Regions of image G050108 Regions of image Bec11

(c) Failure mode 1 (d) Failure mode 2

Figure 3.8. Examples of four of the five classes used in section 3.2.6. The fifth class,
Failure mode 3, is only used on images where no regions are found, hence this class
is not shown here.

SIFT-features was based on code by Scott Ettinger [41], and the code for calculating
SIFT-descriptors was based on code by Daniel Eaton [42].
The repeating region finding algorithm took on average 11 seconds per image to
compute, running on a Pentium 4 3.0 Ghz PC with 1 Gb of memory.

35
Chapter 4

Classification and Decision-making

In this chapter, the final step of the system, the decision making, is described in a
number of steps. Firstly, a discussion about what information can be used to build
a solid ground for the decision is followed by a discussion on how the decision can
be carried out. After this, the construction of an Artificial neural network along
with the results of the network will be presented. Last, some problems that have
been encountered will be explained, followed by a discussion about other possible
solutions to the decision-making problem.

4.1 Information representation


In any decision system one of the most important question is how to represent the
knowledge on how the decisions are to be made. In the many older decision support
systems, knowledge is most often represented in the form of rules [26], something
that is known as a rule-based system. In rule-based systems, the knowledge is
formulated as if -then sentences; if the sun is shining, then the weather is nice. This
is a very logical and intuitive way to to represent knowledge, and it is very frequent
in our societies, one examples is in laws; if you steal something, then you are a
thief. However, rule-based systems make for very strict and unforgiving systems,
and generally have no grey-zones whatsoever. This is one of the reasons why we
do not have absolute law systems, and why we use judges or juries to determine
the guilt of a person instead of computer programs. For some applications this
type of systems can be good because you always know how the system will react,
since it can only act from the set of rules you built into it. For many applications
though, a more adaptive system is needed. This system needs to have the ability
to generalize, handle noisy data, and to make decision on situations which it has
not been programmed for. A typical class of systems that apply to these requests
is the class of learning systems. In the case of learning systems, the focus is not
so much on how to formulate the rules, but rather on how to formulate actions
and reactions that help to explain the meaning of the rules. Instead of telling the
system; if the stoplight is red, then stop, you show the system a red stoplight and

37
tell it to stop, and if you repeat the same procedure for long enough, the system
will learn to stop at the red light. The two procedures can be compared to human
learning. The rule-based procedure can be likened to educational learning, where
someone tells you how you are supposed to react, and the leaning system procedure
can be likened to empirical learning, where you learn by doing.
So, with learning systems, the representation of knowledge is not really the
problem, instead the representation of information is what matters, how do we
present the system with the information needed for it to learn what we want it to
do? Here, the objective is thus, how to represent the thermal image in a simple but
meaningful way?
The thermal image contains a very large amount of information. If every pixel is
regarded as a variable, each image is a set of 76800 variables (for a 320 x 240 image),
and the amount of possible combinations is astronomical. A system that could
handle that much information would have to be very large. It is necessary to reduce
the amount of variables drastically. The first step is taken already in section 3.2,
where the image is segmented into smaller regions. These smaller regions of interest
can of course vary significantly in size, but they still only cover somewhere between
a fifth and tenth of the original image, meaning that the amount of information is
cut down a good bit. However, this is not nearly good enough as this still leaves
about 10000-15000 variables left to process. Additional, and more radical ways to
quantize the data is needed.

4.1.1 Temperature values


So, how to represent the information from the ROIs in the simplest and shortest
way? One way is to start from the real world and what values and measurements
a thermographer would use. In [4], the general methodology of thermography is
explained. The most common measurements that are available in most cameras
are spot-meters, and min, max and average of areas. Of these values, max and
average value are of the most interest for this project. The first value used as
a feature parameter is the maximum temperature of a ROI, here named Tmax .
As pointed out in section 1.1.4, the absolute max temperature is an important
measurement, but it can also be misleading. A max temperature of 70◦ C is quite
high for an electrical component, but it is much more alarming if the surrounding
temperature is 20◦ C, than if the surrounding temperature is 50◦ C. To address this
issue, another measurement was used as well, here called Tmax Relative . For this, a

background temperature is calculated as the largest peak of the histogram of the


whole image. Tmax Relative is then calculated as the normalized difference between the

background temperature and Tmax .


In this thesis, it was decided that a separate decision was to be made for each
region in the image (other possible options would have been to make one decision
for the entire image or to judge pairs of regions). Tmax and TmaxRelative are both indi-

vidual measurements that only apply to the region where it is measured, but it is
also of interest to somehow measure the temperature difference between regions. In

38
section 1.1.4, the phase-to-phase temperature difference is mentioned as an impor-
tant measurement, and in most cases where repeating objects are found, the objects
are related to the different phases. Here, average temperatures are used to calcu-
late the temperature difference between objects/phases. Since there often are more
than two objects in an image, there will be more than one difference value. To make
sure that the number of parameters do not change with the number of objects, the
median of the difference values is calculated and used as feature parameter. This
parameter is here called ∆Tavg .

4.1.2 Histograms
A popular way to determine the similarity between two images is the use of his-
tograms and histogram distances. The idea is that images that have similar his-
tograms are, to some extent, similar also when it comes to content and appearance.
To determine exactly how similar, or how close to each other, two separate his-
tograms are, a histogram distance measure is used. There exist a number of differ-
ent histogram distance measures, starting with the Euclidean norm as the simplest,
and with Earth Mover’s Distance which involves solving an optimization problem,
being among the most advanced.
For this project, histogram distances were deemed to be a suitable measurement
on how much one ROI differs from the other regions found in an image. A histogram
was computed for each detected region in the image, using the same number of bins
and the same range for all the histograms. The number of bins for the histograms
was empirically set to 20, and the range of the bins were adjusted for each image
so that there would be as few empty bins as possible.
A variety of different histogram distance measurements were implemented and
evaluated, starting with the Euclidean Norm (here abbreviated to Euc).
v
u n
DEuc (H1 , H2 ) = t (H1 (i) − H2 (i))2
uX
(4.1)
i=1

The variables used in equation 4.1 are DEuc , H1 and H2 , where DEuc is the Eu-
clidean norm, and H1 and H2 are the two compared histograms. H1 (i) denotes the
i:th bin in the first histogram.
Two other histogram distance measures are featured in [43], the Bhattacharyya
distance (bha, eq. 4.2) and the Matusita distance (mat, eq. 4.3). Both these distance
measurements are presented as giving significantly better results compared to the
Euclidean norm.
n q
X
Dbha (H1 , H2 ) = −ln (H1 (i) × H2 (i)) (4.2)
i=1
v
u n q q 2
uX
Dmat (H1 , H2 ) = t H1 (i) − H2 (i) (4.3)
i=1

39
The Bhattacharyya distance has the disadvantage of being unstable for very small
histogram values or the case of two disjoint histograms. In the latter case (where
either of the histograms is zero, for all bins), it even results in an infinite distance
value.
One of the most popular histogram distances is the χ2 distance [44] which is
described in equation 4.4.
n
X (H1 (i) − H2 (i))2
Dχ2 (H1 , H2 ) = (4.4)
i=1
(H1 (i) + H2 (i))

The χ2 distance have a similar instability problem to the Bhattacharyya distance,


although it is enough that both H1 (i) and H2 (i) are zero for one i for Dχ2 to reach
infinity. To take care of these problems, a small bias,  is added to each histogram.
 is set to be small enough not to make any difference to the correct values, but
large enough to cancel the instability problems (a typical value is 0.0000001).
Another quite popular, and in many ways different, distance measurement is
the Earth Mover’s Distance [45] or EMD. What sets EMD apart from the other
methods is firstly that it is developed with computer vision and similar fields in
mind. Secondly, and what really sets it apart is that it not only measures the
distance in space between the histograms, i.e. the difference of the same bin in two
histograms. EMD also gives relevance to the distribution of the histograms, i.e. the
distance between different bins in different histograms.
In the EMD algorithm, the first histogram is seen as a number of piles of earth,
each bin a separate pile, and the second histogram as a number of holes. The
object of the EMD algorithm is to find out how much “earth” that needs to be
moved, and how far, to fill the holes and remove the piles. This problem is a
variation of the transportation problem, a classical linear optimization problem.
Thus, to calculate the EMD measure between two histograms, an optimization
problem needs to be solved, which of course means that the EMD has a significantly
higher computational complexity than the other, ordinary, distance measures. On
the other hand it is also an intuitive and logic way to calculate histogram distances,
and has many documented cases of success.
As in the case with ∆Tavg , the number of histogram distance measurements will
increase with the number of regions in the image. To make sure that only one value
is used, the distances to all other histograms in the image is calculated for each
region, and the median of these is used as the parameter histdist.

4.1.3 Gradients
Another idea for obtaining suitable values for the decision part was to make use
of the gradient of the segmented part of the image. As is already mentioned in
section 1.1, the gradient can be a valuable clue to whether a part is broken or not,
or where in an image the true fault is located. As stated, a smooth gradient in the
direction away from the error position is a good indication of an overheating. With

40
this knowledge, it is easy to assume that the gradient might be an interesting source
for a feature vector.
However, after numerous attempts, the idea was abandoned. This was much
due to the difficulty of finding the true gradient. True in this case refers to the slow,
stable temperature change within the boundaries of an object, as opposed to the
quick, short temperature change that appears on the boundaries. The problem of
finding the true gradient originates from two separate, but connected, issues.
The first issue stems from the way the gradient is computed. When computing
the gradient of an image (which is a discrete set), a discrete approximation of the
two derivatives (dx and dy) is made. This is most often done by convolving the
image with two Sobel kernels [46], one for the x-axis and one for the y-axis. Since the
Sobel kernel is larger than one pixel (obviously), the operation also acts as a blurring
of the image, which accentuates sharp edges. Even though infrared images have low
contrast, there still exist sharp boundaries, especially between warm objects and the
background. These boundaries will be accentuated, and enlarged when the gradient
is computed, typically with about one pixel on each side of the original boundary
(for a 3 by 3 kernel). This means that the areas within the boundaries of objects
will be diminished. This is not significant in the case of larger objects, but in the
case of small or thin objects, typically for wires that can be only a couple of pixels
wide, the effect can be that the gradient of the insides of the objects disappears
completely. This is a significant problem since the edge-gradient is mostly of much
less interest than the internal gradient.
The second issue is closely connected with the previous one and stems from
the fact that the true gradients are not dominant in a normal image. In fact, the
magnitude of such a slow temperature change is proportional to the background
noise, and much lower than the magnitude of the changes of the edges of objects.
This means that even in those cases where the effects of the first issue are not
significant, it will still be difficult to find the true gradient, since it has about the
same magnitude as the background noise.
Because of these issues, the gradient was never implemented as a feature vector.
However, in [1], Wretman successfully extracts the true gradients by using image
profiles, and then proceeds to use these gradients as an indication of errors. Further
discussions about the possibilities of this technique can be found in section 5.3,
Future work.

4.2 The dataset


As mentioned in section 4.1, the input to the decision system is a number of ex-
tracted values from regions of interest in the image. The regions are meant to be
chosen as the parts of the image that contain the relevant information. One way to
do this is described in section 3.2. However as is also mentioned in the same section,
the kind of region selection used here is not perfect, and sometimes no regions, or
erroneous regions can be selected. Because of this, and because of the low amount

41
Regions of image Ir0075 Regions of image Ir0075

(a) Automatically selected regions. (b) Manually selected regions.

Figure 4.1. An example of the difference between manually and automatically


selected ROIs.

of suitable images available, the regions used for retrieving the data set for the de-
cision system, were selected by hand. This meant that all of the images could be
used as input data, instead of approximately half of the images (see section 3.2.6).
The regions were selected by specifying a polygon hull around the object of interest
in the image. To make sure that the regions include all relevant parts of the object,
the image were inspected in color
Studentand
Versionthe span of viewed temperaturesStudent
of MATLAB adjusted, in
Version of MATLAB

order to detect the edges of the object (see section 1.1.4). The hand-selecting of the
regions also lead to a higher consistency, knowing that the true region of interest
(e.g. the actual hot spot, but not necessary the entire object) were always included
in the regions. An example of this and of the difference between selecting regions by
hand and by automation can be seen in Figure 4.1 (this is an illustrative example,
not all automatically selected regions are this bad).
After the manual selection of regions, 233 regions were available as potential
data sources. Since 233 still is a rather low number of data, if it is to be used for
both training and testing, the same data was used a number of times, but with
the testing subset selected differently each time. A fifth of the available data were
chosen as the testing subset; the system was then trained and the result evaluated.
After this, another fifth of the data were chosen as testing set, and the procedure
was repeated. By using this technique and choosing the test set differently, every
image in the data set can be used for the training of the system, and a more reliable
value for how the system performs can be obtained. To lower the risk of selecting
regions of the same kind, the test subset were chosen by selecting every fifth data.
The first test set would thus contain the 1:st, 6:th, 11:th, 16:th and so on, region,
and the next set would contain the 2:nd, 7:th, 12:th 17:th and so on, region.
Even though the method of selecting every fifth sample give a nice spread to the
regions chosen for the test set, it does also mean that quite frequently, regions from
the same image will be present in both the test set and the training set. This can
lead to problems with the reliability of the results. The impact this problem may

42
have is discussed in section 4.5.
Additionally, the systematic selection of every fifth sample from the data set
does not pay any heed to the target value associated with the sample. This would
not have mattered if the data set had been well balanced between positive (faulty
part) and negative (no fault) samples. However, since the dataset is based on all
objects found in the images, there are more negative samples than positive (158 of
233). This may cause problems with the accuracy of the decision. This issue is
discussed further in section 4.5, and 5.3.

4.3 Classification of data


As has been mentioned earlier, there exists a number of different methods or algo-
rithms that are used to classify, or (as it is often described here) make a decision
on, data. In section 4.1 the distinction was made between learning systems and
rule-based systems, where learning systems are explained to be, in most cases, su-
perior to rule-based systems. There also exist a separate distinction between the
different available classification methods. The distinction is made between whether
the algorithm is said to be Generative or Discriminative [47].
In generative classification methods, a model of the system is constructed from
the data, often using probability density functions (PDFs). The model can be
used to create, or generate, artificial data, hence the name generative. Generative
classifiers have the advantages of being able to handle missing data, and being
able to handle the addition of new classes without affecting the old classes. Some
examples of generative classifiers are: Bayesian networks, Fuzzy logic classifiers and
Gaussian Mixture Models.
Discriminative classifiers, on the other hand, construct no models or other infor-
mation about the underlying system of the data. Instead, discriminative classifiers
only tries to find the optimal mapping from the provided input to the provided
output, often using planes or areas to separate classes from each other. The classifi-
cation is generally very discriminative, no object or input can belong to two classes,
as indicated by the name. Compared to generative classifiers, discriminative classi-
fiers have the advantages of being very fast in classifying new samples, and generally
giving better performance. The reason behind the better performance lies in the
dicriminativity of the classifiers, they are trained to tell classes apart, and nothing
else. Among the discriminative classifiers are: Nearest-neighbor classifying, Support
Vector Machines and Neural Networks.
For this thesis, Artificial neural networks were chosen as the classifier to be
evaluated. The choice of ANN was based on that ANNs are a very well tested
method, that usually gives good results on most applications. The speed of the
classifying of new data can also be of high importance in the case of a real decision
system. Another reason for choosing neural networks is the existence of the Neural
Network Matlab Toolbox, which makes the process of creating and training a neural
net much easier than if one had to create the network from scratch.

43
The other classifiers mentioned above are discussed further in section 4.6.

4.3.1 Artificial Neural Networks


As mentioned in the previous section, Artificial neural networks were chosen as the
classifier, and implemented using the Matlab toolbox. If the reader is familiar with
neural networks and Matlab’s Neural Network Toolbox he or she may skip to the
next section. Otherwise, here follows a brief explanation of the both.

Neural Networks
Artificial neural networks are, as the name implies, an artificial model of real neural
networks. The term real neural networks refer to the vast networks of interconnected
neural cells that exist in the brains of animals and humans.
These neural cells, or neurons, act as summations, collecting the sum of the
inputs to the neuron and sending a corresponding output onwards to the next
neurons [36]. The connections between neurons can vary in size. Typically the
size is proportional to how often the connection is used, a connection that is used
often is larger than a connection that is used more seldom. The difference in size
of the connections leads to varying levels of the signals being transmitted between
cells, thereby giving varying relevance to the signals transmitted. It is commonly
believed that our ability to learn and remember is related to the shifting sizes of
these connections.
Artificial neural networks try to copy this behavior using variables and oper-
ations in computers. The neurons of the brain are replaced by nodes, and the
connections by weights. A network can consist of an arbitrary number of nodes,
arranged in any kind of pattern. In the case of this thesis, a feed-forward backprop-
agation network is used, which consist of three or more layers of nodes. The first
layer is the input layer, which consists of as many nodes as there are inputs to the
system. The last layer is the output layer which, in turn, consists of as many nodes
as there are outputs from the system. In between these layers are the hidden layers
(typically one or two) which consist of an arbitrary number of nodes. The number
of nodes in the hidden layers can affect the behavior of the net greatly and must
therefore be chosen individually for each network.
The weights of each node are updated during a training phase, just as the size
of the connections of the neurons. The updating of the nodes can be done in many
different ways, known as training algorithms. Common for all training algorithm
that uses supervised learning (as do the ones used in this thesis) is that the output
of the system is compared to a specified target output, and the error between these
two is used to update the weights. The objective of these training algorithms is
thus to make the output error converge, hopefully to the global minimum.

44
Matlab’s Neural Networks Toolbox
The Matlab Neural Networks Toolbox provides a simple tool for creating and train-
ing neural networks. A number of different network configurations, as well as train-
ing algorithms, are featured, and the user basically only needs to specify which
combination of features he or she wants.

4.3.2 Constructing the Neural Net


Even though the use of the Matlab Neural Network toolbox makes the creation of
neural nets easy, there are still a number of different parameters that needs to be
set for the net to work.
Firstly, the type of network needs to be determined. For this problem a Feed-
forward Back-propagation network was chosen on the basis that it is the most
general type of network for data classification. Next the number of hidden layers
needs to be decided upon. Since one hidden layer is enough to approximate any
continuous mapping [36], one hidden layer was deemed to be sufficient for this
project. The number of inputs to the network is given by the number of data
available. In this case four variables were used (section 4.1), and thus four input
nodes were defined in the network. The output was chosen as a single node since
the target data is binary with “1” indicating a defective part, and “0” indicating a
correctly working part.
This leaves two more parameters that needs to be determined; the number of
nodes in the hidden layer and the type of training algorithm. Two different train-
ing algorithms were considered, Levenberg-Marquardt (LM, or trainlm in Matlab)
and Resilient Backpropagation (RBP, or trainrp in Matlab). These two were cho-
sen on the basis of being among the fastest and most efficient training algorithms
[48]. Levenberg-Marquardt is regarded to be the fastest algorithm for training
feed-forward networks up to a size of about 100 weights, especially for function ap-
proximation problems, and it is also very well implemented in Matlab [48]. Resilient
Backpropagation on the other hand is regarded to be fast and efficient for pattern
recognition problems, and also handles larger networks better than LM.
Since the amount of training data was rather limited in this project, and be-
cause of the many possible different scenarios that the system is supposed to handle,
the network will need to have a good ability to generalize. Two different methods
designed to increase the generalization of a network were evaluated, Early Stopping
(ES) and Bayesian-Regularization (BR). In early stopping, the idea is to only keep
training as long as the generalization increases, and then stop before any overfitting
occurs [36]. In order to achieve this, a second set of test data, called validation
data, is chosen from the initial data set. During training, the mean square error is
computed for both the training data and the validation data. As long as the error
for the validation set is decreasing, the training continues, but when the validation
error has increased for a number of consecutive iterations this is taken as a sign of
that the network is starting to overfit, and the training is stopped. Bayesian Reg-

45
ularization achieves the same thing as ES, but using a different procedure. In BR,
the performance function is modified so that it forces the network to have smaller
weights. This in turn makes the chances of overfitting in the network smaller [48].
The modification of the performance function is done automatically in the Matlab
function trainbr, which is based on the Levenberg-Marquardt training algorithm.
Since the LM algorithm is incorporated by default in the trainbr function, BR was
never tested with the Resilient Backpropagation training algorithm.
Numerous tests were carried out with these different training methods to deter-
mine which one yields the best results, and how many nodes that are needed in the
network. The results of these tests are presented in section 4.4.

4.4 Results
In this section, a number of tests are performed on the networks described in section
4.3.2. In order to be able to use the results from these tests, a good measurement
on how the network performs is needed. The Matlab standard is to display the
performance of a networks as the mean of the square error (MSE). The error in
question is the distance between the output and the target, i.e. how much the
output signal differs from the target data of the test set. This is a very good
measurement on how well a network approximates a target function, and thus also
a good measurement on how the network performs. However, it is not a very
intuitive measure when it comes to classification problems, since it does not give
any indication on how large part of the set of data that is actually correctly classified.
Therefore, another measure was computed to complement the MSE measurement.
For this measurement, the network was fed with input from the test data set. The
output of the network (which lies approximately within the range of 0 to 1) was
thresholded at 0.5, creating a binary signal. This signal was then compared to
the target data set, and the number of mismatches were counted. The number of
mismatches were then presented as a percentage value, the error rate.
Since it is also of interest to know how accurate the classifications are, the
number of uncertain or inconclusive classifications were also counted. In this case,
the inconclusive classifications are defined as having a value between 0.3 and 0.7, i.e.
these are classifications that may be correct, but that are less confident (remember
that the values are supposed to be 0 or 1). This value is also presented as a
percentage, the inconclusive rate.
Because of the fact that the weights of neural networks are initiated randomly,
the result from two otherwise identical networks can differ to a certain degree.
In an attempt to overcome this problem, and to get a more constant value on
the performance of a network, each separate setting of a network was initiated,
trained and tested a number of times. The number of times this was done is in this
paper referred to as rounds. The means of the values from these rounds were then
computed, and these values are the ones that are used and displayed in this paper.
The standard deviations of these values were also calculated and presented here.

46
4.4.1 The network parameters
As mentioned earlier, there are a couple of parameters that need to be set in order
for a neural network to function. Some of these parameters can be chosen with
common sense and some knowledge about the data. However, some of these pa-
rameters, primarily the number of nodes, and the training algorithm, are not as
easily determined. Although there exist some guidelines and rules of thumb as to
how they are to be chosen, these guidelines are also subject for criticism [49]. In
this project, a more thorough method of determining parameters is used; testing
the different options for performance.
Three different tests are carried out; the first two using early stopping for in-
creased generalization, one using Levenberg-Marquardt training and the other one
using Resilient Backpropagation. The third test was carried out using Levenberg-
Marquardt training with Bayesian Regulation. In all three tests, all parameters
were held fixed, except for the number of nodes in the hidden layer. The results
where calculated for 2, 4, 6, 8, 12, 16, 32, 64 and 128 nodes, and plotted in the
graphs in Figure 4.2(a) - 4.2(f).
As seen in these graphs, the most interesting part of the scale is somewhere
between 2 nodes and 20 nodes. This is consistent with what one would expect given
the number of inputs and outputs. It is also easily visible that, contrary to the theory
mentioned earlier, the RBP training algorithm gives significantly higher error rate
than the LM algorithm. The reason for the poor results of the RBP training method
seems to be that it never reaches the global minimum of the system, instead it gets
stuck in a local minimum. This also seems to be confirmed using other training
methods as a reference; steepest-descent methods also fail, while the quasi-newton
methods (that are quite similar to LM) have higher rates of success.
Because of the failure of the RBP algorithm to achieve a global minimum, it
was discarded for the later tests. Instead, focus was put on the range 2 to 20 nodes
for the LM algorithm. In order to highlight the behavior in this region another set
of tests were carried out, this time with the number of nodes set to 2, 4, 5, 6, 7,
8, 9, 10, 12, 14 and 16. The results from these tests can be seen in Figure 4.3(a) -
4.3(d).
In these “close-ups”, the behavior of the network can be seen more clearly.
Clearly, the number of nodes does not influence the general performance to such a
great degree, since the error rate only fluctuates a couple of percents, neither for the
LM-ES or the LM-BR algorithms. The fluctuations of the error rate are also much
smaller than the standard deviation, and can therefore not be considered as entirely
reliable. However, there is a clear tendency in the LM-ES mean square errors with
the lowest values around 12 and 14 nodes. In the LM-BR case, the results are not
as clear, but a small decrease in the standard deviation and MSE is notable for 6
and 9 nodes. Since a specific number of nodes needed to be chosen, a comparison
between the four configurations (two for each training algorithm) was made. For
this test, a higher number of rounds were used, 100 instead of the previous 10, in
order to determine a more robust performance value. The outcome of these tests is

47
MSE of training and test Error rate (test data) with std deviation
0.2 30
Train error Test error(%)
0.18
Test error Plus std dev.
25
0.16 Minus std dev.
Mean squared error

0.14 20

Percent errors
0.12
15
0.1

0.08 10

0.06
5
0.04

0.02 0
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140
Number of nodes Number of nodes

(a) LM-ES (MSE) (b) LM-ES (Error rate)

MSE of training and test Error rate (test data) with std deviation
8 55

7 50

45
6
Mean squared error

40
Error rate (%)

5
35
4
30
Student Version of MATLAB Student Version of MATLAB
3 Test error(%)
25
Plus std dev.
2 Minus std dev.
20
Train error
1 Test error 15

0 10
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140
Number of nodes Number of nodes

(c) RBP-ES (MSE) (d) RBP-ES (Error rate)

MSE of training and testing Error rate (test data) with std deviation
0.2 35
Train error
0.18
Test error 30
0.16 Test error(%)
Mean squared error

25 Plus std dev.


0.14
Error rate (%)

Minus std dev.


0.12 20

0.1 15
Student Version of MATLAB Student Version of MATLAB
0.08
10
0.06
5
0.04

0.02 0
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140
Number of nodes Number of nodes

(e) LM-BR (MSE) (f) LM-BR (Error rate)

Figure 4.2. The results of tests done with networks to determine the optimal training
algorithm and number of nodes. The left-hand figures show the Mean square error of
the training and test of the networks, and the right-hand figures show the Error rate
in percent. Note that figures 4.2(c), 4.2(d) have different scales compared to the rest,
this is because the RBP training algorithm used in these tests failed to reach a global
minimum. (LM = Levenberg-Marquardt,
Student Version of MATLAB RBP = Resilient Backpropagation, ES of=MATLAB
Student Version

Early Stopping, BR = Bayesian Regulation, MSE = Mean Squared Error, Error rate
= percent erroneous classifications)

48
MSE of training and test Error rate (test data) with std deviation
0.1 25
Train error Test error(%)
Test error Plus std dev.
0.09 20 Minus std dev.
Mean squared error

Error rate (%)


0.08 15

0.07 10

0.06 5

0.05 0
2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16
Number of nodes Number of nodes

(a) LM-ES (MSE) (b) LM-ES (Error rate)

MSE of training and testing Error rate (test data) with std deviation
0.1 25

0.09 20
Mean squared error

Error rate (%)

0.08 15

0.07 Student Version of MATLAB 10 Student Version of MATLAB


Test error(%)
Plus std dev.
0.06 Train error 5 Minus std dev.
Test error

0.05 0
2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16
Number of nodes Number of nodes

(c) LM-BR (MSE) (d) LM-BR (Error rate)

Figure 4.3. Test results of the second test to determine optimal training algorithm
and number of nodes. These plots are “close-ups” of the ones in Figure 4.2. (LM
= Levenberg-Marquardt, ES = Early Stopping, BR = Bayesian Regulation, MSE =
Mean Squared Error, Error rate = percent erroneous classifications)

Student Version of MATLAB Student Version of MATLAB


shown in Table 4.1

Algorithm Nodes MSE Error rate (%) Std. dev. (%) Inconcl. rate (%)
LM-ES 12 0.079 9.13 3.84 13.18
LM-ES 14 0.087 9.51 4.13 12.72
LM-BR 6 0.084 10.00 7.15 16.19
LM-BR 9 0.090 10.86 7.69 17.68
Table 4.1. Resulting values of the four best combinations of training algorithms and
number of nodes.

As can be seen in the table, Levenberg-Marquardt with early stopping yields the
best results with an error rate just above 9 percent and a standard deviation below
4 percent. Given the similarity of the results, it is probable to say that the two
methods are equally good from a result perspective. Early stopping, however, has

49
the advantage of being faster than Bayesian Regularization, which would perhaps
make it a better candidate in those cases where training time is important. Early
stopping also has an overall lower standard deviation than BR.

4.4.2 Manual or automatic segmentation


To determine how much difference the selection of regions makes on the results,
another test was conducted using both regions that had been segmented by the
algorithm described in section 3.2, and manually segmented regions. The question
was whether it might be possible to automate the training procedure, and thus make
it easier for the user to set up the system. For this test, two sets of segmented images
were created, both containing the same images, but with one having regions selected
manually and the other having regions selected automatically. The same number
of objects were detected and segmented in each image, and the same diagnosis
(target) were used for the both sets. Two identical networks were created, using the
parameters determined earlier in this chapter, i.e. 12 nodes and LM-ES training.
The first network was trained with the data from the manually segmented images
while the other network was trained with data from the automatically segmented
images. Each network was then tested with data from both the manually segmented
image set as well as from the automatically segmented image set. To improve the
reliability of the test values, this procedure were repeated 100 times (rounds). The
mean values from this test are displayed in Table 4.2.
The reason why the result for the manually trained, manually tested network
(i.e. the normal case) is different from the result presented earlier is that only a
subset of the original data set was used in this test. The reason for this is that only
those images that the algorithm in section 3.2 segmented sufficiently well has been
used, meaning that only 103 of the original 233 data entries (objects) were used.

Manual train set Automatic train set


Manual test set 8.98% 10.74%
Automatic test set 13.76% 14.81%
Table 4.2. Mean error rates for networks trained and tested with manually or
automatically segmented images

As could be expected, the results for the network with manually segmented train-
ing data and test data is better than those from the network using automatically
segmented data. The low value obtained when testing the “auto-trained” network
with the data from the manually segmented set is rather surprising. It seems that,
according to these results, it is more important to have well segmented images when
using the network, than when training it. Unfortunately, this is not a highly prob-
able scenario in real life where the system is supposed to act as automatically as
possible. However, the results do show a slight, but not very big, decrease in the er-
ror rate for the networks that are trained with the manually segmented data. In the

50
realistic case, i.e. with automatically segmented test data, the result only differed
little more than one percent between using automatically and manually segmented
image data. The error rate for the worst case scenario, i.e. using automatic training
as in the question asked earlier, was only approximately 15 percent. The conclusion
from this test is thus that it could be possible to implement an “automatic” training
of the system, although using manually segmented regions will give slightly better
results.

4.4.3 The feature parameters


In an attempt to determine the relevance of each of the feature parameters used
as input to the network, a number of test-rounds were computed with different
inputs. Four set-ups were created, each one using only one of the parameters as
input to both training and testing. Apart from this, the same settings were used as
in the earlier tests, and the mean error values from 100 rounds were computed along
with the standard deviations. The idea of the test is that the feature parameter
that contains the most information should render the lowest error rate, and vice
versa for the parameter with the least information. The results from this test are
displayed in Table 4.3.

Tmax HistDist Relative


Tmax ∆Tavg
MSE 0.28 0.25 0.16 0.33
Error rate (%) 23.6 36.4 15.1 31.0
Std. dev. (%) 6.5 7.1 4.8 5.5
Table 4.3. Results from individual tests of parameters to determine relevance

As can be seen in the table, using only Tmax Relative as input yields the lowest error

rates followed by Tmax . This indicates that the max temperature of an object is the
most important feature in this classification system, which might not be all that
surprising, given that most electrical errors are connected with over-heating. This
is true to an even higher degree for the set of images used as data in this project,
almost all the images depicting errors have an over-heated object in them. However,
the results also show that none of the feature parameters is superfluous; they all
have an error rate well below the 50 percent that could be expected from an input
with no relevant information.
This result is also consistent with a test conducted using the Matlab Neural
Network Toolbox’s Principal Component Analysis (PCA) function prepca. The
prepca function will perform a PCA analysis, trying to reduce the dimensionality of
the input data and return the data represented in this new feature space. Along with
the input data, it is also possible to specify how large part of the original information
the function is allowed to remove in the reduction of dimensionality. By modifying
this input value, and monitoring the output of the function, approximate values
for the input data’s contribution have been found. The feature parameter with

51
the least individual information contributes with approximately 9 percent, while
the second least informative parameter contributes with approximately 16 percent
of the total information. The third least (or second most) informative parameter
contributes with approximately 30 percent and the most informative with more than
30 percent (since it is not possible to reduce the dimensionality to any lover than
one, it is not possible to obtain any more precise values for the last parameter).
This again demonstrates the legitimacy of all of the parameters.

4.4.4 Histogram distance


The performance of the five different histogram distance measures described in
section 4.1.2 were evaluated using a network with the same parameters as the earlier
tests. The histogram distance method used to compute the values for input to the
system was changed using each of the five algorithms once. The tests were carried
out both using the three temperature feature parameters described in section 4.1.1
together with the histogram distance, and using only the histogram distance as
input. The results from the tests can be seen in Table 4.4 and 4.5 and the ROC
curve for the test done using only the histogram distance as input can be seen in
Figure 4.4.

Euc Bha Mat χ2 EMD


Error rate (%) 9.8% 9.7% 9.4% 9.4% 9.5%
Std. dev. (%) 5.0% 4.2% 4.6% 4.2% % 4.8
Table 4.4. Resulting values from tests performed to determine the optimal histogram
distance measure. Tests performed using all four feature parameters as input.

Euc Bha Mat χ2 EMD


Error rate (%) 37.6% 36.0% 36.1% 37.4% 31.2%
Std. dev. (%) 9.6% 7.8% 8.6% 8.2% 5.8%
Table 4.5. Resulting values from tests performed to determine the optimal histogram
distance measure. Tests performed using only the histogram distance as input.

As can be seen in the tables above, the choice of histogram distance measure has
very little or no effect on the outcome of the system, and this especially when all
four feature parameters are used. This result is consistent with the results obtained
in section 4.4.3. However, in the second test case, where only the histogram measure
is used, some difference between the methods can be observed. The Euc measure
results in slightly higher error rate and standard deviation, and the EMD measure
results in a lower error rate and standard deviation. This tendency can also be
seen in the ROC curve in Figure 4.4, where the EMD measure is represented by the
topmost curve and the Euc measure by the lowest. The EMD curve lies significantly
further away from the diagonal, while the Euc curve is very close to it, indicating

52
ROC curve of individual histogram distance test
1

0.9

0.8

0.7
True positive rate

0.6 Bha
Mat
0.5 Norm
χ2
0.4 EMD

0.3

0.2

0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
False positive rate

Figure 4.4. The ROC curves of tests done with the histogram distances as only
input to a neural network.

that the Euc measure in this case is only marginally better than using random
values for this problem.
The ROC curve has been created by adjusting the error threshold in ten steps
from 0.95 to 0.05. The error threshold is the value above which an output from
the network is considered to be an error. The network output lies approximately
between 0 and 1 and the standard error threshold for the rest of the tests is 0.5.
The conclusion that can be drawn regarding histogram distance measures from
these tests is that in this case, the choice of method does
Studentnot matter
Version significantly.
of MATLAB

However, if optimal performance is wanted, in terms of precision, the EMD measure


should be used, and if speed is the issue, any of the Bhattacharyya, Matusita or χ2
methods should be used on accord of their lower computational complexity.

4.5 Discussion
During the work with the neural networks, some minor complications and disad-
vantages with neural networks were detected.
One of the disadvantages of working with a small data set is that the reliability
of the results can sometimes be questionable. This is due not only to the system
being insufficiently trained, but also due to the fact that the amount of data limits
the amount of possible answers. Since the test set used to evaluate the system and
produce the result values, consists of a finite number of data, the result values will
also be a discrete set. In this case, where the test set consists of one fifth out of the
original 233 data (i.e. 46 or 47 data samples in the test set), this means that the
result value (error rate) can only take on 46 different values. This means that the
result will have a resolution of approximately 2 percent! This problem is addressed

53
Test error with std deviation
18

16

14
Test error
Plus std dev.
12

Error rate (%)


Minus std dev.

10

2
6 6 6 6 6 6
Number of nodes

Figure 4.5. The varying result of a network, initiated and trained six times, all
using the same parameters.

and partially solved by making repeated tests and trainings of the network, and
using the mean of the resulting values. The problem with the discrete percent
values is also a contributing factor to the high standard deviation values of the
results.
Another contributing reason to the high standard deviation
Student Version of MATLAB values of the net-

works is the general randomness of artificial neural networks. In the training of


each new network, the weights of the network are initiated to random values. This
results in that networks, although having identical parameters, can result in dif-
ferent outputs, given the same inputs. The effect of this problem, together with
the effects from the problem with the discrete percent values can be seen in Figure
4.5. What can be seen in the figure is the results from one network that has been
initiated and trained six times (using the same parameters), each time resulting in
a slightly different output.
As is mentioned in section 4.2, there is a high possibility that regions from the
same image can be present in both the test set and the training set. This is of
course a serious reliability problem, since the regions are very likely to be quite
similar to each other. Having two similar data in the training and test data is not
as serious as having two identical data, but it can still influence the results to a
certain degree. How big influence it will have depends on how similar the feature
vectors from regions of the same image are, and how large the differences are within
the entire dataset. In an attempt to quantify this similarity/difference, the following
test was performed: The standard deviations of the values from each image were
calculated. The mean of these values were then calculated, and compared to the
standard deviation of the values of the entire dataset. The results from this test
can be seen in Table 4.6. These results show that the standard deviation within
an image is on average about half that of the entire dataset. One clear exception
from this is the TmaxRelative values, where the both standard deviations are almost
Relative value is the most important
identical. This is of high interest since the Tmax

54
Tmax (◦ K) HistDist Relative
Tmax ∆Tavg (◦ K)
Mean std. dev. per image 12.09 0.005 0.28 1.56
Std. dev. of entire dataset 22.47 0.013 0.29 4.68
Table 4.6. Standard deviations of the values of the entire dataset, and mean stan-
dard deviations of the values per image. Test performed to determine how large the
variations within an image is compared to the variations of the whole set.

value according to the tests of section 4.4.3. The conclusion from this test is that
the influence of having regions from the same image in both test and training set
may be significant but probably not very large.
A third possible reason behind the high standard deviations of the results, is
the issue with the unbalanced dataset described in section 4.2.
Since the test set is chosen as every fifth sample from the dataset, there is no
control of how many positive or negative samples there is in each test set. This may
lead to different rates of positive/negative samples in the five test sets, something
that could lead to large differences in the results from each test set. For example,
imagine a network that has a tendency to underestimate faults. In the case of
two sets of data, one with 60 percent positive samples (faults), and one with 30
percent positive samples, the network is likely to have a higher error rate (more
erroneous classifications) for the set with 60 percent positive samples than for the
other set. This is of course because there are more faults (positive samples) to be
underestimated in the 60 percent set than in the 30 percent set. A behavior like
this will lead to high variations in the test results.
The unbalanced data set will also affect the ROC curves. For a balanced dataset
one would expect that the mean of the x-value (False positive rate) and 1 minus the
y-value (True positive rate) from the ROC curve, would be equal to the error rate
for that threshold value. In the case of an unbalanced dataset (as this one), this is
not the case, since the FPR and the TPR are calculated using different sized sets.

4.6 Other possible solutions


In section 4.3, several methods for the decision making are mentioned. In this thesis,
only Artificial Neural Networks have been evaluated, but there exist a number of
other methods that can be considered for use.
The simplest of the discriminative classifying methods is the Nearest Neighbor
method. In nearest neighbor classifying, the training data is represented as vectors
in a feature space, in accordance with most other classifiers. For each new sample
that is to be classified, the distance from the new sample’s feature vector to those
of the training set is computed. The sample is then classified as having the same
class as its closest neighbor in feature space. For larger training sets, the amount
of calculations for each new classification can become quite significant. Nearest
neighbor classification can therefore not be recommended for any more advanced

55
system that requires a large set of training samples.
Support vector machines (SVM) is a fairly new method that has gained a lot of
attention during the last ten years [35][34]. It has been used in a wide variety of
different classification and pattern recognition problems, and is known to give good
generalization results. Just like neural networks, SVM also belong to the group
of discriminative learning systems, but where neural networks are inspired by the
human brain, SVM get the inspiration from statistical learning theory. In contrast
to ANN, SVM does not try to adjust a complicated area to obtain a separation of the
provided data. Instead, SVM transforms the data to a sufficiently high-dimensional
space where a simple hyperplane can separate the clusters of data. Support vector
machines also have the advantage of being easier to set, than for example neural
networks. Given the good generalization ability and the other positive qualities of
SVM, they can be a feasible alternative to ANN here.
Another alternative to neural networks is Gaussian Mixture Models (GMM).
GMM systems belong to the group of generative classifiers, meaning that they
calculate estimations of probability distributions to model the different scenarios of
a system. The probability distributions are created by combining gaussians with
different covariance matrices. The positions and variances of the gaussians are
adjusted by training algorithms (e.g. Expectation Maximization) to fit the training
data as well as possible. GMMs share all the positive properties of generative
classifiers mentioned in section 4.3 and can therefore be regarded as another possible
alternative to ANNs.
Propositions on how to evaluate the performance of these algorithms for the
case of classifying infrared images of electrical applications is presented in section
5.3, Future work.

4.7 Implementation
All work done on the neural networks described in this section was implemented
using Matlab Student Edition 7.0 with Matlab’s Neural Networks Toolbox.
The training of a network with 12 hidden nodes took approximately 0.3 seconds
using the Levenberg-Marquart, Early Stopping, training algorithm. Training the
same network using Bayesian Regulation instead of ES took on average 1.3 seconds.
In both cases, 186 samples were used as training data (4/5 of of the original 233).
The calculation of of the feature parameters all took less than 0.1 seconds per
image to compute, except for when the Earth Mover’s Distance algorithm was used.
Calculating the histogram distance using the EMD algorithm took on average about
1 second per image.
All calculations were run on a Pentium 4 3.0 Ghz PC with 1 Gb of memory.

56
Chapter 5

Summary and Conclusions

This chapter will summarize the work presented in this thesis, and draw conclusions
on the work and results. Furthermore, possible improvements suitable for future
work will be discussed.

5.1 Summary
This thesis has discussed the design considerations of a decision support system to
be used with infrared images of electrical installations. Furthermore it has described
the construction of such a system, that can with decent accuracy find and classify
regions in infrared images.
The system described works in two parts; first a set of interesting regions are
extracted from the image, and secondly, the regions are classified on the base of
information extracted from the same regions.
The process of finding the regions of interest (ROIs) draws on the assumption
that in most images of electrical applications, there exist repeating objects. Based
on this assumption, the system tries to find repetitions of features in an image. The
features used are first extracted using the SIFT algorithm, and are then compared
and matched to each other. To determine how the features are related to each
other spatially, the translations between pairs of features are calculated. The most
common translation is then selected using a Hough-transform like voting. After the
voting, the features that are related to each other with the chosen translation are
selected as being indications of the underlying objects from the image. To recreate
the objects’ locations from the features, the features are grouped together into what
is believed to be reasonably correct representations of the objects. Two grouping
approaches were discussed, the dominant number approach presented in this thesis,
and the grid approach taken from Wretman’s work [1]. The regions of interest (the
repeated objects) are then chosen as the convex hull of the grouped features.
The regions of interest that are the result of the first part of the system are
then used as base for the following classification. Each region of interest is classified
separately using an artificial neural net. For the classification, four parameters are

57
extracted from each region. The four parameters are: Absolute max temperature,
Relative max temperature, Mean temperature difference compared to the other
regions of the image, and Histogram distance to the other regions of the image.
These parameters have been discussed and evaluated thoroughly, and five differ-
ent ways to calculate the histogram distance have been presented. The performances
of the histogram distance measures have also been evaluated using different tests.
The four parameters are used as input to the neural network of the system, and
using these parameters, each region is classified as OK or not OK. A discussion
about how to build the neural network has been presented. Design considerations
for the architecture of the network, the number of nodes in the network, and the
algorithm used to train the network have all been given. Several tests have also
been performed to determine which combination of these parameters that gives the
best results.
Finally, the results of the system have been presented, along with considerations
on the limitations of the system.

5.2 The demonstration system


For the demonstration of the system described in this thesis, a special demonstration
system was constructed. The function of this system is thoroughly described by
Wretman in his work [1], along with documentation of the different scripts and
methods used.

5.3 Future work


During the work of this thesis a number of subjects have been identified as suitable
for future work. Among these are some ideas for improvement of parts of the
proposed system, and also some minor errors that might need attending to.

5.3.1 Finding and grouping features


During the work with finding repeating patterns in images, several ways to improve
the algorithm were considered.

Improved feature extraction


The first part that could be improved with additional work is the feature extraction.
The infrared images used in this project have very little detail, mostly due to the lack
of textures and the less-than-perfect focusing. This lack of detail has an influence on
the number of features that can be extracted from an image. While Lowe mention
2000 features as a mean number for an image of 500 times 500 pixels [22], the average
number of SIFT features for a 320 times 240 pixel infrared image is between 50 and
100, i.e. approximately 10 times as few features.

58
By using another feature extracting algorithm together with SIFT, it could be
possible to extract more features from the images, something that would result in
a more robust and efficient detection of repeating objects. For the new feature
detector to complement the SIFT detector as good as possible, it should use a
distinctly different way of detecting features. The SIFT algorithm detects features
in differentials of Gaussians of images; the new detector would have to look for
features using another approach to find features not already found by the SIFT
detector.
One very promising detector is Matas’ “maximally stable extremal regions”
(MSER) algorithm [50]. The MSER algorithm looks for features that stay sta-
ble through a number of different thresholdings. The technique is related to the
watershed segmentation algorithm and uses only the raw intensity image. Since
it uses the raw image instead of gradients or LaPlacians of the image, it has the
possibility of detecting a different set of features than SIFT.
Another possible way to increase the number of detected features is to use the
Harris detector [51] in addition to the DoG detector used in this thesis. The Harris
detector searches for corners in the image, while the DoG detector searches for
blobs. Thus the Harris detector might find features that were not found using the
DoG detector.
If new features are found (e.g. with the MSER algorithm), these can be used
to complement the old features found with the SIFT detector, resulting in more
features to base the matching on. This could lead to increased stability for the
matching and grouping of objects. It could also lead to better choices of ROIs,
since the selection of these is only based on the locations of the used features.

Multiples of translations
A second target for improvement could be the process of finding the most probable
translation. In the current system only one winning translation is considered. A
possible improvement could be to extend the search for maxima in A to also include
multiples of this winning translation. This would be beneficial since features can
match not only with adjacent similar object but also with objects further away.
This would be especially beneficial in the case of occlusion, for example in the case
where the third of four objects is occluded (e.g. 0 0 X 0). Here, only two objects
would be detected, using the present technique, since all translation between the
objects are different. Allowing multiples of translations would enable finding all
three objects since the translation from the second object to the fourth would be a
multiple of the translation from the first object to the second.

More precise completion of features


Another possible target for improvement is the method for finding complementary
features. Although the improved method described in section 3.2.4 has shown to
increase the method’s rate for success, there is still room for improvement. Since the

59
images used for comparison are blurred and subsampled to a degree corresponding
to the scale of the closest feature in the row, the accuracy of the guessed position
is also dependent on scale. This is because the search window is chosen to be of
the same size (same amount of pixels) in every scale, which translates to severely
different sizes in the original image. A feature of large scale is therefore allowed to
differ much more in position in the original image than a feature of smaller scale.
Another disadvantage that comes from the use of subsampled images is that there is
a loss of information when using subsampled blurred images. If, instead, the original
image was used when calculating the cross-correlation, the resolution would increase
with the size of the features, and thus, the accuracy would actually increase with
scale, instead of the opposite.
However, there is no proof that using the original image in the cross-correlation
would only lead to improvements. There is a risk that using the original image
could lead to a lower robustness of the method, meaning that features of larger
scales would fail to be correctly positioned since they are not similar enough to
the template feature. Such a loss of robustness could lead to less correctly found
complementary features. Using the original image would also lead to higher compu-
tational costs, since that larger windows with higher number of pixels would have
to be used for higher scales of features. Which method that would yield the best
result of these two is left for further investigations outside of the bounds of this
thesis.

More robust grouping


As is mentioned in section 3.2.5, the Grid approach to grouping proposed in [1] have
a different set of limitations than the Dominant number approach presented in this
thesis. Because of this, a fusion of the two grouping methods could be of interest.
This would mean that the two methods could complement each other in the cases
where either of them fails. For example, the union of the regions produced by the
two methods could be calculated, and in those cases where it is below a certain
relative value (i.e. the two sets of regions are sufficiently disjoint) the best of the
two region selections could be chosen. This way it could be possible to reduce
the number of incorrect region selections (those labeled Failure mode 1,2 and 3 in
section 3.2.6).
Another quite simple way to improve the perfomance of the grouping is to
calculate the Fourier transform of the image. If one would look in the very low
range of the spectra there should be a peak corresponding to the number of repeating
objects in the image. The information from this could be used to strengthen the
hypothesis of how many objects are present in an image.
A more robust grouping could also be achieved by solving the problem with
overlapping regions described at the end of section 3.2.5. It might, for example,
be possible to determine which feature is responsible for the overlap based on the
compactness of the region (i.e. removing features and measuring the difference in
circumference of the region).

60
5.3.2 The data set
A larger dataset would be beneficial for all parts of the system.
For future work, the data chosen as test set should also be chosen more carefully.
To ensure more accurate results, the test set should be chosen randomly, but with
respect to the rate of positive/negative values. If the test sets are chosen with the
same rate of positive samples, while at the same time as randomly as possible, it
would hopefully result in lower standard deviations of the results.
In the case of a larger dataset being available, another improvement would be
to make the distinction between training data and test data based on images. That
way, regions from the same image cannot be present in both sets, which can be the
case in the tests performed for this thesis. Making this distinction would lead to
more reliable results.

5.3.3 Classification
There remains a lot of work that can be done on the classification of the ROIs. In
this thesis, only one method for the classification has been evaluated, and only four
feature vectors have been used. In both these fields, future work may reveal new
and possibly better results.

Additional feature vectors


The choosing of feature vectors is often the key to success in classification problems.
In this thesis, only four different feature parameters have been used and evaluated.
This is not necessarily too few, but since the feature parameters play such an impor-
tant role in the classification, it would be of high interest to implement and evaluate
additional feature vectors.
One proposal for an additional feature parameter is the use of image gradients,
which is also discussed in section 4.1.3. Unfortunately, no satisfactory implementa-
tion of this proposal could be achieved in this thesis, however, in [1], a successful
approach for the extraction of gradients is presented. Using this approach to extract
additional features could prove to be an interesting alternative to the information
extraction method used here.

Alternative classifiers
As mentioned in section 4.6, there exist a large number of other classifiers, apart
from neural networks, that can be applied to the classification problem posed in
this thesis. A few of these are mentioned in the same section. It would be of high
interest to evaluate some of these alternative classifiers, in order to determine which
one is best suitable for this problem.
It would also be of very high interest to combine the results from the gradient
finding method of [1] with the results from the neural network. A possible way of
doing this would be to, based on the results from the network, create a hypothesis on

61
the location of an error, and then use the results from the gradient finding method
to strengthen (or weaken) this hypothesis.

Better visual feedback


The tests performed on neural nets in thesis have all been strictly statistical, mean-
ing that there have been no visual feedback from the outcome of the tests. For
future tests it could be of interest to have the option of visually inspecting the re-
gions that were wrongly classified. In this way it might be possible to see if there
are any tendencies in the system in the classification of regions, for example if large
parts of the wrongly classed regions feature occlusions of the original object.

5.4 Conclusions
For the growing number of inexperienced users of thermographic cameras, a decision
support system would be of great help. In this thesis, it has been shown that such
a system can be constructed, at least for a limited set of applications.
A successful system has been implemented that can, fully automatically, find
and classify interesting regions in an infrared image of electrical installations. As a
part of the system, a novel approach to detecting repeating objects or structures in
an image has been developed. The approach, which is based on feature matching
and a Hough-transform like voting, has been shown to be reasonably successful in
extracting ROI from low contrast infrared images. The approach also shows great
potential for improvements.
It has also been shown that an Artificial neural network can be used to classify
the regions extracted by the aforementioned approach. By only using relatively
simple feature parameters from the regions, a success rate of approximately 90
percent has been achieved.

62
Bibliography

[1] D. Wretman, “Finding regions of interest in a decision support system for anal-
ysis of infrared images,” Master’s thesis, KTH (Royal Institute of Technology),
2006.

[2] R. P. Madding and G. L. Orlove, “Twenty-five years of thermosense: an his-


torical and technological retrospective,” in Proceedings of SPIE Thermosense
XXV, vol. 5073, 2003, pp. 1–16.

[3] E. Daniels, E. Fagerlund, D. Glansholm, B. Hartmann, B. Kleman, and P. Lind-


berg, “FOA orienterar om infraröd teknik,” 1975.

[4] “Termografi Nivå 1 Kursmaterial,” Infrared Training Center, FLIR Systems


AB, Rinkebyvägen 19, S-18211 DANDERYD, Sweden, 2006.

[5] G. R. Peacock, “Temperature uncertainty of IR thermal imager calibration,”


in Proceedings of SPIE Thermosense XXVIII, vol. 6205, 2006.

[6] J. Snell and R. W. Spring, “A new approach to prioritizing anomalies found


during thermographic electrical inspections,” in Proceedings of SPIE - Ther-
mosense XXV, vol. 5073, 2003, pp. 222–230.

[7] “Infracam, User’s manual,” Flir Systems, P.O.Box 3, SE-182 11 Danderyd,


Sweden, 2006.

[8] D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature
space analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 5, pp.
603–619, 2002.

[9] J. Shi and J. Malik, “Normalized cuts and image segmentation,” in CVPR ’97:
Proceedings of the 1997 Conference on Computer Vision and Pattern Recogni-
tion (CVPR ’97), 1997, p. 731.

[10] Y. Deng and B. S. Manjunath, “Unsupervised segmentation of color-texture


regions in images and video,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23,
no. 8, pp. 800–810, 2001.

[11] E. Sharon, A. Brandt, and R. Basri, “Fast multiscale image segmentation,” in


CVPR, 2000, pp. 1070–1077.

63
[12] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object recognition
using shape contexts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 4,
pp. 509–522, 2002.

[13] S. Agarwal, A. Awan, and D. Roth, “Learning to detect objects in images via a
sparse, part-based representation,” IEEE Trans. Pattern Anal. Mach. Intell.,
vol. 26, no. 11, pp. 1475–1490, 2004.

[14] E. J. Bernstein and Y. Amit, “Part-based statistical models for object classifi-
cation and detection,” in CVPR (2), 2005, pp. 734–740.

[15] E. Borenstein, E. Sharon, and S. Ullman, “Combining top-down and bottom-up


segmentation,” in CVPR Workshop, 2004, p. 46.

[16] E. Sharon, A. Brandt, and R. Basri, “Segmentation and boundary detection


using multiscale intensity measurements,” in CVPR (1), 2001, pp. 469–476.

[17] E. Borenstein and S. Ullman, “Class-specific, top-down segmentation,” in


ECCV (2), 2002, pp. 109–124.

[18] ——, “Learning to segment,” in ECCV (3), 2004, pp. 315–328.

[19] O. Boiman and M. Irani, “Detecting irregularities in images and in video,” in


ICCV, 2005, pp. 462–469.

[20] T. Tuytelaars, A. Turina, and L. V. Gool, “Noncombinatorial detection of


regular repetitions under perspective skew,” Pattern Analysis and Machine
Intelligence, IEEE Transactions on, vol. 25, no. 4, pp. 418–432, April 2003.

[21] K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,”


in CVPR (2), 2003, pp. 257–263.

[22] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Inter-


national Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.

[23] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification. Wiley-


Interscience Publication, 2000.

[24] G. Loy and J.-O. Eklundh, “Detecting symmetry and symmetric constellations
of features,” in Proceedings of ECCV 2006, vol. 2, 2006, pp. 508–521.

[25] J. Hays, M. Leordeanu, A. A. Efros, and Y. Liu, “Discovering texture regular-


ity as a higher-order correspondence problem,” in Proceedings of ECCV 2006,
2006.

[26] E. Turban, J. E. Aronson, and T.-P. Liang, Decision Support Systems and
Intelligent Systems, 7th ed. Pearson Prentice Hall, 2000.

64
[27] L. Becker, “Influence of ir sensor technology on the military and civil de-
fense,” in Proc. of SPIE, Quantum Sensing and Nanophotonic Devices III,
2006, 61270S.

[28] B. Correia and R. C. Nunes, “Grouping multiple neural networks for automatic
target recognition in infrared imagery,” in Proc. of SPIE, Automatic Target
Recognition XI, vol. 4379, 2001.

[29] F. Zi, K. Zhang, and D. Zhao, “Infrared target recognition based on support
vector machine,” in Proc. of SPIE, International Conference on Space Infor-
mation Technology,, vol. 5985, 2005, 59853M.

[30] J. F. Head and R. L. Elliot, “Infrared imaging: making progress in fulfilling


its medical promise,” Engineering in Medicine and Biology Magazine, IEEE,
vol. 21, no. 6, pp. 80–85, 2002.

[31] W. Cockburn, “Nondestructive testing of human breast,” in Proceedings of


SPIE Thermosense XXI, vol. 3700, 1999, pp. 312–323.

[32] C. Herry and M. Fritze, “Design considerations for a medical thermographic


expert system,” in IEEE EMBS ’03: Proceedings of the 25th Annual Interna-
tional Conference of the IEEE Engineering in Medicine and Biology Society,
vol. 2, 2003, pp. 1252–1255.

[33] J. Koay, C. Herry, and M. Fritze, “Analysis of breast thermography with an


artificial neural network,” in IEEE EMBS ’04: Proceedings of the 26th Annual
International Conference of the IEEE Engineering in Medicine and Biology
Society, vol. 1, 2004, pp. 1159–1162.

[34] V. N. Vapnik, “An overview of statistical learning theory,” IEEE Transactions


on Neural Networks, 1999.

[35] C. J. C. Burges, “A tutorial on support vector machines for pattern recogni-


tion,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121–167, 1998.

[36] L. Fausett, Fundamentals of neural networks: architectures, algorithms and


applications. Prentice-Hall, 1994.

[37] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line


learning and an application to boosting.” in EuroCOLT, 1995, pp. 23–37.

[38] E. Charniak, “Bayesian networks without tears,” AI Magazine, vol. 12, no. 4,
pp. 50–63, 1991.

[39] T. Lindeberg, “Scale-space theory: A basic tool for analysing structures at


different scales,” Journal of Applied Statistics, vol. 21(2), pp. 224–270, 1994.

[40] J. R. Anderson, Cognitive Psychology and its implications, 6th ed. Worth
Publishers, 2004.

65
[41] S. Ettinger, “SIFT Matlab Implementation,” Intel, 2002, [Online; downloaded
21-Feb-2006]. Available: http://robots.stanford.edu/cs223b/MatlabSIFT.zip

[42] D. Eaton, “SIFT Descriptors, Matlab Implementation,” 2005, [Online;


downloaded 24-Feb-2006]. Available: http://www.cs.ubc.ca/~deaton/files/
siftVects.m

[43] B. Huet and E. R. Hancock, “Cartographic indexing into a database of remotely


sensed images,” in Third IEEE Workshop on Applications of Computer Vision
(WACV96), Dec 1996, pp. 8–14.

[44] E. Wahl, “Histogram-similarity criteria,” CVOnline, 2003, [Online; down-


loaded 15-May-2006]. Available: http://homepages.inf.ed.ac.uk/rbf/CVonline/
LOCAL_COPIES/WAHL1/node5.html

[45] Y. Rubner, C. Tomasi, and L. J. Guibas, “A metric for distributions with


applications to image databases,” in ICCV, 1998, pp. 59–66.

[46] R. C. Gonzalez and R. E. Woods, Digital Image Processing. Addison-Wesley


Longman Publishing Co., Inc., 2001.

[47] I. Ulusoy and C. M. Bishop, “Generative versus discriminative methods for


object recognition,” in Proceedings of IEEE CVPR 2005, vol. 2, 2005, pp. 258
– 265.

[48] M. B. Howard Demuth and M. Hagan, Neural Network Toolbox User’s Guide,
5th ed., The MathWorks Inc., March 2006.

[49] W. S. Sarle, “Neural network faq,” Newsgroup, 2002, [Online; downloaded


24-April-2006]. Available: ftp://ftp.sas.com/pub/neural/

[50] J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide baseline stereo
from maximally stable extremal regions,” in BMVC, 2002.

[51] C. Harris and M. Stephens, “Combined corner and edge detector,” in Proc.
Fourth Alvey Vision Conference, 1988, pp. 147–151.

66
TRITA-CSC-E 2006: 112
ISRN-KTH/CSC/E--06/112--SE
ISSN-1653-5715

www.kth.se

You might also like