6-Spatial Data Analysis (E-next.in)
6-Spatial Data Analysis (E-next.in)
6-Spatial Data Analysis (E-next.in)
The discussion up until this point has sought to prepare the reader for the ‘data
analysis’ phase. So far, we have discussed the nature of spatial data, georefer-
encing, notions of data acquisition and preparation, and issues relating to data
quality and error.
Before we move on to discuss a range of analytical operations, we should begin
with some clarifications. We know from preceding discussions that the analyt-
ical capabilities of a GIS use spatial and non-spatial (attribute) data to answer
questions and solve problems that are of spatial relevance. It is important to
make a distinction between analysis (or analytical operations) as discussed in
Section 3.3.3, and analytical models (often just referred to just as ‘models’). By
analysis we mean only a subset of what is usually implied by the term: we do
not specifically deal with statistical analysis (such as cluster detection, for exam-
previous next back exit contents index glossary web links bibliography about
342
https://E-next.in
343
ple). These are advanced concepts and techniques which are outside the scope
of this book.
All knowledge of the world is based on models of some kind - whether they
are simple abstractions, culturally-based stereotypes or complex equations that
describe a physical phenomena. We have already seen in Section 1.2.1 that there
are different types of model, and that the word itself means different things in
different contexts. Section 2.1 noted that even spatial data is itself is a kind of
‘model’ of some part of the real world.
In this chapter we will focus on analytical functions that can form the build-
ing blocks for application models. It will hopefully become clear to the reader
that these operations can be combined in various ways for increasingly complex
analyses. Later in the chapter we present an overview of different types of ana-
lytical models and related concepts of which the user should be aware, as well
as an examination of how various errors may degrade the results of our models
or analyses.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.1. Classification of analytical GIS capabilities 344
6.1 Classification of analytical GIS capabilities
There are many ways to classify the analytical functions of a GIS. The classifi-
cation used for this chapter, is essentially the one put forward by Aronoff [3]. It
makes the following distinctions, which are addressed in subsequent sections of
the chapter:
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.1. Classification of analytical GIS capabilities 345
ample, we might generalize fields where potato or maize, and possi-
bly other crops, are grown as ‘food produce fields’.
• Measurement functions allow the calculation of distances, lengths, or
areas.
• The potato fields on clay soils (select the ‘potato’ cover in the crop
data layer and the ‘clay’ cover in the soil data layer and perform an
intersection of the two areas found),
• The fields where potato or maize is the crop (select both areas of
‘potato’ and ‘maize’ cover in the crop data layer and take their union),
• The potato fields not on clay soils (perform a difference operator of
areas with ‘potato’ cover with the areas having clay soil),
• The fields that do not have potato as crop (take the complement of the
potato areas).
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.1. Classification of analytical GIS capabilities 346
3. Neighbourhood functions. Whereas overlays combine features at the same
location, neighbourhood functions evaluate the characteristics of an area
surrounding a feature’s location. A neighbourhood function ‘scans’ the
neighbourhood of the given feature(s), and performs a computation on it.
• Search functions allow the retrieval of features that fall within a given
search window. This window may be a rectangle, circle, or polygon.
• Buffer zone generation (or buffering) is one of the best known neigh-
bourhood functions. It determines a spatial envelope (buffer) around
(a) given feature(s). The created buffer may have a fixed width, or a
variable width that depends on characteristics of the area.
• Interpolation functions predict unknown values using the known val-
ues at nearby locations. This typically occurs for continuous fields,
like elevation, when the data actually stored does not provide the di-
rect answer for the location(s) of interest. Interpolation of continuous
data was discussed in Section 5.4.2.
• Topographic functions determine characteristics of an area by looking
at the immediate neighbourhood as well. Typical examples are slope
computations on digital terrain models (i.e. continuous spatial fields).
The slope in a location is defined as the plane tangent to the topogra-
phy in that location. Various computations can be performed, such
as:
– determination of slope angle,
– determination of slope aspect,
– determination of slope length,
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.1. Classification of analytical GIS capabilities 347
– determination of contour lines. These are lines that connect points
with the same value (for elevation, depth, temperature, baromet-
ric pressure, water salinity etc).
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.1. Classification of analytical GIS capabilities 348
Details are discussed in Section 6.5.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 349
6.2 Retrieval, classification and measurement
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 350
6.2.1 Measurement
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 351
Measurements on vector data
The primitives of vector data sets are point, (poly)line and polygon. Related
geometric measurements are location, length, distance and area size. Some of
these are geometric properties of a feature in isolation (location, length, area
size); others (distance) require two features to be identified.
The location property of a vector feature is always stored by the GIS: a single
coordinate pair for a point, or a list of pairs for a polyline or polygon boundary.
Occasionally, there is a need to obtain the location of the centroid of a polygon;
some GISs store these also, others compute them ‘on-the-fly’.
Length is a geometric property associated with polylines, by themselves, or in
their function as polygon boundary. It can obviously be computed by the GIS—
as the sum of lengths of the constituent line segments—but it quite often is also
stored with the polyline.
Area size is associated with polygon features. Again, it can be computed, but
usually is stored with the polygon as an extra attribute value. This speeds up
the computation of other functions that require area size values.
The attentive reader will have noted that all of the above ‘measurements’ do not
actually require computation, but only retrieval of stored data.
Measuring distance between two features is another important function. If both
features are points, say p and q, the computation in a Cartesian spatial reference
system are given by the well-known Pythagorean distance function:
q
dist(p, q) = (xp − xq )2 + (yp − yq )2 .
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 352
If one of the features is not a point, or both are not, we must be precise in defin-
ing what we mean by their distance. All these cases can be summarized as com-
putation of the minimal distance between a location occupied by the first and a
location occupied by the second feature. This means that features that intersect
or meet, or when one contains the other have a distance of 0. We leave a further
case analysis, including polylines and polygons, to the reader as an exercise. It
is not possible to store all distance values for all possible combinations of two
features in any reasonably sized spatial database. As a result, the system must
compute ‘on the fly’ whenever a distance computation request is made.
Another geometric measurement used by the GIS is the minimal bounding box
computation. It applies to polylines and polygons, and determines the minimal
rectangle—with sides parallel to the axes of the spatial reference system—that
covers the feature. This is illustrated in Figure 6.1. Bounding box computation Minimal bounding box
is an important support function for the GIS: for instance, if the bounding boxes
of two polygons do not overlap, we know the polygons cannot possibly intersect
each other. Since polygon intersection is a complicated function, but bounding
box computation is not, the GIS will always first apply the latter as a test to see
whether it must do the first.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 353
For practical purposes, it is important to be aware of the measurement unit that
applies to the spatial data layer that one is working on. This is determined by
the spatial reference system that has been defined for it during data preparation.
A common use of area size measurements is when one wants to sum up the
area sizes of all polygons belonging to some class. This class could be crop type:
What is the size of the area covered by potatoes? If our crop classification is in a
stored data layer, the computation would include (a) selecting the potato areas,
and (b) summing up their (stored) area sizes. Clearly, little geometric computa- Geometric computations
tion is required in the case of stored features. This is not the case when we are
interactively defining our vector features in GIS use, and we want measurements
to be performed on these interactively defined features. Then, the GIS will have
to perform complicated geometric computations.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 354
Measurements on raster data
Measurements on raster data layers are simpler because of the regularity of the
cells. The area size of a cell is constant, and is determined by the cell resolution.
Horizontal and vertical resolution may differ, but typically do not. Together with
the location of a so-called anchor point, this is the only geometric information
stored with the raster data, so all other measurements by the GIS are computed.
The anchor point is fixed by convention to be the lower left (or sometimes upper
left) location of the raster.
Location of an individual cell derives from the raster’s anchor point, the cell reso-
lution, and the position of the cell in the raster. Again, there are two conventions:
the cell’s location can be its lower left corner, or the cell’s midpoint. These con-
ventions are set by the software in use, and in case of low resolution data they
become more important to be aware of.
The area size of a selected part of the raster (a group of cells) is calculated as the
number of cells multiplied by the cell area size.
The distance between two raster cells is the standard distance function applied
to the locations of their respective mid-points, obviously taking into account
the cell resolution. Where a raster is used to represent line features as strings
of cells through the raster, the length of a line feature is computed as the sum
of distances between consecutive cells. This computation is prone to error, as
already discovered in Chapter 2 (Question 11).
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 355
6.2.2 Spatial selection queries
When exploring a spatial data set, the first thing one usually wants is to select
certain features, to (temporarily) restrict the exploration. Such selections can be
made on geometric/spatial grounds, or on the basis of attribute data associated
with the spatial features. We discuss both techniques below.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 356
Interactive spatial selection
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 357
Area Perimeter Ward_id Ward_nam District Pop88 Pop92 Figure 6.2: All city wards
65420380.0000
24813620.0000
41654.940000
30755.620000
1 KUNDUCHI
2 KAWE
Kinondoni
Kinondoni
22106
32854
27212.00
40443.00
that overlap with the
18698500.0000
81845610.0000
26403.580000
49645.160000
3 MSASANI
4 UBUNGO
Kinondoni
Kinondoni
51225
47281
63058.00
58203.00 selection object—here a
4468546.00000 13480.130000 5 MANZESE Kinondoni 59467 73204.00
4999599.00000
4102218.00000
10356.850000
8951.096000
6 TANDALE
7 MWANANYAMALA
Kinondoni
Kinondoni
58357
72956
71837.00
89809.00
circle—are selected (left),
3749840.00000
2087509.00000
9447.420000
7502.250000
8 KINONDONI
9 UPANGA WEST
Kinondoni
Ilala
42301
9852
52073.00
11428.00
and their corresponding
2268513.00000
1400024.00000
9028.788000
6883.288000
10 KIVUKONI
11 NDUGUMBI
Ilala
Kinondoni
5391
32548
6254.00
40067.00 attribute records are high-
888966.900000 4589.110000 12 MAGOMENI Kinondoni 16938 20851.00
1448370.00000
6214378.00000
5651.958000
14552.080000
13 UPANGA EAST
14 MABIBO
Ilala
Kinondoni
11019
43381
12782.00
53402.00
lighted (right, only part of
2496622.00000
1262028.00000
7121.255000
4885.793000
15 MAKURUMILA
16 MZIMUNI
Kinondoni
Kinondoni
54141
23989
66648.00
29530.00
the table is shown). Data
35362240.0000
1010613.00000
28976.090000
5393.771000
17 KINYEREZI
18 JANGIWANI
Ilala
Ilala
3044
15297
3531.00
17745.00 from an urban application
475745.500000 3043.068000 19 KISUTU Ilala 8399 9743.00
1754043.00000
29964950.0000
7743.187000
36964.000000
20 KIGOGO
21 KIGAMBONI
Kinondoni
Temeke
21267
23203
26180.00
27658.00
in Dar es Salaam, Tanza-
1291479.00000
720322.100000
5187.690000
4342.732000
22 MICHIKICHINI
23 MCHAFUKOGE
Ilala
Ilala
14852
8439
17228.00
9789.00
nia. Data source: Dept. of
9296131.00000
483620.700000
16321.530000
3304.072000
24 TABATA
25 KARIAKOO
Ilala
Ilala
18454
12506
21407.00
14507.00 Urban & Regional Plan-
3564653.00000 9586.751000 26 BUGURUNI Ilala 48286 56012.00
2639575.00000
912452.800000
6970.186000
4021.937000
27 ILALA
28 GEREZANI
Ilala
Ilala
35372
7490
41032.00
8688.00
ning and Geo-information
6735135.00000 13579.590000 29 KURASINI Temeke 26737 31871.00
Management, ITC.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 358
Spatial selection by attribute conditions
174308.70 2 30
2066475.00 3 70
214582.50 4 80 Figure 6.3: Spatial se-
29313.86 5 80 lection using the attribute
73328.08 6 80
condition Area < 400000
53303.30 7 80
614530.10 8 20 on land use areas in Dar
1637161.00 9 80 es Salaam. Spatial fea-
156357.40 10 70 tures on left, associated
59202.20 11 20
attribute data (in part) on
83289.59 12 80
225642.20 13 20 right. Data source: Dept.
28377.33 14 40 of Urban & Regional Plan-
228930.30 15 30 ning and Geo-information
986242.30 16 70
Management, ITC.
Figure 6.3 shows an example of selection by attribute condition. The query ex-
pression is Area < 400000, which can be interpreted as “select all the land use
areas of which the size is less than 400, 000.” The polygons in red are the selected
areas; their associated records are also highlighted in red. We can this selected
set of features as the basis of further selection. For instance, if we are interested
in land use areas of size less than 400, 000 that are of land use type 80, the se-
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 359
lected features of Figure 6.3 are subjected to a further condition, LandUse = 80.
The result is illustrated in Figure 6.4. Such combinations of conditions are fairly
common in practice, so we devote a small paragraph on the theory of combining
conditions.
174308.70 2 30
2066475.00 3 70 Figure 6.4: Further spa-
214582.50 4 80 tial selection from the
29313.86 5 80 already selected fea-
73328.08 6 80
tures of Figure 6.3 using
53303.30 7 80
614530.10 8 20 the additional condition
1637161.00 9 80 LandUse = 80 on land use
156357.40 10 70 areas. Observe that fewer
59202.20 11 20
features are now selected.
83289.59 12 80
225642.20 13 20 Data source: Dept. of
28377.33 14 40 Urban & Regional Plan-
228930.30 15 30 ning and Geo-information
986242.30 16 70
Management, ITC.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 360
Combining attribute conditions
When multiple criteria have to be used for selection, we need to carefully express
all of these in a single composite condition. The tools for this come from a field
of mathematical logic, known as propositional calculus.
Above, we have seen simple, atomic conditions such as Area < 400000, and LandUse =
80. Atomic conditions use a predicate symbol, such as < (less than) or = (equals).
Other possibilities are <= (less than or equal), > (greater than), >= (greater than
or equal) and <> (does not equal). Any of these symbols is combined with an
expression on the left and one on the right. For instance, LandUse <> 80 can be Atomic and composite
used to select all areas with a land use class different from 80. Expressions are conditions
either constants like 400000 and 80, attribute names like Area and LandUse, or
possibly composite arithmetic expressions like 0.15 × Area, which would com-
pute 15% of the area size.
Atomic conditions can be combined into composite conditions using logical connec-
tives. The most important ones are AND, OR, NOT and the bracket pair (· · ·). If
we write a composite condition like
we can use it to select areas for which both atomic conditions hold true. This is Logical connectives
the meaning of the AND connective. If we had written
instead, the condition would have selected areas for which either condition holds,
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 361
so effectively those with an area size less than 400, 000, but also those with land
use class 80. (Included, of course, will be areas for which both conditions hold.)
The NOT connective can be used to negate a condition. For instance, the condi-
tion NOT (LandUse = 80) would select all areas with a different land use class
than 80. (Clearly, the same selection can be obtained by writing LandUse <> 80,
but this is not the point.) Finally, brackets can be applied to force grouping
amongst atomic parts of a composite condition. For instance, the condition
(Area < 30000 AND LandUse = 70) OR (Area < 400000 AND LandUse = 80)
will select areas of class 70 less than 30, 000 in size, as well as class 80 areas less
than 400, 000 in size.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 362
Spatial selection using topological relationships
Selecting features that are inside selection objects This type of query uses the
containment relationship between spatial objects. Obviously, polygons can contain Point-in-polygon query
polygons, lines or points, and lines can contain lines or points, but no other
containment relationships are possible.
Figure 6.5 illustrates a containment query. Here, we are interested in finding the
location of medical clinics in the area of Ilala District. We first selected all areas of
Ilala District, using the technique of selection by attribute condition District =
“Ilala”. Then, these selected areas were used as selection objects to determine
which medical clinics (as point objects) were within them.
Selecting features that intersect The intersect operator identifies features that
are not disjoint in the sense of Figure 2.15, but now extended to include points
and lines. Figure 6.6 provides an example of spatial selection using the intersect
relationship between lines and polygons. We selected all roads intersecting Ilala
District.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 363
#
# ## # # # # #
#
# # # #
# ## # #
# #
#
#
# # ## # # ##
## # # # #
# # # #
# # #
#
#
#
# Figure 6.5: Spatial se-
#
# # # # # # ## ### # #
## #
#
# # # # lection using containment.
## # ### # # #
#
# ### # # #
# # # In dark green, all wards
## # #### ## # # ##
# # #
# # # # # ### #
#
# ##
# ######
within Ilala District as the
#
# ##
# # #### ### ##
# # # # ## selection objects. In red,
# # # # ## #
## ## #
#
# ## # ## all medical clinics located
# # # ## #
# # ## ## #
# ##
# #
# ##
# # ## ## # #
### #
### ##
#
#
inside these areas, and
# ##
# #
# # # #
## #
# # # # ## thus inside the district.
# # # ## ### # # #
# # #
# # ##
#
# Data source: Dept. of Ur-
# # ##### #
##
# #
# # ## # # #### # ####
ban & Regional Planning
## #
#
#
## ###
# #
# ## # #
# #### and Geo-information Man-
# ## # # # #
agement, ITC.
Selecting features based on their distance One may also want to use the dis-
tance function of the GIS as a tool in selecting features. Such selections can be
searches within a given distance from the selection objects, at a given distance, or
even beyond a given distance. There is a whole range of applications to this type
of selection, e.g.:
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 364
• Which roads are within 200 metres of a medical clinic? (These roads must
have a high road maintenance priority.)
Figure 6.8 illustrates a spatial selection using distance. Here, we executed the
selection of the second example above. Our selection objects were all clinics,
and we selected the roads that pass by a clinic within 200 metres.
In situations in which we know the distance criteria to use—for selections within,
at or beyond that distance value—the GIS has many (straightforward) compu-
tations to perform. Things become more complicated if our distance selection
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 365
condition involves the word ‘nearest’ or ‘farthest’. The reason is that not only Complex proximity
must the GIS compute distances from a selection object A to all potentially se- formulations
lectable features F , but also it must find that feature F that is nearest to (resp.,
farthest away from) object A. So, this requires an extra computational step to
determine minimum (maximum) values. Most GIS packages support this type
of selection, though the mechanics (‘the buttons to use’) differ.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 366
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 367
ally also be combined. Any set of selected features can be used as the input for Combining selection
a subsequent selection procedure. This means, for instance, that we can select conditions
all medical clinics first, then identify the roads within 200 metres, then select
from them only the major roads, then select the nearest clinics to these remain-
ing roads, as the ones that should receive our financial support. In this way, we
are combining various techniques of selection.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 368
6.2.3 Classification
In classification of vector data, there are two possible results. In the first, the
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 369
input features may become the output features in a new data layer, with an ad-
ditional category assigned. In other words, nothing changes with respect to the
spatial extents of the original features. Figure 6.9(a) is an illustration of this first
type of output. A second type of output is obtained when adjacent features with
the same category are merged into one bigger feature. Such post-processing Aggregation and merging
functions are called spatial merging, aggregation or dissolving. An illustration of
this second type is found in Figure 6.9(b). Observe that this type of merging is
only an option in vector data, as merging cells in an output raster on the basis
of a classification makes little sense. Vector data classification can be performed
on point sets, line sets or polygon sets; the optional merge phase is sensible
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 370
only for lines and polygons. Below, we discuss two kinds of classification: user-
controlled and automatic.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 371
User-controlled classification
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 372
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 373
Automatic classification
1. Equal interval technique: The minimum and maximum values vmin and vmax
of the classification parameter are determined and the (constant) interval
size for each category is calculated as (vmax − vmin )/n, where n is the num-
ber of classes chosen by the user. This classification is useful in revealing
the distribution patterns as it determines the number of features in each
category.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 374
achieve) as well as the characteristics of the data itself. The reader is encouraged
to experiment with their data and compare the results given by each method.
Other (and possibly better) techniques exist.
While these two types of classification can be used in spatial analysis, they are
also frequently used to develop visualizations of the same phenomena. In terms
of analytical operations we refer to some kind of calculation or function which
will use these categories. In terms of visualization, we refer to the graphical
representation of the data using these classifications. Just as either technique
yields different results in numeric terms, it will do the same in visual terms.
Please refer to Chapter 7 for more discussion on issues relating to mapping and
visualization.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.2. Retrieval, classification and measurement 375
1 1 1 2 8 1 1 1 1 4 1 1 1 2 5
4 4 5 4 9 2 2 3 2 5 3 3 4 3 5
4 3 3 2 10 2 2 2 1 5 3 2 2 2 5
4 5 6 8 8 2 3 3 4 4 3 4 4 5 5
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.3. Overlay functions 376
6.3 Overlay functions
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.3. Overlay functions 377
6.3.1 Vector overlay operators
B1
A1 B3
A2 B2
B4
Figure 6.12: The polygon
intersect (overlay) opera-
vector data layer C tor. Two polygon layers
C1
A and B produce a new
C6 C A B
C5 C1 A1 B1
polygon layer (with asso-
C2
C3
A1
A2
B2
B4
ciated attribute table) that
C2 C4 C4 A2 B2 contains all intersections
C5 A2 B3
C3 C6 A1 B3 of polygons from A and B.
Figure after [8].
The standard overlay operator for two layers of polygons is the polygon intersec-
tion operator. It is fundamental, as many other overlay operators proposed in
the literature or implemented in systems can be defined in terms of it. The prin-
ciples are illustrated in Figure 6.12. The result of this operator is the collection of
all possible polygon intersections; the attribute table result is a join—in the rela- Spatial join
tional database sense of Chapter 3—of the two input attribute tables. This output
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.3. Overlay functions 378
attribute table only contains one tuple for each intersection polygon found, and
this explains why we call this operator a spatial join.
A more practical example is provided in Figure 6.13, which was produced by
polygon intersection of the ward polygons with land use polygons classified as
in Figure 6.10. This has allowed us to select the residential areas in Ilala District.
Two more polygon overlay operators are illustrated in Figure 6.14. The first is
known as the polygon clipping operator. It takes a polygon data layer and restricts
its spatial extent to the generalized outer boundary obtained from all (selected) Polygon clipping
polygons in a second input layer. Besides this generalized outer boundary, no
other polygon boundaries from the second layer play a role in the result.
A second overlay operator is polygon overwrite. The result of this binary operator
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.3. Overlay functions 379
is defined is a polygon layer with the polygons of the first layer, except where
polygons existed in the second layer, as these take priority. The principle is
illustrated in the lower half of Figure 6.14. Most GISs do not force the user to
apply overlay operators to the full polygon data set. One is allowed to first select
relevant polygons in the data layer, and then use the selected set of polygons as
an operator argument.
The fundamental operator of all these is polygon intersection. The others can be
defined in terms of it, usually in combination with polygon selection and/or
classification. For instance, the polygon overwrite of A by B can be defined as Polygon intersection
polygon intersection between A and B, followed by a (well-chosen) classification
that prioritizes polygons in B, followed by a merge. The reader is asked to verify
this.
Vector overlays are usually also defined for point or line data layers. Their defi-
nition parallels the definitions of operators discussed above. Different GISs use
different names for these operators, and one is advised to carefully check the
documentation before applying any of these operators.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.3. Overlay functions 380
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.3. Overlay functions 381
6.3.2 Raster overlay operators
Vector overlay operators are useful, but geometrically complicated, and this
sometimes results in poor operator performance. Raster overlays do not suf-
fer from this disadvantage, as most of them perform their computations cell by
cell, and thus they are fast.
GISs that support raster processing—as most do—usually have a language to
express operations on rasters. These languages are generally referred to as map
algebra [54], or sometimes raster calculus. They allow a GIS to compute new
rasters from existing ones, using a range of functions and operators. Unfor- Map algebra
tunately, not all implementations of map algebra offer the same functionality.
The discussion below is to a large extent based on general terminology, and
attempts to illustrate the key operations using a logical, structured language.
Again, the syntax often differs for different GIS software packages.
When producing a new raster we must provide a name for it, and define how it
is computed. This is done in an assignment statement of the following format:
The expression on the right is evaluated by the GIS, and the raster in which it
results is then stored under the name on the left. The expression may contain
references to existing rasters, operators and functions; the format is made clear
below. The raster names and constants that are used in the expression are called
its operands. When the expression is evaluated, the GIS will perform the calcu- Operands
lation on a pixel by pixel basis, starting from the first pixel in the first row, and
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.3. Overlay functions 382
continuing until the last pixel in the last row. There is a wide range of operators
and functions that can be used in map algebra, which we discuss below.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.3. Overlay functions 383
Arithmetic operators
Various arithmetic operators are supported. The standard ones are multiplica-
tion (×), division (/), subtraction (−) and addition (+). Obviously, these arith-
metic operators should only be used on appropriate data values, and for in-
stance, not on classification values.
Other arithmetic operators may include modulo division (MOD) and integer di-
vision (DIV ). Modulo division returns the remainder of division: for instance,
10 MOD 3 will return 1 as 10 − 3 × 3 = 1. Similarly, 10 DIV 3 will return 3.
More operators are goniometric: sine (sin), cosine (cos), tangent (tan), and their
inverse functions asin, acos, and atan, which return radian angles as real values.
5 5 2 2 A 15 15 12 12 C1
5 5 5 2 C1 := A +10 15 15 15 12
MapA
6 2 2 2 16 12 12 12 C2
9 9 10 10
6 6 6 6 16 16 16 16
9 9 9 10
7 3 3 10
4 4 8 8 B C2 := A + B 7 7 14 14 11 11 -60 -60 C3
4 4 4 8 11 11 11 -60
MapB
1 1 1 8 71 33 33 -60 Figure 6.15: Examples of
C3 := ((A - B)/(A + B))*100 arithmetic map algebra ex-
1 1 8 8 71 71 -14 -14
pressions
Some simple map algebra assignments are illustrated in Figure 6.15. The assign-
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.3. Overlay functions 384
ment:
C1 := A + 10
will add a constant factor of 10 to all cell values of raster A and store the result
as output raster C1. The assignment:
C2 := A + B
will add the values of A and B cell by cell, and store the result as raster C2.
Finally, the assignment
C3 := (A − B)/(A + B) × 100
will create output raster C3, as the result of the subtraction (cell by cell, as usual)
of B cell values from A cell values, divided by their sum. The result is multi-
plied by 100. This expression, when carried out on AVHRR channel 1 (red) and
AVHRR channel 2 (near infrared) of NOAA satellite imagery, is known as the
NDVI (Normalized Difference Vegetation Index). It has proven to be a good indica-
tor of the presence of green vegetation.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.3. Overlay functions 385
Comparison and logical operators
Map algebra also allows the comparison of rasters cell by cell. To this end, we
may use the standard comparison operators (<, <=, =, >=, > and <>) that we
introduced before.
A simple raster comparison assignment is:
C := A <> B.
It will store truth values—either true or false—in the output raster C. A cell
value in C will be true if the cell’s value in A differs from that cell’s value in B.
It will be false if they are the same.
Logical connectives are also supported in most implementations of map algebra.
We have already seen the connectives of AND, OR and NOT in Section 6.2.2. An-
other connective that is commonly offered in map algebra is exclusive OR (XOR).
The expression a XOR b is true only if either a or b is true, but not both. Examples Comparison operators and
of the use of these comparison operators and connectives are provided in Fig- connectives
ure 6.16 and Figure 6.17. The latter figure provides various raster computations
in search of forests at specific elevations. In the figure, raster D1 indicates forest
below 500 m, D2 indicates areas below 500 m that are forests, raster D3 areas
that are either forest or below 500 m (but not at the same time), and raster D4
indicates forests above 500 m.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.3. Overlay functions 386
A and B
A
A or B
A xor B
B
A and not B
(A and B) or C
C
Figure 6.16: Examples of
A and (B or C) logical expressions in map
algebra. Green cells rep-
resent true values, white
cells represent false val-
ues.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.3. Overlay functions 387
A
F F F D1
F F
F F F D2
F F F D1 := (A = "forest") AND (B < 500)
F F
F = forest
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.3. Overlay functions 388
Conditional expressions
The above comparison and logical operators produce rasters with the truth val-
ues true and false. In practice, we often need a conditional expression with
them that allows us to test whether a condition is fulfilled. The general format
is:2
2
We have already noted that specific software packages may differ in the specifics of the
syntax that make up an expression. This extends to the actual commands— some packages
using “IFF” instead of “CON”.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.3. Overlay functions 389
A C1
F F F 7 7 7 0 0
F F 7 7 0 0 0
F F F 0 4 4 0 4
MapB
F F F C1 := CON (A = "F", B, 0) 0 0 4 4 4
F F 0 0 0 6 6
F = forest C2
B 10 10 10 0 0 Figure 6.18: Examples of
7 7 7 7 4 10 10 0 0 0 conditional expressions in
C2 := CON ((A = "F") AND (B = 7), 10, 0)
7 7 7 7 4 0 0 0 0 0 map algebra. Here A is
4 4 4 4 4 7 = 700 m. 0 0 0 0 0 a classified raster holding
6 6 4 4 4 6 = 600 m. 0 0 0 0 0 land use data, and B is an
6 6 6 6 6 4 = 400 m.
elevation value raster.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.3. Overlay functions 390
6.3.3 Overlays using a decision table
Conditional expressions are powerful tools in cases where multiple criteria must
be taken into account. A small size example may illustrate this. Consider a
suitability study in which a land use classification and a geological classification
must be used. The respective rasters are illustrated in Figure 6.19 on the left. Do- Domain expertise
main expertise dictates that some combinations of land use and geology result
in suitable areas, whereas other combinations do not. In our example, forests
on alluvial terrain and grassland on shale are considered suitable combinations,
while the others are not.
We could produce the output raster of Figure 6.19 with a map algebra expression
such as:
and consider ourselves lucky that there are only two ‘suitable’ cases. In practice,
many more cases must usually be covered, and then writing up a complex CON
expression is not an easy task.
To this end, some GISs accommodate setting up a separate decision table that
will guide the raster overlay process. This extra table carries domain expertise,
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.3. Overlay functions 391
Land use
Forest Suitable Unsuitable
Grass Unsuitable Suitable Figure 6.19: The use of
Lake Unsuitable Unsuitable Suitability a decision table in raster
overlay. The overlay is
computed in a suitability
study, in which land use
and geology are impor-
tant factors. The mean-
ing of values in both input
rasters, as well as the out-
put raster can be under-
stood from the decision ta-
Geology raster ble.
and dictates which combinations of input raster cell values will produce which
output raster cell value. This gives us a raster overlay operator using a decision
table, as illustrated in Figure 6.19. The GIS will have supporting functions to
generate the additional table from the input rasters, and to enter appropriate
values in the table.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.4. Neighbourhood functions 392
6.4 Neighbourhood functions
1. State which target locations are of interest to us, and define their spatial
extent,
For instance, our target might be a medical clinic. Its neighbourhood could be
defined as:
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.4. Neighbourhood functions 393
• All roads within 500 m travel distance, or
• All residential areas, for which the clinic is the closest clinic.
The alert reader will note the increasingly complex definitions of ‘neighbour-
hood’ used here. This is to illustrate that different ways of measuring neighbour-
hoods exist, and some are better (or more representative of real neighbourhoods)
than others, depending on the purpose of the analysis.
Then, in the third step we indicate what it is we want to discover about the
phenomena that exist or occur in the neighbourhood. This might simply be its
spatial extent, but it might also be statistical information like:
The above are typical questions in an urban setting. When our interest is more in
natural phenomena, different examples of locations, neighbourhoods and neigh-
bourhood characteristics arise. Since raster data are the more commonly used in
this case, neighbourhood characteristics often are obtained via statistical sum-
mary functions that compute values such as average, minimum, maximum, and
standard deviation of the cells in the identified neighbourhood.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.4. Neighbourhood functions 394
Determining neighbourhood extent To select target locations, one can use the
selection techniques that we discussed in Section 6.2.2. To obtain characteristics
from an eventually identified neighbourhood, the same techniques apply. So
what remains to be discussed here is the proper determination of a neighbour-
hood.
One way of determining a neighbourhood around a target location is by making
use of the geometric distance function. We discuss some of these techniques in
Section 6.4.1. Geometric distance does not take into account direction and certain Proximity function
phenomena can only be studied by doing so. For example, pollution spread by
rivers, ground water flow, or prevailing weather systems.
The more advanced techniques for computation of flow and diffusion are dis-
cussed in Section 6.4.2. Diffusion functions are based on the assumption that
the phenomenon spreads in all directions, though not necessarily equally eas-
ily in all directions. Hence, it uses local terrain characteristics to compute the Complex neighbourhoods
local resistance against diffusion. In flow computations, the assumption is that
the phenomenon will choose a least-resistance path, and not spread in all direc-
tions. This, as we will see, involves the computation of preferred local direction
of spread. Both flow and diffusion computations take local characteristics into
account, and are therefore more easily performed on raster data.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.4. Neighbourhood functions 395
6.4.1 Proximity computations
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.4. Neighbourhood functions 396
Buffer zone generation
The principle of buffer zone generation is simple: we select one or more target
locations, and then determine the area around them, within a certain distance.
In Figure 6.20(a), a number of main and minor roads were selected as targets,
and a 75 m (resp., 25 m) buffer was computed from them. In some case stud-
ies, zonated buffers must be determined, for instance in assessments of traffic
noise effects. Most GISs support this type of zonated buffer computation. An
illustration is provided in Figure 6.20(b).
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.4. Neighbourhood functions 397
Buffer generation on rasters is a fairly simple function. The target location or lo-
cations are always represented by a selection of the raster’s cells, and geometric
distance is defined, using cell resolution as the unit. The distance function ap-
plied is the Pythagorean distance between the cell centres. The distance from a
non-target cell to the target is the minimal distance one can find between that
non-target cell and any target cell.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.4. Neighbourhood functions 398
Thiessen polygon generation
Thiessen polygon partitions make use of geometric distance for determining neigh-
bourhoods. This is useful if we have a spatially distributed set of points as target
locations, and we want to know for each location in the study to which target
it is closest. This technique will generate a polygon around each target location
that identifies all those locations that ‘belong to’ that target. We have already
seen the use of Thiessen polygons in the context of interpolation of point data, as
discussed in Section 5.4.1. Given an input point set that will be the polygon’s
midpoints, it is not difficult to construct such a partition. It is even much easier
to construct if we already have a Delaunay triangulation for the same input point
set (see Section 2.3.3 on TINs).
Figure 6.21 repeats the Delaunay triangulation of Figure 2.9(b). The Thiessen
polygon partition constructed from it is on the right. The construction first cre-
ates the perpendiculars of all the triangle sides; observe that a perpendicular
of a triangle side that connect point A with point B is the divide between the
area closer to A and the area closer to B. The perpendiculars become part of the
boundary of each Thiessen polygon.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.4. Neighbourhood functions 399
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.4. Neighbourhood functions 400
6.4.2 Computation of diffusion
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.4. Neighbourhood functions 401
Figure 6.22: Computation
of diffusion on a raster.
The lower left green cell
1 1 1 2 8 14.50 14.95 15.95 17.45 22.45
is the source location, in-
4 4 5 4 9 12.00 12.45 14.61 16.66 21.44
dicated in the local re-
sistance raster (a). The
4 3 3 2 10 8.00 8.95 11.95 13.66 19.66 raster in (b) is the mini-
mal total resistance raster
4 5 6 8 8 4.00 6.36 8.00 10.00 11.00 computed by the GIS.
(The GIS will work in
4 2 1 1 1 0.00 3.00 4.50 5.50 6.50
higher precision real arith-
metic than what is illus-
(a) (b)
trated here.)
for cell c, the GIS computes the total incurred resistance for diffusion from csrc to
ce as 12 (val(csrc ) + val(ce )), while the same for csrc to cne is 12 (val(csrc ) + val(cne )) ×
√
2. The accumulated resistance along a path of cells is simply the sum of these
incurred resistances from pairwise neighbour cells.
Since ‘source material’ has the habit of taking the easiest route to spread, we
must determine at what minimal cost (i.e. at what minimal resistance) it may
have arrived in a cell. Therefore, we are interested in the minimal cost path. To Minimal cost path
determine the minimal total resistance along a path from the source location csrc
to an arbitrary cell cx , the GIS determines all possible paths from csrc to cx , and
then determines which one has the lowest total resistance. This value is found,
for each cell, in the raster of Figure 6.22(b).
For instance, there are three paths from the green source location to its northeast
neighbour cell (with local resistance 5). We can define them as path 1 (N–E),
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.4. Neighbourhood functions 402
path 2 (E–N) and path 3 (NE), using compass directions to define the path from
the green cell. For path 1, the total resistance is computed as:
1 1
(4 + 4) + (4 + 5) = 8.5.
2 2
Path 2, in similar style, gives us a total value of 6.5. For path 3, we find
1 √
(4 + 5) × 2 = 6.36,
2
and thus it obviously is the minimal cost path. The reader is asked to verify one
or two other values of minimal cost paths that the GIS has produced.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.4. Neighbourhood functions 403
6.4.3 Flow computation
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.4. Neighbourhood functions 404
138 106 88 74 76 96 0 3 7 5 4 0
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.4. Neighbourhood functions 405
6.4.4 Raster based surface analysis
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.4. Neighbourhood functions 406
Applications
• Slope aspect calculation The calculation of the aspect (or orientation) of the
slope in degrees (between 0 and 360 degrees), for any or all locations.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.4. Neighbourhood functions 407
• Three-dimensional map display With GIS software, three-dimensional views
of a DEM can be constructed, in which the location of the viewer, the angle
under which s/he is looking, the zoom angle, and the amplification fac-
tor of relief exaggeration can be specified. Three-dimensional views can
be constructed using only a predefined mesh, covering the surface, or us-
ing other rasters (e.g. a hillshading raster) or images (e.g. satellite images)
which are draped over the DEM.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.4. Neighbourhood functions 408
are increasingly used in GIS-based dynamic modelling, such as the com-
putation of surface run-off and erosion, groundwater flow, the delineation
of areas affected by pollution, the computation of areas that will be covered
by processes such as debris flows and lava flows.
Some of the more important computations mentioned above are further dis-
cussed below. All of them apply a technique known as filtering, so we will first
examine this principle in more detail.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.4. Neighbourhood functions 409
Filtering
X X
(wij · rij )/ |wij |,
i,j i,j
where one should observe that we divide by the sum of absolute weights.
3
Please refer to Chapter Five of Principles of Remote Sensing for a discussion of image-related
filter operations.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.4. Neighbourhood functions 410
Figure 6.24: Moving win-
1 1 1 0 0 0 0 1 0
dow rasters for filtering.
(a) raster for a regular av-
1 1 1 -1 0 1 0 0 0 eraging filter; (b) raster
for an x-gradient filter;
1 1 1 0 0 0 0 -1 0 (c) raster for a y-gradient
(a) (b) (c)
filter.
Since the wij are all equal to 1 in the case of Figure 6.24(a), the formula can be
simplified to
1X
rij ,
9 i,j
which is nothing but the average of the nine input raster cell values. So, we see
that an ‘all-1’ filter computes a local average value, so its application amounts to
moving window averaging. More advanced filters have been devised to extract
other types of information from raster data. We will look at some of these in the
context of slope computations.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.4. Neighbourhood functions 411
Computation of slope angle and slope aspect
A different choice of weight factors may provide other information. Special fil-
ters exist to perform computations on the slope of the terrain. Before we look at
these filters, let us define various notions of slope.
Slope angle, which is also known as slope gradient, is the angle α, illustrated in
Figure 6.25, between a path p in the horizontal plane and the sloping terrain.
The path p must be chosen such that the angle α is maximal. A slope angle can
be expressed as elevation gain in a percentage or as a geometric angle, in degrees
or radians. The two respective formulas are:
δf δf
slope perc = 100 · and slope angle = arctan( ).
δp δp
The path p must be chosen to provide the highest slope angle value, and thus
it can lie in any direction. The compass direction, converted to an angle with
the North, of this maximal down-slope path p is what we call the slope aspect.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.4. Neighbourhood functions 412
Let us now look at how to compute slope angle and slope aspect in a raster
environment.
From an elevation raster, we cannot ‘read’ the slope angle or slope aspect di-
rectly. Yet, that information can be extracted. After all, for an arbitrary cell, we
have its elevation value, plus those of its eight neighbour cells. A simple ap-
proach to slope angle computation is to make use of x-gradient and y-gradient
filters. Figure 6.24(b) and (c) illustrate an x-gradient filter, and y-gradient filter,
respectively. The x-gradient filter determines the slope increase ratio from west
to east: if the elevation to the west of the centre cell is 1540 m and that to the x and y gradient filters
east of the centre cell is 1552 m, then apparently along this transect the elevation
increases 12 m per two cell widths, i.e. the x-gradient is 6 m per cell width. The
y-gradient filter operates entirely analogously, though in south-north direction.
Observe that both filters express elevation gain per cell width. This means that
we must divide by the cell width—given in metres, for example—to obtain the
(approximations to) the true derivatives δf /δx and δf /δy. Here, f stands for the
elevation field as a function of x and y, and δf /δx, for instance, is the elevation
gain per unit of length in the x-direction.
To obtain the real slope angle α along path p, observe that both the x- and y-
gradient contribute to it. This is illustrated in Figure 6.26. A, not-so-simple,
geometric derivation can show that always
q
tan(α) = (δf /δx)2 + (δf /δy)2 .
Now what does this mean in the practice of computing local slope angles from
an elevation raster? It means that we must perform the following steps:
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.4. Neighbourhood functions 413
1. Compute from (input) elevation raster R the non-normalized x- and y-
gradients, using the filters of Figure 6.24(b) and (c), respectively.
δf /δx
tan(ψ) = ,
δf /δy
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.4. Neighbourhood functions 414
so slope aspect can also be computed from the normalized gradients. We must
warn the reader that this formula should not trivially be replaced by using
δf /δx
ψ = arctan( ),
δf /δy
the reason being that the latter formula does not account for southeast and
southwest quadrants, nor for cases where δf /δy = 0. (In the first situation,
one must add 180◦ to the computed angle to obtain an angle measured from
North; in the latter situation, ψ equals either 90◦ or −90◦ , depending on the sign
of δf /δx.)
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.5. Network analysis 415
6.5 Network analysis
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.5. Network analysis 416
These may involve the splitting of overpassing lines at the intersection vertex
and the creation of four lines out of the two original lines. Without further at-
tention, the network will then allow one to make a turn onto another line at this
new intersection node, which in reality would be impossible. In some GISs we Overpasses
can allocate a cost with turning at a node—see our discussion on turning costs
below—and that cost, in the case of the overpass, can be made infinite to en-
sure it is prohibited. But, as mentioned, this is a workaround to fit a non-planar
situation into a data layer that presumes planarity.
The above is a good illustration of geometry not fully determining the network’s
behaviour. Additional application-specific rules are usually required to define
what can and cannot happen in the network. Most GISs provide rule-based
tools that allow the definition of these extra application rules.
Various classical spatial analysis functions on networks are supported by GIS
software packages. The most important ones are:
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.5. Network analysis 417
Optimal path finding
Optimal path finding techniques are used when a least-cost path between two
nodes in a network must be found. The two nodes are called origin and desti-
nation, respectively. The aim is to find a sequence of connected lines to traverse
from the origin to the destination at the lowest possible cost.
The cost function can be simple: for instance, it can be defined as the total length
of all lines on the path. The cost function can also be more elaborate and take into
account not only length of the lines, but also their capacity, maximum transmis-
sion (travel) rate and other line characteristics, for instance to obtain a reasonable
approximation of travel time. There can even be cases in which the nodes visited
add to the cost of the path as well. These may be called turning costs, which are Turning costs
defined in a separate turning cost table for each node, indicating the cost of turn-
ing at the node when entering from one line and continuing on another. This is
illustrated in Figure 6.27.
The attentive reader will notice that it is possible to travel on line b in Figure 6.27,
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.5. Network analysis 418
then take a U-turn at node N , and return along a to where one came from. The
question is whether doing this makes sense in optimal path finding. After all,
to go back to where one comes from will only increase the total cost. In fact,
there are situations where it is optimal to do so. Suppose it is node M that
is connected by line b with node N , and that we actually wanted to travel to
another node L from M . The turn at M towards node L coming via another line
may be prohibitively expensive, whereas turning towards L at M returning to
M along b may not be so expensive.
Problems related to optimal path finding are ordered optimal path finding and
unordered optimal path finding. Both have an extra requirement that a num-
ber of additional nodes needs to be visited along the path. In ordered optimal
path finding, the sequence in which these extra nodes are visited matters; in Ordered and unordered path
unordered optimal path finding it does not. An illustration of both types is pro- finding
vided in Figure 6.28. Here, a path is found from node A to node D, visiting nodes
B and C. Obviously, the length of the path found under non-ordered require-
ments is at most as long as the one found under ordered requirements. Some
GISs provide support for these more complicated path finding problems.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.5. Network analysis 419
12
D 148 12
12 B 12 D B
12
12 12 127
121 129
155
156
149
130
157
Figure 6.28: Ordered (a)
150 and unordered (b) opti-
16 122
C 16
C mal path finding. In both
135 131
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.5. Network analysis 420
Network partitioning
In network partitioning, the purpose is to assign lines and/or nodes of the net-
work, in a mutually exclusive way, to a number of target locations. Typically,
the target locations play the role of service centre for the network. This may be Service areas
any type of service: medical treatment, education, water supply. This type of
network partitioning is known as a network allocation problem.
Another problem is trace analysis. Here, one wants to determine that part of the
network that is upstream (or downstream) from a given target location. Such Connectivity
problems exist in pollution tracing along river/stream systems, but also in net-
work failure chasing in energy distribution networks.
• The capacity with which a centre can produce the resources (whether they
are medical operations, school pupil positions, kilowatts, or bottles of milk),
and
• The consumption of the resources, which may vary amongst lines or line seg-
ments. After all, some streets have more accidents, more children who
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.5. Network analysis 421
live there, more industry in high demand of electricity or just more thirsty
workers.
The service area of any centre is a subset of the distribution network, in fact, a
connected part of the network. Various techniques exist to assign network lines,
or their segments, to a centre. In Figure 6.29(a), the green star indicates a pri-
mary school and the GIS has been used to assign streets and street segments
to the closest school within 2 km distance, along the network. Then, using de-
mographic figures of pupils living along the streets, it was determined that too
many potential pupils lived in the area for the school’s capacity. So in part (b),
the already selected part of the network was reduced to accommodate precisely
the school’s pupil capacity for the new year.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.5. Network analysis 422
known as the trace origin. For a node or line to be conditionally connected, it
means that a path exists from the node/line to the trace origin, and that the
connecting path fulfills the conditions set. What these conditions are depends Tracing requires connectivity
on the application, and they may involve direction of the path, capacity, length,
or resource consumption along it. The condition typically is a logical expression,
as we have seen before, for instance:
• The path must be directed from the node/line to the trace origin,
• Its capacity (defined as the minimum capacity of the lines that constitute
the path) must be above a given threshold, and
Tracing is the computation that the GIS performs to find the paths from the trace
origin that obey the tracing conditions. It is a rather useful function for many
network-related problems.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.5. Network analysis 423
In Figure 6.30 our trace origin is indicated in red. In part (a), the tracing condi-
tions were set to trace all the way upstream; part (b) traces all the way down-
stream, and in part (c) there are no conditions on direction of the path, thereby Upstream and downstream
tracing all connected lines from the trace origin. More complex conditions are tracing
certainly possible in tracing.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.6. GIS and application models 424
6.6 GIS and application models
We have discussed the notion that real world processes are often highly complex.
Models are simplified abstractions of reality representing or describing its most
important elements and their interactions. Modelling and GIS are more or less
inseparable, as GIS is itself a tool for modelling ‘the real world’ (or al least some
part of it).
The solution to a (spatial) problem usually depends on a (large) number of pa-
rameters. Since these parameters are often interrelated, their interaction is made
more precise in an application model.
Here we define application models to include any kind of GIS based model (in-
cluding so-called analytical and process models) for a specific real-world appli-
cation. Such a model, in one way or other, describes as faithfully as possible
how the relevant geographic phenomena behave, and it does so in terms of the
parameters.
The nature of application models varies enormously. GIS applications for famine
relief programs, for instance, are very different from earthquake risk assessment
applications, though both can make use of GIS to derive a solution. Many kinds
of application models exist, and they can be classified in many different ways.
Here we identify five characteristics of GIS-based application models:
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.6. GIS and application models 425
4. Its dimensionality - i.e. whether the model includes spatial, temporal or spa- Model characteristics
tial and temporal dimensions, and
5. Its implementation logic - i.e. the extent to which the model uses existing
knowledge about the implementation context.
It is important to note that the categories above are merely different characteristics
of any given application model. Any model can be described according to these
characteristics. Each is briefly discussed below.
Models for planning and site selection are usually prescriptive, in that they
quantify environmental, economic and social factors to determine ‘best’ or op-
timal locations. So-called Predictive models focus upon the “what is likely to be” Predictive models
questions, and predict outcomes based upon a set of input conditions. Exam-
ples of predictive models include forecasting models, such as those attempting
to predict landslides or sea–level rise.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.6. GIS and application models 426
include hydrological flow and pollution models, where the ‘effect’ can often be
described by numerical methods and differential equations.
Rule-based models attempt to model processes by using local (spatial) rules. Cel-
lular Automata (CA) are examples of models in this category. These are often
used to understand systems which are generally not well understood, but for
which their local processes are well known. For example, the characteristics of
neighbourhood cells (such as wind direction and vegetation type) in a raster-
based CA model might be used to model the direction of spread of a fire over
several time steps.
Agent-based models (ABM) attempt to model movement and development of mul-
tiple interacting agents (which might represent individuals), often using sets of
decision-rules about what the agent can and cannot do. Complex agent-based
models have been developed to understand aspects of travel behaviour and
crowd interactions which also incorporate stochastic components.
Scale refers to whether the components of the model are individual or aggre-
gate in nature. Essentially this refers to the ‘level’ at which the model operates.
Individual-based models are based on individual entities, such as the agent-based
models described above, whereas aggregate models deal with ‘grouped’ data, Individual and aggregate
such as population census data. Aggregate models may operate on data at the models
level of a city block (for example, using population census data for particular
social groups), at the regional, or even at a global scale.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.6. GIS and application models 427
operate in some geographically defined space. Some models are aspatial, mean-
ing they have no direct spatial reference.
Models can also be static, meaning they do not incorporate a notion of time or
change. In dynamic models, time is an essential parameter (see Section 2.5. Dy-
namic models include various types of models referred to as process models or
simulations. These types of models aim to generate future scenarios from ex- Static and dynamic models
isting scenarios, and might include deterministic or stochastic components, or
some kind of local rule (for example, to drive a simulation of urban growth and
spread). The fire spread example given above is a good example of an explicitly
spatial, dynamic model which might incorporate both local rules and stochastic
components.
Implementation logic refers to how the model uses existing theory or knowl-
edge to create new knowledge. Deductive approaches use knowledge of the over-
all situation in order to predict outcome conditions. This includes models that
have some kind of formalized set of criteria, often with known weightings for
the inputs, and existing algorithms are used to derive outcomes. Inductive ap- Inductive and deductive
proaches, on the other hand, are less straightforward, in that they try to gener- approaches
alize (often based upon samples of a specific data set) in order to derive more
general models. While an inductive approach is useful if we do not know the
general conditions or rules which apply in a given domain, it is typically a trial-
and-error approach which requires empirical testing to determine the parame-
ters of each input variable.
Most GIS only come equipped with a limited range of tools for modelling. For
complex models, or functions which are not natively supported in our GIS, exter-
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.6. GIS and application models 428
nal software environments are frequently used. In some cases, GIS and models
can be fully integrated (known as embedded coupling) or linked through data and Coupling
interface (known as tight coupling). If neither of these is possible, the external
model might be run independently of our GIS, and the output exported from
our model into the GIS for further analysis and visualization. This is known as
loose coupling.
It is important to compare our model results with previous experiments and to
examine the possible causes of inconsistency between the output of our models
and the expected results. The following section discusses these aspects further.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.7. Error propagation in spatial data processing 429
6.7 Error propagation in spatial data processing
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.7. Error propagation in spatial data processing 430
6.7.1 How errors propagate
Human
decision-making Spatial
data sets
use error
process error
Planning
& management Spatial data
analysis & Figure 6.31: Error propa-
Produced modelling
geoinformation gation in spatial data han-
dling
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.7. Error propagation in spatial data processing 431
the source data and errors arising from some form of computer processing, for
example, rasterization. During the process of spatial overlay, all the errors in the
individual data layers contribute to the final error of the output. The amount of
error in the output depends on the type of overlay operation applied. For exam-
ple, errors in the results of overlay using the logical operator AND are not the
same as those created using the OR operator.
Table 6.2 lists common sources of error introduced into GIS analyses. Note that
these are from a wide range of sources, and include various common tasks relat-
ing to both data preparation and data analysis. It is the combination of different
errors that are generated at each stage of preparation and analysis which may
bring about various errors and uncertainties in the eventual outputs.
Consider another example. A land use planning agency is faced with the prob-
lem of identifying areas of agricultural land that are highly susceptible to ero-
sion. Such areas occur on steep slopes in areas of high rainfall. The spatial data
used in a GIS to obtain this information might include:
• A land use map produced five years previously from 1 : 25, 000 scale aerial
photographs,
The reader is invited to assess what sort of errors are likely to occur in this anal-
ysis.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.7. Error propagation in spatial data processing 432
Referring back to Figure 6.31, the reader is also encouraged to reflect on errors in-
troduced in components of application models discussed in the previous section.
Specifically, the methodological aspects of representing geographic phenomena.
What might be the consequences of using a random function in an urban trans-
portation model (when, in fact, travel behaviour is not purely random)?
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.7. Error propagation in spatial data processing 433
Coordinate adjustments Generalization Table 6.2: Some of the
rubber sheeting/transformations linear alignment most common causes of
projection changes line simplification error in spatial data han-
datum conversions addition/deletion of vertices dling. Source: Hunter &
rescaling linear displacement Beard [23].
Feature Editing Raster/Vector Conversions
line snapping raster cells to polygons
extension of lines to intersection polygons to raster cells
reshaping assignment of point attributes
moving/copying to raster cells
elimination of spurious polygons post-scanner line thinning
Attribute editing Data input and Management
numeric calculation and change digitizing
text value changes/substitution scanning
re-definition of attributes topological construction / spatial indexing
attribute value update dissolving polygons with same attributes
Boolean Operations Surface modelling
polygon on polygon contour/lattice generation
polygon on line TIN formation
polygon on point Draping of data sets
line on line Cross-section/profile generation
overlay and erase/update Slope/aspect determination
Display and Analysis Display and Analysis
cluster analysis class intervals choice
calculation of surface lengths areal interpolation
shortest route/path computation perimeter/area size/volume computation
buffer creation distance computation
display and query spatial statistics
adjacency/contiguity label/text placement
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.7. Error propagation in spatial data processing 434
6.7.2 Quantifying error propagation
Chrisman [13] noted that “the ultimate arbiter of cartographic error is the real
world, not a mathematical formulation”. It is an unavoidable fact that we will Errors are unavoidable
never be able to capture and represent everything that happens in the real world
perfectly in a GIS. Hence there is much to recommend the use of testing proce-
dures for accuracy assessment.
Various perspectives, motives and approaches to dealing with uncertainty have
given rise to a wide range of conceptual models and indices for the description
and measurement of error in spatial data. All these approaches have their ori-
gins in academic research and have strong theoretical bases in mathematics and
statistics. Here we identify two main approaches for assessing the nature and
amount of error propagation:
1. Testing the accuracy of each state by measurement against the real world, and
Modelling of error propagation has been defined by Veregin [56] as: “the ap-
plication of formal mathematical models that describe the mechanisms whereby
errors in source data layers are modified by particular data transformation op- Modelling error vs.
erations.” In other words, we would like to know how errors in the source data modelling error propagation
behave under manipulations that we subject them to in a GIS. If we are able
to quantify the error in the source data as well as their behaviour under GIS
manipulations, we have a means of judging the uncertainty of the results.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
6.7. Error propagation in spatial data processing 435
Error propagation models are very complex and valid only for certain data types
(e.g. numerical attributes). Initially, they described only the propagation of at-
tribute error [21, 56]. More recent research has addressed the spatial aspects of
error propagation and the development of models incorporating both attribute Attribute and locational
and locational components. These topics are outside the scope of this book, and components
readers are referred to [2, 27] for more detailed discussions. Rather than explic-
itly modelling error propagation, is often more practical to test the results of each
step in the process against some independently measured reference data.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
Summary 436
Summary
This chapter has examined various ways of manipulating both raster and vector
based spatial data sets. It is certainly true that some types of manipulations are
better accommodated in one, and not so well in the other. Usually, one chooses
the format to work with on the basis of many more parameters, including the
availability of source data.
We have identified several classes of data manipulations or functions. The first
of these does not generate new spatial data, but rather extracts—i.e. ‘makes
visible’—information from existing data sets. Amongst these are the measure-
ment functions. These allow us to determine scalar values such as length, dis-
tance, and area size of selected features. Spatial selections allow us to selectively
identify features on the basis of conditions, which may be spatial in character.
A second class of spatial data manipulations generates new spatial data sets.
Classification functions assign a new characteristic value to each feature in a set
of (previously selected) features. Spatial overlay functions go a step further and
combine two spatial data sets by location. What is produced as an output spa-
tial data set depends on user requirements, and the data format with which one
works. Most of the vector spatial overlays are based on polygon/polygon inter-
section, or polygon/line intersections. In the raster domain, we have seen the
powerful tool of raster calculus, which allows all sorts of spatial overlay condi-
tions and output expressions, all based on cell by cell comparisons and compu-
tations.
Going beyond spatial overlays are the neighbourhood functions. Their principle
is not ‘equal location comparison’ but they instead focus on the definition of the
previous next back exit contents index glossary web links bibliography about
https://E-next.in
Summary 437
vicinity of one or more features. This is useful for applications that attempt to
assess the effect of some phenomenon on its environment. The simplest neigh-
bourhood functions are insensitive to direction, i.e. will deal with all directions
equally. Good examples are buffer computations on vector data. More advanced
neighbourhood functions take into account local context, and therefore are sen-
sitive to direction. Since such local factors are more easily represented in raster
data, this is then the preferred format. Flow and diffusion functions are exam-
ples.
We also looked at a special type of spatial data, namely (line) networks, and the
functions that are used on these. Optimal path finding is one such function, use-
ful in routing problems. The use of this function can be constrained or uncon-
strained. Another function often used on networks is network partitioning: how
to assign respective parts of the network to resource locations.
Various combinations of the analytical functions discussed above can be used
in an application model to simulate a given geographical process or phenomenon.
The output generated by these models can then be used in various ways, includ-
ing decision support and planning. Many different kinds of models exist, and
the type of model used will depend on the process or phenomena under study,
the nature of the data, and the type of output desired from the model.
The final section of this chapter discussed the issue of error propagation. It was
noted that at each stage of working with spatial data, errors can be introduced
which can propagate through the different operations. These errors can range
from simple mistakes in data entry through to inappropriate estimation tech-
niques or functions in operational models, and can serve to degrade the ‘end
result’ of our analyses significantly.
previous next back exit contents index glossary web links bibliography about
https://E-next.in
Questions 438
Questions
2. On page 352, we mentioned that two polygons can only intersect when
their minimal bounding boxes overlap. Provide a counter-example of the
inverted statement, in other words, show that if their minimal bounding
boxes overlap, the two polygons may still not intersect (or meet, or have
one contained in the other).
5. Observe that the equal frequency technique applied on the raster of Figure 6.11
does not really produce categories with equal frequencies. Explain why
this is. Would we expect a better result if our raster had been 5,000 × 5,000
cells?
previous next back exit contents index glossary web links bibliography about
https://E-next.in
Questions 439
6. When discussing vector overlay operators, we observed that the one fun-
damental operator was polygon intersection, and that other operators were
expressible in terms of it. The example we gave showed this for poly-
gon overwrite. Draw up a series of sketches that illustrates the procedure.
Then, devise a technique of how polygon clipping can be expressed and
illustrate this too.
8. In Figure 6.22(b), each cell was assigned the minimum total resistance of
a path from the source location to that cell. Verify the two values of 14.50
and 14.95 of the top left cells by doing the necessary computations.
previous next back exit contents index glossary web links bibliography about
https://E-next.in