Advanced GIS - Reveiw
Advanced GIS - Reveiw
Advanced GIS - Reveiw
ADVANCE GIS
ADVANCE GIS
Chairman Convener
Vice Chancellor Professor P.D. Pant
Uttarakhand Open University, Haldwani School of Earth and Environment Science
Uttarakhand Open University, Haldwani
Programme Coordinator
Dr. Ranju J. Pandey
Department of Geography & NRM
Department of Remote Sensing and GIS
School of Earth and Environment Science
Uttarakhand Open University, Haldwani
GIS-506/DGIS-506
Course Editor
Dr. Ranju J. Pandey
Department of Geography & NRM
Department of Remote Sensing and GIS
School of Earth and Environment Science
Uttarakhand Open University, Haldwani
CONTENTS
1.1 OBJECTIVES
1.2 INTRODUCTION
1.3 GIS DATABASE
1.4 SUMMARY
1.5 GLOSSARY
1.6 ANSWER TO CHECK YOUR PROGRESS
1.7 REFERENCES
1.8 TERMINAL QUESTIONS
1.1 OBJECTIVES
After reading this chapter, the student will understand:
Various GIS or spatial data types such as raster, vector and attribute or non-spatial data;
Understand the various topological relationship of vector data which are very important
to minimize the errors in GIS database;
Understand the concept of Layers and Coverage in GIS.
1.2 INTRODUCTION
Geographic information system (GIS) or Geospatial Information Systems or Geomatics is a
integrations of tools that captures, stores, analyzes, manages, and presents data related to
location(s). In the simplest terms, GIS is the merging of cartography, statistical analysis, and
database system with information technology. GIS systems are used in cartography, remote
sensing, land surveying, public utility management, natural resource management, precision
agriculture, photogrammetry, geography, urban planning, emergency management, navigation,
aerial video, and localized search engines and many more areas. Therefore, in a general sense,
the term describes any information system that integrates stores, edits, analyzes, shares, and
displays geographic information for informing decision making. GIS applications are tools that
allow users to create interactive queries (user-created searches), analyze spatial information, edit
data, maps, and present the results of all these operations. Geographic information science is the
science underlying the geographic concepts, applications and systems.
Although the above definitions cover wide range of subjects and activities best refer to
geographical information, sometimes it is also termed as Spatial Information Systems as it deals
with located data, for objects positioned in any space, not just geographical, a term for world
space. Similarly, the term 'a spatial data' is often used to refer location data in space and time.
The discipline that deals with all aspects of spatial data handling is known as geospatial or
geomatics or Geoinformatics.
During the initial phases of development, GIS has been extensively used for data conversion
/ digitization of paper maps, storing and generating map prints with little focus on spatial
analysis. With the advent of time, the scenario has changed drastically wherein the spatial
analysis took the pivotal role in many location specific planning and decision making. GIS also
facilitates modeling to arrive at locale specific solutions by integrating spatial and non-spatial
data such as thematic layers and socio-economic data. With the simultaneous development of
communication networks, the data storage boundaries have been erased and new areas like
collaborative mapping and web map services have been developed. Now the present GIS
technology enables map anywhere and serve anytime’. With recent developments, there is a leap
in the development of spatial analysis tools and logical processing methods. This enabled the
development of numerous spatial algorithms, spatial modeling techniques and better display and
visualization of data.
Modern GIS technologies use digital information, for which various digitized data creation
methods are used. The most common method of data creation is digitization, where a hard copy
map or survey plan is transferred into a digital medium through the use of a computer-aided
design (CAD) program, and geo-referencing capabilities. With the availability of ortho-rectified
images both from satellite and aerial sources, onscreen digitization is becoming the main avenue
through which geographic data is extracted. On-screen digitization involves the tracing of
geographic data directly on top of the images instead of by the traditional method of tracing the
geographic form on a separate digitizing tablet.
Discrete Data:
Discrete data, which is also known as thematic, categorical, or discontinuous data. The discrete
data often represents objects in both the feature (vector) and raster data storage systems (please
refer following section for raster and vector data models). A discrete object (s) will have known
and definable boundaries in the geography. In discrete object it is easy to define precisely where
the object begins and where it ends. Typically a house is a discrete object within the surrounding
landscape. Its boundaries and corners can be easily marked. Other examples of discrete objects
could be lake, roads, and ward boundaries in the city. In Figure 1.1 the Nainital Lake is shown as
discrete object.
Continuous Data:
A continuous surface represents a geographic phenomenon in which each location on the surface
is have a value or its relationship from a fixed point in space or from an emitting source.
Continuous data is also referred to as field, non-discrete, or surface data. One type of continuous
surface is derived from those characteristics that define a surface, in which each location is
measured from a fixed registration point. These include elevation (the fixed point being sea
level) and aspect (the fixed point being direction: north, east, south, and west). Another type of
continuous surface includes phenomena that progressively vary as they move across a surface.
The most suitable example of progressively varying continuous data are fluid and air movement.
These surfaces are characterized by the type or manner in which the phenomenon moves. The
continuous and categorical data in a map representation are shown in Figure 1.2.
Traditionally, there are two broad methods used to store data in a GIS for both kinds of
abstractions mapping references: raster images and vector. Points, lines, and polygons are the
stuff of mapped location attribute references. A new hybrid method of storing data is that of
identifying point clouds, which combine three-dimensional points with Red Green or Blue
(RGB) information at each point, returning a "3D color image". GIS thematic maps then are
becoming more and more realistically visually descriptive of what they set out to show or
determine.
of a particular wavelength of light. In GIS, one set of cell and associated value is known as a
LAYER. Raster models are simple with which spatial analysis is easier and faster.
Raster images consists of rows and columns of cells, with each cell storing a single value. Raster
data can be images (raster images) with each pixel (or cell) containing a color value. Additional
values recorded for each cell may be a discrete value, such as land use, a continuous value, such
as temperature, or a null value if no data is available. While a raster cell stores a single value, it
can be extended by using raster bands to represent RGB (red, green, blue) colors, color maps (a
mapping between a thematic code and RGB value), or an extended attribute table with one row
for each unique cell value. The resolution of the raster data set is its cell width in ground units.
Raster data is stored in various formats; from a standard file-based structure of TIF, JPEG, etc. to
binary large object (BLOB) data stored directly in a relational database management system
(RDBMS) similar to other vector-based feature classes. Database storage, when properly
indexed, typically allows for quicker retrieval of the raster data but can require storage of
millions of significantly sized records. Sample raster representation is shown in Figure 1.3.
A simple raster image of 10 x10 array of Sample satellite image (LISS IV) – Raster
The raster data model consists of uniform series of square pixel and is referred to as a grid-based
system. Typically, a single data value will be assigned to each grid location. Each cell in a raster
carries a single value, which represents the characteristic of the spatial phenomenon at a location
denoted by its row and column and is known as digital number or pixel value. The data type for
that cell value can be either integer or floating-point. The advancements in database management
system allows to link multiple attribute tables to link with raster graphics.
The raster data model averages all values within a given pixel to produce a single value for the
region. Therefore, the more area covered per pixel, the less accurate the associated data values.
The area covered by each pixel determines the spatial resolution of the raster model from which
it is derived. Specifically, resolution is determined by measuring one side of the square pixel. A
raster model with pixels representing 10 m by 10 m (or 100 square meters) in the real world
would be said to have a spatial resolution of 10 m; a raster model with pixels measuring 1 km by
1 km (1 square kilometer) in the real world would be said to have a spatial resolution of 1 km;
and so forth.
X
Figure 1.4- Vector points defined by X and Y coordinate values
Points are the most basic geometric type having no length or area. However the geographic
feature represented points have both area and shape (e.g. circle, square, plus signs). We seem
capable of interpreting such symbols as points, but there may be instances when such
interpretation may be ambiguous (e.g. is a round symbol delineating the area of a round feature
on the ground such as a large oil storage tank or is it representing the point location of that
tank?).
Lines or Polylines: One-dimensional lines or polylines are used for linear features such as rivers,
roads, railroads, trails, and topographic lines. Again, as with point features, linear features
displayed at a small scale will be represented as linear features rather than as a polygon. Lines
are composed of multiple, explicitly connected points. Lines have the property of length. Lines
that directly connect two nodes are sometimes referred to as chains, edges, segments, or arcs. In
the line feature the measurement of distance is possible but area calculation is not. A polyline is
composed of a sequence of two or more coordinate pairs called vertices. A vertex is defined by
coordinate pairs just like a point, but what differentiates a vertex from a point is its explicitly
defined relationship with neighboring vertices. A vertex is connected to at least one other vertex.
Like a point, a true line can’t be seen since it has no area. And like a point, a line is symbolized
using shapes that have a color, width and style (e.g. solid, dashed, dotted, etc…). A sample
coordinate representation of line feature is shown in Figure 1.5.
Polygons: Polygons are two-dimensional geometry used for geographical features that cover a
particular area of the earth's surface. Such features may include lakes, park boundaries,
buildings, city boundaries, or land uses.
A polygon is composed of one or more lines whose starting and ending coordinate pairs are the
same. Polygons have the topological relations such as inside and outside; in fact, the area that a
polygon encloses is explicitly defined and calculated in GIS sometime automatically also. If you
are working with a feature which looks to be closed area but area calculation is not possible then
certainly it is a Polyline. If this does not seem intuitive, think of three connected lines defining a
triangle: they can represent three connected road segments (thus polyline features), or the grassy
strip enclosed by the connected roads (in which case an ‘inside’ is implied thus defining a
polygon). A sample polygon representation in X, Y coordinates is shown in Figure 1.6.
X
Figure 1.6- A simple polygon object defined by an area enclosed by connected vertices
Polygons convey the most amount of information of the file types. Polygon features can
measure perimeter and area. Each of these geometries is linked to a row in a database that
describes their attributes. For example, a database that describes lakes may contain a lake's
depth, water quality, pollution level. This information can be used to make a map to describe a
particular attribute of the dataset.
The raster model has evolved to model such continuous features. A raster image comprises a
collection of grid cells rather like a scanned map or picture. Both the vector and raster models for
storing geographic data have unique advantages and disadvantages. Modern GIS packages are
able to handle both models. Representation of same feature in vector geometry and raster cells is
shown in Figure 1.7.
Figure 1.7- Vector and Raster representation of Point and line features
Vector features can be made to respect spatial integrity through the application of topology rules
such as 'polygons must not overlap'. Vector data can also be used to represent continuously
varying phenomena. Contour lines and triangulated irregular networks (TIN) are used to
represent elevation or other continuously changing values. TINs record values at point locations,
which are connected by lines to form an irregular mesh of triangles.
Topology:
Topology is a mathematical relationship between earth objects. It is the method to structure the
data based on the principles of feature adjacency and feature connectivity. It is in fact the
mathematical method used to define spatial relationships. Without a topologic data structure in a
vector based GIS most data manipulation and analysis functions would not be practical or
feasible.
Topology is an important aspect of vector-based models is that they enable individual
components to be isolated for the purpose of carrying out measurements of, for example, area
and length, and for determining the spatial relationships between the components. Spatial
relationships of connectivity and adjacency are examples of topological relationships and a GIS
spatial model in which these relationships are explicitly recorded is described as topologically
structured. In a fully topologically structured data set, wherever lines or areas cross each other,
nodes will be created at the intersections and new areal subdivisions defined. In two dimensions,
this may be regarded as part of the process of planar enforcement referred to previously. In GIS,
topology is implemented through data structure.
Topological structure is important in keeping track of the components of complex objects
and in determining the spatial relationships of connectivity and adjacency between recorded
phenomena. Thus if two lines cross each other they will share a common node. If two areas are
adjacent to each other, such as two neighboring counties, they will share a common boundary
arc. If the boundary of a county coincides with the path of a river they might also share the same
arc. The inclusion of one area in another, such as a specific type of forest within a county, will
result in their sharing common polygons. The presence of these various spatial relationships can
be determined by relatively simple comparisons of the identifiers of their topological
components, rather than requiring possibly computationally demanding geometric calculations
based on coordinates. It may also be noted that because shared spatial objects are only stored
once, though perhaps referenced many times, storage space is saved by avoiding duplication of
the same geometric data. This in turn assists in the maintenance of the integrity of the database
by avoiding the possibility of two different versions of the same geometric components. The
topology of tourist destination and road network of tourist map of Almora is shown in Figure 1.8.
Figure 1.8- Topology of tourist destinations and Road network of Almora (Map source-
http://www.uttarakhand-tourism.com/)
Topology errors:
There are different types of topological errors and they can be grouped according to whether the
vector feature types are polygons or polylines. Topological errors with polygon features can
include unclosed polygons, gaps between polygon borders or overlapping polygon borders. A
common topological error with polyline features is that they do not meet perfectly at a point
(node). This type of error is known as undershoot error if a gap exists between the lines and
an overshoot if a line ends beyond the line it should connect. The slivers created when digitizing
the polygons. The example of three topological errors is shown in figure 1.9.
Figure 1.9- Undershoots (1) occur when digitized vector lines that should connect to each other
don’t quite touch. Overshoots (2) happen if a line ends beyond the line it should connect to.
Slivers (3) occur when the vertices of two polygons do not match up on their borders.
The result of overshoot and undershoot errors are so-called ‘dangling nodes’ at the end of the
lines. Dangling nodes are acceptable in special cases, for example if they are attached to dead-
end streets.
Topological errors break the relationship between features. These errors need to be fixed
in order to be able to analyze vector data with procedures like network analysis (e.g. finding the
best route across a road network) or measurement (e.g. finding out the length of a river). In
addition to topology being useful for network analysis and measurement, there are other reasons
why it is important and useful to create or have vector data with correct topology. Just imagine
you digitize a municipal boundaries map for your province and the polygons overlap or show
slivers. If such errors were present, you would be able to use the measurement tools, but the
results you get will be incorrect. You will not know the correct area for any municipality and you
will not be able to define exactly, where the borders between the municipalities are.
It is not only important for your own analysis to create and have topologically correct data, but
also for people who you pass data on to. They will be expecting your data and analysis results to
be correct!
Topology rules:
Fortunately, many common errors that can occur when digitizing vector features can be
prevented by topology rules that are implemented in many GIS applications. Except for some
special GIS data formats, topology is usually not enforced by default. Many common GIS, like
QGIS, define topology as relationship rules and let the user choose the rules, if any, to be
implemented in a vector layer. The following list shows some examples of where topology rules
can be defined for real world features in a vector map:
Area edges of a municipality map must not overlap.
Area edges of a municipality map must not have gaps (slivers).
Polygons showing property boundaries must be closed. Undershoots or overshoots of the
border lines are not allowed.
Contour lines in a vector line layer must not intersect (cross each other).
Vector data is more compatible with relational database environments, where they can be
part of a relational table as a normal column and processed using a multitude of
operators.
Vector file sizes are usually smaller than raster data, which can be 10 to 100 times larger
than vector data (depending on resolution).
Vector data allows much more analysis capability, especially for "networks" such as
roads, power, rail, telecommunications, etc. Examples: Best route, largest port, airfields
connected to two-lane highways. Raster data will not have all the characteristics of the
features it displays (Figure 1.10).
In GIS, additional non-spatial data (sometime refer as attribute data) can also be stored along
with the spatial data represented by the coordinates of vector geometry or the position of a raster
cell. In vector data, the additional data contains attributes of the feature. For example, a forest
inventory polygon may also have an identifier value and information about tree species. In raster
data the cell value can store attribute information, but it can also be used as an identifier that can
relate to records in another table.
To understand it better let’s say we have a spatial data model that stores the location of
Community Service Center (CSC) in your locality. For each CSC, to represent the object, we
would store the location/positional of CSC. In addition to the positional information, we will also
store attributes that will describe the various services available in the CSC. In this example, we
are storing Net banking service, revenue service such checking of land record and generation of
certificates as three attributes that describe with this particular CSC at this particular position on
your locality. The location, net banking service, revenue service and generation of certificates
will be stored as one row in an attribute table that will contain four columns because there are
four descriptors for this CSC.
Attributes can store all kinds of different descriptive statistical information, which can be broken
down into four different categories: nominal, ordinal, interval, and ratio. A nominal attribute
data provides descriptive information about the object such as the name of an object so for
instance a city name, or the type of an object. What’s important here is that this descriptive
information does not imply any order, size, or any other quantitative information. That means
that you cannot state that one attribute is greater than or less than another attribute or you cannot
multiply attributes together, so for instance, it does not make sense to multiply the city name by
the district. The only comparisons you can do with nominal attributes are to check whether to
attributes are equal or not equal.
In addition to text descriptions, the nominal attribute category includes descriptive information
such as images, movies, and sounds. What could be the example of it?
The next attribute category is ordinal attribute data, which imply a ranking or order based on
their values. These values can be descriptive text, or numerical. For example, I can describe an
object as having a high/medium/low ranking, or a ranking of 100/50/1. In either case, these
ordinal attributes allow us to specify rank only, and not scale. So for instance, we can state that
high is ordered higher than low, and high is ordered higher than medium, and low is ordered
lower than high, but we cannot say that high is twice as high as medium, and medium as twice as
high as low. Additionally, if the numerical attributes are of the ordinal attribute category, again
we can say that 50 is ordered higher than 20 and 20 is ordered higher than 10 but we cannot say
that 50 is twice as high as 25 and 25 is twice as high as 12 ½. Even though we are using numbers
to describe a rank, do not let that confuse you into thinking that a scale is implied.
The third entry category is interval attribute data. Interval attributes imply a rank order and
magnitude or scale. Interval attributes use numbers, however, those numbers do not have a
natural zero, and use an arbitrary zero point instead. For instance if we look at temperature on the
Fahrenheit scale, 0°F is not a natural zero point for temperature, it is a human defined zero point.
Therefore, while we can say that 50°F is 10°F more than 40°F, we cannot say that 50°F is twice
as hot as 25°F, again, because 0°F is a human created zero, and not a natural phenomenon. With
an interval attribute, addition and subtraction to make sense but not multiplication since values
are relative from that arbitrary zero.
The fourth and final category is the ratio attribute data. A ratio attribute implies both rank order
and magnitude about a natural zero. Ratio data, unlike interval attribute data, use numerical
attributes of addition, subtraction, multiplication, and division where there is an absolute natural
zero. So for example, if we are measuring speed in Kilometer per hour, then a car not moving at
all is moving at zero Kilometers per hour. In terms of temperature, the only measurement that
uses a natural zero is Kelvin, which has absolute zero.
Now you know the four different attribute categories, let’s take a look at an example data set and
its related attribute table, and try to identify each column as holding nominal, ordinal, interval, or
ratio data.
Let’s finish talking about attribute data types. Computers fundamentally “think” differently than
humans. While humans see numbers, letters, pictures, and sounds, a computer only sees zeros
and ones, or binary data. Therefore, we need a way to translate the numbers, sounds, and
videos, as humans know it, to a form in which a computer can understand, and store the
information. Computer scientists have created data structures that can be used by us to translate
information into a format which the computer can store in its memory, called a datatype. There
are four typical data types that we use in GIS: integer, float/real, text/string, and date. It is
important that we specify which data type we are going to use to store information in the
computer’s memory so that we may use the memory in the most efficient manner and let the
computer know which operations are allowed for each data point stored in that memory location
using that the data type.
The first data type is the integer, which is a whole number, such as the number one, the number
2458, and the number -54. Integers can be used for mathematical calculations; however, any
resulting fraction of a whole number will be rounded, or truncated.
The float, or real, data type holds a decimal number such as the number 1.452, the number
254,783.1, or -845.157. Like the integer data type, the float or real data type can be used for
mathematical calculations. No rounding or truncation will take place when using float or real
numbers, depending on the number of significant digits you have specified.
The text, or string, data type contains characters such as character “A”, the characters “GIS”, the
characters “House No. 61 Kalidas Road.”, or the number “61”. Even though the text may contain
numbers, it is important to note that they cannot be used for mathematical calculations. However,
strings can be manipulated to find substrings, or to cut strings and locations.
The last common data type is date. The date data type holds time and date information such as
12/10/2018, or 10/12/18, or December 10, 2018. The date data type cannot be used for
mathematical calculations however, it can be used to determine and calculate lengths of time
between two different dates or times. Additionally, the computer stores the date information in
its own internal data structure, but can be formatted to output the date in many different ways, as
shown in these examples.
1.4 SUMMARY
Geographic information system (GIS) or Geomatics is a integrations of three major disciplines
viz. geography, information technology and mathematics. Technically it has emerged as tools
that captures, stores, analyzes, manages, and presents data related to location(s). In the simplest
terms, GIS is the merging of cartography, statistical analysis, and database system with
information technology. The GIS also provides an abstract representation of geographical
features in the computer system for its better understanding and analysis. The geographical
features are represented by using GIS data models. The two basic types of data exists in GIS i.e.
discrete and continuous data. There are two basic data models viz. raster and vector data models
which are used to represent geographical features in GIS. In raster representation of geographical
features, the raster cells or pixels are used as a unit to store the information. The vector data
model uses there major geometries i.e. Point, line and polygon to represent the GIS data. The
topology plays a critical role to establish mathematical relationship between earth object in
vector data models. While creating the GIS data in vector data models the topological errors
must be handled very carefully. The characteristics of geographical feature are stored as attribute
data which is also known as non-spatial data. The non-spatial data is organized and managed in a
database management system using standard data types.
1.5 GLOSSARY
Raster Data- A raster consists of a matrix of cells or pixels organized into rows and
columns or a grid where each cell contains a value representing information.
Vector Data- Data in a format consists of points, lines or polygons.
Spatial Data- Comprise the relative geographic information about the earth and its
features.
Non-spatial data- It is an independent of Geographic location.
1.7 REFERENCES
3. Campbell, James B. Introduction to Remote sensing.-2nd ed. – London: Taylor and Francis,
1996. 622 pp.
6. Harish Chandra Karnatak, S Saran, K Bhatia, PS Roy, 2007 “Multicriteria spatial decision
analysis in web GIS environment”, Geoinformatica 11 (4), 407-429.
7. Karnatak Harish, Karamjit Bhatia and Sameer Saran, (2008) “Multi-criteria decision analysis
using Spatial Compromise Programming”, Proceedings of the 2nd National Conference-
INDIACom-2008, on Computing for Nation Development, ISBN No-ISSN 0973-7529, ISBN
8. Jensen, John R. Remote Sensing of the Environment: An Earth Resource Perspective. – New
Delhi: Pearson Education, 2006. 560 pp.
10. Joseph, George. Fundamentals of Remote Sensing.- Hyderabad: University Press, 2003. 433
pp.
11. Lillesand, Thomas M and Kiefer, Ralph W.Remote Sensing and Image Interpretation.- New
York: John Willey and Sons. 1987. 721 pp.
12. Maguire, David J, (Ed.), Goodchild, Michael F, (Ed.) and Rhind, David W, (Ed.).
Geographical Information System: Vol. 1: Principles.-Essex:Longman Scientific and Technical,
1992. 649 pp.
13. Reddy Anji, M. Textbook of remote sensing and geographical information system – 2nd ed.-
Hyderabad: B S Publications, 2001. 418 pp.
14. Sabins, Floyd F. Remote Sensing : Principles and interpretation. – San Fransisco.
W.H.Freeman, 1978. 426 pp.
15. Schowengerdt, Robert A. Remote Sensing: models and methods for image precessing.-2nd
ed.-San Diego: Academic Press, 1997. 522 pp.
16. Swain, Philip H and davis, Shirley M. Remote sensing: The quantitative approach.:
Newyork, Mc Grow Hill, 1978. 396 pp.
17. Williams, Jonathan. Geographical Information from Space: Processing and application of
geocoded satellite images. – Chichester: John Willey and Sons, 1995. 210 pp.
Web URLs:
1. http://en.wikipedia.org/wiki/
2. http://geog.hkbu.edu.hk/geog3600/ (Hongkong Baptist University)
3. http://geosun.sjsu.edu/paula/137/ppt/lecture13/sld008.htm
4. http://rst.gsfc.nasa.gov/
5. http://www.ccrs.nrcan.gc.ca/ccrs/eduref/tutorial/tutore.html
6. http://www.cla.sc.edu/gis/avshtcrs/handouts.html
7. http://www.ed.ac.uk/
8. http://www.geoplace.com
9. http://www.gisdevelopment.net
10. http://www.gislinx.com/Software/Programs/MicroStation/index.shtml
11. http://www.gisqatar.org.qa/conf97/links/b4.htm
12. http://www.innovativegis.com/basis/
13. http://www.isro.gov.in
14. http://www.nasa.org
15. http://www.ncgia.ucsb.edu/~spalladi/thesis/Chapter3.html
16. http://www.planweb.co.uk/
17. http://www.sbg.ac.at/geo/idrisi/wwwtutor/tuthome.htm
18. http://www.sli.unimelb.edu.au/gisweb/menu.html
19.https://saylordotorg.github.io/text_essentials-of-geographic-information-systems/s08-data-
models-for-gis.html
20.https://docs.qgis.org/testing/en/docs/gentle_gis_introduction/topology.html
21.https://mgimond.github.io/Spatial/feature-representation.html
22. https://opengeospatial.weebly.com/22-non-spatial-data.html
2.1 OBJECTIVES
2.2 INTRODUCTION
2.3 CHARACTERISTICS OF SPATIAL & NON SPATIAL DATA
2.4 SUMMARY
2.5 GLOSSARY
2.6 ANSWER TO CHECK YOUR PROGRESS
2.7 REFERENCES
2.8 TERMINAL QUESTIONS
2.1 OBJECTIVES
After studying this unit you will be able to:
2.2 INTRODUCTION
Information system plays important role in any decision making. Starting from a common user
who wishes to find path and direction to reach a desired destination to a politician who is
concerned about prioritizing developmental activity in an area or business community
interested to find optimum location of market or city planner and wants to know the areas of
population concentrations, all rely on set of information. Every set of information has a
concern with geographic locations, pattern of change and processes on surface of earth.
Information which pertains to space other than human body representing all that surrounds is
the geographic information. We are more interested in this unit to know how this geographic
information can be described, measured and stored in different forms to be able to facilitate
decision making tools.
The word Data also find its close association with the Latin word ‘datum’ meaning ‘having
being or ‘given’. Technically it is referred to ‘datum is’ and ‘data are’ to highlight the plurality
of its forms. Hence, spatial data can relate to things on which conceptualization, analysis and
inference are based to understand the real world phenomena.
aspect hence becomes an important aspect to understand the nature of data. The illustration
given in the figure 2.1 can help understand this.
The characteristics of information which can be easily distinguished describing where things
are using location or reference system; relationship between those locations which represent
spatial interactions; qualitative and quantitative description of associations forming pattern or
form of phenomena can be grouped as Spatial Information.
The difference in the term would be much easily understood with the help of illustration in
figure 2.2. Map shows the study area representing samples with single
observation/measurement. These points refer to one type of details in this case vegetation type
found in the area. These are referred to as ‘data’. Now, look Map B, now the data has been
combined to provide detail about the area building vegetation zones, this is ‘information’. The
zones were identified plotting line separating one vegetation type from the other on the basis of
corresponding data. Further in Map C, data and information can be combined to illustrate zone
of vegetation as Mangrove (Red and Black Mangrove), Palm (Coconut and Nut) and Citrus
(Lime and Orange). This can be both data and information depending upon end user.
………………………………………………………………………………………………………………
………………………………………………………………………………………………………………………
……..…………………………………………………………………………………………………………………
………..….……………………………………………………………………………………………………………
………………………………………………………………………………………………………………
………………………………………………………………………………………………………………………
………………………………………………………………………………………………………………………
………………………………………………………………………………………………………………………
Types of Data:
All facts and figures collected for specific purposes can be grouped to several categories. On
the basis of method of collection of data they can be grouped as Primary data (data collected
directly from the source), Secondary data (data which has already been collected and currently
made available). Data can also be grouped on the basis of their characteristics like Categorical
data (representing character, like name, gender, age, language etc.) and Numerical data
(represented by numeric values, like age, population numbers, income etc.). Data and
Information related to specific location on Earth surface is generally identified as Geospatial
data and can be grouped into two broad categories for its storage, analysis and manipulation.
They are: spatial and non-spatial data (Fig 2.3). Primarily these data sets are used to catalogue
and create database for computer based application to be used further for data processing and
analysis.
Geo spatial data can be identified with basic geographical structures and can be represented in
the form of precise location, connectedness between locations and enclosed section of locations
in the space in reference to any theme of information and adding a label to it stating what they
are and about its character. A detailed note on spatial and non spatial data follows in the
preceding section.
Spatial Data:
Spatial Data are data that seek connectedness to a place in the Earth. Dictionary defines spatial
data as data that occupies cartographic (map able) space that usually has specific location
according to some geographic referencing system (latitude/longitude) which enable them to be
located in two-dimensional or three-dimensional space.
Spatial data defined by physical characteristics usually include location and position
representing known location on earth.
Spatial data can simply give an address (precise location) and can give magnitude.
Spatial data are data /information about the location and shape of, relationship among
geographic features which is generally stored as co-ordinate and topology (spatial proximity of
object).
ii) Time is an important part of spatial data. The date of data becomes meaningful when
temporal change is determined.
iii) Spatial data also depict spatial characteristics in form of shape of features where
dimensions like area and perimeter becomes significant.
iv) Spatial relationship between and among features also becomes important where
distance becomes a characteristic. Distance from one feature to the other through
simple measurements describes proximity, nearness or connectedness in spatial
relations.
Geographic analysis allows us to study and understand the real world processes. The method of
representation of spatial data is central to its analysis as it enables the user adopt models to
analyze, describe and map the real world phenomena. Computer based operations tools like
GIS (Geographic Information System) enhances the process of spatial analysis combining
meaningful sequences to reveal new or unidentified relationship between datasets which help
better understand the real world phenomena. The scope of spatial analysis ranges from simple
query about spatial phenomena to complex combinations of original and derived data sets.
UNIT 2 - CHARACTERISTICS OF SPATIAL & NON SPATIAL DATA Page 28 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
Data Forms - The computer based aided tools recognizes three data forms to represent the
spatial data as represented in Figure 2.4A and 2.4B: A Point indicating specific location of
feature. It can also represent non physical entities like address of a location or point location of
an accident. It is shown by convenient visual symbol on map viz. a dot or X mark. To be
precise in dimension it should be noted that it do not have real length or width depicted for the
feature. A Line depicting linear feature. It is one dimensional meaning ‘length only’. It has a
beginning and an end. It can also be seen as line joining two point locations, e.g. roads, canal,
river etc. or administrative boundaries. A polygon which is a two dimensional feature and gives
spatial information its magnitude. It is an enclosed area comprising at least three sides.
Concepts like area; perimeter becomes functional and adds detail to its analysis.
Conceptual Model - The spatial data is being organized and processed within GIS as objects,
network or field respectively.
Object based model are preferred for the entities having well defined boundary. It can be
studied as individual phenomena provided it can be separated conceptually from the
neighboring phenomena as discrete entity e.g. river, building, forest, utility centre, roads etc. It
can also be evaluated having specific relation with other objects.
Network based model also subsets of object based models but the emphasis is on the specific
characteristics of interaction within and across multiple objects. The discrete flow of
connectivity is important rather than shape of the phenomena e.g. flow of gas pipeline, air
traffic route or sea navigation routes.
Field models emphasize phenomena that have continuous variable across some region of space.
This may represent two either three dimensional extent e.g. air pollution extent, direction of
wind flow etc.
Data Structure - In order to store and display data in computer the data structure are framed and
data models are created. There are two models or data structures adopted for representing the
spatial data in GIS: raster data and vector data. Raster and Vector data structures are way of
defining spatial data in the computer.
i) Raster Spatial Data Model –The most commonly adopted structure of data is the grid
cell tessellations which regard space as unit of tessellation in a grid. Raster data
structure represents the real world phenomena as a matrix of grid cells. Each cell in the
grid has unique identity usually a code number which refers to a specific attribute
measure, e.g. specific vegetation type in a forest land use, amount of rainfall at a station
or its elevation. It should be noted that the single value in the given space would
represent specific criteria and the overall representation of the landscape would include
several of such codes to represent its varied characteristics.
Raster model also uses layered approach. Each layer indicates a specific theme and
value of individual cell in a layer represents categories of classes within that theme.
Each cell is also known as pixel (picture element). The size of the pixel identified as
number of matrix division of that particular layer relative to the depicted feature of
interest measures effectiveness of its representation.
ii) Vector Spatial Data Model – When the object is likely to be represented as accurately
as possible occurring in the real world, vector data models are used. Vector features are
defined by ‘co-ordinate’ points. The term co-ordinate means the X-Y plane of reference
where the position can be defined precisely. These plane surfaces refer to latitude and
longitude in the spherical co-ordinate system.
Vector data model treat phenomena as sets of composed spatial entities each defined
precisely by a set of coordinates. A vector point is expressed as single X-Y coordinate
position and is represented by a dot or any other symbol for visual convenience. A
vector line has two nodes with a specific beginning and ending vector points. A straight
line would have no vertex whereas a complex line would have vertices with X-Y co-
ordinate pairs. When an arrangement is as such that there are set of pairs of X-Y co-
ordinates on the boundary and there is same point as the beginning and ending node it
makes it a self enclosed line, this represents a polygon (Figure 2.5).
The raster and vector methods to represent spatial data structures are mutually exclusive. As
seen in figure the storing and display of the spatial data have different mode in both the
representations but the choice of the method would certainly depend upon the identified real
world problem and spatial analysis.
Raster methods
Advantages Disadvantages
Simple data structure Volumes of graphic data
Vector methods
Advantages Disadvantages
Good representation of data structure Complex data structures
Accurate graphics with network linkages Simulation is difficult as each
unit has different topological form
Updating and generalization of information is Expensive Technology
possible
The problem of raster or vector data structure choice disappears once it is realized that both
are valid method of spatial data representation and both structures are inter-convertible. But
some of the uses in terms of best representation of spatial data can be enlisted as:
Vector data are best suitable for soil type, land use and digital terrain mapping.
Network Analysis such a communication and transport network is best represented
by vector spatial data model
Raster data structure is chosen for quick map overlay, map combinations and spatial
analysis.
……………………………………………………………………………………………………
……………………………………………………………………………………………………
……………………………………………………………………………………………………
……………………………………………………………………………………………………
……………………………………………………………………………………………………
……………………………………………………………………………………………………
Q. What are different data structures used to represent spatial data in a geographical analysis?
……………………………………………………………………………………………………
……………………………………………………………………………………………………
……………………………………………………………………………………………………
Non spatial data is also known as attribute data. An attribute is the description of a feature that
defines the spatial data. It does not account the geometric considerations. There are many
forms of non spatial data including text description, numbers indicating quantities of some sort,
codes or short description of character etc.
An illustration would help in understand more. A non spatial data is generated by asking
common generic questions which are exclusive of spatial information. E.g. In a city where
every area is coded with a ward number, a simple form of non-spatial data would be the query
about which specific land use type it belongs, population and what the land value within that
land use type is. Non spatial data is independent of the location based identity of features. As in
the example cited above the description of land use, land values, population are not dependent
on their location identities.
Attribute data/ non spatial data can be explained in terms of their qualitative and quantitative
characteristics.
Qualitative Non Spatial data- The data obtained in this category do not have any numeric
description. They are devoid of any measurement and magnitude. A name, explanation, labels
serves as description and letter or number codes are proxy to word description and do not poses
any mathematical meaning. These have no role in any statistical analysis and averages of
numeric scores are meaningless.
As the illustration shows the map a shows a classification of high, medium and low with
numerals 1, 2 and 3. But these numbers are mere cumbersome codes and have been used to
represent the characters in the legend which otherwise would have taken much space and
would look visually messed up. Similarly map B with codes 1, 2 and 3 represents city names
which otherwise was difficult to write on map.
UNIT 2 - CHARACTERISTICS OF SPATIAL & NON SPATIAL DATA Page 33 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
Quantitative Non Spatial Data – With reference to the nature of non spatial data having no
mathematical meaning, the numbers refer mere to measurements of the magnitude of the
feature which they represent. As given in the illustration map C the area depicts land value and
map D refers city population for the point location.
Hence data serves as raw material from which the information base is built up. They are
collected and amassed into records and files. A database is of vital importance as it is
collection of data which can be further used by different users. They are structurally organized
and the categories include quantitative and qualitative data sets.
The data are vital tool of any analysis. The size of data becomes an asset and utmost care has to
be put in to ensure its non redundancy, loss or damage. Data Base Management System
(DBMS) can be referred to as a tool for representing real world oriented model of data on
computers. The entire process of data entry, its classification, abstraction and representation are
associated with it.
Data Storage - Non spatial data stored in GIS are known as attribute tables. The row in the
table represents a special feature and broad characteristics are represented by column.
Technically the row is called a record or tuple where as column is depicted as field or item.
Queries to the database (finding desired dataset in the computer) require database management
software to find the named data or classes of data items. Hence it is necessary to arrange the
data so as the entities and attributes are based on some conceptual models of arrangement of
data in a set format of structure so that the retrieval becomes an easy task. This theoretical
foundation helps in storage, organization and manipulation of the datasets. The following data
models are generally used for non spatial information based on the analysis required. :
UNIT 2 - CHARACTERISTICS OF SPATIAL & NON SPATIAL DATA Page 34 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
Data Models -
Hierarchal Data Models – Based on tree structure relationship where one too many concepts
are implied. Common generic GIS questions of what, who, where and how can be asked and
the data retrieved can be evaluated to show the connection. It composes of hierarchy of nodes
(entities of data) where each lower node is connected to the primary node called root/parent.
Educational
Qualification
Job
Description
Experience
Department required
Network Data Model– Relies on the principle that an item in the data set can be linked to any
other item. Each entry data set is classified as node and relationship sets are seen as linkages by
using pointers and the relationships can be one too many and many too many. The generic
question pertains to analysis of patterns and relationships.
Relational Model – This model is based on the design to relate one set of data with another.
Dataset are chosen from one field which meets the condition and it is then moved to the next
field. In this type, data are organized in two dimensional tables which are easy for users to both
develop and understand. The relation can further be described mathematically.
Objects Oriented Model – The data recognizes object as classes of real world object and uses
the additional information to describe the object through attribute, procedures or method which
operate on them. It uses messages to send to the object to identify them depending on the
property e.g. object identifier would send message to inquire for co-ordinates, area, perimeter
UNIT 2 - CHARACTERISTICS OF SPATIAL & NON SPATIAL DATA Page 35 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
for a polygon which further can be grouped in classes or new ones may be created which has
combinations of sub classes.
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………..
2.4 SUMMARY
In this unit you have learnt the following:
Geographic Data refers to any data on Earth surface or near Earth surface and
Geospatial Data has much precision in terms of reference to location on Earth surface
and/or near Earth surface, e.g. latitude/longitude, GPS location.
Data and can be grouped into two broad categories: Spatial and Non-spatial data.
Spatial Data are data that seek connectedness to a place on the Earth and that occupies
cartographic (map able) space. It has specific location according to some geographic
referencing system (latitude/longitude) which enables them to be located in two-
dimensional or three-dimensional space. Non spatial data is also known as attribute
data. An attribute is the description of a feature that defines the spatial data. It does not
account the geometric considerations.
Both spatial and non spatial data are required to understand phenomena on space and
are used in geographical analysis.
2.5 GLOSSARY
Spatial data- is used to describe any data related to or containing information about a
specific location on the Earth’s surface.
Non spatial data- is data that is independent of geographic location.
Q2. Differentiate between data structures of spatial data and discuss its advantages.
2.7 REFERENCES
Fischer, M. and Wang, J (2011) Spatial Data Analysis: Models, Methods and
Techniques, Springer Publication, USA.
Chou, H.Y (1997) Exploring Spatial Analysis in Geographical Information, Onward
press, USA
Davis, E. B (1996) GIS: A visual Approach, Onward press, USA.
Dalamagas T., Sellis T., Sinos L. (1998) A Visual Database System for Spatial and
Non-spatial Data Management. In: Ioannidis Y., Klas W. (eds) Visual Database
Systems 4 (VDB4). VDB 1998. IFIP — The International Federation for Information
Processing. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-35372-2_6
Q2. Write a detail note on non spatial database for geographical analysis.
3.1 OBJECTIVES
3.2 INTRODUCTION
3.3 TOPOLOGY CREATION AND DATA QUERY
3.4 SUMMARY
3.5 GLOSSARY
3.6 ANSWER TO CHECK YOUR PROGRESS
3.7 REFERENCES
3.8 TERMINAL QUESTIONS
3.1 OBJECTIVES
After reading this unit you will be able to know about:
i. Spatial relationship between different entities.
ii. Errors associate with point, line & polygon.
iii. Importance of topological file format.
3.2 INTRODUCTION
The study of geometric properties that do not change when forms are bent, stretched, or
undergo similar transformations is known as topology. Because the list of neighbors to any
given polygon does not change during geometric stretching or bending, polygon adjacency is
an example of a topologically in variant property. Topology deals with spatial properties that
do not change under specific transformations: a) The relationships between neighborhoods
continue and the boundary lines have both the beginning and end nodes & b) The areas still are
bounded by the same borders, only their perimeter shapes and lengths have changed.
In the topological data files, topological relationships such as adjacency and connectivity
are explicitly recorded. These relationships can be recorded independently of the coordinate
data and thus do not change when the data is stretched or bent, as when converting between
coordinate systems. Topology is the mathematics branch used to determine spatial connections
between entities (ESRI, 1999). Topology is a specific component of the vector representation
model. Topology is said to be present in a vector layer if it contains the spatial relations
between its features. Topology is required for certain analyses and alters how some GIS
operations, such as geometry editing, operate. GIS transmits information through graphic
symbolization (points, lines and polygons), and mathematically retains relationships through
topology concept. For example, you can easily identify crossing streets and adjacent properties
when you stand on a hill and look into the countryside. To identify these links, the
mathematical logic used by a computer is topology. Topology can be stored as a topological
data model (geometric data correction), but topology can also be used for non-topological data
analyses. Creating and storing topological relationships have variety of benefits: a) Data is
efficiently stored to allow fast processing of large datasets, b) Enables the computer to quickly
determine the spatial relations of all characteristics and analyze them, c) Ensure that data is
geometrically correct, d) Improved data quality - detects and corrects digitizing errors and
validates data to ensure accuracy & f) Carrying out some types of spatial analysis (selections,
network analysis)
Figure 3.1 depicts the following spatial relationships: disjoint, meets, equals, inside, covered
by, contains, covers, and overlaps. What are the applications of spatial relationships? These
relationships can be used in queries on a spatial database. Topological relationships can also
be used to ensure to pological consistency of space.
Figure: 3.1. Spatial relationships between two regions derived from the topological invariants
of intersections of boundary and interior.
containment, and connectivity. The geometric relationships that exist between area features
are described by adjacency and containment. Containment is a subset of adjacency that
describes area features that are entirely contained within another area feature. These three
topological relationships will make certain of the following:
Figure:3.2.Topologicalspatial relationships
Connectivity:
Connection is a geometrical property used to describe the connections between line functions,
such as the road network. You can connect to the airport; connect the rivers to the streams, or
take the water treatment plant to a house. You can find a routing route. This is the basis for
many operations for network tracing and tracking. The arc-node data structure has arcs
connecting to each other at nodes. The two arcs have a Node from which the arc begins and a
Node to which the node ends. It is called the topology of the arc-node. By searching for
common node numbers, connected arcs are determined by the list. Arcs 1, 2 and 3 all intersect
in Figure 3.3, since node 11 is shared. Arc 1 can be passed to arc 3 due to its common node
meeting at 11. On the other hand nodes starting from arc 1 to arc 5 are not in a state to turn
either of its direction due to absence of common node. For network analysis, connectivity
answers which line segments are connected?"
Are a definition/containment:
Containment is an adjacent extension that describes area characteristics that can be wholly
included in another area feature. For example, the inner limits (or hole) of the polygon are
defined on an island. An island describes about inner portion of a polygon of the vector
model. The arc node structure reflects polygon basically through arc list instead of closed
loop of set of X & Y values. Polygons are an ordered arc list instead of a closed loop, known
as polygon-arc topology, of (x,y)co-ordinates.
Polygon F is comprised of 8, 9, 10 and 7 arcs in Figure 3.4. (The 0 arc creates an island in the
polygon before seven indicates). Each arc is represented by two polygons (in the illustration
below, arc 6 appears in the list for polygons B and C).The arc co-ordinates are only stored once
because the polygon is simply a list of border arcs, reducing the amount of data and preventing
border overlap between immediate polygons. Containment responses to "Which spatial
characteristics are included in which?"
Figure: 3.4.Polygon-ArcTopologyexample
Contiguity or adjacency:
Contiguity is the topological concept that allows the vectors to determine the neighborhood of
the characteristics that share a border. This is the basis for many surfacing operations in
neighboring countries. When they share a common frontier, areas can be described as
adjacent. The arc is defined by the from node and the to-node. The arches have a right and
left sides, so that the polygons can be determined at both sides. In Figure 3.5, polygon B is on
the left side of Arc 6, and polygon C is on the right. Hence both the polygons located side by
side. The polygon of the universe ensures every arc has the right and the left. "what polygons
are contiguous to which polygon on ground?" and used for spatial analysis of the areal data.
Figure:3.5.Topologycontiguityexample
It is generally the inter relationship of different features among them which is shown to us
through the proper implications of topological rules. The main function of all these
topographical rules is to define the relationship between different objects draw an in the form of
vector data. These rules are highly manageable through the geo database format and this format
helps us to fix diverse errors associated with vector data. "Must not overlap" is a rule used to
maintain the integrity of features in the same feature class, As an example, When two features'
geometries overlap, they are highlighted in red (as the red overlap in nearby polygons and a
linear segment of the following two lines show).For example, suppose you have two types of
road properties: normal roads (which are connected to the other roads at both nodes) and
hillside roads (which are connected to other roads at both nodes) (those which are at a dead end
node). A topology rule may require that road features at both ends be connected to other road
features, with the exception of roads that are of the Dead End subtype. Topology rules can be
divided into three categories based on the type of geometry. These categories are as follows-:
Must be disjoint (Point)-: It is very fundamental with vector data that points of same
feature class or its subtype need not to be overlap in any condition. Violating this rule
will always create error in the database. It is very necessary to eliminate these errors by
UNIT 3 - TOPOLOGY CREATION AND DATA QUERY Page 44 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
Figure3.6:Must be disjoint(Source-ESRI)
Must be covered by endpoint of (Point)-: These errors are mainly created when point
of feature class is not properly covered by the last point of same feature class or its
subtype. This topological rule is useful to mark errors and to correct those errors which
are not on the straight line.
Point must be covered by line (Point)-: This kind of rule is appropriate to eliminate
errors when a point which lies outside any line feature or not covered by the dimensions
of a line feature.
Must be properly inside polygons (Points)-: Each point must be in a polygon, if points
are located outside any polygon then the particular case is considered as an error in
vector data.
Must Be Larger than Cluster tolerance (Line) -: The minimum distance between the
vertices that make up a function is called the cluster tolerance. Cluster tolerances are
used to determine which vertices match. This rule applies to all poly line feature groups
and is required for the topology. When checking the topology, the reduced poly line
function is an error. Properties that violate the law do not affect the law.
Must Not Overlap (Line)-: This rule is very useful to identify the errors in which a
line shares some of its length with other line or its sub type. Lines touch, intersect, and
overlap with each other. This rule will be implemented to the scenarios where two of
the lines will share the same space on ground in real. When two of the lines will
overlap in any condition then this error will occur.
Must Not Intersect (Line)-: Crossing and overlapping of one line with other line or any
arc of same lines from same feature class or its subtype is generates this sort of error
when working with line. This rule will be of highly use when any line or segments of
any line drawn over a same space as other lines. Line errors occur when lines overlap
and dot errors occur when lines cross.
Must Not Intersect With (Line)-: The lines of the same class or sub-structure shall not
cross or surpass part of another line. For instance, if many lines cannot cross or cross,
but one function can affect the internal function of another, use this rule with lines that
never cross segment and occupy the same space as other lines. Line errors occur when
lines overlap and dot errors cross lines.
Figure3.14: Must not intersect with (Source-ESRI)
Must Not Have Dangles (Line)-: Any part or portion of a line within a single feature or
subtype shall affect the end of a line. Use this rule for example, if lines are connected in a
class of features or subtypes. In this example, for stretches ending in cul-de-sac or close
dead ends, you can set exceptions to this rule. At the end of a line no other line or line will
be affected by point mistakes.
Must Not Have Pseudo Nodes (Line)-: The end of a line cannot touch a different line
in a class or subtype. Any part of itself may influence the end of a line. Use this rule for
cleaning data on subdivided lines. Segment of a river system, for example, can be
limited to ends or junctions of hydrological analysis. Point errors occur when the end of
a line only affects the end of a line.
Must not intersect or Touch Interior (Line)-: Lines can touch their ends only and
cannot overlap in a class or subtype of features. Use this rule only when touching the
line at its ends instead of crossing or overlapping the line, for example if lots cannot
cross or overlap the line and only connect at the end of each line function. Line mistakes
occur when two or more lines overlap and when two or more lines cross or touch, dot
errors arise.
Must not intersect or Touch interior with (Line)-: Lines in one class or subtype of
functions may only affect ends and cannot overlap lines in another class or subtype of
features. This rule should only be used when you want to touch a line at its end but not
intersect or overlap it with any other feature class or subtype, such as when plot lines
cannot intersect or overlap block lines. When two or more lines overlap, a line error
occurs, and a dot error occurs when two or more lines cross or touch.
Must not overlap with (Line)-: Lines of the same class or subtype of function shall not
cover the function class or subtype of any other line. For instance, road segments cannot
overlap on flat segments when roads cross and get near rivers. Line errors occur when
the overlap between two feature classes and lines of subtypes. This rule is applicable to
lines that should never share space with lines from another class or subtype of function.
Must be covered by Feature Class of (Line)-: Lines in one class or subtype of feature
should be lines in another class or subtype of feature. Use this rule when you have
multiple lines describing the same geographic location, like when bus lines must be
above road lines. Line errors must be generated by the first-class lines not covered by
second-class lines.
Must be covered by boundary of (Line)-: The boundaries of lines in one feature class or
sub type must match the boundaries of polygons in another feature class or subtype. Use
this rule to model lines that match polygon borders, such as poly line characteristics used
to display block and lot limits, to cover parcel limits. Please follow this instruction. Line
errors occur when lines are not covered by polygon boundaries.
Must be inside (Line)-: Lines of one feature class or subtype must be contained within
polygons of another feature class or subtype. If lines are included within polygons, use
this rule, for example, when streams are within watersheds .Lines that are not in
polygons are created by mistake.
Must not self- intersect (Line)-: Within the class or subtype of the feature, lines must
not cross or overlap? Lines can intersect, cross, and overlap each other. This rule is used,
for example, when contour lines cannot intersect or overlap on their ends without
intersecting or overlapping. Where lines overlap or cross, line mistakes occur.
Must not overlap (Polygon)-: Within a feature class or subtype, polygons must not
overlap. Polygons can be linked together, either at a point or along an edge. This rule
ensures that no polygon feature in the same feature class or subtype overlaps another
polygon feature in the same feature class or subtype, such as when administrative
boundaries such as ZIP Codesor voting districts, or mutually exclusive area
classifications such as landform types, cannot overlap. When polygons overlap, polygon
errors occur.
Must not have gaps (Polygon)-: There must be no gaps between polygons within a feature
class or subtype. Use this rule when all of your polygons must forma continuous surface
with no voids or gaps, such as when soil polygons must form a continuous fabric with
no gaps or voids. Line errors are created by the outlines of void areas within a single
polygon or between polygon boundaries that are not coincident with other polygon
boundaries.
Must not overlap with (Polygon)-: The polygons of the first feature class or sub type
must not overlap the polygons of the second feature class or subtype. Apply this rule
when polygons from one feature class or subtype must not overlap polygons from
another feature class or sub type, such as lakes and land parcels from two different
feature classes. Polygon errors occur when polygons from two feature classes or sub
types overlap.
Must be covered by feature class of (Polygon)-: The second class or sub type of
feature polygons should be covered by polygons in the first class or subtype. Use this
rule if each feature class or subtype includes all polygons from another feature class or
sub type, such as if states are covered by counties. The uncovered areas of the polygons
cause polygon errors in the first feature class or subtype.
Must cover each other (Polygon)-: All first-class polygons and all second-class
polygons must be mutually exclusive. This implies that class 1 (1) must be class 1 (2)
and class 2 (2) must be class 1 of the class 1 feature (1). Use this rule if you want to
cover the same area with polygons from two feature classes or subtypes, such as when
plants and soil need to be covered. If a part of a polygon does not cover one or more
polygons in another feature class or subtype, a polygon error occurs.
Must be covered by (Polygon)-: A single polygon in one class or subtype of feature must
be covered by a polygon in another class or subtype. Use these rules if you want to cover
some polygons in another functional class with some parts of another polygon, such as
when counties must be covered by states. Polygon errors are caused by features that have
polygons in the first class or subtype that are not covered by the second class or subtype of
a single polygon.
Contain one point (Polygon)-: It should be exactly one point for each polygon. It must
be a polygon for each point. To ensure, for example, that the features of a polygon class
and a point feature class are a one-to-one correlation, if parcels must have exact 1 point
of address. The polygons that do not contain exactly one point create polygon errors.
These above explained rules of topology are used to extract out the errors from vector data and
topology also allow us to remove all these unintentional error and make data error free. Error
free data in GIS is always beneficial for all sort of analysis done with these datasets. Initially
when most of the vector data are geo-relational in nature then finding error was very difficult.
At that time only coverage file allows us to find out errors based on topology. But with the
passage of time and also with the emergence of object based data model (Geo-database)
topological editing become very common with vector database. Hence topological file format
always plays an important role in any sort of analysis in GIS.
3.4 SUMMARY
Study of geometric properties that do not change when forms are bent or stretched is known as
topology. Polygon adjacency is an example of a topologically. Topology deals with spatial
properties that don't change under specific transformations. Topology is the mathematics branch
used to determine spatial connections between entities. GIS transmits information through
graphic symbolization (points, line sand polygons) and mathematically retains relationships
through a topology concept. For example, you can easily identify crossing streets and adjacent
properties when you stand on a hill and look into the countryside. Topology can be stored as a
topological data model (geometric data correction) and can also be used for non-topological
data analyses. The data is efficiently stored to allow fast processing of large datasets. Enables
the computer to quickly determine the spatial relations of all characteristics and analyze them.
Ensures that data is geometrically correct. Geometric relationships between spatial entities and
their attributes are critical for spatial analysis and integration in GIS. Because topology is
included in the data model, a single line can represent the shared boundary to denote which side
of the line belongs to which polygon. Although most vector layer operations can be performed
without topology, some, such as network analysis, cannot. If we consider a roads layer, there is
no way to build a network from it fit only contains lines representing roads but no information
about how they are connected.
3.5 GLOSSARY
Topology-: It is the spatial relationship between different entities which remain intact in
any condition.
Node-: A starting or ending point of a rim, linked topologically to all the rims of that rim
Ans-: Topological editing is a type of editing that limits coincident geometry to a topologically
connected graph of edges and nodes.
Ans-: Connection is a geometrical property that describes how line functions, such as the road
network, is connected.
Ans-: Topology rules can also be defined between feature subtypes within one or more feature
classes.
Ans-: topology is used to extract out the errors from vector data and topology also allow us to
remove all these unintentional error and make data error free.
3.7 REFERENCES
Batty, M and Xie, Y., Model structures, exploratory spatial data analysis, and aggregation,
International Journal of Geographical Information Systems, 1994, 8:291-307.
Bhalla, N., Object-oriented data models: a perspective and comparative review, Journal of
Information
Bregt, A. K., Denneboom, J, Gesink, H. J., and van Randen, Y., Determination of rasterizing
error: a
case study with the soil map of The Netherlands, International Journal of Geographical
Information Systems, 1991, 5:361-367.
Carrara, A., Bitelli, G., and Carla, R., Comparison of techniques for generating digital terrain
models
from contour lines, International Journal of Geographical Information Systems, 1997, 11:451-
473.
Congalton, R.G., Exploring and evaluating the consequences of vector-to-raster and raster-to-
vector conversion, Photogrammetric Engineering and Remote Sensing, 63:425-434.
Holroyd, F. and Bell, S. B. M., Raster GIS: Models of raster encoding, Computers and
Geosciences,
1992, 18:419-426.
Joao, E. M., Causes and Consequences of Map Generalization, Taylor and Francis, London,
1998.
Kumler, M.P., An intensive comparison of triangulated irregular networks (TINs) and digital
elevation models, Cartographica, 1994, 31:1-99.
Langram, G., Time in Geographical Information Systems, Taylor and Francis, London, 1992.
Lee, J., Comparison of existing methods for building triangular irregular network models of
terrain from grid digital elevation models, International Journal of Geographical Information
Systems, 5:267-285.
Maquire, D. J., Goodchild, M. F., and Rhind, D. eds., Geographical Information Systems:
Principles and Applications, Longman Scientific, Harlow, 1991.
UNIT 3 - TOPOLOGY CREATION AND DATA QUERY Page 61 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
21:66-113.
Peuquet, D. J., An examination of techniques for reformatting digital cartographic data. Part
II: the raster to vector process, Cartographica, 1981, 18:375-394.
Piwowar, J. M., LeDrew, E. F., and Dudycha, D. J., Integration of spatial data in vector and
raster formats in geographical information systems, International Journal of Geographical
Information Systems,1990, 4:429-444.
Peuker, T. K. and Chrisman, N., Cartographic Data Structures, The American Cartographer,
1975, 2:55- 69.
Rossiter, D. G., A theoretical framework for land evaluation, Geoderma, 1996, 72:165-190.
Shaffer, C.A., Samet, H., and Nelson R. C., QUILT: a geographic information system based
on
Sklar, F. and Costanza, R. Quantitative methods in landscape ecology: the analysis and
interpretation of landscape heterogeneity. in: Turner, M. and Gardner, R., editors. The
development of dynamic spatial models for landscape ecology: A review and prognosis. New
York: Springer-Verlag; 90:239-288.
Tomlinson, R. F., The impact of the transition from analogue to digital cartographic
representation, The American Cartographer, 1988, 15:249-262.
Wedhe, M., Grid cell size in relation to errors in maps and inventories produced by
computerized map
Worboys, M. F., GIS: A Computing Perspective, Taylor and Francis, London, 1995.
Zeiler, M., Modeling Our World: The ESRI Guide to Geodatabase Design, ESRI Press,
Redlands, 1999.
Textbook of Remote Sensing and Geographical Information System, M.Anji Reddy, Second
UNIT 3 - TOPOLOGY CREATION AND DATA QUERY Page 62 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
Edition, Pp 1-23.
4.1 OBJECTIVES
4.2 INTRODUCTION
4.3 DATA MANIPULATION
4.4 SUMMARY
4.5 GLOSSARY
4.6 ANSWER TO CHECK YOUR PROGRESS
4.7 REFERENCES
4.8 TERMINAL QUESTIONS
4.1 OBJECTIVES
After going through this unit the learner will able to learn:
4.2 INTRODUCTION
Data manipulation is the process used to modify or alter the information in a more orderly
and readable manner. We use DML to do this. Now question is what does DML mean? Well,
the Data Manipulation language Terminology means that we can add, exclude, and switch
records, i.e. change the records so that we can interpret. In other words, manipulation of data
is the modification of information to make comprehension easier or more formal, for
example, a data log in alphabetical order should be sorted, which makes it easier to find
individual entries. Data manipulation on Web server logs also enables the administrator of the
website to track its most popular sites and traffic sources.
What is Data?
Data are the facts, quantities or statistics which are collected and stored together for analysing
which helps in providing an information. With evolution in time data can be used in scientific
research, financial or business matters and e-governance. Many a times, use of data can be
differentiated in 4 categories -
Prescriptive- it illustrates the 'what should happen' condition of the data as it will help
in processing and prescribing to what extent or measures to improve the outcomes or
correct the problem.
Today's data transform corporate operations. All depends on data, from corporate decision-
making to daily operations. None of this can be done without transforming raw data into
accessible information, particularly when there are several data and various sources. This is
where the processing or manipulation of data is concerned. Now, question arises what is data
manipulation?
Data manipulation is the process used to modify or alter the information in a more
orderly and readable manner. We use DML to do this. Now question is what does DML
mean? Well, the Data Manipulation language Terminology means that we can add, exclude,
and switch records, i.e. change the records so that we can interpret. In other words,
manipulation of data is the modification of information to make comprehension easier or
more formal, for example, a data log in alphabetical order should be sorted, which makes it
easier to find individual entries. Data manipulation on Web server logs also enables the
administrator of the website to track its most popular sites and traffic sources.
For enterprise processes and optimisation, data manipulation is essential. In order to better
use and transform it into usable knowledge such as analysis of financial data, customer
behaviour and trend analysis, you must be able to work with the data in the required format.
Data manipulation thus offers many advantages to a business, including:-
- Consistent data: In a standardised format, data can be organised to make it readable and
understandable. One may not have the united perspective while taking data from multiple
sources, but one will ensure that the data is consistently organised and processed through data
manipulation and commands.
- Project data: In particular when it comes to finance, it is critical for businesses to use
historical information to predict their future and to do more thorough analysis. The data
manipulation makes this function possible.
- Create more value from the data: In addition, one can do something about data by
converting, modifying, removing and adding data into a database. If information continues to
be stagnant, it is worthless. However, if you know how to use your results, you can get a
strong insight into better business choices.
- Remove or ignore unwanted data: Data that cannot be used will still interfere with what is
essential. Inaccurate or unnecessary data must be removed and cleaned. Through data
manipulation you can quickly clear records so, that we can deal with the documents that we
need.
DML or data manipulation language is used to make data more organized or readable. DML
is a language for the computer programming used to insert, omit and modify data into a
database. It makes cleansing and mapping data simple for further analysis. Structured query
language is a widely used language for data manipulation. We need SQL (structured query
language) to communicate with the database, and there can be four functions during this
communication:
Select
Update
Insert
Delete
Through these commands we will command a certain machine what to do with data or at least
a chunk of chosen data.
- SELECT: The selected declaration helps users to draw a data base selection into the work.
You say what to choose to the computer and where to choose.
- UPDATE: One uses the UPDATE statement to modify data which already exist. One will
instruct the archive to upgrade certain data sets and new information to be entered either with
one or multiple records at a time.
- INSERT: By using the INSERT statement, one can transfer data from one position to the
next.
- DELETE: By using the DELETE expression to get rid of current records in a table. One
instructs the machine when to remove and what files you want to remove.
Since SQL cannot import or export data from external sources, certain vendors can store data
and provide you with the necessary resources to manipulate data for your business needs.
Data manipulation language (DML) is a database language that helps users to view or modify
information on a structured data model. Basically, there are two categories of data
manipulation language:
Manipulation of data is a major problem for the process of optimization as it makes available
data to produce insights including financial information analysis, consumer behaviour
analysis and pattern analysis. During integration, the technique is commonly used for data
compatibility with the target device. For example, accountants handle raw data collected by
retailers and marketing in order to understand the prices of products, pricing rates or future
tax needs. Likewise, stock market analysts leverage data so that they can plan their
investment portfolio accordingly in order to predict market patterns.
There are a number of uses of the data manipulation. Some more ways in which manipulation
can be useful for organisations:
Data Consistency: It's easier to organise, read and interpret data in a consistent
data format. When data are derived from various sources, a single format needs to be
transformed and manipulated. It is easier to type or use data for reporting after
standardising the format in the enterprise system.
Data Projection: Data manipulation permits the use of historical data for future
projections and a systematic study especially in the field of urban planning.
Value Generation: One can update, alter, suppress, and input data in a database by
using data manipulation. This ensures that one can use data for in-depth insights and
smarter business choices.
Redundant Data Removal: Data from sources also contain redundant, incorrect or
unnecessary data. In order to use these data, it needs to be checked by accuracy and
filters applied to extract information that is important to your business. You can
quickly clean the data using data manipulation so that the data matter can be filtered
out.
The best way to manipulate the data is by the use of software that has integrated, automatic
data management features including data cleaning, mapping, aggregating and storage. These
tools spare one from the difficulty of manually inserting the data and repeating low-value
tasks. In addition, these tools support the workflow functions for producing and delivering
reports without human interaction.
The main five measures that are used for the efficient handling of data are given below:
Python data manipulation and R data manipulation are key elements of data manipulation.
Now let's consider how to handle data before going through the more detailed concepts of
Python and R data handling.
One knows how to use MS Excel most certainly. Some tips for manipulating Excel
information are provided here.
1. Formulas and functions – One of the nice things about excel is that one can rely on
important mathematical features to make the data more worthwhile.
2. Autofill in Excel- This function is helpful if one choose to use the same equation in
several cells. One way to do this is to re-type it. Another method is to drag the mouse
down to the bottom right corner of the cell. It helps in using the same formula in
several rows at the same time.
3. Sort and Filter- When reviewing results, users can save a lot of time by sorting and
filtering Excel options.
4. Removing duplicates- In the course of data collection and assimilation there are
always chances of duplication of data. The Delete Duplicate function in Excel will
help to clear duplicate table entries.
One should also learn about data modification now as data manipulation has been discussed.
Although these two words sound identical, they cannot be interchanged. Data manipulation
typically results in the analysis of new and more sophisticated data through logic or raw data
calculation. Modification of data on the other hand means that the same data values or the
data themselves are changed. It can sound very complicated, for example, assume we have an
X=5 value. We can present the value as X=2+3 or X=1+4, X=6-1 etc. this represents data
manipulation, which is an examples of how the given value can be read by logic. Data
modification implies to change the reference to X=7 itself.
Now, how can we use data modification to support market choices through data
manipulation? Well, data modification can be used to calculate financial objectives if several
data sources are processed through manipulation.
4.4 SUMMARY
In this chapter we have learned about data manipulation techniques that are a process to
organise and make data easy to understand and analyse. This is accomplished by data
manipulation language and is divided in declarative DML which tells what problem to be
solved without specifying the exact steps. In fact, procedural DML explains how to solve the
problem. Structured query language is an example of declarative DML which occurs through
four function- select, update, insert and delete. On the other hand, examples of procedural
DML are FORTRAN, COBOL, and ALGOL etc. Data manipulation is a useful and important
function for analysing financial data, for business purposes, performing research analysis.
The chapter also explains the basic steps of data manipulation which includes - defining why
one needs data analysis, collection of data from sources, cleaning of unnecessary data,
analysis of Data and finally interpretation of results and their applications.
4.5 GLOSSARY
DATABASE -As simply as possible, this is a storage space for data. We mostly use
databases with a Database Management System (DBMS), like PostgreSQL or
MySQL. These are computer applications that allow us to interact with a database to
collect and analyze the information inside.
Q.1 In SQL, which of the following is not Data manipulation language command?
Q.2 The language based application programs to request data from the DBMS is referred to
as?
4.7 REFERENCES
https://www.jigsawacademy.com/blogs/data-science/data-manipulation/amp/#
https://www.computerhope.com/jargon/d/datamani.htm
https://www.astera.com/type/blog/data-manipulation-tools/
https://whatagraph.com/blog/articles/data-manipulation
https://www.digitalvidya.com/blog/data-manipulation/amp/
Chang, Kang‐tsung Introduction to Geographic Information Systems 5 th edition 2009
Mcgraw‐Hill.
5.1 OBJECTIVES
5.2 INTRODUCTION
5.3 RASTER DATA MANIPULATION AND
RECLASSIFICATION
5.4 SUMMARY
5.5 GLOSSARY
5.6 ANSWER TO CHECK YOUR PROGRESS
5.7 REFERENCES
5.8 TERMINAL QUESTIONS
5.1 OBJECTIVES
After going through this unit, the learner will be able to learn
1. Understand and meaning of Data and their types.
2. Learn about Raster data manipulation tools and techniques.
3. Learn about Reclassification of Raster data.
5.2 INTRODUCTION
Data is a Latin word that refers to information that is expressed in the form of a digit/number,
symbol, or letter and is used to reflect the status of any geographical object, as well as its
behaviour or outcome. The position and attributes of spatial features on surface of the earth
are defined by data that is geographically referenced to the earth's surface. Location refers to
the location on the earth's surface, while characteristics refer to the name of the location, the
number of people going to or visiting that location, the form of settlement, transportation and
communication options, and so on. Geoinformatics considers two types of data. First is
spatial data and second is non-spatial data. Spatial data provides the information about the
location, shape and size of objects, and non-spatial data, also called attribute data provides
the information about spatial characteristics. Non-spatial data is independent from
geometrical information of objects.
in Raster data manipulation. The raster data model covers the space with a regular grid, and
each grid cell's value represents the characteristics of a spatial phenomenon at that cell's
position. This simple raster with fixed cell positions data structure is not only
computationally effective, but it also facilitates a large number of data analysis operations.
Raster data analysis is focused on cells and rasters, as opposed to vector data analysis, which
is based on geometric objects such as point, line, and polygon. Individual cells, groups of
cells, and cells within an entire raster can all be analysed using raster data. Some raster data
operations only use one raster, while others use two or more. The cell type value is an
important factor in analysis of raster data. Mean and standard deviation are optimised for
numeric values, while majority (the most frequently occurring cell value) is designed for both
numeric and categorical values.
Raster format is used to store a variety of data types. Raster data analysis, on the other hand,
is limited to software-specific raster data, such as ArcGIS ESRI grids. As a result, to use
DEMs and other raster data in data processing, they first be processed and transformed to
software-specific raster data.
The general tools for raster data manipulation are covered in this chapter. The analysis of
raster data environment, including the area for analysis and the output cell size, is described
in the following section.
Management of Raster Data - We often need to clip or combine raster data found
online to match the study area in a GIS project. To clip a raster, we can use the larger raster
as the input and assign an analysis mask or the min and max x-y coordinates of a rectangular
region for the analysis environment. Mosaic is a technique for merging collective input raster
data into a discrete raster data. If the input rasters overlay, maximum GIS packages supply
options for editing the cell values in the overlayed areas. For overlapping regions, ArcGIS,
intends the user to choose the data from the foremost input raster or the mixing of data from
the input data of rasters. If there are little gaps between the input data of rasters, one way is to
put data in unknown values using neighbourhood mean operations.
Extract by attribute function generates a new raster with cell values that match the query
expression. We may, for example, generate a new raster file inside a specific elevation
region. On the output, those cells outside the elevation zone have no data.
Figure 5.2: A circle, in white, is used to retrieve values of cell from raster that is inputted.
The output raster covers the same area as the input raster, but there lies no data, which is
outside of that particular circular area.
Source: Chang, K. T. (2019).
Figure 5.3: An Aggregate technique generates a raster with lower-resolution from the input (a). The
procedure makes use of the mean statistics as well as a factor of two. To perform this operation, the cell value of 4 in (b)
is the mean of values in second box in right in (a).
Figure 5.4: In the output, each cell (b) has a discrete number that relates the
connected region to which it lies in the input (a). For example, the connected region
which has the similar cell value of 3 in (a) has a discrete number of 4 in (b).
Source: Chang, K. T. (2019).
operation, on the other hand, associates each input raster with a single collection of cell
values. In other words, a local operation based on raster will require a raster for each
characteristic to query or analyse the same stand and soil features as above. If the data sets,
which is going to be examined have a large number of features or attributes which share the
same geometry, an overlay operation based on vector is more effective than a raster-based
local operation.
Buffering - Both buffering operation based on vector and a physical distance measure
operation based on raster computes distances from specific features. However, in at least two
ways, they vary. To begin, a buffering operation measures distances using x and y
coordinates, while a raster-based operation measures physical distances using cells. As a
result, a buffering operation will construct more precise buffer zones than a raster-based
operation. This disparity in precision can be critical, for example, when executing functions
of riparian zone management. Second, a buffering process is more adaptable and versatile. A
buffering process, for example, can produce several rings (buffer zones), while a process
based on raster produces continual distance measurements. To identify buffer zones from
continuous distance measures, additional data processing is needed. A buffering operation
may create individual buffer zones for each selected attribute or a buffer zone in dissolved
state for all selected attributes. Using a raster-based process, it would be difficult to construct
and control separate distance measurements.
MOSAIC:
A mosaic is comprised of two or more images that have been combined or merged. By
mosaicking multiple raster datasets together in ArcGIS, there can be generated a single
dataset of raster file. There can be also made a mosaic dataset and a virtual mosaic from a
series of datasets of raster.
Figure 5.8: The figure illustrates how six neighbouring datasets of raster are mosaicked into a
single dataset of raster.
Source: https://desktop.arcgis.com
In certain instances, the edges of the raster datasets that are being mosaicked together can
overlap.
Figure 5.9: The figure illustrates the edges of the raster datasets that are being mosaicked
together can overlap. Source: https://desktop.arcgis.com
These overlapping areas can be done in a variety of ways, including keeping only raster data
from the initial or end dataset, using a weight-based algorithm to blend the overlapping cell
values, taking the mean of the overlaying values of cell, or taking the least or highest value.
The First, Least, or Highest options produce the maximum significant results when
mosaicking discrete data. Continuous data is better served by the Blend and Mean options.
The output is floating point if all of the input rasters are floating point. The output is integer if
all of the inputs are integer and First, Least, or Highest are used.
You may use the mosaic dataset to apply a number of other methods of mosaicking to a
dynamic mosaic or an exported mosaic, mosaicked raster dataset. Sorting by attributes, using
a seamline, and other techniques are among them. If the raster dataset contains a colour map,
there are many choices for handling it. You may use the colour map from the first or last
dataset of raster in the mosaic, or confirm all of the colours in the final colour map are
similar. You may also select whether or not to mosaic any rasters that contain a colour map.
You may also perform colour corrections on the raster datasets that are being mosaicked by
choosing to colour balance or colour match them. The colour correction is done with a
dodging technique in colour balancing. For each band, a global gamma value and contrast
adjustment are calculated, and these values are then used to calculate the final value for each
pixel in output. When displaying a catalogue of raster in ArcMap or on a dataset of mosaic,
this option is available, and it can be added perpetually during the use of Raster Catalogue to
Raster Dataset tool. The pixel values of the overlapped regions between the reference and
source rasters are synchronised by colour matching. The matching algorithm is applied to the
source rasters after it has been calculated in the overlap regions. To interpolate the correct
matching of colour from the reference raster to the source rasters, colour matching can use
one of three methods:
Statistics Matching - The colour transformation is applied to the source datasets after
the statistical variations between the reference overlap region and the source overlap
region are matched.
Histogram Matching - The colour transformation is added to the source datasets after
the histogram from the reference overlap region is compared with the histogram from
the source overlap region.
Linear Correlation - A weighted average can be used to align overlapped pixels and
interpolate to the remainder of the source pixels that do not have a one-to-one
relationship.
When viewing a raster catalogue in ArcMap, when applying mosaicking methods, or when
showing a mosaic dataset, colour matching can be conducted.
The schema of a mosaicked raster dataset is the same as that of every other raster
dataset.
The number of bands in all of the datasets of raster and the produced mosaic of raster
should be the similar; else, the mosaic can’t be generated.
Mosaicking two or more rasters of the similar spatial reference and pixel size into a
single raster is feasible. If the spatial reference of second raster dataset dissent from
the dataset of first raster, the spatial reference of second raster dataset will be ignored,
and its data will be converted into the first raster dataset's spatial reference. In this
situation, the Project Raster function is suggested to ensure that the data is not
affected.
Reclassification:
In classification, basically we group or assigns certain attributes to a class on basis of
attribute value, that is, vector data or we group pixels to a class on basis of pixel values, that
is, raster data. This process is called classification.
When we group or classify the already grouped data or classified data, that process is called
the reclassification.
Classification is a technique of purposefully removing the details form an input data to reveal
or to get the pattern out from the data.
Reclassification removes the details from an input dataset in order to get or to reveal the
important spatial patterns. Reclassification reduces the number of classes and eliminates the
details. If the input data set itself is the resultant of a classification, then it is callled as
reclassification. Reclassification of data can be in different systems for different-different
purposes. Also, based on specific attribute values, some codes are assigned.
Example: A soil map.
Soil map is already a classified data, classified in few classes of different types of soil. And
this soil map or soil types is reclassified into soil suitability analysis for a particular crop.
Thus soil type is reclassified in two classes.
1. Soil suitability class – to that particular crop.
2. Soil unsuitability class – to that particular crop.
Classification – reclassification is based on Automatic classification and Manual or User
controlled classification.
Figure 5.11: Polygons belonging to one class can be merged to make a single feature
Source: Study material IIRS Outreach Programme
5.4 SUMMARY
In this unit, we have discussed about data and their types, types of spatial data. Then we have
discussed about raster data manipulation tools and techniques like data analysis environment
in which an analysis mask is defined which restricts analysis to cells which does not contain
the cell value "no data.” Then we have discussed and learned about different raster data
operations such as raster data management, raster data extraction and raster data
generalization. After this, we have learned about physical distance measure operations, in
which, allocation and direction and applications of physical distance measure operations are
discussed.
Further, we have seen comparison of vector and raster based data analysis, which is
compared on the basis of Overlay and Buffering operations. After this, we have learned about
Mosaic technique, which is comprised of two or more images that have been combined or
merged. By mosaicking multiple raster datasets together in ArcGIS, you can generate a single
raster dataset. After this, we have learned about characteristics of mosaicked raster data.
in the end, we have discussed about Reclassification and their types. Then we have learned
about Merge process in reclassification.
5.5 GLOSSARY
Analysis mask: A mask that restricts analysis of raster data to cells which don’t
acquire the cell value of no data.
Mosaic: An operation of raster data that can compile multiple input rasters into a
single raster.
Physical distance: Physical distance is a straight-line distance, which is in between
the cells.
5.7 REFERENCES
1. Chang, K. T. (2009). Introduction to Geographic Information Systems, 5e.
2. Chang, K. T. (2019). Introduction to Geographic Information Systems, 9e.
3. Tomlin, C. D. (1990). Geographic information systems and cartographic
modelling (No. 910.011 T659g). New Jersey, US: Prentice-Hall.
4. Beguería, S., & Vicente-Serrano, S. M. (2006). Mapping the hazard of extreme
rainfall by peaks over threshold extreme value analysis and spatial regression
techniques. Journal of applied meteorology and climatology, 45(1), 108-124.
5. Lillesand, T., Kiefer, R. W., & Chipman, J. (2015). Remote sensing and image
interpretation. John Wiley & Sons.
6. https://desktop.arcgis.com/en/arcmap/10.3/manage-data/raster-and-
images/what-is-a-
mosaic.htm#:~:text=A%20mosaic%20is%20a%20combination,a%20collection
%20of%20raster%20datasets.
7. https://eclasscms.iirs.gov.in/cms_admin/projectFile/23%20April%202020_Spatia
l%20Analysis%20%E2%80%93%20Introductory%20Concept%20and%20Ove
rview%20by%20Shri.%20Prabhakar%20Alok%20Verma.pdf
8. https://8945e053-a-62cb3a1a-s-
sites.googlegroups.com/site/ignouhelpbooks302/Block-
2%20Concept%20of%20Geospatial%20Data.zip?attachauth=ANoY7copxVti-
tN0OhwyM6gyE6j153NLKVnDQimQLI1KjBxW8_I97XfZ0aaLOGlvgXaHaAn
xNEl8SgRCw15APiYqbBhzpA69jAd7YN3WHSabckIgjZCbM1f2F4CYFpDNEp
yCpdLsadTFcXGZ69Y01WibVb7Dp5Ru6z2ePJy9JzgBzDDNFuqLXhsb067fFRl
4PgQX-urtjdMjdcIbvcfC0imzXcAusOcRey-jb73kq6BjQJVfop-4kF37-
iDqv09aUsxjPoMGdFs-&attredirects=0&d=1
9. Chang, Kang‐tsung Introduction to Geographic Information Systems 9th edition 2016
Mcgraw‐Hill.
10. Lillesand, Thomas M., Ralph W. Kiefer, and Jonatham W.Chipman, 2004
Q.3 Compare the vector and raster based data analysis on the basis of different operations?
6.1 OBJECTIVES
6.2 INTRODUCTION
6.3 RASTER DATA ANALYSIS-LOCAL, FOCAL, ZONAL AND
GLOBAL
6.4 SUMMARY
6.5 GLOSSARY
6.6 ANSWER TO CHECK YOUR PROGRESS
6.7 REFERENCES
6.8 TERMINAL QUESTIONS
UNIT 6 - RASTER DATA ANALYSIS-LOCAL, FOCAL, ZONAL AND GLOBAL Page 89 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
6.1 OBJECTIVES
After reading this unit you will be able to understand:
i. Different analysis based on raster data.
ii. Find and execute different forms of raster data analysis based onrequirements.
iii. Raster data processing based on local, focal, zonal and global operations.
6.2 INTRODUCTION
Raster analysis is similar to vector analysis in many ways. There are, however, some significant
differences. The primary distinctions between raster and vector modelling are determined by the
nature of the data models. Because datasets are stored in a common coordinate framework, all
operations are possible in both raster and vector analysis. Every coordinate in the planar section
is contained or falls within.
On the other hand, raster analysis only strengthens its spatial relations at the location of
the cell. Raster operations are generally carried out with multiple raster data sets, which are the
UNIT 6 - RASTER DATA ANALYSIS-LOCAL, FOCAL, ZONAL AND GLOBAL Page 90 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
result of cell-by-cell cell computations. The output value for a single cell is typically independent
of the input or output cell's value or location. Output cell values, for example, are in some cases
influenced by the surrounding cells or cell groups. Raster data are particularly suitable for
ongoing data. Data change across a landscape or surface smoothly. The raster data structures are
much more concerned with phenomena such as chemical concentration, slope, elevation and look
than with vector data structures. This makes many analyses more appropriate or possible only
with raster data. This section and the following section will explain the basics and some of the
most common analytical instruments for raster data processing. Spatial operations may occur
with raster data. Although the actual calculations are substantially different from their vectors,
their conceptual support is similar. The analysis of raster data covers geo-processing techniques
with single as well as multiple layer operations.
Single Layer Analysis-: One of the first phases in a dataset is often reclassified or coded by
raster analyses. Reclassification is basically a layer process in which all data pixels are assigned
a new class or range value based on their original values. For example, for almost every cell
within its reach, a different value is usually stored in an elevation grid. If each pixel value is
added in several discrete classes (e.g., 0–100 = "1," 101–200 = "2," 201–300 = "3," and so on),
these can be simplified. This simplification allows for fewer unique values and less expensive
storage.
UNIT 6 - RASTER DATA ANALYSIS-LOCAL, FOCAL, ZONAL AND GLOBAL Page 91 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
Vector data commonly use different operations which always play an important role while
analyzing different dataset. Buffering and overlay analysis are two of them which always enable
researchers or professional to acquire their result in precise manner. Buffering is the process of
producing an output dataset with a specified width zone (or zones) surrounding an input feature.
In raster datasets, these input features are represented as a grid cell or a group of grid cells with a
uniform value (e.g., buffer all cells with a value of 1). Buffers come in handy when determining
the influence area around specific features. Vector data buffers produce a precise area of
influence at a specified distance from the target feature, whereas raster buffers are
approximations representing those cells within the target feature's specified distance range
(Figure 6.2 "Raster Buffer around a Target Cell(s)"). The majority of geographic information
systems (GIS) calculate raster buffers by establishing a grid for distance from the centre of the
target cell(s) to the centre of the nearby cells and reclassifying those distances so that "1"
represents the cells that comprise the original goal. These cells could be further classified by
including values of "3," "4," "5," and so on, as the representative of multiple ring buffers (s).
Multiple Layer Analysis: - A vector data set, like, raster can also be cut (Figure 6.3 "Clipping a
Raster to a Vector Polygon Layer"). The input raster is overlaid with a vector polygon clip
layer. The raster clip process leads to a raster identical to the raster, but which shares the
raster clip layer.
UNIT 6 - RASTER DATA ANALYSIS-LOCAL, FOCAL, ZONAL AND GLOBAL Page 92 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
Scale of Analysis:
Raster analyses are basically of four types which differ from each other on the basis of is basic
operations. All these operations are providing unique set of functions which offers distinct
functions to GIS analyst.
Local Operations:
The output pixel is a weighted combination of the grey values of the pixels in the vicinity of the
input pixel, hence the term local neighbourhood operation is found in many common image
processing operations. The quarter size and the pixel weights determine the operator's action.
This concept was introduced already when the discussion of realistic picture sampling considered
pre-filtering. The discussion of transformations in the image will now be formalized and serves
as a basis. On a single or multiple rasters, local operations can be performed. A single raster's
local operation usually entails applying a mathematical transformation to each individual cell in
the grid. For example, a researcher could obtain a digital elevation (DEM) model in which each
cell value represents a feet elevation. If these elevations are preferred to be represented in metres,
the cell values can be carried out locally with a simple, arithmetical transformation (original feet
UNIT 6 - RASTER DATA ANALYSIS-LOCAL, FOCAL, ZONAL AND GLOBAL Page 93 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
elevation*0.3048 i.e. new metre elevation). The output pixel is a weighted combination of the
grey values of the pixels in the vicinity of the input pixel. The quarter size and the pixel weights
determine the operator's action. On single or multiple rasters local operations can be performed.
A local operation, if used on a single raster, usually takes the form of applying a mathematical
transfer to each cell in the grid. A researcher, for example, could obtain a digital elevation model
in which each cell value represents elevation in feet.
Using multiple rasters, analyses such as changes are possible over time. It is simple to
remove these values and place the difference in the output raster that will show the variation in
groundwater between these two times due to the two rasters with groundwater depth information
on a parcel of land during 2000 and 2010. (Figure 8.5"Local Operation on a Raster Dataset").
However, as the number of input rasters grows, such local analyses can become slightly more
complicated. In the Universal Soil Loss Equation, for example, a local mathematics formula for
several rasters such as rainfall intensity, soil erodibility, and slope can be used (USLE).
Local operations are cell-by-cell operations, which form the core of raster data analysis.A local
operation can create a new raster from one or more input rasters. A function relating to the input
to the output is used or a classification table assigns the cell value of the new raster.
Local Operations with a Single Raster - In the case of single raster as an input, each cell value
in the output raster is calculated by the local operation as a cell value in the input raster at the
same place. As shown in Figure 6.4, large numbers of mathematical operators are available at
GIS platform. The GIS function can be a computerised GIS tool or a mathematical operator.
For example, turning the floating point raster into an integer raster is a simple local operation
which uses the integer operator to cell-by-cell truncates the cell value at the decimal point.
Converting a slope raster measured in one degree is also a local operation, but a more
UNIT 6 - RASTER DATA ANALYSIS-LOCAL, FOCAL, ZONAL AND GLOBAL Page 94 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
(a) (b)
Local operation creates a new classification raster. Reclassification is also known as recoding or
transforming by search tables (Tomlin 1990). There may be two methods of reclassification. A
one-to-one change is the first method, which means that the output raster assigns a new cell value
to the input raster. For example, the value of 1 in the output raster is allocated for irrigated
croplands in a land-use raster. Three main purposes are used for reclassification. First, a
reclassified raster can be made. For example, a raster may have 1 for a slope of 0 to 10 percent, 2
for 10 to 20 percent, and so on, rather than a continuous slope value. Second, a new raster can be
created which contains a single category or value such as a 10-20 percent pitch. Thirdly, a new
raster can be created, showing cell values in the raster input. A reclassified raster, for instance,
can show the 1-5 ranking with 1 less adequate and 5 more appropriate.
UNIT 6 - RASTER DATA ANALYSIS-LOCAL, FOCAL, ZONAL AND GLOBAL Page 95 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
some local operations. A local operation known as Combine gives each single combination of
input values a unique output value. Suppose there are three cell values for a slope raster (0, 20,
40 and 40 percent) and four cell values for an aspect raster (north, east, south, and west aspects).
For every unique combination of slope and aspect, 1 creates a result raster with a value greater
than 40 percent. Slope and southern aspect, 2 with 20 to 40% slope, southern aspect, etc. Local
operations in GIS can easily be carried out through different software’s. Arc GIS can be used to
carry out these operations and result will be used for several other analyses.
UNIT 6 - RASTER DATA ANALYSIS-LOCAL, FOCAL, ZONAL AND GLOBAL Page 96 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
Figure:-6.4 Common Neighborhood Types around Target Cell “x”: (a) 3 by 3, (b)Circle, (c)
Annulus, (d) Wedge
The cell values within the neighborhood are typically used in the calculation process and then the
calculated value is assigned to the central cell. The focal cell is moved from one cell to another
until all the cells are visited to complete a neighborhood operation on a grid. Different rules
designed by the developers of GIS software are used in a raster-margin focal cell that does not
have a district like a 3-by-3 rectangle. A simple rule is to use only cell values in the
neighborhood available for computation (e.g. 6 rather than 9). Although an operation in the
neighborhood works with a single raster, it does not work with multiple raster. Summary
statistics including maximum, minimum, range, total, mean, median and standard deviation, as
well as the table of measures such as majority, minority or diversion, can be found in the results
of a neighborhood operation. These statistics and measures are the same as those for multiple
raster local operations. On raster datasets, neighbourhood operations are commonly used for data
simplification. Because the averaging process reduces the influence of outlying data values, an
analysis that averages neighbourhood values would result in a smoothed output raster with
dampened highs and lows. Neighborhood analyses, on the other hand, can be used to exaggerate
UNIT 6 - RASTER DATA ANALYSIS-LOCAL, FOCAL, ZONAL AND GLOBAL Page 97 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
differences in a dataset. Edge enhancement is a type of neighbourhood analysis that looks at the
moving window's range of values. A large range value indicates the presence of an edge within
the window's boundaries, whereas a small range indicates the absence of an edge.
` In raster data sets, neighbourhood operations are commonly used for data
simplification. The effect of the outer data values is reduced through the average procedure,
resulting in a smooth output raster with damped highs and lows. Alternatively, neighbourhood
analysis can be used to exaggerate differences in a dataset. Edge enhancement is a type of
quarter’s analysis that examines the moving window's value spectrum. A large range indicates
that there is an edge within the window's scope, whereas a small range indicates that there is no
edge. The estimated value is assigned to the output raster block cells by a neighbourhood
operation, which is a rectangle (block) operation.
Neighborhood operations can be important for studies requiring cells to select their
neighbourhood features. For instance, the system of irrigation for gravity sprinklers requires
information about a lifting drop within a circular cell neighborhood. Assume that a system
requires 130 feet (40 metres) in height to make it financially viable within a distance of 0.5 mile
(845 metres). By using a circle with a radius of 0.5 mile as a neighborhood and (high) as the
data, an elevation raster operation in the neighborhood can answer the questions. A raster query
can show which cells fulfill the criterion.
Zonal Operations:-
An zonal operation is used in cell groups of similar value or features which are not surprisingly
referred to as zones (for example, land parcels, political/municipal units, water bodies, types of
soil/vegetation). The raster versions of the polygons could conceptualize such zones. Zonal raster
is often reclassified into only a few categories by an input raster. A single raster or two
overlaying rasters can be used for zonal operations. Given one input raster, the geometry of each
raster area, like area, perimeter, space and centroid is measured by zonal operations given two
zonal operation rasters, an input raster and a zonal raster, the output raster of the zonal operation
summarizes the input raster cell values of each zone in the zonal raster.
UNIT 6 - RASTER DATA ANALYSIS-LOCAL, FOCAL, ZONAL AND GLOBAL Page 98 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
An zonal operation is used in cell groups of similar value or features which are not surprisingly
referred to as zones (for example, land parcels, political/municipal units, water bodies, types of
soil/vegetation). The raster versions of the polygons could conceptualize such zones. Input raster
is often reclassified in only a few categories by zonal raster’s. A zonal operation is basically
working with group of cells of same values or same feature present in any image. This operation
is very useful when values in a raster are homogenous. A single raster or two overlays zonal
operations can be used Raster’s. Given one input raster, the geometry of each raster area, like
area, perimeter, space and centroid is measured by zonal operations. Given two zonal raster’s, an
input raster and a zonal raster, an output raster is produced by the zonal operation which
summarizes the cell values in each area in the input raster of a zonal raster.
A zonal operation works with cell groups with the same value or similar characteristics.
These groups are referred to as zones. Zones may be adjacent or non-adjacent. An adjoining area
includes space-connected cells, whereas a non-contiguous area comprises separate cell areas. An
example of an adjoining zone in a watershed raster is the spatially connection of cells belonging
to the same watershed. An example of a noncontiguous zone is a land use raster in which a
particular type of land may occur in various areas of the raster. A single raster or two rasters may
be used for zonal operations. Given the single input raster, each zone’s geometry such as area,
perimeter, thickness, and centroid in the raster is measured by zonal operations. The area is the
sum of cells falling within the zone times the size of the cell.
UNIT 6 - RASTER DATA ANALYSIS-LOCAL, FOCAL, ZONAL AND GLOBAL Page 99 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
The perimeter of an adjacent zone is its length, while the perimeter of an adjacent area is
the sum of each part's length. The thickness calculates the radius (in cells) of each zone of the
largest circle. And the middle of a zone, at the intersection of the major axis and the small
elliptical axis which is closest to the zone, is the geometric centre. Given two zonal rasters, an
input raster and a zonal raster, an output raster is produced by the zonal operation which
summarizes the cell values in each area in the input raster of a zonal raster. Summary statistics
and measures cover areas, minima, sum, sum, range, median, minority, variety and standard
deviations.
Zonal geometry measures such as the area, perimeter, thickness and centroid are
especially useful for landscape ecology studies (Forman and Godron 1986; McGarigal and
Marks 1994). Many other geometric measures from the area and perimeter can be derived. The
perimeter/area ratio, for example, is a simple measure of the complexity of form in the landscape
ecology used. In fields like landscape ecology, where geometry and spatial arrangement of
habitat patches can have a significant effect on the type and number of species that may reside
there, zonal operations and analyses can be valuable. In addition, zonal analyses can effectively
quantify narrow habitat corridors which are important to regional flightless and migratory animal
movement in densely urbanized areas.
Global Operation:-
Global operations are similar to zonal operations in which the whole raster data set is one zone.
The basic statistical values of the raster as a whole are defined as typical world operations. For
example, for the whole extent of the input raster the minimum, the maximum, the average, the
range, and so on can be quickly calculated
UNIT 6 - RASTER DATA ANALYSIS-LOCAL, FOCAL, ZONAL AND GLOBAL Page 100 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
.
Figure 6.6 Global Operations on a Raster Dataset
6.4 SUMMARY
Raster analysis is similar to vector analysis in many ways, but there are some significant
differences. In vector analysis, features in one layer are explicitly located in relation to other
layer existing features. Containment and superposition are inherent relations between layers as a
corollary to this. For instance, on one layer, a point is located in another layer on one side of an
arc or on or off a polygon. Raster analysis reclassifies or recodes a data set often as one of the
first steps. The reclassification is essentially a layer process by which all data pixels based on
their original values have a new class or range value. This simplification makes it possible to
achieve fewer unique values and cheaper storage. A zonal operation is used in cell groups of
similar value or features which are not surprisingly referred to as zones. Given one input raster,
the geometry of each raster area, like area, perimeter, space and centroid is measured by zonal
operations. Zonal raster is often reclassified into only a few categories by an input rasters.
UNIT 6 - RASTER DATA ANALYSIS-LOCAL, FOCAL, ZONAL AND GLOBAL Page 101 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
6.5 GLOSSARY
Raster Data-: It is basically represent spatial data in the form of grid, cells, and pixels
intabular format.
Raster Data Operation-: Cell based analysis processed to extract unavailable raster
information from satellite images.
Block Operation-: It is a form of neighborhood operation which rectangular block to
assign the eventual value to all block cells in the output raster.
Local Operation-: It is basically cell by cell raster data analysis.
Zonal Operation-: This form of raster analysis involves group of cells contains same
values.
Reclassification-: A local operation that reclassifies cell values of an input raster to
create a new raster.
Ans-: The process of a raster clip leads to a raster that is identical to the raster but shares the
scope of the polygon clip layer.
Ans-: Edge enhancement is a type of neighborhood analysis that examines the range of values in
the moving window.
6.7 REFERENCES
1. Begueria, S., and S. M. Vicente- Serrano. 2006. Mapping the Hazard of Extreme Rainfall by
Peaks over Threshold Extreme Value Analysis and Spatial Regression Techniques.
Journal of Applied Meteorology and Climatology 45:108–24.
2. Brennan, J., and E. Martin. 2012. Spatial proximity is more than just a distance measure.
UNIT 6 - RASTER DATA ANALYSIS-LOCAL, FOCAL, ZONAL AND GLOBAL Page 102 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
UNIT 6 - RASTER DATA ANALYSIS-LOCAL, FOCAL, ZONAL AND GLOBAL Page 103 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
7.1 OBJECTIVES
7.2 INTRODUCTION
7.3 RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS
AND DECISION RULE BASED
7.4 SUMMARY
7.5 GLOSSARY
7.6 ANSWER TO CHECK YOUR PROGRESS
7.7 REFERENCES
7.8 TERMINAL QUESTIONS
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 104 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
7.1 OBJECTIVES
After reading this unit you should be able to understand:
I. Various functions of raster data analysis.
II. Functions of arithmetic operation and its implications with respect to GIS data.
III. Various arithmetic operators in GIS
IV. Significance of decision rule based analysis for raster data.
7.2 INTRODUCTION
Operational procedures and quantitative methods for the analysis of spatial data in raster format
is always important to understand and discuss. In raster analysis, geographic units are regularly
spaced, and the location of each unit is referenced by row and column positions. Because
geographic units are of equal size and identical shape, area adjustment of geographic units is
unnecessary and spatial properties of geographic entities are relatively easy to trace. All cells in
a grid have a positive position reference, following the left-to-right and top-to-bottom data scan.
Every cell in a grid is an individual unit and must be assigned a value. Depending on the nature
of the grid, the value assigned to a cell can be an integer or a floating point. When data values
are not available for particular cells, they are described as NODATA cells. NODATA cells
differ from cells containing zero in the sense that zero value is considered to be data. The
regularity in the arrangement of geographic units allows for the underlying spatial relationships
to be efficiently formulated. For instance, the distance between orthogonal neighbors (neighbors
on the same row or column) is always a constant whereas the distance between two diagonal
units can also be computed as a function of that constant. Therefore, the distance between any
pair of units can be computed from differences in row and column positions. Furthermore,
directional information is readily available for any pair of origin and destination cells as long as
their positions in the grid are known.
The mathematics that underpins all geographical analysis involves the application of rules, most
of which are straightforward. Mathematics makes use of symbols. We will add and explain
other symbols later. In this chapter we will consider arithmetic, which is concerned with
numerical calculations such as adding, subtracting, multiplying, and dividing.
The whole of arithmetic is based essentially on seven axioms, as shown in Box 2.1. Outside
arithmetic, these axioms may not apply, for instance, when two rain drops running
downawindowpanecometogethertomakeoneraindropsothatinsymbolicform:1+1=1.
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 105 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
Furthermore, computer programmers often write “N = N + 1,” meaning “Take the number in
the box labeled N, add one to that number and put it back in the box labeled N”;although
partially an arithmetic operation, the use of the “=” sign has a different meaning from that
which we are considering here.
Arithmetic map operations are very common procedures used in GIS to combine raster maps
resulting in a new and improved raster map. It is essential that this new map be accompanied by
an assessment of uncertainty. This paper shows how we can calculate the uncertainty of the
resulting map after performing some arithmetic operation. Actually, the propagation of
uncertainty depends on a reliable measurement of the local accuracy and local covariance, as
well. In this sense, the use of the interpolation variance is proposed because it takes into
account both data configuration and data values. Taylor series expansion is used to derive the
mean and variance of the function defined by an arithmetic operation. We show exact results
for means and variances for arithmetic operations involving addition, subtraction and
multiplication and that it is possible to get approximate mean and variance for the quotient of
raster maps.
Advantages of using the Raster Format in Spatial Analysis-:
Efficient processing: Because geographic units are regularly spaced with identical
spatial properties, multiple layer operation scan be processed very efficiently.
Numerous existing sources: Grids are the common format for numerous sources of
spatial information including satellite imagery, scanned aerial photos, and digital elevation
models, among others. These data sources have been adopted in many GIS projects and have
become the most common sources of major geographic databases.
Different feature types organized in the same layer: For instance, the same grid
may consist of point features, line features, and area features, as long as different features are
assigned different values.
Data redundancy: When data elements are organized in a regularly spaced system,
there is a data point at the location of every grid cell, regardless of whether the data element is
needed or not. Although, several compression techniques are available, the advantages of
gridded data are lost whenever the gridded data format is altered through compression. In
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 106 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
most cases, the compressed data cannot be directly processed for analysis. Instead, the
compressed raster data must first be decompressed in order to take advantage of spatial
regularity.
Resolution confusion: Gridded data give an unnatural look and unrealistic present
at ionone less the resolution is sufficiently high. Conversely, spatial resolution dictates spatial
properties. For instance, some spatial statistics derived from a distribution may be different, if
spatial resolution varies, which is the result of the well-known scale problem.
Cell value assignment difficulties: Different methods of cell value assignment may
result in quite different spatial patterns
Map algebra is an informal and commonly used scheme for manipulating continuously sampled
(i.e. raster) variables defined over a common area. It is also a term used to describe calculations
within and between GIS data layers, according to some mathematical expression, to produce a
new layer; it was first described and developed by Tomlin. Map algebra can also be used to
manipulate vector map layers, sometimes resulting in the production of a raster output.
Although no new capabilities are brought to GIS, map algebra provides an elegant way to
describe operations on GIS datasets. It can be thought of simply as algebra applied to spatial
data which, in the case of raster data, are facilitated by the fact that a raster is a geo-referenced
numerical array.
Map Algebra models the surface of the earth as a multitude of independent, coincident layers or
themes. The layers interact according to mathematical models and are typically based on real
world observations. Planners develop layers on development and population (Steinitz et al.
1976). Social scientists develop layers on demographics, ethnicity, and economic factors (Mc
Harg 1969). Applying Map Algebra model to input layers produces a new layer, which maybe a
physical map sheet, a vision perceived through a stack of my lars on a light table, or an
electronic dataset displayed on a computer screen. Regardless of mechanism, the result allows
its users to explain complex phenomena, predict trends, or make adjustments to the model.
However it is the mechanism which bounds usability of Map Algebra. How easy it is for
scientists to perform simple tasks? Can complex models be developed and tested? Historically
layers were plotted on individual transparent maps which, when super imposed and registered
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 107 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
Provide visually integrated view of the data. The manual process of map overlay is slow and
tedious.
The ability to express problems in a formal mannerism necessary part of solving problems with
computers. A Geographic Information System is the best computer tool for solving
geographical problems (GIS). Basic functionality for visualizing, managing, and manipulating
spatially referenced data is provided by such systems. A computer language is used to express
problem solving, either one provided by the system or one that interoperates with it. Not only
for advanced spatial analysis problems, but also for many ad hoc queries, GIS users are faced
with the task of writing programmes as a concrete formulation of their particular problem.
Given that the majority of GIS users are land-related professionals rather than programmers, it
is critical to make the language interface to a GIS as simple and intuitive as possible. Writing
programmes can be difficult for inexperienced users. Even if the problem being solved has as
systematic or scientific approach, expressing it in a programme is a difficult intellectual task.
The following are two primary reasons for this:
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 108 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
The data associated with any grid cell can be of any type whatsoever. It is conceptually useful
to divide data types into several classes, however. These include:
Categorical data: These are non-numerical data. Grids that classify land use or land
cover exemplify this category. Other examples are proximity grids (values identify then
earnest object) and feature grids (only two values are possible: one value for cells where
features occur, another value—typically zero or No Data—where features do not occur).
Integral data: These data may be relative ranks or preferences or they may be
counts of occurrences or observations, for example. Thus, what they measure is in here only
integral.
Vector data: These are ordered tuples of real values that represent fields of
directions. For example, hydraulic gradients (for two-dimensional groundwater models), wind
velocities (again for two-dimensional models), and ocean currents are two-dimensional vector
fields. Vector data may have more than two dimensions, even though they are defined over a
strictly two-dimensional domain. For example, models using
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 109 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
astronomical data, such as climate models, may make use of information about the three-
dimensional location (on the earth's surface) of each grid point.
(Scientific visualization systems usually have built-in support for vector data, whereas most
GIS es require the modeler to represent vector data as an ordered collection of floating-point
grids).
An essential part of map algebra or spatial analysis is the coding of data in such a way as to
eliminate certain areas from further contribution to the analysis. For instance, if the existence of
low-grade land is a prerequisite for a site selection procedure, we then need to produce a layer
in which areas of low-grade land are coded distinctively so that all other areas can be removed.
One possibility is to set the areas of low-grade land to a value of 1 and the remaining areas to 0.
Any processes involving multiplication, division or geometric mean that encounters the zero
value will then also return a zero value and that location (pixel) will be removed from the
analysis. The opposite is true if processing involves addition, subtraction or arithmetic mean
calculations, since the zero value will survive through to the end of the process. The second
possibility is to use a null or No Data value instead of a zero. The null is a special value which
indicates that there is no digital numerical value. In general, unlike zero, any expression will
produce a null value if any of the corresponding input pixels have null values. Many functions
and expressions simply ignore null values, however, and in some circumstances this may be
useful, but it also means that a special kind of function must be used if we need to test for the
presence of (or to assign) null values in a dataset. For instance, within ESRI‟s Arc GIS, the
function ISNULL is used to test for the existence of null values and will produce a value of 1 if
null, or 0 if not. Using ER Mapper‟s formula editor, null values can easily be assigned, set to
other values, made visible or hidden. Situations where the presence of nulls is disadvantageous
include instances where there are unknown gaps in the dataset, perhaps produced by
measurement error or failure. Within map algebra, however, the null value can be used to great
advantage since it enables the selective removal or retention of values and locations during
analysis.
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 110 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
Table7.1Operationscategorizedaccordingtotheirspatialornon-spatialnature.
(Source:AfterBonham-Carter,2002)
These two processes are quite similar and they provide a means of controlling what happens
during some function. They allow us to evaluate some criterion and to specify what happens
next if the criterion is satisfied or not. Logical processing describes the tracking of true and
false values through a procedure. Normally, in map algebra, a non-zero value is always
considered to be a logical true, and zero, a logical false. Some operators and functions may
return either logical true values (1) or logical false values (0), for example relational and
Boolean operators. There turn of a true or false value acts as a switch for one or other
consequence within the procedure. Conditional processing allows that a particular action can be
specified, according to the satisfaction of various conditions; if the conditions are evaluated as
true then one action is taken, and an alternative action is taken when the conditions are
evaluated as false. The conventional if–then–else statement is a simple example of a conditional
statement:
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 111 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
Conditional processing is especially useful for creating analysis „masks‟. In Fig. 24.1, each
input pixel value is tested for the condition of having a slope equal to or less than 15º. If the
value tests true (slope angle is 15º or less), a value of 1 is assigned to the output pixel. If it tests
false(exceeds 15º), a null value is assigned to the output pixel. The output could then be used as
a mask to exclude areas of steeper slopes and allow through all areas of gentle slopes, such as
might be required in fulfilling the prescriptive criteria for a sites election exercise.
A relational operator enables the construction of logical functions and tests by comparing two
numbers and returning a true value (1) if the values are equal or false (0) if not. For example,
this operator can be used to find locations within a single input layer with DN values
representing a particular class of interest. These are particularly useful with discrete or
categorical data.
A Boolean operator, for example AND, OR or NOT , also enables sequential logical functions
and tests to be performed. Like relational operators, Boolean operators also return true (1) and
false (0) values. They are performed on two or more input layers to select or remove values and
locations from the analysis. For example ,to satisfy criteria with in a slopest ability model,
Boolean operators could be used to identify all locations where values in one input representing
slope are greater than 40º AND where values in an elevation model layer are greater than
2000m(asin Fig. 7.2a).
Logical operators involve the logical comparison of the two inputs and assign a value according
to the type of operator. For instance, for two inputs (A and B)ADIFFB assigns the value from A
to the output pixel if the values are different or a zero if they are the same. An expression
AOVER B assigns the value from A if a non-zero value exists; if not then the value from B is
assigned to the output pixel. A combinatorial operator finds all the unique combinations of
values among the attributes of multiple input rasters and assigns a unique value to each
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 112 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
combination in the output layer. The output attribute will contain fields and attributes from all
the input layers.
All these operators can be used, with care, alone or sequentially, to remove, test, process, retain
or remove values (and locations) selectively from datasets alone or from within a spatial
analysis procedure.
Fig.7.1.Logicaltestofslopeangledata,fortheconditionofbeingnogreaterinvaluethan15º:
(a)slope angle raster and(b) slope mask (pale grey blank cells indicate null values).
(Source: Liu, and Mason,2009)
/, >,GTGreaterthan
!, XOR Logical
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 113 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
MOD,Modulus
DIFF,Logicaldifference
Fig. 7.2. Use of Boolean rules and set theory within map algebra; here the circles represent the
feature classes A, B and C, illustrating how simple Boolean rules can be applied to geographic
data sets, and especially rasters to extract or retain values, to satisfy aeries of criteria:(a)AAND
B (intersection or minimum ); (b) A NOT B; (c) (A AND C) OR B; (d) A OR B (union or
maximum);(e) AXORB; and (f)AAND (BORC).(Source: Liu, and Mason, 2009)
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 114 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
Local Operations:
A local operation involves the production of an output value as a function of the value(s) at the
corresponding locations in the input layer(s).These operations can be considered point
operations when performed on raster data, i.e. they operate on a pixel and its matching pixel
position in other layers, as opposed to groups of neighbouring pixels. They can be grouped into
those which derive statistics from multiple input layers (e.g. mean, median, minority), those
which combine multiple input layers, those which identify values that satisfy specified criteria
or the number of occurrences that satisfy specified criteria (e.g. greater than or less than), or
those which identify the position in an input list that satisfies a specified criterion. All types of
operator
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 115 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
previously mentioned can be used in this context. Commonly they are subdivided according to
the number of input layers involved at the start of the process. They include primary operations
where nothing exists at the start, to n-ary operations where n layers may be involved; they are
summarized in Table 24.3 and illustrated in Fig. 24.3.
Fig. 7.3. Classifying map algebra operations in terms of the number of input layers and some
examples.(Source: Liuand Mason, 2009)
Primary operations:
This description refers primarily to operations used to generate a layer, conceptually from
nothing, for example the creation of a raster of constant value, or containing randomly
generated numbers, such as could be used to test for error propagation through some analysis.
An output pixel size, extent, data type and output DN value (either constant or random between
set limits) must be specified for the creation of such a new layer.
Unary operations:
These operations act on one layer to produce a new output layer and they include tasks such as
rescaling, negation, comparison with other numbers, application of functions and
reclassification. Rescaling is especially useful in preparation for multi-criteria analysis where
all the input layers should have consistent units and value range: for instance, in converting
from byte data, with 0 to 255 value range ,to a percentage scale (0-100) or a range of between
0and 1,and vice versa.
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 116 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
Negation is used in a similar context, in modifying the value range of a dataset from being
entirely positive to entirely negative and vice versa. Comparisons create feature grids: the
places where the comparison is true can be considered features on the earth's surface. They map
the regions where a logical condition (the comparison) holds. These could be regions where,
say, ozone concentrations exceed a threshold, ocean depths are below a certain target, or land
use equals a given code. Mathematical functions are useful for changing the visualization of a
grid. An equal interval classification using the square roots of the values will differ from an
equal interval classification of the values themselves, for instance. Functions are also important
as intermediate steps in many models. Reclassification is especially significant in data
preparation for spatial analysis, and so deserve rather more in-depth description, but all these
activities can be and are commonly carried out in image processing systems.
To illustrate different applications succinctly, suppose that three grids appear in the current
view: "Integer" is an integer grid, "Float" is a floating-point grid, and "Indicator" is an integer
grid containing only 0, 1, and No Data values. A value of 0 can be interpreted as a logical
"false" and a value of 1 as a logical "true". In practice, of course, we will replace these names
by the names of our themes.
Rescale a grid: that is, Multiply all its values by a constant value.
all values
Compareagridtoaconstantvalue.Theresultofacomparisonistrue,0wherethecomparisonis false,
and No Data where the original value is No Data
[Float] < 1 Returns 1 where values are less than 1, otherwise returns
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 117 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
radians).[Float].IntRoundsallvaluesandconvertstheresulttoaninteger grid.
[Float].Sqrt Computes the square root of each value. Negative values return No Data
(because the square root is not defined for negative values).
[Float].IsNullReturns1atallcellswithNoDatavalues,otherwise returns0.
Binary operations:
Binary numeric operations act on ordered pairs of numbers. Likewise, binary grid operations
acton the pairs of numbers obtained in each set of matching cells. The resulting grid is defined
only where the two input grids overlap.
Suppose there are several floating-point grids represented by themes named "Float",
"Float1","Float2",and soon; with a similar supposition for integer and logical grids.
Mathematical operators
[Float] +[Integer] Converts the values in[Integer]to floats, then performs the additions.
Logical operators
[Float1]
<[Float2]Returns1ineachcellwhere[Float1]'svalueislessthan[Float2]'svalue;otherwise,return
s 0.
This description refers to operations in which there are two input layers, leading to the
production of a single output layer. Overlay refers to the combination of more than one layer of
data, to create one new layer. The example shown in Fig. 7.4 illustrates how a layer
representing average rainfall, and another representing soil type, can be combined to produce a
simple, qualitative map showing optimum growing conditions for a particular crop. Such
operations are equivalent to the application of formulae to multiband images, to generate ratios,
differences and other inter-band indices and as mentioned in relation to point operations on
multi-spectral
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 118 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
images, it is important to consider the value ranges of the input bands or layers, when
combining their values arithmetically in some way. Just as image differencing requires some
form of stretch applied to each input layer, to ensure that the real meaning of the differencing
process is revealed in the output, here we should do the same. Either the inputs must be scaled
to the same value range, or if the inputs represent values on an absolute measurement scale then
those scales should have the same units.
The example shown in Fig. 7.4 represents two inputs with relative values on arbitrary nominal
or ordinal (Fig. 7.4a) and interval (Fig. 7.4b) scales. The resultant values are also given on an
interval scale and this is acceptable providing the range of potential output values is understood,
having first understood the value ranges of the inputs, since they may mean nothing outside the
scope of this simple exercise.
Another example could be the combination of two rasters as part of a cost-weighted analysis
and possibly as part of a wider least cost pathway exercise. The two input rasters may represent
measures of cost, as produced through reclassification of, for instance, slope angle and land
value, cost here being a measure of friction or the real cost of moving or operating across the
area in question. These two cost rasters are then aggregated or summed to produce an output
representing total cost for a particular area(Fig.7.5).
Fig. 7.4. An example of a simple overlay operation involving two input rasters: (a) an integer
raster representing soil classes (class 2, representing sandy loam, is considered optimum); (b) a
floating-point raster representing average rainfall, in metres per year (0.2is considered
optimum);and (c) the output raster derived by addition of a and b to produce a result
representing conditions for a crop; a value of2.2(2 þ0.2),on this rather arbitrary scale,
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 119 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
represents optimum growing conditions and it can be seen that there are five pixel positions
which satisfy this condition.(Source:Liu and Mason, 2009)
Fig. 7.5 (a) Slope gradient in degrees; (b) ranked (reclassified) slope gradient constituting the
first cost or friction input; (c) ranked land value (produced from a separate input land-use
raster)representing the second cost or friction input; and (d) total cost raster produced by
aggregation of the input friction rasters (f1 and f2). This total cost raster could then be used
within a cost-weighted distance analysis exercise.(Source: Liu and Mason, 2009)
Local statistics:
When we have many related grids defined in the same region, we often want to assess change:
at each cell, how varied are the grid results? How large do they get? How small? What is the
average? These questions make sense for numerical data.
For grids with ordinal data--that is, values that can be ordered, but which may not have any
absolute meaning--you can still ask about order statistics. These are the relative rankings of
values within the ordered collections of values observed at each cell. For grids with categorical
data, you might want to know at each cell whether one category predominates throughout the
collection of grids and how many different categories actually appear at the cell's location. (Liu,
and Mason,2009)
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 120 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
The Spatial Analyst syntax for some of these requests is strange, because it wants to force
expressions into the form "a Grid. Request (list of other grids)". This is inherently asymmetric
because it singles out one grid in the collection to play the role of the object ("a Grid") to which
the calculation is applied and leaves the other grids in the role of a list of arguments ("list of
other grids").Despite this syntax, for some requests, such as the local statistics, there is no
asymmetry in the calculation itself: all the grids are equivalent. For some other requests, there is
an asymmetry in the calculation: one grid plays a special role.
Spatial Analyst constructs lists with curly braces{} and separates the elements by commas.
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 121 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
The Majority statistic evidently is not very useful when many ties occur: that is, when there are
many cells where two or more values occur equally often.
[Float]. Grids Greater than ({[Float1], [Float2], [Float3], [Float4]}) For each base cell in
[Float], computes the number of times corresponding cells from[Float1],...,[Float4] exceed(and
do not equal) the base cell‟s value. There is a corresponding Grids Less Than operator.
The Con request is especially useful. The result of Con, by default, is the second grid ([Float2]
or [Mosaic] in the examples).However, at cells where [Indicator] is true, the values of the first
grid ([Float1] or [Average]) are "painted" over the default values. Thus the Con request is a
natural vehicle for selectively editing grids.
Logic Programming
Functional Programming
Rule-Based Programming
Object-Oriented Programming
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 122 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
Logic Programming:
To solve problems, logic programming employs rules of exact logic, or to be more precise,
rules of first order predicate logic. Problems are expressed as statements to represent beliefs
about the world that we hold. A set of logical terms and logical connectors make up the
statements. A truth table contains the rules for evaluating statements.
In first order predicate logic all objects belong to a single universe. This leads to a characteristic
of “flatness” in pure logical languages. All objects are universal and so are the axioms by which
they are related. There is no procedural abstraction in first order predicate logic.
In practice, logic programming languages use some procedural mechanisms to interpret logical
statements. The most popular of these programming languages is PROLOG (Bratko, 1990). A
logical statement is expressed as a Horn clause consisting of a conclusion head “C” and several
conditional terms “B” in the body. They have the form:
“B1 and B2 and B3 … and BN implies C”
Different combinations of a head and body create three types of clauses: queries, rules and
facts. The fundamental form of programming control is a query that is answered by searching
for matching facts, or rules whose heads match the query and whose body may be proven. This
ability to search through a set of facts and to further deduce relations from rules gives PROLOG
its deductive capability. Terms AND connector OR Connector Implication p q p q p q p q true
true true true true true false false true false false true false true false false false false false true
Figure 1: Truth Table. ∨ → The power of PROLOG-like languages to express both spatial
queries and spatial models has been well demonstrated. LOBSTER is an early example of a
prototype system that used PROLOG as the language interface to query a spatial DBMS
(Egenhofer, 1990). The prototype provided a high level language to manipulate symbolic
representations of spatial features. This was possible because the DBMS was able to handle
complex record structures, and user defined functions could be programmed as built ins to the
PROLOG interpreter. Spatial data types for points, lines, areas, and surfaces were defined in the
DBMS and manipulated at a semantic level by the rules and facts expressed in Horn clauses.
All low level access to spatial data and spatial manipulation is handled by the built in functions.
This ability to include declarative expressions of spatial queries within a logic language is
viewed as a key requirement by other researchers (Abdelmoty et al., 1993)
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 123 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
Functional Programming:
Functional programming is based upon mathematical concepts of mapping functions. A
function maps object values from one domain to another. This is expressed formally f:X®Y ,
the function of maps object values from the domain X to the domain Y. The object returned by
a function depends only on its arguments. In addition functions do not induce any side effects
so all state information evolves in an explicit and controlled way. This trait is known as
referential transparency. Any transformations on objects are handled by explicitly returning
new objects. This has a bearing on the data and procedural abstractions used by functional
languages. Both rely upon mapping functions to express structural and behavioural
relationships.
A GIS database perceived and manipulated by a functional language is viewed as a collection of
objects together with a collection of functions. This has not proven to be a very attractive
quality for feature-based GIS applications as there is not sufficient selective distinction between
the different operations permitted on various types of spatial features (ie. point, linear, and area
features). However, GIS applications that use a simple image-based structure are more
predisposed to this type of manipulation. Map algebra is an example of a function-oriented
language used in GIS for manipulating and analysing surface data (Tomlin, 1991). Map
algebrauses a set of conventions to provide finer interpretation of the geographic locations (ie.
local, neighbourhood, zonal) but these are still manipulated by functional transformations. Map
algebra has the advantage of a straight forward notation and is very useful for developing
models of spatial interpretations.
Rule-Based Programming:
several ad hoc system developments for decision support (Lowes and Bellamy,1994)(Davis and
McDonald, 1993).
Object-Oriented Programming:
Object-oriented programming (OOP) is based on concepts for objects, classes, and the
inheritance mechanism between classes. An object is an instance of a class to hold all related
state information. Since objects can reference other objects, it is possible to build compositions
of more complex objects. The classes in a program define categories of objects which share the
same state information and procedural interfaces. Inheritance provides a relationship between
classes based upon a taxonomy hierarchy. These organizing principles are formally based upon
classification theory.
OO Phas become very popular as it provide same nt all everage for designerstoencapsulate the
structure and behaviour of design problems as objects. Data abstraction is supported through
associative references to express structural relationships between objects, and class inheritance.
Procedural abstractions are provided in two ways. The permissible actions on an object, and a
configuration of objects, are integrated as part of the object class description. But the final
implementation code still use slow level procedural mechanism stoper form operations in
sequence, by conditional branching, or within an iteration. A disadvantage is that these control
constructs involve the introduction of state variables to hold computational values between
operations and procedures.
Writing a program in an OOP language does not necessarily make the program object-oriented.
But in general programs incorporate object-oriented design principles Proceedings of Geo
Computation „97 & SIRC „97 19 (Rumbaugh et al., 1991). OOP is especially suited to
problems where these is a large number of entities to be modelled, each with complex structural
relationships and operational semantics. In recent years OOP has made a significant impact on
graphical user interfaces (GUI‟s) and the application programming environment. Desktop GIS‟
soften use object-oriented concepts in the user interface and application programming
environment. But in most cases spatial data handling is still based upon a geo-relational model,
and so data abstractions such as association and inheritance are not applied to the spatial data.
Morehouse(1990)discusses the implications and difficulty of having true object-oriented
modelling semantics for spatial databases. The Open GIS Specification (OGC, 1997)
incorporates object-oriented geo-processing concepts. The full development of models to allow
user defined schemas will require information representation specified by data dictionaries,
schematic catalogues, geometry rules, etc. This technology specification will have an important
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 125 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
impact on the adoption of object-oriented data abstractions within GIS programming languages.
Decision Tables:
Rule-sets are difficult to interpret for any reasonably sized knowledge base. An alternative
technique for representing decision rules is as decision trees (Giarratano and Riley, 1994) or
decision tables (Reilly et al., 1987). The different forms for representing rules can be shown by
example. The example describes rules for choosing the best wine to have with a meal
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 126 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
7.4 SUMMARY
In raster analysis, geographic units are regularly spaced, and the location of each unit is
referenced by row and column positions. All cells in a grid have a positive position reference,
following the left-to-right and top- to-bottom data scan. The regularity in the arrangement of
geographic units allows for the underlying spatial relationships to be efficiently formulated. An
arithmetic map operations are very common procedures used in GIS to combine raster maps. It
is essential that this new map be accompanied by an assessment of uncertainty. We show exact
results for means and variances for arithmetic operations involving addition, subtraction and
multiplication. The use of the interpolation variance is proposed because it takes into account
both data configuration and data values. Novice users often find writing programs to be
adaunting task. Users of GIS are faced with the task of writing programs as a concrete
formulation of their particular problem. GIS is a necessary part of solving problems with
computers is to express them in a formal way. The appropriate computer tool to solve
geographical problems is a Geographic Information System(GIS). Such systems provide the
basic functionality for visualising, managing and manipulating spatially referenced data. It's
important to make the language interface to a GIS as easy to use and intuitive as possible. It can
be thought of simply as algebra applied to spatial data which, in the case of raster data, are
facilitated by the fact that a raster is a georeferenced numerical array. This problem occurs
when an application presents information in one way but the programming environment to
access and manipulate that information is different. A popular way to present information in
GIS is as a map organised into the matic layers. In a programming environment the user is
presented with tables containing records.
7.5 GLOSSARY
Map algebra-: It is the most common scheme for manipulating continuously sampled (i.e.
raster) variables defined over a common area.
Primary operations-: Operations used to generate a layer, conceptually from nothing.
Binary operations-: Binary grid operations action the pairs of numbers obtained in each set
of matching cells
Local statistics-: When many related grids are accumulated over same region.
Integral Data-: These data may be relative ranks or preferences of the main databases.
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 127 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
7.7 REFERENCES
1. Abdelmoty A.I., Williams M.H. and Paton N.W. (1993) Deduction and Deductive Databases
for Geographic Data Handling. 3rd International Symposium on Large Spatial Databases,
SSD’93, Singapore, pp.443-464
2. Arentze T.A., Borgers A. and Timmermans H. (1995) The Integration of Expert Knowledge in
Decision Support Systems for Facility Location Planning. Computers, Environment and Urban
Systems 19(4), pp.227-247
3. Bratko Ivan (1990) Prolog Programming for Artificial Intelligence. Addison Wesley.
Davis J.R. and McDonald G. (1993) Applying a Rule-Based Decision Support System to Local
Government Planning. In: Expert Systems in Environmental Planning, Editors J.R. Wright, et. al.
Springer-Verlag, pp.23-45
4. ESRI (1994) Avenue – Customization and Application Development for ArcView.
Environmental System Research Institute Inc, Redlands, CA.
5. Egenhofer M. and Frank A. (1990) LOBSTER: Combining AI and Database Techniques in
GIS. Photogrametrmetric Engineering & Remote Sensing 56(1), pp.919-926
6. Frank A.U. and Kuhn W. (1995) Specifying Open GIS with Functional Languages. Advances
in Spatial Information Systems, Proceedings SSD’95, Portland, pp.184-195
7. Giarratano J. and Riley G. (1994) Expert Systems - Principles and Programming. PWS Publ.
Co., Boston.
8. Jian Guo Liu, Philippa J. Mason, “Essential Image Processing and GIS for Remote Sensing,”
Imperial College London, UK, 261-280 (2009).
9. Lowes, D. and Bellamy J.A. (1994) Object Orientation in Spatial Decision Support System for
Grazing Land Management. AI Applications 8(3), pp.55-66
10. Morehouse S.D. (1985) ARC/INFO - A Geo-Relational Model for Spatial Information.
Proceedings Auto-Carto 7, Washington, pp.388-397
11. Morehouse S.D. (1990) The Role Of Semantics In Geographic Data Modelling. Proceedings
4th International Symposium on Spatial Data Handling, Zurich, pp.689-698
12. Newell A. and Simon H. (1972) Human Problem Solving, Prentice-Hall.
13. Paulson L.C (1996) ML For Working Programmers. Cambridge University Press.
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 128 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
UNIT 7 - RASTER DATA ANALYSIS- ARITHMETIC OPERATIONS AND DECISION RULE BASED Page 129 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
8.1 OBJECTIVES
8.2 INTRODUCTION
8.3 RASTER DATA FORMATS
8.4 SUMMARY
8.5 GLOSSARY
8.6 ANSWER TO CHECK YOUR PROGRESS
8.7 REFERENCES
8.8 TERMINAL QUESTIONS
8.1 OBJECTIVES
After going through this unit the learner will able to:
1. Learn about elements of raster data model.
2. Learn about Raster Data Structure and Data compression.
3. Get to know about the format of raster data.
4. Understand what the raster data in a GIS is and how it can be used.
8.2 INTRODUCTION
We represent Earth surface features and man-made features in the GIS environment, and this
form of data is known as spatial data in remote sensing processes. Raster and Vector are the two
data formats used to generate spatial data in geographic information systems and remote sensing.
Point, line, and area geometric objects are used to represent spatial features in vector data.
Vector data is ideal for isolated features with well-defined positions and forms, but it is
ineffective for spatial patterns that shift over time, such as soil erosion, precipitation, and
elevation. For representing continuous phenomena, raster data is the best choice. To cover space,
raster data use a regular grid. The value of each grid cell is a spatial attribute at a certain point of
the cell. The cell value variation reflects the phenomena of spatial variation.
GIS software used to be either raster-based or vector-based, but now, in the current situation, the
majority of GIS software can handle data in both formats. In GIS applications, advancement in
computing science has effectively eliminated the distinction between raster and vector data.
Many opportunities exist in the combined raster and vector data working area, where one may
incorporate mathematical and simulation approaches suitable for each of the formats in the
review. We will study about Raster data format in this chapter.
Raster data:
Raster data are increasingly used in a number of GIS applications and has become the primary
source of spatial data in geographic databases. Raster data is defined as a grid/cell format that
represents a function of the earth's surface, both natural and man-made. This ensures that all
raster data is represented by image, cell, and grid formats. All satellite images are recorded in
raster format, which is the most important feature. The analysis of raster data can be defined as
follows: the research area is divided into normal cells with unique dimensions, and each cell's
measurement or attribute is expressed by a digital code, i.e. (DN). The locations of raster cells
are inferred from their positions in the image rather than being specifically recorded. Raster data
is typically represented as a matrix (2D array) with row and column numbers indexed into each
cell.
In the raster format, each cell has a value that is either an integer or a floating point number.
Integers are commonly used to display discrete statistics, such as temperature, average annual
precipitation, and elevation, while floating point numbers are typically used to represent
continuous data, such as forest area, agricultural land or built up area.
Fig. 8.1 A continuous raster with darker shades for higher altitude
Raster data are usually organised into layers, which are also called as bands, themes, or grid.
Each layer has a feature-based theme, such as irrigation, soil type, topography, land use, and
vegetation cover, etc. The raster data models are better suited to continuous phenomena, but they
can also be used to describe discrete functions.
Source: gisoutlook.com
Fig 8.2 Illustration of line, point, and area feature: on the right side we see raster format
whereas vector format on the left side.
Raster data represents a point with a single cell, lines with a series of adjacent cells, and areas
with a set of contiguous cells (Fig. 8.2). While it lacks the precision of the vector data model in
describing the position of spatial features, the raster data model has the distinct advantage of
having fixed cell locations. A raster can be analysed as a matrix with rows and columns in
computational algorithms, and the cell value in a two-dimensional array can be stored. Arrayed
variables are conveniently handled by all widely used programming languages. As a result, raster
data is significantly easier to process, aggregate, and analysis than vector data.
Cell Value:
Each raster cell contains a value that corresponds to the spatial characteristics of the location of
the column and row indicated. Based on the coding of their cell numbers, the Raster Data can be
either an integer or a floating point raster. The integer value does not have decimals,
whereas floating point values have it. Usually, integer cell values reflect the categorical data that
may be or may not be ordered. The land cover raster can be set to 1 for urban built-up areas, 2
for forestland areas, 3 for water bodies, and so on. The continuous numerical data are seen by the
floating point cell value. For instance, a raster of precipitation could have values such as 20.15,
12.23 (millimetre), etc.
A raster floating point needs more memory, or we can say more working space, than an integer
raster, and this disparity can be significant in GIS projects that cover a large area. We can view
the cell values of an integer raster from an attribute table, which is one of the few variations we
can address here. Due to the vast number of data, a floating-point raster typically does not have
an attribute table.
Cell Size:
The raster data model's resolution is determined by the cell size. A cell size of 30 metres equals
900 square metres per cell (30 X 30 meters). On the other side, a cell size of 10 metres means
that each cell is 100 square metres in size (10 X 10 meters). As a result, we might assume that a
raster of 10 metres has a higher resolution than a raster of 30 metres. A larger cell can't reflect
the exact position of spatial features, so mixed properties like woodland, grassland, and water are
more often in a cell. The most popular approach is to enter the division that takes up the greatest
percentage of the cell area, but where a raster uses a smaller cell size, these issues are minimised.
A small cell size, on the other hand, increases data volume and processing time.
Raster Bands:
A raster can have one or more bands i.e. multi-band. In a raster with many bands, each cell has
several values associated with it. A satellite image of five, seven, or more bands at each cell
position is an example of a multi-band raster. In the other hand, only one cell value is present in
a single-band raster. An example of a single-band raster is a height raster, where each cell
position has one height value.
Spatial Reference:
In order to coordinate spatially with other data sets in a GIS, raster data must provide spatial
reference information. For instance, we must first ensure that two data sets are in the same
coordinate system to superimpose an elevation raster on a vector-bases forest cover. A
Georeferenced raster is usually named, that has been processed to match a projected coordinating
system.
In association with the raster, two modifications are required. First, at the top-left corner is the
base of the proposed coordinate scheme or we can say projected coordinate. Second, the
coordinate projected must match the raster rows and columns.
A wide range of data we use in GIS are raster encoded. The raster data model is composed of
the same fundamental components. There are many types of Raster data that we are going to
discuss one by one.
Satellite Imagery:
The user of remotely sensed satellite data is very similar to a GIS user. The satellite image's
spatial resolution is equal to the ground pixel size. A spatial resolution of 30 metres, for example,
means that it covers 900 square metres on the earth. The pixel value, also known as the
brightness value, represents the amount of light energy produced by the earth's surface. Light
energy is measured using spectral bands from the electromagnetic spectrum, which is a
continuous range of wavelengths. Multispectral images are made up of several bands, while a
single spectral band is made up of panchromatic images.
time lapse. At the same time, GPS and the Inertial Measurement Unit (IMU) decide the position
and direction of the laser source.
Source: https://sites.google.com/site/bethorninggis6920/labs/working-with-dems
Fig 8.3 DEMs at four resolutions: 30m, 10m, 5m, 2mThe DEM of 2 metres, containing more
topographical information than the other three.
Global DEMs:
DEMs with different resolutions are now available on a global scale. SRTM DEMs, with a
coarser spatial resolution of approximately 3-arc seconds, are available outside the USA with a
spatial resolution of 90 metre at the equator). These global-scale DEMs are referred to as SRTM
DTED Level 1 (digital terrain elevation data), as opposed to DTED Level 2 for the US and its
territories. Although the values of SRTM DTED level 1 are determined from the values of the
elevation of SRTM DTED level 2, at coincident points of less than 16 metres they have the same
vertical accuracy.
With a grid spacing of 5 minutes of latitude by 5 minutes of longitude, ETOPO5 (Earth
Topography-5 Minute) data cover both the ground surface and the ocean floor of the Earth.
Global DEMs with a horizontal grid spacing of 30 arc-seconds (approximately 1 kilometre) are
available from both GTOPO30 and GLOBE. GTOPO30 and GLOBE were created using raster
data from satellite imagery and vector data from the Digital Map of the World's contour lines.
GLOBE's vertical accuracy is measured to be within 30 metres when using raster sources and
160 metres when using vector sources.
Cell-by-Cell Encoding:
The shortest raster data structure is available from the cell-by-cell encoding system. A raster is
saved as a matrix and the cell values are entered into a file in rows and columns (Fig 8.4). This
technique operates at the cell level and is perfect if a raster's cell value also varies.
DEMs are using the data structure cell by cell so the values of the neighbouring elevation are
rarely identical. The cell-by-cell encoding process is often used to store data in satellite images.
Source: https://saylordotorg.github.io/text_essentials-of-geographic-information-systems/s08-
01-raster-data-models.html
Fig 8.4 Each cell value is recorded by row and column in the cell-by-cell data structure. The cell
value of the yellow cells is 1
Run-Length Encoding:
Whenever a raster consists of multiple redundant cell values, the mechanism of cell-by-cell
encoding is inefficient. For instance, a scanned bi-level file on the soil map contains many 0s
representing non-inked whereas 1s representing inked soil lines. The RLE process, which
records cell values by row and category, can be used to store raster models with many repeated
cell values more effectively. A group is made up of cells that have the same cell value. The
polygon's run-length encoding is shown in yellow in Fig. 8.5. The length of the group (“run”)
that falls inside the polygon is indicated by the beginning cell and the end cell for each row.
Source: https://saylordotorg.github.io/text_essentials-of-geographic-information-systems/s08-
01-raster-data-models.html
Fig 8.5 The run- length encoding method records the yellow cells by row. Row 2 has two
adjacent yellow cells in columns 5 and 6. Row 2 is therefore encoded with one run, beginning in
column 5 and ending in column 6. The same method is used to record other rows.
Quad Tree:
Quad tree divides a raster into a hierarchy of quadrants using recursive decomposition rather
than operating along one row at a time. Recursive decomposition is a subdivision process that
continues until each quadrant in a quad tree has just one cell value. Figure 8.6 shows a raster
with a yellow polygon and a quad tree which holds the element. Nodes and branches make up
the quad tree. A quadrant is represented by a node. A node may be a non-leaf node or a leaf node
depending on the cell value in the quadrant. A quadrant with different cell values is represented
by a non-leaf node. As a result, a non-leaf node is a branch point, where the quadrant is
subdivided while leaf node, is the point at which the same quadrant value can be coded. The
depth of the quad tree or the number of levels in the hierarchy will differ depending on the
complexity of the 2-D feature.
After the subdivision is over, the 2-D feature is coded using a quad tree and a spatial indexing
tool. Figure 8.6 shows two yellow leaf nodes in the level-1 NW quadrant (with a spatial index of
0). 022 refer to the level-2 SE quadrant, while 023 refers to the level-3 SE quadrant of the level-2
NE quadrant. The coding of the two-dimensional function is completed by the string of (022,
023) and others for the other three level-1 quadrants. The regional quad tree is a good way to
store area data, particularly if there are few categories. This method is also efficient for data
processing. Quad trees can also be used in GIS. For sorting, indexing, and displaying global data,
researchers suggest using a hierarchical quad tree structure. Quad trees may also be used as a
spatial indexing technique. Spatial indexing makes it simple and easy to find raster and vector
spatial data. Oracle spatial, for example, uses quad tree as a way of indexing spatial data.
Source: https://saylordotorg.github.io/text_essentials-of-geographic-information-systems/s08-
01-raster-data-models.html
Fig 8.6 A raster is divided into a hierarchy of quadrants using the regional quad tree system. The
split ends where a quadrant consists of cells of equal value (Yellow or white). A leaf node is a
quadrant that cannot be subdivided.
Data Compression:
Raster data sets typically contain large amount of data and require considerable memory space.
Approximate file sizes are 1.1 megabytes (MB) for a 30-meter DEM, 9.9 MB for a 10-meter
DEM, 5 to 15 MB for a 7.5- minute digital raster graphic (DRG), and 45 MB for a 3.75- minute
quarter DOQ in black and white. The capacity requirements for an uncompressed 7–band TM
scene are nearly 200 MB. The memory requirement becomes even higher for high-resolution
satellite images.
The reduction of data volume is referred to as data compression, and it is a subject that is
especially important for data distribution and internet mapping. We're all familiar with data
compression applications like WinZip for Windows and gzip for UNIX. These programmes can
operate on any kind of data file while preserving the original file and folder structure. This part,
on the other hand, is about image compression.
Scanners are linked to this format. It stores and reads the scanned pictures. TIFF may use run-
length and compression schemes for other images. The colours like a GIF are not confined to
256. This is commonly used in the field of desktop printing. It serves as an interface to many
scanners and packages in graphic arts. TIFF supports black and white images as well as pseudo
colour, which can also be stored in both compressed and decompressed formats.
GeoTIFF is a metadata standard in the public domain, which enables the embedding of
geographical information into a TIFF file. The possible additional material includes a map,
coordinates, ellipsoids, datums, and any other information required to determine the exact file
geographical relation. A programme that does not read and decode the advanced metadatas will
still be able to open a GeoTIFF format file, which is fully compatible with the TIFF 6.0 format.
An animated GIF is a GIF file that includes multiple images or "frames." This pictures are
played series by opening or displaying the file on a web browser. The effect is a short film or an
animation clip. The GIF format consists of an extension to graphics control (or "GCE block"),
which allows several frames in a single GIF file. This section also defines the interval between
frames, which can be used at some points in the animation to adjust the frame rate or place
pauses. Another part, the Netscape Application Block (NAB), indicates the repeated animations
(a setting of "0" is used for infinite repetitions).
13) BMP
Short for "Bitmap." You might say it as "bump," "B-M-P," or just "bitmap." The BMP format is
widely used to store image files as a raster graphic format. It is now recognisable among several
applications on all macbooks and pcs, but has been released on the Windows platform. The BMP
format saves colour data without any encoding for each pixel in the image. For example, a BMP
image of 10x10 pixels would have 100 pixel colour data. This approach allows crisp, high-
quality graphics to be saved, but still creates massive file sizes. The JPEG and GIF formats are
bitmaps, but use algorithms for compression images which can reduce file size considerably. For
this reason, JPEG and GIF images are used on the Web, while BMP images are often used for
printable images.
8.4 SUMMARY
In this unit we have discussed about raster data and its type which includes Satellite Imagery,
USGS Digital Elevation Model, and Non-USGS Digital Elevation Model & Global DEMs. We
have also discussed about Raster Data Structure which includes Cell-by-Cell Encoding, Run-
Length Encoding and Quad Tree. We have also learned how to organise raster data and elements
of raster data model such as Cell value, Cell size, Raster bands etc. The chapter also explains the
process of Data Compression, Raster Data Format which explains Portable Network Graphics
(PNG), Joint Photographic Experts Group (JPEG2000), JPEG File Interchange Format (JFIF),
Graphic Interchange Format (GIF), Geo Tagged Image File Formats (GeoTIFF), etc.
8.5 GLOSSARY
Floating-point raster: This raster are used to store what is considered a continuous
data with floating point cell values.
Georeferenced raster: A raster that assigning information about geospatial positioning
to raster data based on a given coordination scheme.
Landsat: An orbiting satellite that provides the images of the Earth’s surface with
repeated photos.
Lossy compression: A method of data compression capable of achieving high
compression ratios but which cannot completely rebuild the original image.
Quad tree: A system that divides a raster into a quadrant hierarchy.
Raster data model: A data model using spatial characteristics by rows, columns and
cells.
Rasterization: Rasterization is a process in which conversion of vector data to raster
data is to be done.
Run length encoding (RLE): A structure for raster data, recording cell values in rows
and groups. An encoded run-length file can also be considered a compressed run-
length file (RLC).
Vectorization: Rasterization is a process in which conversion of raster data to vector
data is to be done.
Wavelet transform: A modern technique of image compression that deals image like a
wave and gradually breaks the wave into simpler wavelet.
8.7 REFERENCES
Bhatta, (2008) Remote Sensing and GIS Oxford University Press, Pp: 442,121,129,135,
144.
Floyd F. Sabins, (1996/1997) Remote Sensing Principles and Interpretation, W.H.
Freeman And Company New York 3rd Edition, Pp: 29, 69,105,177,236.
Kalicharan Sahu, (2008), Text Book of Remote Sensing and GIS, Atlantic Publications,
Pp: 1-2,127-198.
Textbook Of Remote Sensing and Geographical Information System, M.Anji Reddy,
Second Edition, Pp 1-23.
http://en.wikipedia.org/wiki/Raster_data
http://geospatial.referata.com/wiki/Raster_Data_Model
http://gis.stackexchange.com/questions/57142/what-is-the-difference-between-vector-
and-raster-data-models Websites Books
Bhatta, (2008) Remote Sensing and Gis Oxford University Press
Chang, Kang‐tsung Introduction to Geographic Information Systems 5 th edition 2009
Mcgraw‐Hill.
Lillesand, Thomas M., Ralph W. Kiefer, and Jonatham W.Chipman, 2004
Textbook Of Remote Sensing and Geographical Information System, M.Anji Reddy,
Second Edition, Pp 1-23.
9.1 OBJECTIVES
9.2 INTRODUCTION
9.3 OVERLAY ANALYSIS- UNION, INTERSECTION
9.4 SUMMARY
9.5 GLOSSARY
9.6 ANSWER TO CHECK YOUR PROGRESS
9.7 REFERENCES
9.8 TERMINAL QUESTIONS
9.1 OBJECTIVES
After reading this unit learner will be able:
1. To understand overlay analysis.
2. To know details and difference between raster and vector overlay.
3. Gain Knowledge about application of Overlay analysis
9.2 INTRODUCTION
Overlay operations are part of most spatial analysis processes and generally form the core of GIS
projects. These operations combine several maps and thus give new information that was not
present in the individual maps. In overlay operations new spatial elements are created on the
basis of multiple input maps.
Figure 9.1:
First, two or more layers of information from the same area are overlaid onto each other. Then,
the topology of the new layer is updated: if a point now lies within a polygon, it gets assigned
this information as a new attribute. If two lines intersect ("arcs") a new node will be added at
their intersection. If two polygons intersect, a unique identification number is given to the
intersecting set and so forth. Ultimately, the overlay results in an information gain. In order for
this integration to make sense, all input layers must have the same reference system and scale.
The map will only be legible if all the layers fit together exactly in regards to position and scale.
The process itself is independent of whether a raster or a vector model is used. With a raster
model, the overlay operation is rather an overlay than an intersection. The integration of
information from various sources through overlay is one of the most important functions of a
GIS.
Finally, Overlay analysis gives us: “what’s within what?”
Overlay creates an output by combining geometries and attributes from different layers (either
vector or raster). Overlay output: combines two different layers to form a new layer (different
geometry and attribute table).
Figure 9.3:
capability. The raster data model affords a strong numerically modelling (quantitative analysis)
capability. Most sophisticated spatial modelling is undertaken within the raster domain.
In vector-based systems, topological overlay is achieved by the creation of a new topological
theme from two or more existing themes. This requires the rebuilding of topological tables, e.g.,
arc, node, polygon, and therefore can be time consuming and CPU-intensive. The result of a
topological overlay in the vector domain is a new topological theme that contains attributes of
the original input data layers. In this way, selected queries of the original layer can then be
undertaken, e.g., soils and forest cover, to determine where specific situations occur, e.g.,
deciduous forest cover where drainage is poor.
Till date, the primary analysis technique used in GIS applications, vector and raster, is the
overlay of selected data layers.
1. Union
It reserves all features from the input and overlay layers
The area extent of the output combines the area extents of both layers
Input layers have to be polygons
3. Symmetrical difference
It preserves features common to either the input layer or overlay layer but not
both
The geometry of the overlay layer as to be the same as the input
4. Identity
It preserves only features that fall within the area extent of the input layer
The overlay layer has to be a polygon or the same geometry as the input
VECTOR OVERLAY
Overlay of vector data is slightly complicated because it must update the topological tables of
spatial relationships between points, lines, and polygons. During the process of overlay, the
attribute data associated with each feature type are merged. The resulting table contains both the
attribute data. The process of overlay depends upon the modelling approach the user needs.
Generally, GIS software implements the overlay of different vector data layers by combining the
spatial and attributes data files of the layers to create a new data layer. Again, different GIS
software utilizes varying approaches for the display and reporting of overlay results. Some
UNIT 9 - OVERLAY ANALYSIS- UNION, INTERSECTION Page 153 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
systems require that topological overlay occur on only two data layers at a time, creating a third
layer. One might need to carry out a series of overlay procedures to arrive at the conclusion,
which depends upon some criteria.
A union overlay combines the geographic features and attribute tables of both inputs into a single
new output. An intersect overlay defines the area where both inputs overlap and retains a set of
attribute fields for each. A symmetric difference overlay defines an output area that includes the
total area of both inputs except for the overlapping area. Using these operations, new spatial
elements are created by the overlaying of maps.
There are three types of vector overlay; point-in-polygon, line-on-polygon, and polygon- on-
polygon.
1. Point in Polygon Overlay: Points are overlaid on polygon map as shown in Figure 9.10.
Topology of point in polygon is ‘is contained in’ relationship. Point topology in the new
data layer is a new attribute of polygon for each point.
3. Polygon on Polygon Overlay: Two layers of area objectives are overlaid, resulting in
new polygons and intersections as shown in Figure 9.12. The number of new polygons is
usually larger than that of the original polygons. Polygon topology in the new data layer
is a list of original polygon IDs.
RASTER OVERLAY:
Overlay of raster data with more than two layers is rather easier as compared with overlay of
vector data, because it does not include any topological operation but only pixel by pixel
operations. In raster data analysis, the overlay of datasets is accomplished through a process
known as ‘local operation on multiple rasters’ or ‘map algebra’, through a function that
combines the values of each raster’s matrix for mathematical calculations. This function may
weigh some inputs more than others through use of an ‘index model’ that reflects the influence
of various factors upon a geographic phenomenon (Figure 9.13).
In raster overlay, the pixel or grid cell values in each map are combined using arithmetic and
Boolean operators to produce a new value in the composite raster map. The maps can be treated
as arithmetical variables and we can perform complex algebraic functions. The method is often
described as map algebra. The raster GIS provides the ability to perform operations on map
layers mathematically. This is particularly important for the modelling in which various maps are
combined using various mathematical functions. Conditional operators are the basic
mathematical functions that are also supported in such cases.
Figure 9.15: Raster Overlay Process ( It can be Added, subtracted, divided or multiplied)
Most sophisticated spatial modeling is undertaken within the raster domain. Each point can be
addressed by as a part of a neighborhood of surrounding values. If all the neighboring points
having the same attribute value are grouped together, is termed as region. Raster map overlay
introduces the idea of map algebra. It means in the raster data processing, some analysis use
individual cells only and some rely on neighboring or regional associations. Thus the raster data
processing methods can be classified into the following categories:
1. Local operations
2. Neighborhood operation
3. Regional operations
Local operations are based on point-by-point or cell-by-cell analysis. The most important of this
group is the overlay analysis. In the raster based analysis either the logical or arithmetic
operators are used. The logical overlay methods use operators AND, OR, and XOR (exclusive
OR). Mathematically AND multiplies the individual cells whereas logical OR and XOR add
individual values of corresponding cells. The most important consideration in raster overlay is
the appropriate coding of the features in the input layers. The raster overlay is affected by the
resolution (cell size) and scale of measurement (nominal, ordinal, interval or ratio). It is advised
that the resolution and the scale of measurement of both the input and analysis layer should be
compatible. Basic arithmetic operators in raster overlay operations are ADDITION,
SUBTRACTION, DIVISION, and MULTIPLICATION. All these operators are explained
here with the self-explanatory diagrams.
RECLASSIFICATION:
Reclassification is method of changing the attribute values without altering the geometry of the
map. In fact it is a database simplification process that aims at reducing the number of categories
of attribute data layer. Accordingly, features adjacent to one another that have a common value,
will be treated and appear as one class. Reclassification is an attribute generalization technique.
Typically this function makes use of polygon patterning techniques such as crosshatching and/or
colour shading for graphic representation. It usually uses either logical or arithmetic operators
for raster data or arithmetic operator for vector data. After reclassification, the common
boundaries between polygons with identical attribute values are dissolved. Consequently the
topology will be rebuilt.
WEIGHTED OVERLAY :
The objective behind area-on-area overlay on a vector data model, “to identify one or more parts
of the new geometry that met simple criteria. Areas that did not meet the criteria were
discarded”. This was processed as a single task. The function of weighted overlay is to determine
a new set of values for the complete coverage based on a combination of input values. There are
two task to perform that working with a vector data model.
1. Create a new set of geometries for the entire area, and
2. Compute a new set of attributes for those geometries.
After performing the above mention task is a matter of describing a mathematical equation to
process the input values. In the first task requires you to extend the basic polygon overlay
operation to consider every intersection between all polygons in every data layer. As you can
imagine this can be computationally demanding, especially if the GIS you are using computes
topology 'on the fly' and does not store it in the data structure. As we shall see this is one of the
reasons why weighted overlay is more frequently applied to a raster data model. However, there
are requirements for overlaying point, linear, and polygon data in selected combinations, e.g.
point in polygon, line in polygon, and polygon on polygon are the most common.
The arcs of the input layers are split at their intersection with arcs of the union layer. Thus the
number of polygons in the output layer will be larger than the input layer. It is the Boolean
operation that uses OR. Therefore the output map corresponds to the area extent of input layer or
analysis layer or both. UNION requires both the input and analysis layer be polygon. This
operator is generally used for querying and analysis of urban sprawl.
INTERSECT operator performs the intersection of two input layers. The resultant layer will keep
those portions of the first input layer features which fall within the second input layer polygon.
That is, features that lie in common area of both the input layers. It uses the Boolean operator
AND. The point of caution is that the input layer may either be a point or, line or polygon but the
analysis layer should always be a polygon.
layer are retained. The retained layer will have all attributes from the input layer only.
The input layer may be point, line and polygon but the analysis layer is always a polygon.
3. SPLIT divides the input coverage into two or more coverages. For this a series of clip
operation is performed. Each resultant layer contains only those portions of the input
layer that are overlapped by the polygon satisfying the specified criteria. For example, a
national forest cover can use SPLIT to divide vegetation coverage by district so that each
district can have its own vegetation coverage.
4. UPDATE and IDENTITY UPDATE uses a cut and paste operation to replace the input
coverage and its map features.
5. IDENTITY operation overlays polygons and keeps all input layer features and only
those features from the analysis layer that overlap the input layer. The resultant layer will
have the same spatial features as that of the input layer. In case of polygon overlays the
number of polygon in the output layer will always be larger in number than the input
layer.
It is expressed in a Boolean operation (input map) AND (overlay map) OR (input map). The
input map may contain points, lines or polygons. The word of caution is that this operation can
only be ideally applied if the map boundary is precisely maintained. Beside these operators a
number of other operators such as MERGE, Append, ELIMINATE, and RESELECT
DISSOLVE etc are also used in the vector overlay operation
APPLICATION:
1. Delimitation of protected zones around features like defining buffer zones along river
streams to restrict urban developments.
2. Creating restrictions criteria for the location of an industrial site based on buffers along
conservation areas, river streams, and residential areas.
3. Definition of areas of influence like generating a buffer zone centred on a school to
estimate the number of potential students
4. Delimitation of protected zones around features
5. GIS spatial analysis using buffer to identify riparian land use
6. GIS spatial analysis using buffer to define a search radius centred in one specific feature.
A suitability model can be used to find the best location to construct a new school, hospital,
police station, industrial corridors etc. Certain land uses are more conducive than others for
building a new school for example, forest and agriculture were more favorable than residential
housing in this model. It was desired to locate the school on flat slopes, near recreation sites, and
far from existing schools.
9.4 SUMMARY
We have to remember that analysis through GIS begins with overlay analysis; let it be the
examples given above or other decision making analysis undertaken using diverse spatial data
sets. Today the analysts are working with real time data sets, voluminous big data component
and speedy processing of layered data with ready output. GIS is considered as a decision making
tool in problem solving almost all environmental concerns. Spatial analysis is a vital part of GIS
and can be used for many applications like site suitability, natural resource monitoring,
environmental disaster management and many more. Vector, raster based analysis functions and
arithmetic, logical and conditional operations are used based on the recovered derivations. With
the technology expanding and the output tool readily available in the hands of individuals and
experts the challenges have increased and expanded their dimensions.
9.5 GLOSSARY
1. Overlay: Overlay is a GIS operation that superimposes multiple data sets (representing
different themes) together for the purpose of identifying relationships between them..
An overlay creates a composite map by combining the geometry and attributes of the
input data sets.
2. Raster Overlay: Raster overlay involves two or more different sets of data that derive
from a common grid. The separate sets of data are usually given numerical values. These
values then are mathematically merged together to create a new set of values for a single
output layer.
3. Vector Overlay: A vector overlay involves combining point, line, or polygon geometry
and their associated attributes. All overly operations create new geometry and a new
output geospatial data set. The clip function defines the area for which features will be
output based on a “clipping” polygon.
4. Weighted Overlay: Weighted overlay is one method of modeling suitability. ArcGIS
uses the following process for this analysis. Multiplying each layer's weight by each cell's
suitability value produces a weighted suitability value. Weighted suitability values are
totaled for each overlaying cell and then written to an output layer.
5. Reclassification: Reclassification operations merely repackage existing information on
a single map. Overlay operations, on the other hand, involve two or more maps and
result in the delineation of new boundaries.
6. Boolean: Binary (two-valued) system of variables and operations for logical operations
developed by George Boole in the mid-nineteenth century.
7. Decision Support System: An interactive, computer based system that supports decision
making.
8. Digitizing: The process of converting analog spatial information from sources like paper
maps to digital data.
9. Projection (Map): A method to transform the Earth’s curved surface onto a plane.
10. Topology: The geometric relationship between points, lines, and geometric forms that
remains consistent throughout spatial operations in a digital mapping environment.
9.7 REFERENCES
1. Carver, S. J. (1991). Integrating multi-criteria evaluation with geographical information
systems. International Journal of Geographic Information Systems 5, 321--339.
2. Chrisman, N. (2002). Exploring geographic information systems (2nd edn.). New York:
Wiley. Eastman, J. R. (2005).
3. Multi-criteria evaluation and GIS. In Longley, P. A., Goodchild, M. F., Maguire, D. J. &
Rhind, D. W. (eds.) Geographical information systems – principles and technical issues
(2nd edn.). Hoboken, NJ: Wiley.
4. Herbertson, A. J. (1905). The major natural regions: An essay in systematic geography.
The Geographical Journal 25, 300--310.
5. Hoyt, H. (1939). The structure and growth of residential neighborhoods in American
cities. Washington, DC: US Federal Housing Administration.
6. McHarg, I. (1965). Design with nature. Garden City, NY: Natural History Press.
7. Tomlin, D. (1990). Geographic information systems and cartographic modelling.
Engelwood Cliffs, NJ: Prentice-Hall.
8. Tomlinson, R. (1967). An introduction to geographic information system of the Canada
land information inventory. Ottawa: Department of Forestry and Rural Development.
10.1 OBJECTIVES
10.2 INTRODUCTION
10.3 PROXIMITY ANALYSIS- BUFFERING
10.4 SUMMARY
10.5 GLOSSARY
10.6 ANSWER TO CHECK YOUR PROGRESS
10.7 REFERENCES
10.8 TERMINAL QUESTIONS
10.1 OBJECTIVES
After reading this unit learner will be able to understand:
Proximity analysis.
Buffers in GIS.
Application of buffering
10.2 INTRODUCTION
In geographic information systems and spatial analysis, proximity analysis is the
determination of a zone around a geographic feature containing locations that are within a
specified distance of that feature, the buffer zone. A buffer is likely the most commonly used
tool within the proximity analysis methods. Let’s discuss, what are buffers in GIS?
Proximity usually creates two areas: one area that is within a specified distance to selected
real world features and the other area that is beyond. The area that is within the specified
distance is called the buffer zone.
A buffer zone is any area that serves the purpose of keeping real world features distant from
one another. Buffer zones are often set up to protect the environment, protect residential and
commercial zones from industrial accidents or natural disasters, or to prevent violence.
Common types of buffer zones may be greenbelts between residential and commercial areas,
border zones between countries eg. noise protection zones around airports, or pollution
protection zones along rivers.
CREATING BUFFERS :
Tools plugin gives geo-processing tools which create buffer around features based on
distance field. Follow the steps given below to create buffer in Q-GIS.
1. Open Road Layer and Places Layer on Q-GIS (to simplify, clip both the layers to
features within India)
2. Select Buffer from Geoprocessing Tools sub menu item of Vector Menu. This opens
buffer dialogue box which appears as in figure below.
cause the buffered zone around an object to be inconsistent. An example of this type
of buffer is mapping the fallout zone around a nuclear reactor, while the fallout zone
is being blown by the wind. In the event of a nuclear fallout, the wind could be
blowing from east to west. The wind blowing the radiation to the west would cause
the area to the west to have a much higher radiation hazard than the area to the east of
the reactor. Buffering this area would show a very narrow buffer zone on the east side
of the reactor and a very elongated buffer zone to the west.
VARIATIONS IN BUFFERING:
There are several variations in buffering. The buffer distance or buffer size can
vary according to numerical values provided in the vector layer attribute table for each
feature. The numerical values have to be defined in map units according to the Coordinate
Reference System (CRS) used with the data. For example, the width of a buffer zone along
the banks of a river can vary depending on the intensity of the adjacent land use. For
intensive cultivation the buffer distance may be bigger than for organic farming.
Buffers around polyline features, such as rivers or roads, do not have to be on both sides of
the lines. They can be on either the left side or the right side of the line feature. In these cases
the left or right side is determined by the direction from the starting point to the end point of
line during digitizing.
Figure 10.7: Multiple buffering a point feature with distances of 10, 15, 25 and 30 km.
Figure 10.8: Buffer zones with dissolved (left) and with intact boundaries (right) showing
overlapping areas.
APPLICATION OF BUFFERING :
Buffer zones are areas created to enhance the protection of a specific conservation area, often
peripheral to it. Within buffer zones, resource use may be legally or customarily restricted,
often to a lesser degree than in the adjacent protected area so as to form a transition zone.
1. Buffering creates s buffer zone data set.
2. A buffer zone often treated as a protection zone and is used for planning and
regulatory purposes.
3. A city require a buffer zone of 550m for alcohol trading from school.
4. A 30m buffer zone along bank may needed to protect a river.
5. Buffering operations includes, for example, identifying protected zone, arounf lakes
and streams, zone of noise pollution, around highway, service zone around bus route
and ground water pollution zone around waste site.
EXAMPLE:
Geodesic Buffer Example:
The goal of this example is to compare 1,000 kilometer geodesic and Euclidean buffers of a
number of select world cities. Geodesic buffers were generated by buffering a point feature
class with a geographic coordinate system, and Euclidean buffers were generated by
buffering a point feature class with a projected coordinate system (in both the projected and
unprojected datasets the points represent the same cities).
When working with a dataset in one of the common projected coordinate systems for the
whole world, such as Mercator, projection distortion may be minimal near the equator, but
significant near the poles. This means that for a Mercator projected dataset, distance
measurements and buffer offsets should be quite accurate near the equator and less accurate
away from the equator.
10.4 SUMMARY
What have we learned?
Let’s wrap up what we covered in this worksheet:
Buffer zones describe areas around real world features.
Buffer zones are always vector polygons.
A feature can have multiple buffer zones.
10.5 GLOSSARY
1. Proximity: Proximity analysis is a class of spatial analysis tools and algorithms that
employ geographic distance as a central principle. Proximity analysis is a crucial tool
for business marketing and site selection. Marketers analyze demographics and
infrastructure to determine trade areas.
2. Buffer: A buffer is a reclassification based on distance: classification of
within/without a given proximity. Buffering involves measuring distance outward in
directions from an object. Buffering can be done on all three types of vector data:
point, line, area.
10.7 REFERENCES
1. Jensen, John R.; Jensen, Ryan R. (2013). Introductory Geogrpahic Information
Systems. Pearson. 149.
2. Jump up↑ Buffer (Analysis), ArcGIS Desktop 10 online help, Accessed 4 March 2010
3. Jump up↑ Longley, P. A., Goodchild, M. F., Maguire, D. J., & Rhind, D. W.
(2011). Geographic Information Systems & Science. Danvers, Massachusetts: John
Wiley & Sons.
4. Jump up↑ Bolstad, Paul (2008). GIS Fundamentals: A First Text on Geographic
Information Systems, Third Edition. White Bear Lake, Minnesota: Eider Press.
5. Jump up↑ Bolstad, Paul (2008). GIS Fundamentals: A First Text on Geographic
Information Systems, Third Edition. White Bear Lake, Minnesota: Eider Press.
6. Jump up↑ Lo, C.P., Young, Albert K.W. (2002). Concepts and Techniques of
Geographic Information Systems. Upper Saddle River, New Jersey: Prentice-Hall inc.
7. Jump up↑ Lo, C.P., Young, Albert K.W. (2002). Concepts and Techniques of
Geographic Information Systems. Upper Saddle River, New Jersey: Prentice-Hall inc.
pg 207.
11.1 OBJECTIVES
11.2 INTRODUCTION
11.3 NETWORKING ANALYSIS: OPTIMAL PATH &
NEIGHBORHOOD
11.4 SUMMARY
11.5 GLOSSARY
11.6 ANSWER TO CHECK YOUR PROGRESS
11.7 REFERENCES
11.8 TERMINAL QUESTIONS
UNIT 11 - NETWORKING ANALYSIS: OPTIMAL PATH & NEIGHBORHOOD Page 177 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
11.1 OBJECTIVES
After studying this unit you will be able to:
11.2 INTRODUCTION
A network is a system of connected set of lines representing some geographic phenomenon. It is
identifies through forms through which resources are transported or communication is achieved.
The “goods” transported can be almost anything: people, cars and other vehicles along a road
network, commercial goods along a logistic network, phone calls along a telephone network, or
water pollution along a stream/river network.
Network Elements:
Network consists of different elements each of which can be associated with an attribute defining
the characteristics of an element (refer fig.11.1). They are:
a) Links/Lines – Links are the basic element of network as it serves as the conduit for the
movement. There are two terms which should be understood in this regard:
Resistance, which describes the amount of impedance it involves for the free flow of
resources, example may be cost of transportation, condition of road, time… etc. It is user
defined and depends on the direction of flow, hence can be categorized as, “from-to
resistance” or “to-from resistance”. Negative link impedance signifies that the link cannot
UNIT 11 - NETWORKING ANALYSIS: OPTIMAL PATH & NEIGHBORHOOD Page 178 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
be traversed in that line. Resource Demand is the associative attribute on the link;
example may be number of households dependent along each water pipeline for potable
use in an area.
b) Nodes – These are end points. Links are always connected at nodes.
c) Turns – It is the direction of flow from one link to the other connected through node
(point of location). The resource flow can be regulated by the turn, example, as no U-turn
at specific traffic intersection to reduce the inflow of traffic in a specific direction.
d) Stops -These represent location where resources can be picked and dropped in a link. A
classic example would be a bus stop in a bus route where passenger can be picked and
dropped. The demand of resource is an attribute of stop. Positive resource demand would
indicate resource picked up where as negative demand means resource drop off.
e) Centers – These are location which have specific attribute as points which has supply of
resources to distribute it further in the link of a network. Resource capacity is an
important attribute.
f) Barriers - These represent location through which there are no resource flows. These are
generally visualized as obstacle in the link.
UNIT 11 - NETWORKING ANALYSIS: OPTIMAL PATH & NEIGHBORHOOD Page 179 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
Network Analysis:
The real world is full of set of network which consistently facilitates movement of resources,
people and all sources of communications for utilities under concern. Network Analysis is the
study of representation, management and manipulation of such network features. Each utility
service does have certain requirements and some optimum desired levels of services are required.
Connectivity functions represent spatial linkages between features. Analysis of such networks
may entail shortest path computations (in terms of distance or travel time) between two points in
a network for routing purposes. Other forms are the find all points reachable within a given
distance or duration from a start point for allocation purposes, or determination of the capacity of
the network for transportation between an indicated source location and sink location.
Network analysis is done to achieve any of the following requirements of services dependent on
the utilities under concern:
a) Path determination –It is the process of calculating the optimal path through series of
point in a network to simulate the flow of resources through them. Depending on the
application the path determination can be categorized under two heads:
Source-Destination Path, which is the optimal path from pre defined source to pre
defined destination. The path of least resistance is determined from source to the
destination by evaluating the link, turn and resistance for the links.
Optimal Cyclic Path, where optimal path is determined from evaluating resistance
for each pair of links in the network. It can be worked out for multiple stops in a
network by optimizing the order of visit depending upon distance between each
stops evaluating impedance in the network.
b) Resource allocation – It is associated with links as resource centers. In order to meet the
demand of the link the principal of least resistance is followed and all possibility of turns
and links at those points are being analyzed.
c) Utility location – It is a search for facility point location in a network unlike allocation of
resources in an already located point of variation. It is determine by evaluating set of
constraints defined by the facility points and also the flow demand of each link.
d) Finding the closest facility – For a known event the closet facility to a given location can
be estimated. Number of facility provider can be established to give choice to the
customer.
UNIT 11 - NETWORKING ANALYSIS: OPTIMAL PATH & NEIGHBORHOOD Page 180 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………
Q2. What are the different types of analysis that can be performed on a network?
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
The aim of optimal path finding is optimal selection of nodes that will enable to achieve a high
performance in the network. Optimal-path finding techniques are used when a least-cost path
between two nodes in a network must be found. The two nodes are called origin and destination.
The aim is to find a sequence of connected lines to traverse from the origin to the destination at
the lowest possible cost.
In Optimal-path finding, the cost function can be simple: for instance, it can be defined as the
total length of all lines of the path. The cost function can also be more elaborate and take into
account not only length of the lines but also their capacity, maximum transmission (travel) rate
and other line characteristics, for instance to obtain a reasonable approximation of travel time.
There can even be cases in which the nodes visited add to the cost of the path as well. These may
be called turning costs, which are defined in a separate turning-cost table for each node,
indicating the cost of turning at the node when entering from one line and continuing on another.
This is illustrated in Figure 11.1 of the examples.
In the following illustration it will be noticed that it is possible to travel on line b in Figure 11.2,
make a U-turn at node N, and return along as to where one came from. The question is whether
doing this makes sense in optimal-path finding. After all, to go back to where one came from
will only increase the total cost. In fact, there could be situations where it would be optimal to do
so to go to a new node depending on utility of services and accessibility.
UNIT 11 - NETWORKING ANALYSIS: OPTIMAL PATH & NEIGHBORHOOD Page 181 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
Figure 11.2: Network neighborhood of node N with associated turning costs at N. Turning at N onto c
is prohibited because of its direction, so no costs are mentioned for turning onto c. A turning cost of
infinity (∞) also means that the turn is prohibited.
An illustration of ordered and unordered path finding types is provided in Figure 11.3. Here, a
path is found from node A to node D, via nodes B and C. Obviously, the length of the path found
under non-ordered requirements is at most as long as the one found under ordered requirements.
Some GISs provide support for these more complicated path-finding problems.
Figure 11.3: Ordered (a) and unordered (b) optimal-path finding. In both cases, a path had to be found
from A to D: in (a) by visiting B and then C; in (b) also by visiting both nodes, but in arbitrary order.
But when the network is very big, then it becomes inefficient since a lot of computations need to
berepeated. There are many optimization techniques for finding optimal path and these are
defined as below:
UNIT 11 - NETWORKING ANALYSIS: OPTIMAL PATH & NEIGHBORHOOD Page 182 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
PSO is an optimization algorithm which has been applied to finding shortest path in the network.
However, itmight fall into local optimal solution. In this algorithm, the flow starts with a
population of particles whoseposition that represents the solutions for the problem, and velocities
are randomly initialized in the search space.
The search for optimal position is performed by updating the particle velocities, hence positions,
in each iterationin a specific manner as follows: in every iteration, the fitness of each particle’s
position is determined by fitnessmeasure and the velocity of each particle is updated by keeping
track of two “best” positions.
Pbest: The first one is the best position a particle has traversed so far, this value is called
“pbest”.
Nbest: Another best value is the best position the any neighbor of a particle has traversed
so far, this best value is a groupbest and is called “nbest”.
Gbest: When a particle takes the whole population as its neighborhood, the
neighborhood best becomes the global best andit is accordingly called “gbest”.
In the PSO algorithm, the potential solutions, called as particles, are obtained by “flowing”
through the problemspace by following the current optimum particles. Generally speaking, the
UNIT 11 - NETWORKING ANALYSIS: OPTIMAL PATH & NEIGHBORHOOD Page 183 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
PSO algorithm has a strong ability tofind the most optimistic result, but it has a disadvantage of
easily getting into a local optimum. The PSOalgorithm’s search is based on the orientation by
tracing Pb that is each particle’s best position in its history, and tracing Pg that is all particles’
best position in their history; therefore, it can rapidly arrive around the globaloptimum. However,
because the PSO algorithm has several parameters to be adjusted by empirical approach, ifthese
parameters are not appropriately set, the search will become very slow.
c) Tabu Search:
It is an iterative search that starts from some initial feasible solution and attempts to determine
the best solution in the manner of a hill-climbing algorithm. The algorithm keeps historical local
optima for leading to the near globaloptimum fast and efficiently. During these search
procedures the best solution is always updated and stored asideuntil the stopping criterion is
satisfied. The two main components of the tabu search algorithm are the tabu listrestrictions and
the aspiration criterion. TS use short-term and/or long-term memory while making
movesbetween neighboring solutions. It is essential for a local search to be balanced in terms of
quality of solutions andcomputing time of these solutions. In that sense, a local search does not
necessarily evaluate all neighborhoodsolutions. Generally, a subset of solutions is evaluated.If
the optimal score is unknown (which is usually the case), it must be told when to stop looking
(for examplebased on time spend, user input, etc...).
d) Dijkstra's Algorithm:
Dijkastra’s algorithm is a graph search algorithm that solves the single-source optimal path
problem for a graphwith nonnegative edge path costs, producing an optimal shortest path tree.
This algorithm is often used in routingand as subroutine in other graph algorithms. It can also be
used for finding costs of shortest paths from a singlevertex to a single destination vertex by
stopping the algorithm once the optimal path to the destination vertex hasbeen determined.
Traffic information systems use Dijkstra’s algorithm in order to track the source and destinations
from a given particular source and destination. The computation is based on Dijkstra's algorithm
which is used to calculate the shortest path tree inside each area of the network.
Dijkstra’s labeling method is a central procedure in shortest path algorithms. An out-tree is a tree
originating from the sourcenode to other nodes.The output of the labeling method is an out-tree
from a source node s, to a set of nodesL.Three pieces of information are required for each node i
in the labeling method while constructing the shortestpath tree:
• the distance label, d(i),
• the parent-node/predecessor p(i),
• the set of permanently labeled nodes L
UNIT 11 - NETWORKING ANALYSIS: OPTIMAL PATH & NEIGHBORHOOD Page 184 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
Where d(i) stores an upper bound on the optimal shortest path distance from s to i; p(i) records
the node that immediately precedes node i in the out-tree. By iteratively adding a temporarily
labeled node with the smallest distance label d(i) to the set of permanently labeled nodes L,
Dijkastra guarantee so ptimality. The algorithm canbe terminated when the destination node is
permanently labeled.
The major disadvantage of the algorithm is the fact that it does a blind search there by
consuming a lot of time waste of necessary resources.Another disadvantage is that it cannot
handle negative edges. This leads to acyclic graphs and most often cannot obtain the right
shortest path.
............................................................................................................................................................
............................................................................................................................................................
............................................................................................................................................................
Q.2. Discuss various methods to overcome problem of optimal path finding in a network.
............................................................................................................................................................
............................................................................................................................................................
............................................................................................................................................................
Neighborhood Analysis:
There are numerous ways to represent the structure in a network but finding the proper account
to convey the desired network information is always not an easy task. However, like any large
data set, summary statistics (e.g. graph invariants) are one way to help succinctly describe certain
aspects of the networks. Another approach is to break up the network into smaller, easier to
manage components andstudy the properties of the sub-networks. The local regions are defined
as the neighborhoodsaround the vertices (i.e. ego networks). Neighborhood analysis can reveal
certain aspects of the network that are concealed when only aggregate global network measures
are considered. This allows the small patterns, anomalies, and features (as might be relevant to
crime and terrorism networks) to be discovered that would be missed in a more global analysis.
For example, identifying all the local leadership changes or increased activity regions can help
identify terrorist cells.
UNIT 11 - NETWORKING ANALYSIS: OPTIMAL PATH & NEIGHBORHOOD Page 185 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
suit a certain position. In addition, the neighborhoods also specify a community from which sub-
graph level statistics canbe ascertained. Community characteristics, like density, can be used to
identify tightly coupled regions of the network.
Neighborhood Matrix: There are various metrics that can be calculated for networks. Some are
specific to vertices (e.g. degree) and others describe an aspect of the entire network (e.g.
density). However, all metrics are dependent on the specification of the network and their values
can change ifthe network composition changes. This makes the specification of the network a
very important task, and one that is often crucial to the success of network analysis. To minimize
the effects of network selection, the neighborhood representation will allow evaluation of smaller
parts of the network, allowingsmaller scale effects to be captured, in addition to being
sufficiently large to also capture the large scale effects.
Based on the neighborhood statistic employed, we can define a discrepancy measure Dt(B)
describing how unusual the sub-graph, given by the vertex and neighborhood size, B = (v, k)
appears at time t. These measures should be suitably standardized to allow direct comparison
between all neighborhood sizes and times. However, unless there are very large abrupt changes,
this may not detect the change with sufficient power.
…………………………………………………………………………………………………..
.……………………………………………………………………………………………………...
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
UNIT 11 - NETWORKING ANALYSIS: OPTIMAL PATH & NEIGHBORHOOD Page 186 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
11.4 SUMMARY
The decision making management facilities desires optimal and shortest path to allocate
resources on the basis of demand and capacity. Networks are framework through which
resources flow.
In order to facilitate the desired utility services the network systems are studied in detail
to represent, manipulate and mange the association of linear features.
Optimal Path determination and neighborhood analysis are methods of network analysis.
11.5 GLOSSARY
1. Optimal Path finding- Optimal path finding is optimal selection of nodes that will enable to
achieve a high performance in the network.
2. Network analysis- Network Analysis is the study of representation, management and
manipulation of such network features.
11.7 REFERENCES
UNIT 11 - NETWORKING ANALYSIS: OPTIMAL PATH & NEIGHBORHOOD Page 187 of 216
GIS-506/DGIS-506 Advance GIS Uttarakhand Open University
12.1 OBJECTIVES
12.2 INTRODUCTION
12.3 MAP MANIPULATION
12.4 SUMMARY
12.5 GLOSSARY
12.6 ANSWER TO CHECK YOUR PROGRESS
12.7 REFERENCES
12.8 TERMINAL QUESTIONS
12.1 OBJECTIVES
After going through this unit the learner will able to learn:
1. Understand the meaning of Maps and their types.
2. Learn about Map Manipulation tools.
3. Capable to work on GIS formats.
12.2 INTRODUCTION
The word "map" is derived from the Latin word "mappa", which means napkin or paper. A
map is a symbolic depiction of the quality of a selected location, usually drawn on a flat
surface, or simply, we can say that the map is a model of the world depicted on a flat surface.
The map displays information about the world in a simple and intuitive way. They inform
about the world by showing the sizes and shapes of countries, distance between places, and
the location of features. But in present time GIS maps go far beyond the static maps of years
Past.
Types of Map:
Each and every map shows different kind of information. Function and symbolization both
play significant role in map making. By function maps can be general reference or thematic
and by symbol, maps can be qualitative or quantitative. Thereare many types of maps that we
are going to discuss below: -
According to Function:
Maps can be classified according to their functions. For example soil map shows different
soil types in a particular region and there are many other examples that we are going to
discussed below. Based on functions, maps are classified in the physical and cultural maps.
Physical Map- Physical maps are prepared to show natural features such as relief, soil,
rocks, vegetation, and Climate etc. These maps are further sub – divided into following types:
a) Astronomical Maps: -These type of maps are prepared to show heavenly bodies,
like stars, moon and planets in our solar system. These maps have both large and
small scales.
b) Relief Maps: Relief map are made to show actual topography feature of earth
surface like mountains, plateau, river system etc.
c) Geological Maps: These types of maps are made to show various geological
features such as rocks, minerals and surficial deposits, as well as location of
geologic structure such as faults and folds.
d) Climatic Maps: Climatic maps are drawn to show the geographic distribution of
Monthly or annually an average values of climatic variables i.e. Temperature,
precipitation, humidity, atmospheric pressure over a particular region, in simple
words we can say that climatic Maps depict different types of climate zones of an
area.
e) Weather Maps: Weather maps are made to show the average condition of elements
of weather (Temperature, Pressure, direction and velocity of winds etc.) over a short
period i.e. on day-to-day basis.
f) Soil Maps: Soil maps is a geographical representation of soil types or soil properties
in the area of interest by using different shades and colours.
Cultural Map - Cultural maps are drawn to represent man – made features such as canals,
dams, buildings, rails and road network etc.
a) Political Maps: Political map that represent the political sub – division of the
world, of continent, or of major Geographic regions. For example, Political map
of India shows 29 states and 9 union territories of India.
Source:https://www.mapsofindia.com/
Fig. 12.1Political Map of India
b) Population Map: These maps are drawn to show distribution and density of
Population.
c) Agricultural Map: These maps are drawn to represent production and
distribution of different types of crop in a particular area.
Map Manipulation:
As we know, Maps are very important for us because they provide us very useful information
in a very lucid way or in simple words we can say that a map simplifies complex information.
But, it is not necessary that a single map is useful to every person, because different type of
map will contain different type of information, he/she wants to seek from that particular map.
Now, when every person gathers information according to their need then concept of map
manipulation comes into play.
Map manipulation is done in GIS environment. There are various tools for processing
and managing maps in the database in the GIS software package. Like overlay and buffering,
these two tools are very basic tool that are frequently used to data pre-processing and data
analyses. Map manipulation is easy to follow graphically, even though terms describing the
various tools may differ between GIS packages. There is alots of tools that are used in Map
Manipulation such as Dissolve, Clip, Append, Select, Eliminate, Update, Erase and split. All
these tools are discussed one by one:
Dissolve - Dissolve is a tool that aggregates features andthus referred to as ‘Merge’ or
‘Amalgamation’. In this process, a new map feature is created by merging adjacent polygons
with common values of specified attributes. In GIS, dissolve is one of the Data Management
tool that is used for generalizing features. For instancein choropleth mapping, dissolve can
delete boundaries with common values and draw larger areas with the same common values,
as shown in Figure 12.2.
Source: https://pro.arcgis.com
Fig. 12.2
Thus, using the dissolve operator on multiple Polygon with same value will yield one new
polygon, combining the dimension of original, dissolved polygons.
Source: https://pro.arcgis.com
Fig. 12.3
Clipping to form new layer creates specific area of interest, which is an important function
when working in GIS. This becomes very advantageous when the analyst only needs to work
in a specific area. He/she can easily discard unnecessary spatial information and does not
affect the original data. An example of the use of clip tool is to analyse traffic patterns in a
Central business district (CBD). The analyser does not need road data outside of the CBD.So
they easily cut the road data of CBD boundaries of the particular area. A clip operation can be
processed using both Raster and Vector data. In figure 12.3 we can see that how the clipping
tool clips certain area from a larger area. The first frame is an original frame that is base for
further processing, followed by the Polygon that is circular in shape clipped from original
frame. Finally, the last image shows the new layer, clipped from the original and that is our
study area.
Append - This creates a new layer by piecing two or more layers together (Figure 12.4).
This tool can append point, polygon and linefeature class, raster catalogues, and tables into
existing dataset. For example, several rastercan be appended to an existing raster dataset, for
example, you can attach multiple raster’s to an existing raster dataset, but you cannot attach a
line feature class to a point feature class.
Source: https://pro.arcgis.com
Fig. 12.4
Source: https://pro.arcgis.com
Fig. 12.5
For example, if we want to know how many houseswere affected by a recent flood, we
simply select all houses (layer 1) that fall within the flood boundary (Layer 2). A variety of
selection methods are available to select point, line, and polygon feature in one layer that
overlaps the features in same or another layer.
Eliminate - This tool creates a new layer by removing a user- defined Polygon by merging
with neighbouring polygon with the largest area. Eliminate, as a GIS tool, eliminates the
small polygons that are usually used to remove the result of overlapping operations (Figure
12.6). For example, intersection or union.
Source: https://pro.arcgis.com
Fig. 12.6
Eliminate operator can also be referred to as ‘omit’. As it can be implemented to remove
feature when they become unnecessary at a certain scale. It is evolved from omission
operator, instituted by Raisz in 1962.
Note: The input layer must include a selection, otherwise, Eliminate will fail.
Intersect - The intersection tool calculates the geometric intersection of any number of
feature layers and feature classes common to all inputs (that is, they intersect), and writes
these features to the output feature class (Figure 12.7).Input feature must be simple feature
i.e. Point, multipoint, line or Polygon. They cannot be complex feature such as annotation
features or Network feature. There is one more thing if input feature is Point, line or Polygon,
The output function will be the same as the input function by default.For example if one or
more of the input are Polygon, the default output will be Polygon, if one or more of the input
is line, the default output will be line, if one or more of the input is of type point, the default
output will be point.
Source:https://pro.arcgis.com
Fig.12.7
Union - Union is an analytical process in which the features from two or more map layers
are combined in to a single, composite layer. In simple words we can say that the output
feature class will contain union of all the inputs from all the input feature classes. There is
one important thing to remember that all Input feature classes must be Polygon.
Source: https://pro.arcgis.com
Fig.12.8
Union includes the data from all the included layers, meaning both overlapping and non-
overlapping areas are included in new polygon.
Update - In this tool, the attributes and geometry of the input features will be updated
through the updated features in the feature class, or simplywe can say that ‘cut and Paste’
operation to replace the input layer with updated layer as we see in Fig. 12.9
Source: https://pro.arcgis.com
Fig. 12.9
As the name suggests update is very useful for updating an existing layer rather than
redrawing that layer once again. So, we can say that update is a better option than re-
digitizing the entire map.
Erase - The GIS operation deletes those features that fall within the area of the erased layer
from the input layer (Figure 12.10).
Source: https://pro.arcgis.com
Fig.12.10
In simple words, it can be said that the feature class is created by superimposing the input
feature with the polygon of the erased feature, and only those parts of the input feature that
are beyond the boundary of the erased feature are copied to the output feature class. As long
as the input features are of the same or smaller order, the "erased features" can be points,
lines or polygons. The polygon erase feature can be used to erase polygons, lines or points in
the input feature. The line erase feature can be used to erase lines or points from the input
feature; the point erase feature can be used to erase points from the input feature.
Split - Split divides the input layer into two or more layers (Fig 12.11). The split layer of the
sub-units of the display area is used as a template for dividing the input layer. For example, a
national forest can split a stand layer by district so that each district office can have its own
layer.
Source: https://pro.arcgis.com
Fig. 12.11
In ArcGIS, clip and split are also editing tools. These editing tools work with features rather
than layers. For example, the editing tool of Split splits a line at a specified location or a
polygon along a line sketch. The tool does not work with layers. It is therefore important that
we understand the function of a tool before using it.
12.4 SUMMARY
In this unit we have discussed about maps and their types we have also seen that there are
maps that are classified based on functions i.e. Physical and Cultural maps in which physical
maps are prepared to show natural features such as relief, soil, rocks, vegetation and climate
etc. while cultural maps are drawn to represent man-made features such as canals, dams,
buildings, rail and road network.
Further we learned about Map manipulation and their tools that are used in GIS
Software which are dissolve, clip, append, select, eliminate, update, erase, and split. These all
tools are used to manipulate the maps. So, the theoretical understanding of these tools is the
prerequisite condition for carrying out map manipulations in order to extract specific
information from a map.
12.5 GLOSSARY
Append: A GIS operation that creates a new layer by merging two or more layers
together.
Union: A polygon-on-polygon overlay method that preserves all feature from the
input layers.
Update: A GIS operation that replaces the input layer with the update layer and its
features.
12.7 REFERENCES
Chang, Kang‐tsung Introduction to Geographic Information Systems 5 th edition 2009
Mcgraw‐Hill.
ArcGIS Desktop Help 9.1,
http://webhelp.esri.com/arcgisdesktop/9.1/index.cfm?TopicName=welcome
https://www.mapsofindia.com/
https://geology.com/maps/types-of-maps/
https://www.nationalgeographic.com/science/article/151022-data-points-how-
make-maps-influence-people
Chang, Kang‐tsung Introduction to Geographic Information Systems 5 th edition 2009
Mcgraw‐Hill.
Lillesand, Thomas M., Ralph W. Kiefer, and JonathamW.Chipman, 2004
13.1 OBJECTIVES
13.2 INTRODUCTION
13.3 VECTOR DATA FORMATS
13.4 SUMMARY
13.5 GLOSSARY
13.6 ANSWER TO CHECK YOUR PROGRESS
13.7 REFERENCES
13.8 TERMINAL QUESTIONS
13.1 OBJECTIVES
After reading this unit learner will be able to:
1. Describe vector GIS data models;
2. Discuss advantages and disadvantages of vector data models;
3. Explain topology, topological and non-topological data structures.
13.2 INTRODUCTION
Vector model uses discrete points, lines, and areas corresponding to discrete entity, and can be
defined by the coordinate geometry. Vectors are graphical objects that have geometrical
primitives such as points, lines, and polygons to represent geographical entities in the computer
graphics. Vectors have a precise direction, length, and shape.
TINs are used to represent elevation or other continuously changing values. Mass-point is a
technique to represent surfaces using several points in a very dense manner. Contour is an
imaginary line of constant elevation on the ground surface. The corresponding line on a map is
called a contour line, a line on a map that joins places of the same elevation (height) above sea
level. Contour interval is the difference in elevation between two contour lines. Isoline is a line
on a surface, connecting points of equal value such as temperature, rainfall, etc. TINs record
values at point locations, which are connected by lines to form an irregular mesh of triangles.
The faces of the triangles represent the terrain surface. However, it should be borne in mind
that raster like continuity cannot be obtained by any of the aforementioned models.
Areas are represented by a closed set of lines and are used to define features such as fields,
building, or administrative areas. These closed set of lines are referred to as polygons or
regions. As with line features, some of these polygons exist (physically) on the ground, while
others are imaginary (abstract). Polygons need only points to input but the area, perimeter, and
other geometric attributes may be computed by the GIS software rather than by the manual
input. Regions are similar to polygons but it may contain a hole within an area, or one region
may contain multiple polygons which are not adjacent. For example, private land lots scattered
within a national forest should be subtracted from the forest to get the exact coverage and area
of the forest. Another example is that a district having many small islands requires all those
island areas to be converted in a single object.
Vector data representation using points, lines, and areas is not always straightforward because
it depends on map scale, functions we wish to perform in our later analysis, and occasionally,
on the criteria established by government mapping agencies. (Map scale is the ratio of the map
distance to the corresponding distance on the ground.)
It can be difficult for a GIS user to decide when a feature should be represented by a line.
Whether a road be represented by a single line along its centre, or are two lines required, one
for each side of the road. GIS requires a single line, and not two lines, along with the centre. A
stream may be represented using lines near its headwaters but as an area along its lower
reaches. In this case, the width of the river and the scale of the map should be considered to
take a decision. Government mapping agencies have some standards to make this task easier;
for example, a river having width less than 40 ft wide should be represented as lines on
1:24,000 scale maps.
The things that are represented as line (or polyline) may be easy to guess such as road, pipeline,
water line, bus route, and so on that have their basic shape similar to line or combination of
lines. However, in the city map with a scale of approximately 1:25,000 or 1:10,000, we may
represent buildings, parks, bus terminus, and so on as points. If we need more detailed map,
however, for instance, in the scale of 1:1000, the aforelisted infrastructure may be better to be
represented as polygons, rather than as points. On a district map, cities are required to be
represented as an area but on a map of large country such as India, it is not possible to represent
cities as areas.
The simplest vector data model stores and organizes the data without establishing relationships
among the geographic features are generally called as spaghetti model. In this model, lines in
the database overlap but do not intersect, just like spaghetti on a plate. The polygon features are
defined by lines which do not have any concept of start and end node or intersection node.
However, the polygons are hatched or coloured manually to represent something. There is no
data attached to it and, therefore, no data analysis is possible in the spaghetti model.
starting and ending points. The vector file consists of a few long lines, many short lines, or
even a mix of the two. The files are generally written in a binary or ASCII (American Standard
Code for Information Interchange) code which refers to a set of codes used to represent alpha
numerical characters in computer data processing. Therefore, a computer programmer needs to
follow the line from one place to another in the file to enter the data in system. This
unstructured vector data are called as cartographic spaghetti. Vector data in the spaghetti data
model may not be usable by GIS. However, most of the systems still use this basic data
structure because of their standard format (e.g., mapping agency’s standard linear format). To
express the spatial relationships more accurately between the features, the concept of topology
has evolved. Topology can explain the spatial relationships of adjacent, connectivity and
containment between spatial features. Topological data are useful for detecting and correcting
digitizing errors e.g., two streams do not connect perfectly at an intersection point. Therefore,
topology is necessary for carrying out some types of spatial analysis such as network and
proximity. There are commonly two data structures used in vector GIS data storage viz.
topological and non-topological structures. Let us now discuss about the two types of data
structure.
arc file. Node refers to the end points of the line segment. The arc has information not
only related to that particular arc but also to its neighbours in geographic space. It
includes the arc number of the next connecting arc and the polygon number i.e. A: the
left polygon (PL) and B: the right polygon (PR). The arc forms areas or polygons, and
the polygon identifier number is the key for constructing a polygon. Some important
vector data structures are such as Topologically Integrated Geographic Encoding and
Referencing (TIGER) and Coverage Data Structure.
i) Topologically Integrated Geographic Encoding and Referencing (TIGER):
It is an early application of topology in preparing geospatial data created by US
Bureau of Census as an improvement to the Geographic Base File/Dual
Independent Map Encoding (GBF/DIME) data structure. This data structure or
format was used in the 2000 census by US Bureau of the Census. In the TIGER
database, points are called 0-cells, lines 1-cells, and areas 2-cells. Each 1-cell
represents a direct line which starts from one point and ending at another point.
The line comprises both sides of the data. Each 2 and 0-cells share of the
information of the 1-cells associated with it. The main advantage of this data
structure is that the user can easily identify an address on either the right side or
the left side of a street or road.
ii) Coverage Data Structure: Coverage data structure was practiced by many GIS
companies like ESRI, in their software packages in 1980s to separate GIS from
CAD (Computer Aided Design). A coverage data structure is a topology based
vector data structure that can be a point, line or polygon coverage. A point is a
simple spatial entity which can be represented with topology. The point
coverage data structure contains feature identification numbers (ID) and pairs of
x, y coordinates, as for example A (2, 4). The starting point of the arc is called
from node (F-Node) and where it ends to node (T-Node). The arc-node list
represents the x, y coordinates of the nodes and the other points (vertices) that
generate each arc. For example, arc C consists of three line segments comprising
F-Node at (7, 2), the T-Node at (2, 6) and vertex at (5, 2). Figure below shows
the relationship between polygons and arcs (polygon/arc list), arcs and their left
and right polygons (left poly/right poly list), and the nodes and vertices (arc-
coordinate list). Polygon ‘a’ is created with arcs A,B,G,H and I. Polygon ‘c’
surrounded by polygon ‘a’ is an isolated polygon and consists of only one arc,
i.e. 8. ‘o’ is the universal polygon which covers outside the map area. Arc A is a
directed line from node 1 to node 2 and has polygon ‘o’ as the polygon on the
left and polygon ‘a’ as right polygon. The common boundary between two
polygons (o and a) is stored in the arc coordinate list once only, and is not
duplicated (Chang, 2010).
Referencing Many
VPF Vector Product Format Military mapping systems
SHP Arc View Shape Arc View
a raster file. It takes no more effort to scan a map of a dense urban area than to scan a sparse
rural one. On the other hand, a vector file requires careful measuring and recording of each
point, so an urban map is much more time-consuming to draw than a rural map. Unlike raster,
the process of making vector maps is not fully automated, arid thus the cost increases with map
complexity.
Raster data can be compressed more easily than vector data because it is often more repetitive
and predictive. Many raster formats, such as TIFF have compression options that drastically
reduce image sizes, depending upon image complexity and variability. Raster data are most
often used for digital representations of aerial photographs, satellite images, scanned paper
maps, and other applications with very detailed images. Raster data are used when costs have to
be reduced or when the map does not require analysis of individual map features or when
‘backdrop’ maps are required.
In contrast, vector data are appropriate for highly precise applications, when file sizes are
important, when individual map features require analysis, and when descriptive information
(attribute) must be stored.
Additional non-spatial (attribute) data can also be stored besides the spatial data represented by
the coordinates of the vector geometry or the position of a raster cell. In the vector data, the
additional data are attributes of the object. For example, a forest polygon may also have an
identifier value and information about tree species. In raster data, the cell value can store
attribute information, which can also be used as an identifier that can relate to records in
another table, but it maintains a complex structure and has several limitations.
Raster and vector maps can also be combined visually. For example, a vector street map could
be overlaid on a raster aerial photograph. The vector map provides discrete information about
individual street segments; the raster image provides a backdrop of the surrounding
environment. Table 13.2 summarizes the advantages and disadvantages of raster and vector.
Table 13.2: Advantages and disadvantages of Raster and Vector
Raster model Vector model
Advantages Advantages
Simple data structure Smaller file size
Easy and efficient overlaying Individual identity for discrete objects like
Compatible with remote sensing imagery line, polygon, etc.
High spatial variability is efficiently Efficient for topological relationship
represented Efficient projection transformation
Efficient to represent continuous data Accurate map output
Easy to edit
Disadvantages Disadvantages
Larger file size Complex data structure
All the objects are series of pixels, no identity Difficult overlay operations
for discrete objects other than points/pixels High spatial variability is inefficiently
Difficult to build topological relationship represented
Inefficient projection transformations Not compatible with remote sensing imagery
Loss of information when using large cells Not appropriate to represent continuous data
Difficult to edit
13.4 SUMMARY
You have learnt the following in this unit:
Real world features such as temples, parks, roads, railways, crop land, and forest land
are represented as point, line/polyline and polygon. Spatial information of features or
objects can be stored in a GIS using vector or raster models. Spatial database of real
world features need to be translated into simplified representations which can be stored
and updated in a system.
Two data models, namely, vector data model which is used to symbolize discrete
features, and the raster data model, which is most often used to represent continuously
varying phenomena currently dominate the commercial GIS software.
Main advantage of vector model is easy access and complex analysis, while raster
model is useful for overlaying and spatial analysis.
The raster data structure represents the information in the form of grid cells or pixels
which stands for picture element. Important raster data structures viz. cell-by-cell
encoding, run length encoding, and quadtree give an idea to store the raster data
information.
The data structures are mainly topological, i.e. TIGER, coverage and non-topological
data structures under vector models.
Database management system organizes the spatial data in a systematic pattern.
13.5 GLOSSARY
1. Data: Data are units of information, often numeric, that are collected through
observation. In a more technical sense, data are a set of values of qualitative or
quantitative variables about one or more persons or objects, while a datum is a single
value of a single variable.
2. Vector: Vector is a data structure, used to store spatial data. Vector data is comprised of
lines or arcs, defined by beginning and end points, which meet at nodes.
A vector based GIS is defined by the vectorial representation of its geographic data.
3. Point: A point feature is a GIS object that stores its geographic representation an X and
Y coordinate pair as one of its properties (or fields) in the row in the database.
Some point features, such as airplane locations need to also include a z-value, or height,
to correctly locate itself in 3D space.
4. Line: A line is one of three features with which most vector data is represented
in GIS maps. The others are point and polygon. Lines are used to represent the shape
and location of geographic objects, such as street centerlines and streams, too narrow to
depict as areas. A line is formed by connecting two data points.
5. Polygon: A polygon feature is a GIS object that stores its geographic representation, a
series of x and y coordinates pairs that enclose an area—as one of its properties (or
fields) in the row in the database.
13.7 REEFERENCES
1. Burrough, P. A. and McDonnell, R. A., (1998), Principles of Geographical Information
Systems, Oxford University Press, New York.
2. Chang, K.-t., (2010), Introduction to Geographic Information Systems, Tata McGraw-
Hill, New Delhi.
3. Lo, C. P. and Yeung, K. W., (2009), Concepts and Techniques of Geographic
Information Systems, PHI Learning Pvt. Ltd, New Delhi.
4. Longley, P. A., Goodchild, M. F., Maguire, D. J., and Rhind, D. W., (2005),
Geographic Information Systems and Science, John Wiley and Sons, West Sussex.
5. Rolf, A. D. B., (ed.). (2001), Principles of Geographical Information Systems ?An
Introductory Text Book, ITC, The Netherlands.