Gis Desde R

ONLINE COURSE

Introduction to GIS
Manipulating and Mapping
Geospatial Data in R
Feedback, questions, comments or requests?

Email us at [email protected]
This lesson was written by Sean Angiolillo and was last updated on 29 Jan. 2019.
When you hear "geospatial data", what comes to your mind? For many people, it's
ordinary maps, either of physical or human geography. Maps are one important
output of geospatial data, but they can be used for so much more. In fact, geospatial
data is an overlooked and underappreciated aspect of today's "big data" revolution.
Geospatial Data Is Everywhere

Many of today's "big data" innovations and ideas are rooted in geospatial data
sources. Most people always carry at least one device, such as a smartphone, that
tracks their location — the result is geospatial data. Entire industries like the Internet
of Things (IoT), drones, and autonomous cars rely on this sort of reliable, real-time
geospatial data. Similarly, entire concepts like "smart cities" are built around the idea
of using geospatial data.
The growth of geospatial data is fairly recent, but its effect can already be seen in
many of the products we use on a daily basis. Google search results are more local,
Ubers arrive faster, and we can even track the exact location of our Grubhub or
Zomato order for free. All of these products and services use geospatial data to tailor
the product experience to each user.
Before we dive into working with geospatial data in R, let’s talk about why you should
even be interested in learning about geospatial data by highlighting a quick overview
of both business and public use cases.
1
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
LEARN MORE:
The greater accessibility of geospatial data has helped spur a wide variety of
use cases. One very enthusiastic website has a list of over 1,000 use cases
of geospatial data, including fields like retail, health care, transportation, and
governance. ESRI also maintains a catalogue of interesting case studies
using geospatial data.
How Businesses Use Geospatial Data

If you're familiar with how businesses use data to make better-informed decisions,
it's not a far leap to imagine how geospatial data fits into this picture.
Perhaps the most obvious place to start when discussing how businesses can use
geospatial data is improved transportation and logistics planning. Companies can
use real-time geospatial data to optimize their supply chains and thereby reduce
costs. In addition, geospatial data allows businesses to not just react to weather and
climate patterns in real time, but also make predictions. For instance, analyzing
historical weather trends can help a retailer better anticipate customer demand and
adjust supply accordingly.
In another sense, geospatial data can significantly improve market segmentation

efforts. All businesses care about refining their sales and marketing to more
effectively reach customers. For example, many businesses rely on predictive
models that try to identify likely customers based on factors such as purchase history
or demographic information. The ability to add location to these models can
significantly increase returns by increasing customer retention, reducing churn, and
finding new customers.
Next, geospatial data can play a key role in a business’ risk analysis efforts. The
idea of modeling risk to mitigate potential exposure is important in many industries,
and geospatial data fits well into such models. For example, it can be used to more
accurately assess which properties may be at greater risk due to environmental
damage and extreme weather events, such as flooding.
In a similar vein to risk analysis is fraud detection and prevention. Fraud detection,
particularly in industries like credit cards, is one area where machine learning models
made an early contribution for their ability to sift through huge mounds of data and
2
flag irregular or suspicious transactions. Incorporating geospatial data into these

kinds of models can help further identify the signal in the noise.
One example of geospatial data is this road data for Pune, a city in India, color-coded to show which
roads are well-lit at night. (Source: Atlan.)
Lastly, geospatial data can play a role in almost any kind of optimization exercise.
Take the example of identifying new locations for a business. Whether determining
where to open the next branch of a chain restaurant or Amazon HQ2, it would be
foolish to ignore geospatial data. It allows companies to spatially visualize
simultaneous layers of analysis, such as target market size, number of competitors,
public amenities and infrastructure, and environmental risk factors.
LEARN MORE:
Many people have explained how businesses are incorporating geospatial

data into analytics solutions, but Fern Halper’s TDWI bulletin on use cases of
geospatial analytics is an especially useful place to begin.
How the Public Sector Uses Geospatial Data

Many of the business use cases have a similar corollary in the public or nonprofit
domain.
3
While a business may use geospatial data for logistics planning, a public agency can
use geospatial data for better urban and rural planning to improve land use,
environmental protection, pollution levels, and liveability. If a business can optimize
its fleet of delivery trucks, a public agency can optimize emergency response teams
and evacuation routes after a natural disaster. Even basic public functions like trash
collection or the rates charged for parking spaces can be optimized based on some
level of geospatial data.
Businesses may use geospatial data for market segmentation, but public agencies
can use the same principles to better target constituents. For example, public
agencies can work with geospatial data to better identify at-risk populations for
mosquito-borne diseases based on proximity to areas with poor water drainage.
Public health missions can then more accurately target their outreach to these
populations. Alternatively, much like a business targeting customers, political parties
have used geospatial data to target potential voters.
An example of geospatial data is this Damage Proxy Map (DPM) of areas in Hokkaido, Japan, that
were likely damaged by the M6.6 earthquake on September 5, 2018. (Source: NASA's Jet Propulsion
Laboratory.)
Like a business, a public agency may want to assess risk to the public and
infrastructure. A new use case, known as predictive policing, is under trial in certain
parts of the world. The idea is for police agencies to assess the risk of crime in a
given area through geospatial and temporal analysis of past crimes, and thereby
4
adjust how many police are on call and where they are stationed. While this raises a
number of important ethical concerns, it highlights the far-reaching implications of
geospatial data.
A business might use geospatial data in its fraud detection efforts, but a public
agency might use geospatial data to track illegal construction. The Delhi
Government in fact had a project to achieve just this by creating what is called the
Delhi State Spatial Data Infrastructure (DSSDI).
If businesses can use geospatial data to find strategic locations, public agencies can
do the same to determine the location of important public infrastructure like new
roads, hospitals, schools and voting booths. We even used geospatial data in our
effort to locate where to position 10,000 new LPG (liquefied petroleum gas)
distribution centers across India.
One last use case in the public sector is the idea of investing in geospatial data as
a public good. For example, one important form of geospatial data is weather data.
More accurate, localized, and accessible weather reports can have a large impact on
the agricultural sector. By publishing digitized and spatially-mapped land records,
governments can mitigate land acquisition disputes. Geospatial data on
environmental metrics like air quality or carbon emissions is another example of a
public good. In the case of India, major government initiatives like Digital India and
Smart Cities rely on these geospatial technologies.
LEARN MORE:
One useful resource on how GIS technology can improve city services is this
McKinsey brief. Or for analysis focused on the Indian context, this FICCI
report highlights 40 cases on geospatial technologies in different sectors.
Final Thoughts
If this brief overview of geospatial data use cases has piqued your interest, be sure
to keep reading. Starting in the next lesson, you'll quickly get your hands dirty
working directly with geospatial data in R.
5
The previous lesson looked at the many growing use cases for geospatial data. Now
we'll get started manipulating geospatial data in R using the sfpackage. You'll learn
how to import spatial data, combine attribute data into existing geospatial objects,
calculate area, and simplify spatial dataframes before plotting them (which is the
subject of the next lesson).
Using R as a GIS
Until recently, serious work with geospatial data required an often-proprietary
desktop GIS (Geographic Information System), such as ArcGIS. Now, however, GIS
capabilities in R have greatly advanced.
In many ways, the benefits of using R over a desktop GIS are similar to the benefits
of using R over Excel for data analysis.
● R is free and open source, which makes it easier and cheaper to get started,
compared to the expensive licenses needed for many desktop GIS. This has
helped spread geospatial data analysis beyond the domain of only GIS
specialists, thereby opening opportunity to a wider ecosystem of contributors.
● As a full-fledged programming language, the command line interface of R has

greater flexibility than the point-and-click interface of a desktop GIS. This
means there's no restriction on what's ultimately achievable.
● R is reproducible in a way that point-and-click interfaces are not. This is, of

course, key to the scientific process.
6
● The greater shareability of R scripts and packages encourages a faster

development cycle and a more collaborative workflow.
Many of these advantages apply to any open-source programming language, such

as Python, which shares an API with many desktop GIS. However, R, designed as
an environment for statistical computing, is particularly well-suited for spatial
statistics and offers unmatched access to a huge ecosystem of statistical libraries.
R’s data visualization libraries in particular are a key advantage when it comes to
mapping geospatial data.
LEARN MORE:
See "Why Geocomputation with R?" for a greater discussion of the merits of
R for GIS.
Getting Started with R as a GIS

Before working with a new domain area in R, it's useful to check its associated
CRAN Task View for an overview of relevant packages. The CRAN Task View for
Analysis of Spatial Data lists dozens of packages relating to geospatial data. Until
recently, the first package mentioned was s p
, but now it first shows the s
fpackage
and notes that maintenance of s pwill continue.
What package should you use for geospatial data? The answer could differ based on
your objectives, but the sfpackage is likely the best place to get started. The sf
(Simple Features) package provides a class system for geographic vector data. It is
the successor to the s ppackage and is quickly being adopted by many other
packages for geospatial data.
Let's start exploring some of the package's features with simple state-level Indian
population and economic data.
LEARN MORE:
Never used the sfpackage? Get started with the recently completed open
source book under development, Geocomputation with R by Robin Lovelace,
7
Jakub Nowosad and Jannes Muenchow. In particular, Chapter 2 gives a

great introduction to what simple features are and the structure of sfobjects.
Spatial data enthusiasts are also excited about the announcement that sf
package authors Edzer Pebesma and Roger Bivand are currently working on
an open source book of their own, Spatial Data Science. Drafts of the first
eight chapters of what is sure to be a key resource for the field are now
available.
Creating sf Objects
For tidyverse users, one of the most exciting aspects of the sfpackage is the ability
to work with geospatial data in a tidy workflow. Unlike its predecessor package sp ,
with the s
fpackage, geospatial and attribute data can be stored together in a
spatial dataframe, where the object’s geometry occupies a special list-column. In
addition to being faster, this lets you manipulate an s
fobject via m
agrittrpipes like
an ordinary dataframe, or at least one with a few special characteristics.
It's certainly possible to create your own sfobjects with functions from the package
like st_point() , st_linestring() , and st_polygon() . But in most cases we only
have to read in existing spatial data. That is generally done with the st_read()
function.
The shapefiles used in this demonstration can be found in this blog’s associated
GitHub repository.
8
NOTE:
All sffunctions begin with st_*to help users identify them.
In general, you can find administrative boundary data from GADM. It maintains
open-source, current administrative boundary data for most countries. It's possible to
download spatial data directly from the GADM website, but using the GADMTools
package helps ensure your workflow is reproducible. Specifying l
evel = 1returns
state-level boundaries.
LEARN MORE:
Robin Wilson’s website provides a great list of free GIS data sources,
covering both physical and human geography.
The code below will download shapefiles for Indian states. However, since Kashmir
isn't included in India’s borders, we'll use the shapefiles in the repository.
9
Inspecting Objects
Before working with this sfobject, let's briefly compare the sfand sppackages.
First, we'll convert the sfobject to a SpatialPolygonsDataFrame

, an S4 class
defined by the sppackage. We can do this with sf::as() .
Now we can briefly inspect the structure of a SpatialPolygonsDataFrame

.
We notice it has 5 "slots" (each prefaced by the @symbol). The first slot should look
familiar. The data slot holds a dataframe with 36 observations of 8 variables. We can
extract any of these slots using the @
symbol like we'd normally do with the $ symbol.
10
The following slots hold polygons , plotOrder , bboxand proj4string. We'll return
to these in the context of sfobjects. For now, just note that the data and aspects of
the object’s geometry are held in separate slots. This is not compatible with the
tidyverse-style workflow to which many of us have grown accustomed.
While we could continue working with this format, let’s convert it back to an sfobject
with the st_as_sf()function and inspect the difference.
The former SpatialPolygonsDataFrame , a class defined by the sppackage, now

has two simultaneous classes: sfand data.frame . Printing the first few
observations tells us a lot about the object.
11
● It has 36 features (depending on how many we print) and 8 fields (our

attributes).
● The geometry typeis a multipolygon because the geometries represent the

shapes of various areas. Other common geometry types include points, lines,
and their “multi-” counterparts.
● bboxgives the object’s bounding box dimensions.
● epsgand proj4stringdescribe the coordinate reference system (CRS).

Note that this is a geographic CRS (measured in longitude and latitude) as
opposed to a projected CRS.
LEARN MORE:
For more information on coordinate reference systems, see Section 2.4 of

Geocomputation with R.
12
We could directly access information about the object’s spatial features with
functions like st_geometry_type() , st_dimension() , st_bbox()and st_crs().
Moreover, familiar functions like g
limpse()or V iew()that we'd use to explore a
dataframe also work on s fobjects.
If you further inspect this object, you can see that it has a few attribute columns
giving an abbreviation and bounding box for each state. Most importantly, the last
column holds each state’s geometry in a list-column.
Manipulating sf Objects
Because spatial dataframes in the sfpackage are dataframes, we can manipulate
them using our normal data manipulation tools, such as dplyr
. Here we'll just select
and rename the columns we want.
13
NOTE:
This doesn't explicitly select the geometry column, but the geometry in sf
objects is sticky. It remains in the object unless explicitly dropped with
ind_sf%>% st_set_geometry(NULL) .
For those already familiar with the tidyverse, using normal dplyrverbs to manipulate
sfobjects is one of the great benefits of using the sfpackage. If we were working
with the slots of the earlier S
patialPolygonsDataFrame , this wouldn't be possible.
Moreover, note that these manipulations haven't affected the class of our object in
any way.
Preparing Attribute Data

Now that we have a spatial dataframe, we need to prepare the associated attribute
data for each state or union territory.
Since we're focusing on the process rather than the data itself, we'll use population,
economic, and region data from Wikipedia. If you are unfamiliar with data import
using the g
ooglesheetspackage, web scraping with r vest
, and wrangling with
14
dplyr
, please refer to the prepare_data.Rscript in this GitHub repository to see
how the attributes.rdsdata set was assembled.
LEARN MORE:
Unfamiliar with these packages? Check out the googlesheetspackage

vignette and the Data Transformation chapter of Hadley Wickham’s R for
Data Science.
Once we've prepared an attributes dataframe, we can join it to the spatial dataframe
as with any two dataframes, as well as mutate two new variables.
LEARN MORE:
See the Relational Data chapter of R for Data Science if left_join()is

unfamiliar.
If we inspect this object once more, we can see that it has all of the expected
attribute columns, and the last column holds each state’s geometry in a list. The
same spatial attributes regarding the object’s bounding box and CRS remain as well.
15
Calculating Area
Our attribute data already has a column for area, pulled from Wikipedia. However, if
this wasn't the case or we didn't trust the data, we could calculate the area of each
observation in our spatial dataframe using the s t_area()function.
It's simple enough to do this, but we need to be careful with the units. In this case,
we need to convert from square meters to square kilometers.
16
LEARN MORE:
See this guide on units of measurement for R vectors for more information
on unit conversion in R.
In the output below, note the difference between the simple numeric class of
area_km2and the class of "units" for the area calculation resulting from s
t_area()
.
Moreover, we can see that the two figures are close but not exactly the same.
17
Simplifying Geometry
Before plotting sfobjects (the subject of the next lesson), we should simplify the
polygons in the spatial dataframe.
For simple maps, there's no need to have the fine level of detail that comes with the
GADM data or many other sources of geospatial data. Simplification can vastly
reduce memory requirements while sacrificing very little in terms of visual output.
Fortunately, there's an easy process to reduce the number of vertices in a polygon
while retaining the same visible shape.
One option is sf::st_simplify(), but here we'll use the ms_simplify()function

from the rmapshaperpackage. Below we keep only 1% of the object’s vertices while
maintaining the same number of shapes.
NOTE:
Another useful function is sf::st_geometry() . When passing it an sf

object, it will return just the geometry. This allows us to create a quick plot of
only the geometry to check if everything looks right.
We also stripped the units class for the area we calculated because it
created a problem for ms_simplify() . We can always add it after
simplification.
18
The original map is above on the left, and the simplified version is on the right. The
simplified version looks no different despite having only 1% of the vertices. In fact, it
looks even better because the border lines are cleaner.
Moreover, simplification reduced the geometry size from 9.56 MB to just 150 KB.
19
Finally, let’s save the simplified spatial dataframe for the next lesson.
LEARN MORE:
For more information on simplification, see Section 5.2.1 of Geocomputation

with R.
Final Thoughts
After briefly introducing the context of using R as a GIS, this lesson showed how the
sfpackage creates a class structure for storing geospatial and attribute data
together in an object that fits into a tidyverse workflow. We can clearly see the
benefits of this structure when it comes to manipulating a spatial dataframe with our
familiar d
plyrverbs.
Now that you know how to manipulate geospatial data, the natural next step is
visualization or mapping. Here again, we'll see the benefit of a tidy workflow, now
that g
gplot2’s g
eom_sf()is available to us. Though, as you'll see in the next
lesson, g gplot2is just one of many excellent package options when it comes to
visualizing geospatial data in R.
20
In the previous lesson, we briefly explored the structure of spatial dataframes as

defined by the sfpackage. The next step is visualization, or more specifically in the
case of geospatial data, mapping. As with any kind of data, visualization is an
important step before diving into any kind of statistical analysis.
This lesson introduces how to use some of the most well-known R packages to
create static maps, such as tmapand ggplot2 . We’ll also explore a few other
packages like c artogram ,g
eogridand g eofacetfor some more unique spatial
visualizations. (We'll tackle creating animated and interactive maps in the next
lesson.)
Resources on Visualizing Geospatial Data

Before diving into different R packages for mapping, let's review a few excellent
resources that will help you get started.
Below are two excellent open source resources on the principles of data
visualization. Both include chapters on geospatial data visualization.
● Data Visualization: A practical introduction by Kieran Healy. (This even

includes a dedicated chapter on maps.)
● Fundamentals of Data Visualization by Claus O. Wilke
21
Here are two resources focused more narrowly on the mechanics of mapping
specifically in R, rather than larger principles of good design:
● "Making Maps with R" chapter of the previously-mentioned Geocomputation

with R
● Bhaskar V. Karambelkar’s tutorial at useR 2017 on "Geospatial Data

Visualization in R"
Choosing the Right Visualization

Recent advances in software have made many different types of geospatial data
visualizations — such as choropleths, dot density maps and cartograms — easily
available. However, the correct visualization often begins with the type of data you
have.
Before choosing a visualization, we should be sure about the nature of our data. Is
the data numeric? And if so, is it a raw count, such as population, or is it
standardized, such as population density? If the data isn't numeric, is it nominal (or
categorical), such as linguistic or religion data, or ordinal, such as satisfaction
rankings?
One point Healy makes clear in his book is that it's important to consider whether or
not a truly geospatial visualization is the best choice for your data. In our case, and
in many cases concerning choropleths, the data is only partly geospatial — it really
represents counts of some value in an arbitrary unit.
The spatial object we created in the previous lesson has attributes like population,
GDP, and sex ratio. It is certainly possible to visualize this data through barplots,
ignoring the data’s geospatial qualities. Alternatively, instead of ignoring the
geospatial elements, we could show some of this information through a proxy — for
example, mapping different colors to a variable like region.
In fact, focusing on geospatial elements can sometimes misrepresent the data

because of vastly unequal areas between different regions and the populations they
hold — as is the case for Indian states. If we were working with district-level (as
opposed to state-level) data, then a choropleth might be necessary because we can't
show hundreds of bars on a single barplot.
Nevertheless, because of its simplicity, we'll use state-level data to test out different
approaches for geospatial visualization. With this goal in mind, hopefully none of the
22
visualizations below are "bad", but whether or not they are the "best" visualization for
this data would depend on the specific objectives at hand.
Static Maps
Although a number of R packages have made it easy to make attractive interactive
and animated maps, they haven't removed the need for effective static maps. This
section looks at how to create static maps in base R, t
map
, and g
gplot2
.
NOTE:
While tmapand ggplot2are two of the most popular packages for creating
maps, they aren't the only options. The cartography package is another
interesting tool, particularly for certain kinds of maps, such as choropleths
contained in proportional symbols. See the package vignette and cheat
sheet to get started.
Base Plotting
As demonstrated in the previous lesson's plots of geometry, the sfpackage provides
ap lot()method for visualizing geographic data. Plotting the object itself will
produce a grid of faceted plots, one for each attribute. Choosing a variable produces
a single map.
This plot demonstrates how quick and easy plotting base maps can be, but there are
reasons why this default choropleth may not be an effective visualization. Hopefully,
by the end of this lesson, it will be clear why and what other types of visualization
may be more effective in this case.
23
Thematic Maps (tmap)

The tmappackage from Martijn Tennekes has been a standard-bearer for mapping
in R for some time. As the name "thematic" suggests, it's especially well-suited for
choropleths, but it can produce a wide range of geospatial visualizations.
It brings a ggplot2
-style syntax tailored to geospatial data. Like ggplot2, it
emphasizes sequentially adding layers to a plot. You can pass a spatial dataframe to
the t m_shape()function much like you'd pass a dataframe to the g gplot()function.
Moreover, because spatial dataframes in the sfpackage are also dataframes, you
can filter out any particular features (like "Andaman & Nicobar Islands" below) and
directly proceed with piping the object into a t m_shape()chain.
LEARN MORE:
Guides to tweaking other aspects of the map can be found in the

documentation, which includes a number of tutorials and vignettes.
24
After filtering out union territories, the choropleth below maps India’s GDP density, a
measure of economic activity by area. Measured here in units of nominal GDP per
square kilometer, GDP density has no clear midpoint, and so it requires a sequential
color scale as opposed to a diverging or categorical color scale.
25
LEARN MORE:
There is a great deal of theory and advice about using color in data
visualization, including maps. Wilke’s book in particular has excellent
chapters on color scales and color pitfalls.
Arranging tmap Objects
tmapalso has a helpful function, tmap_arrange()

, for lining up multiple tmap
objects next to each other.
For example, if we filter out the small union territories to get a fairer distribution, we
can separately create tmapobjects of population growth and density. Then we can
arrange them next to each other for comparison.
Like GDP density, values like population growth and population density are
standardized data, and so they are well-suited for a choropleth.
26
LEARN MORE:
Axis Maps’ cartography guide directly addresses the question of

standardizing data with respect to geospatial data.
Inset Maps
tmapis also particularly useful for creating inset maps, those that include a small
window providing the wider geographic context for the main map.
The first step is creating a base or primary map. We have done this for sex ratio in
Northeast India.
27
NOTE:
One trick from tmap_tricks()is reversing the color scale by placing a -in
front of the palette name. This makes sense because our concern should
increase as sex ratio decreases.
Next, we created the smaller inset map, which will provide the wider geographic
context. For the small map, we wanted to highlight the Northeast region on the larger
map of India.
In order to do this, we first grouped the features by region. Using the same dplyr
syntax, we can reduce the 36 features to 8 regions.
These 8 regional features have a geometry reflecting the "sum" of their individual
sub-components. It is quite interesting that this kind of geometric operation can be
done so easily by using s
t_unify()behind the scenes.
28
Once we have a base map and an inset map, we can combine them with the
following syntax, using some trial and error to get the placement right.
Faceted Maps
tmapalso supports the creation of faceted maps, or small multiples. They can be
useful for attributes with a fairly small number of levels. For instance, if we have
population data for a few years, we could show a progression over time. In this case,
region is a useful variable for faceting. Splitting up a map by region can sometimes
highlight contrasts better than looking at one image of the entire area.
29
The free.coordsargument controls whether to show only the faceted map area or
instead highlight the facet’s place in the original map.
There is no inherent order to regions, but it's useful to impose one. Below we've
ordered the facets in a roughly counter-clockwise order starting from "Northern". To
do this, it helps to first make r
egionan ordered factor.
It's also important to pay attention to the nature of the distribution before making any
plot. In India, per capita GDP is highly skewed because of outliers like Goa and
Delhi. If you map this data on a linear scale, most states may end up the same color.
This will conceal important differences in the bulk of the data.
If you have highly skewed data, it may be helpful to perform a logarithmic

transformation. This should add greater color differentiation in the map, though the
legend may require more careful interpretation because the color bins aren't of equal
width.
LEARN MORE:
For more information about statistical transformations in the context of data

visualization, see Section 8.2 of Wilke's book.
30
In the example above, a log10()transformation has hopefully achieved a greater

level of differentiation. You can, for instance, differentiate between the small bright
yellow dots of Goa and Delhi, the blue of Uttar Pradesh and Jharkhand, the
turquoise of Central India and Rajasthan, and the green of South India.
31
NOTE:
The free.coords()argument of tm_facets()is set to TRUE . If it was

instead set to FALSE
, the entire map of India would appear in each facet with
the given region highlighted.
Proportional Symbols Maps
So far, all of our maps have been choropleths. This was convenient because our
data was always standardized in some way — a density, percentage or ratio for
example. Choropleths, however, are poorly suited to raw count data. When dealing
with count data, such as population, a proportional symbols map can be more
effective.
Luckily, tmapis also well-suited to these types of visualizations. Here, a symbol

(typically a circle) is drawn in proportion to the depicted variable on top of the original
geography. We can use this kind of map to visualize both population and nominal
GDP data.
32
The proportional symbols map retains the original geography, only obscured by the
top layer of symbols. The symbols retain the correct spatial arrangement and are
easy to interpret in relationship to each other. Still, judging the area of circles is more
difficult than compared to a non-spatial representation, such as a barplot.
geom_sf in ggplot2
Hopefully these examples have demonstrated that tmapis a robust mapping tool. At
the same time, the addition of geom_sfhas made ggplot2another attractive option.
ggplot2requires tidy data. Since spatial dataframes defined in the sfpackage are
dataframes, it makes sense that we could expect to use ggplot2to visualize sf
33
objects. Recently, ggplot2added support for sfobjects with geom_sf() . The key
advantage of geom_sf()is that tidyverse users are already familiar with ggplot2
and its wider ecosystem of add-on packages.
However, as this is a recent addition, you might expect a few bugs. For example, in
the above faceted tmapobject, setting free.coords = FALSEallowed for the entire
object to be plotted in each facet. At this time, faceting an s
fgeom doesn't seem to
allow setting s
cales = “free”to allow a similar outcome.
Nevertheless, there are many benefits and cases where we can visualize sfobjects
with g
gplot2 . For instance, ggplot2users will be familiar with the process of
mapping data from dataframes to aesthetics, and layering additional dataframes on
top of a plot. That same workflow holds for plotting s fobjects.
In the plot below, we want to add only the state name "Kerala" to the map. We could
have done it with the annotate()function, but instead we created an sfobject (also
a dataframe) holding only the feature we wanted to annotate (Kerala).
In order to do this successfully, however, we first need to find the geographic center
of Kerala to know the point from which to draw the label. Geometric operations like
calculating centroids, buffers and distance require a projected CRS as opposed to a
geographic CRS, and so we've done so below using s t_transform() .
With a geographic CRS, st_centroid()does produce a result, but it produces a

warning that "st_centroiddoesn't give correct centroids for longitude/latitude data"
because it assumes attributes are constant over geometries. The distance between
longitudes, however, changes based on its given latitude. (Think of the distance
between longitudes at the equator vs. at the North Pole.)
The question then becomes choosing an appropriate projected CRS. Viewing

crs_data = rgdal::make_EPSG()shows thousands of options. We also searched
for "India" at EPSG.io. Ultimately we chose a CRS with EPSG code 24343, which
notes "# Kalianpur 1975 / UTM zone 43N" since UTM zone 43N covers Kerala. (You
might also find Projection Wizard useful.)
Using this CRS, we were able to use st_transform()to project both sfobjects
onto the same projected CRS. Once that was done, we could add the Kerala label
using g
eom_text_repel()like we'd normally do in g
gplot2
.
34
35
Dot Density Maps
Proportional symbols maps are not the only option for raw count data. A dot density
map can be an effective tool to spatially visualize count data, particularly when your
goal is to find clusters and regional patterns instead of exact data values.
Below we've created a dot density plot comparing rural and urban populations. To do
this, first, we depart from a tidy data format and gather()urban and rural population
data. Then we use the s t_sample()function to draw sample points based on the
respective urban and rural population data for each observation.
This type of visualization would be much more effective if we had data at smaller
levels of administration, such as districts. Instead, because we are sampling at the
state level, our dots will be placed in locations counter to the actual population
density. For example, Maharashtra’s urban population will be randomly spread
throughout the state instead of clustering in metros like Mumbai. Nevertheless, it's
still useful to see how such a map can be created.
36
37
While we can't see population clusters around metros, we can still see relatively
sparsely populated areas (Jammu & Kashmir and the Northeast), the extremely
dense rural belt of Uttar Pradesh and Bihar, and the relatively more urban South
India.
Moreover, as we'll see in the next lesson, this is one example of a visualization
where adding interactivity — specifically, the ability to separately plot urban and rural
data — can be a real benefit.
LEARN MORE:
For more details on generating dot density plots in R, see these excellent
blogs from Tarak and Paul Campbell.
Partially Spatial Static Representations

At the beginning of this lesson, we discussed how a choropleth can misrepresent
data if there are large differences between the area of an observational unit (e.g. a
38
state) and its population. When data is only partially spatial, a map may not always
the best visualization, depending on your objectives.
If you don't need a fully spatial representation of your data, there are other
visualization options that communicate some spatial aspects of the data, but diverge
in some aspect or another. Examples include cartograms, hexbin maps and
geofaceted plots.
Cartograms
In a cartogram, we maintain the overall geospatial nature of an object, but distort
the area of each observational unit so that each unit is scaled proportional to some
chosen variable.
Nominal GDP is a useful variable to demonstrate this relationship. For example, in

relation to its very small geographic area, Delhi’s contribution to India’s GDP is very
high. A traditional choropleth (shown on the left) fails to make this distinction. Using a
cartogram (shown on the right), we can distort a state’s geographic area to match its
contribution to India’s GDP.
39
While still maintaining the overall geographic structure of India, we can vividly see
which states shrink or expand when distorting geographic area by the size of
nominal GDP. (For the same reasons discussed above, we need to use a projected
CRS rather than a geographic one.)
Above we created a continuous cartogram with the cartogram_cont()function.

However, as shown below, we could have also just as easily chosen a
non-continuous area cartogram using c artogram_ncont() , which would introduce
separation between states, or a Dorling cartogram using c
artogram_dorling() ,
which would represent each state as a circle.
40
41
Given that cartograms are already distorting the actual geographic shapes, it's often
not necessary to keep one continuous unit. In such cases, a non-continuous
cartogram may be preferable. Alternatively, we can go even more abstract and
replace all geographic shapes with a simple circle, scaled to the parameter of
interest.
NOTE:
The Dorling cartogram is essentially the proportional symbols map without

the underlying map.
Hexbin Maps
Similar to a Dorling cartogram, a hexbin map also replaces exact spatial boundaries
with a rough spatial arrangement. However, instead of mapping the variable of
interest to size, it's mapped to color.
Hexbin grids for the United States and a few other countries are well established, but
geogridis a new package under development that tries to generate automatic
hexbin grids given any set of geospatial polygons. Although the package lets you
generate a number of possible grids and select the best option, we had trouble
generating a map that adequately placed certain states, the Northeast and
non-contiguous territories in particular.
Nevertheless, the hexbin map below gives a sense of why reducing geospatial
polygons to a hexagon can be more useful, in certain cases, than the original
geometry.
42
The downside of this visualization is that it gives equal area to all states. Tiny union
territories are represented by the same area as Uttar Pradesh. However, if we
understand that context in advance, this can be a useful visualization if we only want
a distribution of per capita GDP across states, regardless of population or area.
43
NOTE:
The package’s README documents the process for generating geogrids

using an spclass, but it's even simpler for an sfobject (as shown above).
Geofaceted Plots
Similar to a hexbin map, a geofaceted plot sacrifices exact spatial characteristics in
favor of a loose spatial arrangement. It strongly prioritizes accurate presentation of
the attribute data at obvious cost to the geospatial representation.
The geofacetpackage makes it easy to design a custom grid and use it to facet
data across the grid.
44
Although it doesn't look much like an Indian state map anymore, this visualization
does vividly communicate the vast differences in urban and rural populations across
states. It also highlights how the choice of visualization affects our interpretation.
NOTE:
The dot density map created above comes from the exact same data.
45
Final Thoughts
With the help of packages tmap , ggplot2
, cartogram
, geogridand geofacet, this
lesson has introduced some of the most common methods for creating various kinds
of static geospatial visualizations in R, such as choropleths, dot density maps and
cartograms.
Interested in going beyond static maps and exploring the world of animated or
interactive maps? Check out the next lesson in this course.
46
Now that we've completed an overview of static mapping in R in the previous lesson,
let’s explore how to create animated and interactive maps. We'll then conclude by
creating a Shiny app to show the potential of visualizing geospatial data through
interactive web applications in R.
Adding animation or interaction to a static map creates opportunities for stories and
experiences that are not otherwise possible in a static world. Despite this power, it’s
important to introduce these elements in the right circumstances. Animated and
interactive maps both demand a higher level of attention from the user; without this,
the visualization won't be as clear or meaningful.
Animated Maps
Animated maps are particularly well-suited for spatio-temporal data as they can
show change of a variable over time, but they certainly have other uses as well.
This lesson introduces two methods for making animated maps: animated tmapsand
the g
ganimatepackage. We’ll also need the packages and objects from the
previous lesson.
47
Animation with tmap

Perhaps the simplest way to create basic animated maps is through a slight
modification to a faceted tmapplot. Simply change the tm_facets()argument byto
along. (ImageMagick is required.) Then, providing the generated output to the
tmap_animation()function loops each faceted plot into a gif or mpeg animation.
You can specify further details like the delay of each frame and the output
dimensions.
We can see an example of how this works by turning the earlier faceted plot of per
capita GDP by region into an animated gif. Instead of showing all facets at once in a
grid, we've looped each image into a gif, displaying them one at a time.
48
Animation with gganimate

For more robust animations (not only geospatial), look to the recently rewritten
gganimatepackage from Thomas Lin Pedersen. The package allows not just for
looping of facets but also a huge array of possibilities. This leads to some really
creative plots, as found in the package’s wiki.
LEARN MORE:
Check out Pedersen's theory on establishing a grammar of animation in a

useR keynote here.
Let's mimic an example that Pederson used in his keynote. After binding together
transformed data sets into one object, we can achieve the animation below with just
one additional line to our plot: t
ransition_states(), which mimics
49
ggplot2::facet_wrap()by splitting the data into multiple panels, tweening

between defined states and pausing at each state for a specified period.
Instead of visualizing per capita GDP as the previous animation did, let’s visualize
nominal GDP. Each frame may not be the most effective visualization, but animating
them together is certainly attention-grabbing.
50
Interactive Maps
Interactive maps have perhaps even greater storytelling potential than animations as
they allow the user, in some respects, to create their own narrative. Certain features,
such as panning and zooming, allow a level of freedom just not possible with a static
or even animated map.
Nevertheless, it's important to be careful to only use them when necessary. An

interactive map requires a higher level of attention from the user than a static or an
animated map. The users must interact with the visualization! If they don't (or not in
the way intended), then what the map’s designer was trying to communicate is lost.
That's why it's important to think about what you're trying to achieve and why a static
(or even animated) map isn't sufficient.
If you need interaction, there are many levels of interaction to consider. In the case
of choropleths, often a simple tooltip with the exact value the color represents can
add value. But, of course, interaction can accomplish much more. For example, we
can change base maps to plot different kinds of geography. We can let users select
51
and filter their own data to be plotted. Users can also modify other aspects of a plot,
such as the color scheme or statistical transformations. Interactive maps also give
users opportunities for brushing and linking. Based on user input or selection, we
can design a reaction in another view. These are just a few of the possibilities, and
so it's important to think carefully about what you actually need.
This section introduces the ggiraphpackage, tmap

’s view mode, mapview
,
leaflet, and p lotly
.
Interactivity with ggiraph

Let’s start simple. In some cases, we may only want to add a minimal amount of
interactivity (such as a hover or tooltip effect). In that case, we could turn to the
plotlyor g giraphpackages.
The plotlypackage is a popular choice for adding interactivity to ggplot2plots.

The ggplotly()function handles geospatial data in the same way as non-spatial
data. You might, however, find this strategy to be a bit of a hassle, often introducing
slight distortions, when you're only after a simple tooltip on hover. However, its more
advanced features (discussed later) are quite useful.
For simple interactivity, you might try the ggiraphpackage from David Gohel. The
basic idea is to pass ggplot()an interactive geom in place of a traditional geom. In
the case of maps, this means using g eom_sf_interactive()instead of geom_sf().
After specifying additional arguments like t ooltip
,o nclickand d
ata_id
, simply
call g
giraphon the saved g gobject.
The code below adds a simple tooltip that includes state name and sex ratio when
hovering over the previous sex ratio map.
52
Click here to view the interactive map in a separate window.
53
Interactive tmaps
One of the strongest reasons to add interactivity to a static map is the ability to layer
geospatial data (such as points or polygons) on top of base maps that depict
physical or human geography. Perhaps the easiest way to achieve this advantage is
by plotting t
mapobjects in t
map
’s "view" mode.
By default, tmap_mode()is set to "plot", but changing this argument to "view" can
make any tmapobject an interactive plot. This option builds on top of Leaflet, which
we'll cover below.
Alternatively, it's possible to convert tmapobjects to Leaflet objects via the

tmap_leaflet()function. This interactivity also applies to t map_arrangeobjects,
which is nice for having side-by-side interactive maps.
However, interactive tmapobjects, particularly arranged tmapobjects, seem to be

difficult to modify after creation. For instance, the tooltip took the first column by
default, so we edited the first column to include the tooltip we wanted. Legends also
seemed difficult to move from the top right for some reason. Because of small issues
like this, you might be better off directly building in Leaflet for more polished projects.
54
Interactivity with mapview

Another package for interactive viewing of spatial data is mapview . It too is built on
top of Leaflet. Its primary role is to create quick interactive geospatial visualizations,
rather than presentation-quality visualizations — but its functionality may be growing.
If we wanted to quickly compare a number of choropleths for different variables, it
could be a good option.
55
NOTE:
Select one data layer at a time to view the correct choropleth.
Interactivity with Leaflet

Both mapviewand the interactive mode of tmaprely on Leaflet under the hood.
Eventually, you'll likely want to build directly in Leaflet.
Leaflet is a Javascript library for interactive maps. An R wrapper package of the

same name from RStudio has made it very easy to create Leaflet maps in R. The
leafletpackage has a rich array of features including fully interactive panning and
zooming, maps built from customizable tiles, and plotted layers and groups of
56
markers, polygons and popups. It is also very easy to embed Leaflet map widgets
into RMarkdown documents, web pages or Shiny apps.
One of the key advantages of Leaflet is the ability to draw on a huge range of map
tiles (which might include roads or natural features) that can lie underneath your data
set. Moreover, if you want to represent layers of spatial data, for instance as points
on top of polygons, then Leaflet is there for you.
The ability to interact with layers of spatial data comes in handy for enhancing our
earlier dot density map of rural and urban population. To address the overplotting of
dots, it can be useful to toggle between urban and rural populations, as shown
below.
In order to achieve this effect in Leaflet, we first need to use the st_cast()function
to convert our earlier multi-points into individual points because Leaflet doesn't
support multi-point objects at this time. We also need to transform the projected CRS
to a geographic CRS (longitude and latitude coordinates) for plotting in Leaflet. Then
we establish each respective layer as a "group". Remember, here one dot equals
one lakh (100,000) people.
57
58
Interactivity with plotly

Like Leaflet, plotlyis an R wrapper package for a JavaScript visualization API — in
this case, it's the plot.lyJavaScript library. plotlyis another robust and
well-documented option for adding interactivity to any kind of plot.
LEARN MORE:
In addition to the official documentation, there are a number of excellent

resources for plotting geospatial data with plotly .
● There is now a cookbook-style handbook, which includes a dedicated

section for maps.
● This blog post explores improving ggplotly()conversions for maps.
● This blog post covers visualizing geospatial data with sfand plotly
.
Using plotlysyntax, we can easily construct interactive choropleths similar to the

one we created with ggiraph
. Here is one for population density.
59
Beyond simple choropleths, plotlyreally shines at more complex levels of

interaction, implementing effects more commonly reserved for Shiny apps. Like
Leaflet, p
lotlycan also take advantage of customizable base map tiles via the
plot_mapbox()function.
NOTE:
Using the plot_mapbox()function will require setting up a public access

Mapbox token.
plotlycan also be used for brushing and linking views. Brushing refers to
subsetting data based on some kind of user input like a box selection. This user
selection is then linked to a part of the visualization, which reacts to the selection.
60
For example, we can design visualizations with multiple views that interact with each
other. The crosstalkpackage lets HTML widgets share data and "talk" to each
other. Taking advantage of the c
rosstalkpackage, p lotlyis able to "link views".
The c
rosstalk::bscols()function arranges HTML elements or widgets in
Bootstrap columns. This allows for persistent or generalized selection across
elements.
Below, after creating a SharedDataobject and specifying regionas a key

argument, we can make selections on a map that generate changes in a data table.
This is still a fairly simple effect, but it demonstrates a kind of linking framework only
possible with interactive maps.
Click anywhere on the map to see the table filter by region.
61
We can even aggregate brush selections into a "persistent selection". For instance,
we can examine the distribution of a variable like sex ratio in a histogram; then select
points at the low or high end of the distribution, and watch the map highlight which
states fall into the selected bin(s).
62
There may be simpler ways to more effectively communicate this data, but it helped
us get an idea of what is possible with plotly
.
Mapping Applications in Shiny

These approaches for creating interactive maps can be quite powerful, but they do
have their limitations. The interactive maps shown above allow some degree of user
interactivity, but the code is ultimately static and so the user interface is largely fixed.
While packages like p lotlycan be used for linking views, there are better ways to
achieve high levels of interaction between maps and plots.
As shown in this gallery, there are now more than 100 registered HTML widgets for
R. Communicating the full extent of their capabilities is often best done through a
web application. In general, a complete application affords more possibilities that
aren't possible with only the techniques above, such as downloading output after
some kind of interaction.
Building a complete web application is the final step for interactivity. Within R, Shiny
is the way to achieve this. Shiny is an R package that makes it easy to build
interactive web apps straight from R without any knowledge of web development
languages like HTML, CSS or JavaScript.
LEARN MORE:
The Shiny documentation is the best place to get started. It includes a

number of sample apps, articles, and webinars. Shiny also has a gallery of
sophisticated and simple apps to learn and seek inspiration from.
63
The Basics of Shiny Apps

Essentially, Shiny apps have two parts: a front end and a back end. When creating a
Shiny app, you can choose to build it as a single file (in which case, the front end
and back end are housed in two functions, u i()and s erver() ) or two files (in which
case, the front end and back end are found in separate files, u i.Rand s erver.R ).
ui.Rcontrols the app’s appearance, while server.Rcontains the logic that

transforms a list of user inputs, such as dropdown menus or radio buttons, into
various kinds of outputs, like plots or tables.
Beyond these minimum two files, larger projects often involve a few other important
components. One is a separate data folder that holds all of the data read into the
app. Another is a file, perhaps named g lobal.R , that reads in data files, sets global
variables, and contains functions to be used in s
erver.R .
Particularly as complexity increases, it's helpful to pare down the server.Rfile to

only the reactive logic of Shiny. Setting variables, reading in data, and functions
describing how to build objects can all be handled in a g lobal.Rfile. Removing
these elements allows you to better focus on the reactivity in s erver.R .
Lastly, you might add a styles.cssfile for custom styling. I chose to add
includeCSS(styles.css)inside the header tag within my u i.Rfile. This allowed
us to override any of the app’s default styling in a separate file without distracting
from the structure of u
i.R
.
LEARN MORE:
After you get a handle on these concepts, it can be helpful to view the code
for mapping applications in Shiny. Examples include those found in the
Leaflet documentation, Geocomputation with R, tmap, and the Shiny gallery
itself. The Shiny documentation also includes an example of an interactive
choropleth. Finally, this blog post is also aimed at beginners starting to learn
Shiny and Leaflet.
Example of a Geospatial Shiny App

Leaflet is the most robust option for mapping in Shiny, especially when the data is
truly geospatial. However, because our data set focuses more on attributes than
64
geometry, we chose to explore different geospatial representations without the

underlying base maps using a ggiraphobject.
In the Shiny app below, you can construct a choropleth of any variable in the data
set for any subset of India’s regions. You can also compare how this choropleth
changes across a number of geographic representations, such as cartograms and
hexbin maps. Further, you can cross-check the data presented in the choropleth with
its corresponding dotplot and table in the adjacent tabs.
You can try the app out for yourself here.
LEARN MORE:
Check out Sean Angiolillo's R User Meetup talk and blog about how he built
a much more complex Shiny app — an interactive data visualization of the
1991-2011 Indian Census data depicting “Households Classified by Source
and Location of Drinking Water and Availability of Electricity and Latrine”.
The gif below also shows some of the visualizations the app can generate.
65
As far as what Shiny is capable of, this app is still quite simple, but hopefully it
demonstrates some of Shiny’s potential for communicating interesting geospatial
data stories.
Final Thoughts
This lesson and the previous one together have only scratched the surface of R’s
mapping capabilities. R is well-known for its visualization libraries, and this reputation
holds for geospatial data as well. Whether you're creating static, animated or
interactive maps, there's an R package ready to help you create high-quality
visualizations. Just remember to keep in mind your objectives and goals for any
visualization before designing either a static, animated or interactive map.
66
So far in this GIS course, we've introduced the following ideas:
● use cases for geospatial data
● getting started with the sfpackage
● manipulating attributes of geospatial data in a tidy workflow
● visualizing geospatial data through a wide array of static, interactive and

animated maps
However, we've yet to really do anything useful with the actual geometry of our
geospatial data. In this lesson, we'll introduce spatial subsetting, an important
family of operations applicable to geospatial data.
What Is Spatial Subsetting?

We've seen how to manipulate sfspatial dataframes through their attributes, as we'd
do with normal dataframes. For instance, with dplyr
, we can filter a spatial
dataframe to keep only observations that match certain factor levels or have a
numeric variable above or below a certain threshold. Similarly, we can also filter
observations using the geometry column of our geospatial data.
As defined in Geocomputation with R, “Spatial subsetting is the process of selecting

features of a spatial object based on whether or not they in some way relate in space
to another object.”
67
LEARN MORE:
Chapter 4, "Spatial data operations", of Geocomputation with R is the place

to start learning about spatial subsetting in R.
To take an example using our previous data set of Indian states, we might wish to
filter for only states that share a border with Delhi NCR. Or, rather than filtering by
attributes like states or districts, we may only care about states or districts within a
certain distance from a particular point. Spatial subsetting operations allow us to
perform these kinds of manipulations.
Topological Relations
Many types of spatial subsetting operations are available at our fingertips. Different
types of spatial relations are more formally called topological relations. The two
examples given above describe different topological relations. The former is looking
for a common border, or perhaps areas that "touch", whereas the latter is looking for
areas "within" another area.
As implemented in the sfpackage, you'll find these operations in functions like

st_intersects() ,s t_disjoint() ,s
t_within(),s
t_contains() ,
st_touches() ,s
t_crosses()and more. The documentation for any of these
functions includes the complete list.
These functions require a pair of sfgeometry sets — a target object and a selecting
object. Before diving into the specific syntax, let’s first get a sense of how these
relations are defined.
The simplest is st_intersects()and its inverse st_disjoint() . Giving two sf

objects to st_intersects()will return all observations that intersect with each
other in any way. Conversely, s
t_disjoint()returns observations with no
intersection.
Another pair of operators is st_within()and st_contains() . Both of these

operations return only observations that lie entirely within one object or another. The
designation of "x" and "y" arguments determines whether s t_within()or
st_contains()is actually the operation you need. s t_within()returns
observations in "x" that fall entirely within "y". s
t_contains()returns observations
in "x" that entirely contain "y".
68
LEARN MORE:
These relationships are much easier to understand with diagrams. GITTA

has a useful introduction to topological relations, complete with Venn
diagrams of each. More diagrams can also be found on Wikipedia. Another
helpful resource is S Ogletree’s article, which uses toy data within R to look
at the differences between relations.
Preparing Data
Before diving in to the syntax of spatial subsetting, we need some sample data. We'll
use the tidycensusand tigrispackages to download median household income
data for the Philadelphia metro area at the census tract level.
NOTE:
In the US Census hierarchy, census tracts are below counties and above
block groups. See Kyle Walker’s tigrisslides for more information.
LEARN MORE:
The documentation is a good way to get started with Kyle Walker’s

tidycensusand t igrispackages. Note that you'll need to get an API key
from the Census Bureau.
69
The tigrispackage does have a core_based_statistical_areas()function for

downloading shapefiles of metro areas, but instead we'll start with a table of counties
covering our area of interest. Then we'll demonstrate how to get a more narrow
geographic area through spatial subsetting.
We now have 1,186 census tracts covering the Philadelphia metropolitan area. This
is a larger area than we want to cover so we'll spatially subset this data set to a
70
smaller area based on distance from a central point of interest, in this case
Philadelphia’s City Hall.
Before we can do this, however, it's important to pay attention to the coordinate
reference system (CRS) of our geospatial data. The commands below show that the
data has a geographic CRS with EPSG code 4269.
In order to use spatial subsetting operations, we need to reproject our data from a
geographic CRS to a projected CRS. In this case, we’ve chosen to use EPSG code
2272.
71
LEARN MORE:
For more information on coordinate reference systems, see Chapter 2 and

Chapter 6 of Geocomputation with R.
Now that we have projected census tracts, we'll define a circle and use it as the
second geometry feature set by which we'll subset the census tracts.
As shown in the map below, with these two simple feature geometry sets in the
same projected CRS, we are ready to spatially subset.
72
Spatial Subsetting Syntax

In R, there are often multiple ways to achieve the same result. For subsetting, we
have a base R method using the square bracket [and a tidyverse method using
filter() . Spatial subsetting is no exception — both options are available within the
sfpackage.
The syntax is remarkably simple with the square bracket method. It's very similar to
bracket subsetting of a dataframe. But inside the square bracket, where a logical
expression would filter rows, you just need to place the selecting simple feature
geometry (i.e. a spatial dataframe, s
fc_POLYGON , etc).
In the example below, we subset the original 1,186 census tracts by those that
intersect the circle we defined. The result is a new sfspatial dataframe with 617
observations.
73
By default, st_intersects()is the unspoken topological operator when using the

square bracket for spatial subsetting. Setting the "op" argument allows us to choose
any topological relation instead of the default s
t_intersects
. In the example below,
we’ve chosen s t_disjoint() .
NOTE:
The 617 observations returned from the intersection plus the 569
observations returned from st_disjoint()sum to the original 1,186 tracts.
A second method of spatial subsetting involves creating an intermediary object of the

class "sparse geometry binary predicate" (sgbp), which is essentially a list of
matching indices we can use to subset the target object. Under this method, rather
than setting an "op" argument, we use a different topological operator beginning with
st_*.
74
Moreover, we have the option of returning a sparse or a dense matrix, which

slightly affects the syntax as shown below. This method fits more easily into a tidy
workflow, as evidenced by the use of d plyr
.
Regardless of which method you choose, all three methods should return the same
number of observations in a spatial dataframe of the same CRS.
To give one example of this tidy workflow, note below how we can start with our
original spatial dataframe, perform a spatial subset (in this case st_within), and
directly pipe the result into g
gplot2
. As we'd expect, the result is a much smaller
and more circular shape, fitting just inside the boundaries of our circle.
75
Explore Spatial Subsetting via Shiny

While seeing the syntax and a few diagrams can be useful, it's even better to
practice with different operations and quickly see the results. This Shiny app will let
you quickly explore spatial subsetting through different topological relations.
76
It shows the same map of census tracts for the Philadelphia metropolitan area. After
choosing a topological relation, you can position a circle of any size over the map of
census tracts and click to perform a spatial subset.
After a click, you'll see several results:
● The syntax of the given spatial subset, in both square bracket and dense
matrix methods, appears in the right-hand panel.
● The histogram for the selection is plotted against the original distribution of
census tracts.
● The choropleth color scale and legend adjusts to the selection’s domain. This
can be used to reveal more detailed variation within a region. For example,
income levels vary widely between wealthier suburbs and core urban areas in
the Philadelphia metro region. Spatially subsetting a smaller, more
homogenous geographic area can show new patterns.
Final Thoughts
With this lesson and the Shiny apps as tools, you've hopefully learned:
● the concept of spatial subsetting and when it may be useful
● differences in topological relations
● multiple ways to spatially subset your own data
Hopefully, now you are well on your way to becoming as comfortable spatially
subsetting your data as if it were simply attribute data.
Keep reading for one final lesson on how to explore satellite images, one of the best
forms of geospatial data today!
77
This lesson was written by Himanshu Sikaria and was last updated on 29 Jan. 2019.
In the previous lessons, we talked about how to handle basic geospatial data — any
data with a geographic component. Our previous examples used data sets with
several economic indicators for each Indian state.
However, there's a more complex form of geospatial data — raster images, which is
data captured by satellites orbiting the Earth. Raster images can be far more difficult
to find and process, but their high level of detail and frequent updates make them
incredibly valuable for analysis.
This lesson focuses on the basics of raster images — what they are, where to get
them, how to extract and process them, and what basic operations and analysis you
can do on them. We'll illustrate all of this by examining satellite data for a rural region
of Karnataka, a state in south India.
The Basics of Rasters

Imagine if you had the power to click images of any location on Earth from space.
Today, thanks to satellites, we do.
A raster file is an image of the Earth, which is geotagged. (That means that we can
find the exact location of any of raster image on the world map.) Like any other
image, raster images are made up of cells (pixels), and each cell has a value
associated with it.
78
LEARN MORE:
Looking for more information about raster images? This blog by ArcGIS
explains raster images perfectly, and these flashcards explain the jargon of
raster imagery.
Raster Attributes
Every raster scene has various attributes, or parameters. These can be accessed by
@— for example, rastername@extent .
Here are some attributes you should know:
1. class: There are three options — RasterLayer, RasterStack or RasterBrick. A

RasterLayer object represents single-layer (variable) raster data. A
RasterStack is a collection of RasterLayer objects with the same spatial
extent and resolution. A RasterBrick is truly a multilayered object, and
processing a RasterBrick can be more efficient than processing a
RasterStack.
2. resolution: The size of each cell (or pixel) that makes up the entire image.
This value is in degrees for the example below (1 degree ~ 110 kms).
3. extent: The latitude and longitude of the image's top right point and bottom
left point.
4. coord. ref.: This is the current raster file's projection. The Earth is a sphere,
and it needs to be projected to be converted to 2D.
5. values: The minimum and maximum values among all the cells in the raster.
LEARN MORE:
Learn more about geographic projections with this video from Vox.
79
The "Hello World" of Rasters
80
Exploring Karnataka and the Impact of

Droughts
Let’s get started with raster data in R by exploring Landsat 8 data for Karnataka, a
state in south India.
What Is Landsat 8?

Landsat is without a doubt one of the best sources of free satellite data today.
Managed by NASA and the United States Geological Survey, the Landsat satellites
have been capturing multi-spectral imagery for over 40 years.
The latest satellite, Landsat 8, orbits the Earth every 16 days and captures more
than 700 satellite images per day across 9 spectral bands and 2 thermal bands. Its
imagery has been used for everything from finding drought-prone areas and
monitoring coastal erosion to analyzing an area’s fire probability and setting the best
routes for electricity lines.
Landsat 8 Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS)
images consist of nine spectral bands. Bands 1 to 7 and 9 have a spatial resolution
of 30 meters, Band 8 (panchromatic) is 15 meters, and Bands 10 and 11 are 100
meters. The ultra blue Band 1 is useful for coastal and aerosol studies. Band 9 is
useful for cirrus cloud detection. Thermal Bands 10 and 11 are useful for providing
more accurate surface temperatures.
Table from USGS
81
USGS gives free, public access to both its raw and processed satellite images. Raw
images are available on AWS S3 and Google Cloud Storage, where they can be
downloaded immediately. Processed images are available with the EROS Science
Processing Architecture (ESPA). Images are also available through a variety of data
products, such as SR (Surface Reflectance), TOA (Top of Atmosphere) and BR
(Brightness Temperature).
Accessing the processed Landsat 8 data can be tricky. There are two different APIs
— one by Development Seed for searching (called sat-api) and one by USGS for
downloading (called espa-api). Download requests have to include the product ID,
projection, and format of the data, then they must be approved by USGS, which can
take anywhere from a couple minutes to a couple days. To make matters worse, the
APIs input and output data with different structures.
LEARN MORE:
New to Landsat 8 data? Here's lots more information to get you started:
● Read about the Landsat Collection (Pre-Collection and Collection 1)

here.
● Watch this video to understand the difference between the data on

ESPA and AWS S3/Google Cloud Storage, and why using ESPA is
preferred over AWS’s Digital Numbers (DN).
● Watch a video on how Landsat data is captured here.
● Read about over 120 applications of Landsat 8 data here.
Downloading Raster Images from the Landsat

Satellite
Our open-source package rLandsat can be used to easily download the latest
Landsat raster images — no Python or API knowledge needed! (You can read more
about r
Landsathere.)
Landsat divides the entire earth into grids, each with a unique row and path. This tool
from USGS can be used to convert a latitude and longitude to a path and row.
82
A part of Karnataka that had a major drought lies in path 145 and row 49. Let’s
download the latest imagery for this grid using rLandsatfunctions.
NOTE:
To run any of the functions starting with espa_

, you need valid login
credentials from ESPA-LSRD, and you need to input them in your
environment with e spa_creds(username, password)(as above) for the
functions to work properly.
Reading the Files in R

Once the download is complete and the TAR files extracted, there will be a GeoTIFF
file for each Landsat band. Each band represents the light reflected at different
frequencies. Bands 2, 3 and 4 represent the light visible to the human eye.
83
The next step is to load the rasters to R using the rasterlibrary. The file size of
rasters are generally huge (one Landsat tile has about 60 million pixels), so the
rasterlibrary doesn't load the entire data to memory; only when required, the
functions call the values and processes them in chunks. As a result, the raster
library saves every intermediate variable in the temp folder.
LEARN MORE:
Read detailed information about the rasterlibrary in its CRAN

documentation.
First, let’s try to load the data for different bands in R using the raster library, and
then we'll plot any one of them. To read a single raster image, we can use the
raster()function. To read a stack (multiple) of rasters at once, we can use the
stack()function. Printing the raster/stack file will give brief information about the
raster.
84
Plotting a Raster Image

Plotting a single band's raster is simple — just use the plotfunction. We can also
modify the color range of the plot using the colparameter.
Plotting a Stack of Raster Images

A RasterLayer is a single band raster file, which means that each pixel in the raster
has a single value attached to it. We can create a RasterStack by combining
different RasterLayers. This would assign multiple values to a single pixel.
85
Plotting a RasterStack with the visible bands gives an image of exactly how the
human eye would see this piece of Earth from space.
86
Cropping Raster Images

As you can see, this Landsat scene covers a lot of area, and it's difficult to actually
get insights from it. Let's try to crop the scene to only the east region for more clarity.
The upper-left and bottom-right coordinates are specified in the extent()

function.
The RasterStack is passed to the crop()function with the specified extent to be
cropped.
87
88
In these images, we can clearly see the change in greenery over the years — 2014
is the most green, and 2017 is the least. In fact, in 2017, Karnataka faced the worst
drought in 42 years.
In the next section, let’s see how we can quantify this change and check which
regions were worst affected by the drought.
Vegetation Quality In and Around the Region

A combination of other bands can be super helpful too. If we combine infrared, red
and green (Bands 5, 4 and 3), we can create a plot where the vegetation is red. The
more vibrant the color, the healthier the crop.
89
90
Basic Operations
Doing raster operations is easy — most of the time, it can be treated as a numeric
vector. By doing basic algebraic operations on different bands, we can create indices
that better explain the characteristics of the region.
Let’s try to create one of the most extensively used indices from Landsat — NDVI
(Normalized Difference Vegetation Index). NDVI is defined as (Band 5 - Band
4)/(Band 5 + Band 4) .
Negative values of NDVI (values approaching -1) correspond to water. Values close
to zero (-0.1 to 0.1) generally correspond to barren areas of rock, sand or snow.
Low, positive values (approximately 0.2 to 0.4) represent shrub and grassland, while
high positive values (values approaching 1) indicate temperate and tropical
rainforests.
LEARN MORE:
This document from Landsat gives more information about NVDI and all the
other spectral indices you can create using Landsat data.
91
92
Which Regions Are Most Affected?

With this data, we can quantify the exact change in NDVI and look for the most
affected regions in this rural part of Karnataka.
93
2017 was definitely a bad year with a massive decrease in NDVI, though 2018
seems to be better with a positive NDVI change.
94
Final Thoughts
The power of spatial data is immense, and this is just the beginning of the sort of
work that you can do with satellite imagery. In this lesson, with a few basic
operations and visualizations on freely available data, we analyzed how vegetation in
Karnataka changed over time. With the same data and techniques, we can do more
complex analysis and apply machine learning techniques to further classify land into
different types.
For example, at Atlan, we worked with Landsat 8 data in R to classify every piece of
land in India into one of four categories: water, barren, green, or built-up regions.
The results were really interesting — we could detect where and when new buildings
and houses were being built, green regions turned into barren ones, and rivers dried
up.
Land classification for Karnataka. (Black is built-up, yellow is barren, blue is water,
and green is green land.)
Development in Karnataka. (Black is built-up, yellow is barren, and red is agricultural

land.)
95

Gis Desde R

Uploaded by

Copyright:

Available Formats

Gis Desde R

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Gis Desde R

Uploaded by

Copyright:

Available Formats

Feedback, questions, comments or requests?

Geospatial Data Is Everywhere

How Businesses Use Geospatial Data

In another sense, geospatial data can significantly improve ​market segmentation

flag irregular or suspicious transactions. Incorporating geospatial data into these

Many people have explained how businesses are incorporating geospatial

How the Public Sector Uses Geospatial Data

● As a full-fledged programming language, the command line interface of R has

● R is ​reproducible​ in a way that point-and-click interfaces are not. This is, of

● The greater ​shareability​ of R scripts and packages encourages a faster

Many of these advantages apply to any open-source programming language, such

Getting Started with R as a GIS

Jakub Nowosad and Jannes Muenchow. In particular, ​Chapter 2​ gives a

All ​sf​functions begin with ​st_*​to help users identify them.

First, we'll convert the ​sf​object to a ​SpatialPolygonsDataFrame​

Now we can briefly inspect the structure of a ​SpatialPolygonsDataFrame​

The former ​SpatialPolygonsDataFrame​ , a class defined by the ​sp​package, now

● It has 36 features (depending on how many we print) and 8 fields (our

● The ​geometry type​is a multipolygon because the geometries represent the

● bbox​gives the object’s bounding box dimensions.

● epsg​and ​proj4string​describe the coordinate reference system (CRS).

For more information on coordinate reference systems, see ​Section 2.4​ of

Preparing Attribute Data

Unfamiliar with these packages? Check out the ​googlesheets​package

See the ​Relational Data chapter​ of ​R for Data Science​ if ​left_join()​is

One option is ​sf::st_simplify()​, but here we'll use the ​ms_simplify()​function

Another useful function is ​sf::st_geometry()​ . When passing it an ​sf

For more information on simplification, see ​Section 5.2.1​ of ​Geocomputation

In the ​previous lesson​, we briefly explored the structure of spatial dataframes as

Resources on Visualizing Geospatial Data

● Data Visualization: A practical introduction​ by Kieran Healy. (This even

● Fundamentals of Data Visualization​ by Claus O. Wilke

● "Making Maps with R"​ chapter of the previously-mentioned ​Geocomputation

● Bhaskar V. Karambelkar’s ​tutorial​ at useR 2017 on "Geospatial Data

Choosing the Right Visualization

In fact, focusing on geospatial elements can sometimes misrepresent the data

Thematic Maps (tmap)

Guides to tweaking other aspects of the map can be found in the

Arranging tmap Objects

tmap​also has a helpful function, ​tmap_arrange()​

Axis Maps’ ​cartography guide​ directly addresses the question of

If you have highly skewed data, it may be helpful to perform a logarithmic

For more information about statistical transformations in the context of data

In the example above, a ​log10()​transformation has hopefully achieved a greater

The ​free.coords()​argument of ​tm_facets()​is set to ​TRUE​ . If it was

Proportional Symbols Maps

Luckily, ​tmap​is also well-suited to these types of visualizations. Here, a symbol

With a geographic CRS, ​st_centroid()​does produce a result, but it produces a

The question then becomes choosing an appropriate projected CRS. Viewing

Dot Density Maps

Partially Spatial Static Representations

Nominal GDP is a useful variable to demonstrate this relationship. For example, in

Above we created a continuous cartogram with the ​cartogram_cont()​function.

The Dorling cartogram is essentially the proportional symbols map without

The package’s ​README​ documents the process for generating geogrids

Animation with tmap

Animation with gganimate

Check out Pedersen's theory on establishing a grammar of animation in a

ggplot2::facet_wrap()​by splitting the data into multiple panels, tweening

In another sense, geospatial data can significantly improve market segmentation

● R is reproducible in a way that point-and-click interfaces are not. This is, of

● The greater shareability of R scripts and packages encourages a faster

Jakub Nowosad and Jannes Muenchow. In particular, Chapter 2 gives a

All sffunctions begin with st_*to help users identify them.

First, we'll convert the sfobject to a SpatialPolygonsDataFrame

Now we can briefly inspect the structure of a SpatialPolygonsDataFrame

The former SpatialPolygonsDataFrame , a class defined by the sppackage, now

● The geometry typeis a multipolygon because the geometries represent the

● bboxgives the object’s bounding box dimensions.

● epsgand proj4stringdescribe the coordinate reference system (CRS).

For more information on coordinate reference systems, see Section 2.4 of

Unfamiliar with these packages? Check out the googlesheetspackage

See the Relational Data chapter of R for Data Science if left_join()is

One option is sf::st_simplify(), but here we'll use the ms_simplify()function

Another useful function is sf::st_geometry() . When passing it an sf

For more information on simplification, see Section 5.2.1 of Geocomputation

In the previous lesson, we briefly explored the structure of spatial dataframes as

● Data Visualization: A practical introduction by Kieran Healy. (This even

● Fundamentals of Data Visualization by Claus O. Wilke

● "Making Maps with R" chapter of the previously-mentioned Geocomputation

● Bhaskar V. Karambelkar’s tutorial at useR 2017 on "Geospatial Data

tmapalso has a helpful function, tmap_arrange()

Axis Maps’ cartography guide directly addresses the question of

In the example above, a log10()transformation has hopefully achieved a greater

The free.coords()argument of tm_facets()is set to TRUE . If it was

Luckily, tmapis also well-suited to these types of visualizations. Here, a symbol

With a geographic CRS, st_centroid()does produce a result, but it produces a

Above we created a continuous cartogram with the cartogram_cont()function.

The package’s README documents the process for generating geogrids

ggplot2::facet_wrap()by splitting the data into multiple panels, tweening

This section introduces the ggiraphpackage, tmap

The plotlypackage is a popular choice for adding interactivity to ggplot2plots.

Alternatively, it's possible to convert tmapobjects to Leaflet objects via the

However, interactive tmapobjects, particularly arranged tmapobjects, seem to be

Leaflet is a Javascript library for interactive maps. An R wrapper package of the

In addition to the official documentation, there are a number of excellent

● There is now a cookbook-style handbook, which includes a dedicated

● This blog post explores improving ggplotly()conversions for maps.

Using plotlysyntax, we can easily construct interactive choropleths similar to the

Beyond simple choropleths, plotlyreally shines at more complex levels of

Using the plot_mapbox()function will require setting up a public access

Below, after creating a SharedDataobject and specifying regionas a key

The Shiny documentation is the best place to get started. It includes a

ui.Rcontrols the app’s appearance, while server.Rcontains the logic that

Particularly as complexity increases, it's helpful to pare down the server.Rfile to

● getting started with the sfpackage

As defined in Geocomputation with R, “Spatial subsetting is the process of selecting

Chapter 4, "Spatial data operations", of Geocomputation with R is the place

As implemented in the sfpackage, you'll find these operations in functions like

The simplest is st_intersects()and its inverse st_disjoint() . Giving two sf

Another pair of operators is st_within()and st_contains() . Both of these

These relationships are much easier to understand with diagrams. GITTA

The documentation is a good way to get started with Kyle Walker’s

The tigrispackage does have a core_based_statistical_areas()function for

For more information on coordinate reference systems, see Chapter 2 and

By default, st_intersects()is the unspoken topological operator when using the

Moreover, we have the option of returning a sparse or a dense matrix, which

1. class: There are three options — RasterLayer, RasterStack or RasterBrick. A

Table from USGS

● Watch this video to understand the difference between the data on

● Watch a video on how Landsat data is captured here.

● Read about over 120 applications of Landsat 8 data here.