Gis Desde R
Gis Desde R
Gis Desde R
ONLINE COURSE
Introduction to GIS
Manipulating and Mapping
Geospatial Data in R
When you hear "geospatial data", what comes to your mind? For many people, it's
ordinary maps, either of physical or human geography. Maps are one important
output of geospatial data, but they can be used for so much more. In fact, geospatial
data is an overlooked and underappreciated aspect of today's "big data" revolution.
The growth of geospatial data is fairly recent, but its effect can already be seen in
many of the products we use on a daily basis. Google search results are more local,
Ubers arrive faster, and we can even track the exact location of our Grubhub or
Zomato order for free. All of these products and services use geospatial data to tailor
the product experience to each user.
Before we dive into working with geospatial data in R, let’s talk about why you should
even be interested in learning about geospatial data by highlighting a quick overview
of both business and public use cases.
1
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
LEARN MORE:
The greater accessibility of geospatial data has helped spur a wide variety of
use cases. One very enthusiastic website has a list of over 1,000 use cases
of geospatial data, including fields like retail, health care, transportation, and
governance. ESRI also maintains a catalogue of interesting case studies
using geospatial data.
Perhaps the most obvious place to start when discussing how businesses can use
geospatial data is improved transportation and logistics planning. Companies can
use real-time geospatial data to optimize their supply chains and thereby reduce
costs. In addition, geospatial data allows businesses to not just react to weather and
climate patterns in real time, but also make predictions. For instance, analyzing
historical weather trends can help a retailer better anticipate customer demand and
adjust supply accordingly.
Next, geospatial data can play a key role in a business’ risk analysis efforts. The
idea of modeling risk to mitigate potential exposure is important in many industries,
and geospatial data fits well into such models. For example, it can be used to more
accurately assess which properties may be at greater risk due to environmental
damage and extreme weather events, such as flooding.
In a similar vein to risk analysis is fraud detection and prevention. Fraud detection,
particularly in industries like credit cards, is one area where machine learning models
made an early contribution for their ability to sift through huge mounds of data and
2
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
One example of geospatial data is this road data for Pune, a city in India, color-coded to show which
roads are well-lit at night. (Source: Atlan.)
Lastly, geospatial data can play a role in almost any kind of optimization exercise.
Take the example of identifying new locations for a business. Whether determining
where to open the next branch of a chain restaurant or Amazon HQ2, it would be
foolish to ignore geospatial data. It allows companies to spatially visualize
simultaneous layers of analysis, such as target market size, number of competitors,
public amenities and infrastructure, and environmental risk factors.
LEARN MORE:
3
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
While a business may use geospatial data for logistics planning, a public agency can
use geospatial data for better urban and rural planning to improve land use,
environmental protection, pollution levels, and liveability. If a business can optimize
its fleet of delivery trucks, a public agency can optimize emergency response teams
and evacuation routes after a natural disaster. Even basic public functions like trash
collection or the rates charged for parking spaces can be optimized based on some
level of geospatial data.
Businesses may use geospatial data for market segmentation, but public agencies
can use the same principles to better target constituents. For example, public
agencies can work with geospatial data to better identify at-risk populations for
mosquito-borne diseases based on proximity to areas with poor water drainage.
Public health missions can then more accurately target their outreach to these
populations. Alternatively, much like a business targeting customers, political parties
have used geospatial data to target potential voters.
An example of geospatial data is this Damage Proxy Map (DPM) of areas in Hokkaido, Japan, that
were likely damaged by the M6.6 earthquake on September 5, 2018. (Source: NASA's Jet Propulsion
Laboratory.)
Like a business, a public agency may want to assess risk to the public and
infrastructure. A new use case, known as predictive policing, is under trial in certain
parts of the world. The idea is for police agencies to assess the risk of crime in a
given area through geospatial and temporal analysis of past crimes, and thereby
4
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
adjust how many police are on call and where they are stationed. While this raises a
number of important ethical concerns, it highlights the far-reaching implications of
geospatial data.
A business might use geospatial data in its fraud detection efforts, but a public
agency might use geospatial data to track illegal construction. The Delhi
Government in fact had a project to achieve just this by creating what is called the
Delhi State Spatial Data Infrastructure (DSSDI).
If businesses can use geospatial data to find strategic locations, public agencies can
do the same to determine the location of important public infrastructure like new
roads, hospitals, schools and voting booths. We even used geospatial data in our
effort to locate where to position 10,000 new LPG (liquefied petroleum gas)
distribution centers across India.
One last use case in the public sector is the idea of investing in geospatial data as
a public good. For example, one important form of geospatial data is weather data.
More accurate, localized, and accessible weather reports can have a large impact on
the agricultural sector. By publishing digitized and spatially-mapped land records,
governments can mitigate land acquisition disputes. Geospatial data on
environmental metrics like air quality or carbon emissions is another example of a
public good. In the case of India, major government initiatives like Digital India and
Smart Cities rely on these geospatial technologies.
LEARN MORE:
One useful resource on how GIS technology can improve city services is this
McKinsey brief. Or for analysis focused on the Indian context, this FICCI
report highlights 40 cases on geospatial technologies in different sectors.
Final Thoughts
If this brief overview of geospatial data use cases has piqued your interest, be sure
to keep reading. Starting in the next lesson, you'll quickly get your hands dirty
working directly with geospatial data in R.
5
This lesson was written by Sean Angiolillo and was last updated on 29 Jan. 2019.
The previous lesson looked at the many growing use cases for geospatial data. Now
we'll get started manipulating geospatial data in R using the sfpackage. You'll learn
how to import spatial data, combine attribute data into existing geospatial objects,
calculate area, and simplify spatial dataframes before plotting them (which is the
subject of the next lesson).
Using R as a GIS
Until recently, serious work with geospatial data required an often-proprietary
desktop GIS (Geographic Information System), such as ArcGIS. Now, however, GIS
capabilities in R have greatly advanced.
In many ways, the benefits of using R over a desktop GIS are similar to the benefits
of using R over Excel for data analysis.
● R is free and open source, which makes it easier and cheaper to get started,
compared to the expensive licenses needed for many desktop GIS. This has
helped spread geospatial data analysis beyond the domain of only GIS
specialists, thereby opening opportunity to a wider ecosystem of contributors.
6
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
LEARN MORE:
See "Why Geocomputation with R?" for a greater discussion of the merits of
R for GIS.
What package should you use for geospatial data? The answer could differ based on
your objectives, but the sfpackage is likely the best place to get started. The sf
(Simple Features) package provides a class system for geographic vector data. It is
the successor to the s ppackage and is quickly being adopted by many other
packages for geospatial data.
Let's start exploring some of the package's features with simple state-level Indian
population and economic data.
LEARN MORE:
Never used the sfpackage? Get started with the recently completed open
source book under development, Geocomputation with R by Robin Lovelace,
7
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
Spatial data enthusiasts are also excited about the announcement that sf
package authors Edzer Pebesma and Roger Bivand are currently working on
an open source book of their own, Spatial Data Science. Drafts of the first
eight chapters of what is sure to be a key resource for the field are now
available.
Creating sf Objects
For tidyverse users, one of the most exciting aspects of the sfpackage is the ability
to work with geospatial data in a tidy workflow. Unlike its predecessor package sp ,
with the s
fpackage, geospatial and attribute data can be stored together in a
spatial dataframe, where the object’s geometry occupies a special list-column. In
addition to being faster, this lets you manipulate an s
fobject via m
agrittrpipes like
an ordinary dataframe, or at least one with a few special characteristics.
It's certainly possible to create your own sfobjects with functions from the package
like st_point() , st_linestring() , and st_polygon() . But in most cases we only
have to read in existing spatial data. That is generally done with the st_read()
function.
The shapefiles used in this demonstration can be found in this blog’s associated
GitHub repository.
8
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
NOTE:
In general, you can find administrative boundary data from GADM. It maintains
open-source, current administrative boundary data for most countries. It's possible to
download spatial data directly from the GADM website, but using the GADMTools
package helps ensure your workflow is reproducible. Specifying l
evel = 1returns
state-level boundaries.
LEARN MORE:
Robin Wilson’s website provides a great list of free GIS data sources,
covering both physical and human geography.
The code below will download shapefiles for Indian states. However, since Kashmir
isn't included in India’s borders, we'll use the shapefiles in the repository.
9
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
Inspecting Objects
Before working with this sfobject, let's briefly compare the sfand sppackages.
We notice it has 5 "slots" (each prefaced by the @symbol). The first slot should look
familiar. The data slot holds a dataframe with 36 observations of 8 variables. We can
extract any of these slots using the @
symbol like we'd normally do with the $ symbol.
10
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
The following slots hold polygons , plotOrder , bboxand proj4string. We'll return
to these in the context of sfobjects. For now, just note that the data and aspects of
the object’s geometry are held in separate slots. This is not compatible with the
tidyverse-style workflow to which many of us have grown accustomed.
While we could continue working with this format, let’s convert it back to an sfobject
with the st_as_sf()function and inspect the difference.
11
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
LEARN MORE:
12
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
We could directly access information about the object’s spatial features with
functions like st_geometry_type() , st_dimension() , st_bbox()and st_crs().
Moreover, familiar functions like g
limpse()or V iew()that we'd use to explore a
dataframe also work on s fobjects.
If you further inspect this object, you can see that it has a few attribute columns
giving an abbreviation and bounding box for each state. Most importantly, the last
column holds each state’s geometry in a list-column.
Manipulating sf Objects
Because spatial dataframes in the sfpackage are dataframes, we can manipulate
them using our normal data manipulation tools, such as dplyr
. Here we'll just select
and rename the columns we want.
13
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
NOTE:
This doesn't explicitly select the geometry column, but the geometry in sf
objects is sticky. It remains in the object unless explicitly dropped with
ind_sf%>% st_set_geometry(NULL) .
For those already familiar with the tidyverse, using normal dplyrverbs to manipulate
sfobjects is one of the great benefits of using the sfpackage. If we were working
with the slots of the earlier S
patialPolygonsDataFrame , this wouldn't be possible.
Moreover, note that these manipulations haven't affected the class of our object in
any way.
Since we're focusing on the process rather than the data itself, we'll use population,
economic, and region data from Wikipedia. If you are unfamiliar with data import
using the g
ooglesheetspackage, web scraping with r vest
, and wrangling with
14
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
dplyr
, please refer to the prepare_data.Rscript in this GitHub repository to see
how the attributes.rdsdata set was assembled.
LEARN MORE:
Once we've prepared an attributes dataframe, we can join it to the spatial dataframe
as with any two dataframes, as well as mutate two new variables.
LEARN MORE:
If we inspect this object once more, we can see that it has all of the expected
attribute columns, and the last column holds each state’s geometry in a list. The
same spatial attributes regarding the object’s bounding box and CRS remain as well.
15
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
Calculating Area
Our attribute data already has a column for area, pulled from Wikipedia. However, if
this wasn't the case or we didn't trust the data, we could calculate the area of each
observation in our spatial dataframe using the s t_area()function.
It's simple enough to do this, but we need to be careful with the units. In this case,
we need to convert from square meters to square kilometers.
16
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
LEARN MORE:
See this guide on units of measurement for R vectors for more information
on unit conversion in R.
In the output below, note the difference between the simple numeric class of
area_km2and the class of "units" for the area calculation resulting from s
t_area()
.
Moreover, we can see that the two figures are close but not exactly the same.
17
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
Simplifying Geometry
Before plotting sfobjects (the subject of the next lesson), we should simplify the
polygons in the spatial dataframe.
For simple maps, there's no need to have the fine level of detail that comes with the
GADM data or many other sources of geospatial data. Simplification can vastly
reduce memory requirements while sacrificing very little in terms of visual output.
Fortunately, there's an easy process to reduce the number of vertices in a polygon
while retaining the same visible shape.
NOTE:
We also stripped the units class for the area we calculated because it
created a problem for ms_simplify() . We can always add it after
simplification.
18
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
The original map is above on the left, and the simplified version is on the right. The
simplified version looks no different despite having only 1% of the vertices. In fact, it
looks even better because the border lines are cleaner.
Moreover, simplification reduced the geometry size from 9.56 MB to just 150 KB.
19
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
Finally, let’s save the simplified spatial dataframe for the next lesson.
LEARN MORE:
Final Thoughts
After briefly introducing the context of using R as a GIS, this lesson showed how the
sfpackage creates a class structure for storing geospatial and attribute data
together in an object that fits into a tidyverse workflow. We can clearly see the
benefits of this structure when it comes to manipulating a spatial dataframe with our
familiar d
plyrverbs.
Now that you know how to manipulate geospatial data, the natural next step is
visualization or mapping. Here again, we'll see the benefit of a tidy workflow, now
that g
gplot2’s g
eom_sf()is available to us. Though, as you'll see in the next
lesson, g gplot2is just one of many excellent package options when it comes to
visualizing geospatial data in R.
20
This lesson was written by Sean Angiolillo and was last updated on 29 Jan. 2019.
This lesson introduces how to use some of the most well-known R packages to
create static maps, such as tmapand ggplot2 . We’ll also explore a few other
packages like c artogram ,g
eogridand g eofacetfor some more unique spatial
visualizations. (We'll tackle creating animated and interactive maps in the next
lesson.)
Below are two excellent open source resources on the principles of data
visualization. Both include chapters on geospatial data visualization.
21
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
Here are two resources focused more narrowly on the mechanics of mapping
specifically in R, rather than larger principles of good design:
Before choosing a visualization, we should be sure about the nature of our data. Is
the data numeric? And if so, is it a raw count, such as population, or is it
standardized, such as population density? If the data isn't numeric, is it nominal (or
categorical), such as linguistic or religion data, or ordinal, such as satisfaction
rankings?
One point Healy makes clear in his book is that it's important to consider whether or
not a truly geospatial visualization is the best choice for your data. In our case, and
in many cases concerning choropleths, the data is only partly geospatial — it really
represents counts of some value in an arbitrary unit.
The spatial object we created in the previous lesson has attributes like population,
GDP, and sex ratio. It is certainly possible to visualize this data through barplots,
ignoring the data’s geospatial qualities. Alternatively, instead of ignoring the
geospatial elements, we could show some of this information through a proxy — for
example, mapping different colors to a variable like region.
Nevertheless, because of its simplicity, we'll use state-level data to test out different
approaches for geospatial visualization. With this goal in mind, hopefully none of the
22
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
visualizations below are "bad", but whether or not they are the "best" visualization for
this data would depend on the specific objectives at hand.
Static Maps
Although a number of R packages have made it easy to make attractive interactive
and animated maps, they haven't removed the need for effective static maps. This
section looks at how to create static maps in base R, t
map
, and g
gplot2
.
NOTE:
While tmapand ggplot2are two of the most popular packages for creating
maps, they aren't the only options. The cartography package is another
interesting tool, particularly for certain kinds of maps, such as choropleths
contained in proportional symbols. See the package vignette and cheat
sheet to get started.
Base Plotting
As demonstrated in the previous lesson's plots of geometry, the sfpackage provides
ap lot()method for visualizing geographic data. Plotting the object itself will
produce a grid of faceted plots, one for each attribute. Choosing a variable produces
a single map.
This plot demonstrates how quick and easy plotting base maps can be, but there are
reasons why this default choropleth may not be an effective visualization. Hopefully,
by the end of this lesson, it will be clear why and what other types of visualization
may be more effective in this case.
23
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
It brings a ggplot2
-style syntax tailored to geospatial data. Like ggplot2, it
emphasizes sequentially adding layers to a plot. You can pass a spatial dataframe to
the t m_shape()function much like you'd pass a dataframe to the g gplot()function.
Moreover, because spatial dataframes in the sfpackage are also dataframes, you
can filter out any particular features (like "Andaman & Nicobar Islands" below) and
directly proceed with piping the object into a t m_shape()chain.
LEARN MORE:
24
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
After filtering out union territories, the choropleth below maps India’s GDP density, a
measure of economic activity by area. Measured here in units of nominal GDP per
square kilometer, GDP density has no clear midpoint, and so it requires a sequential
color scale as opposed to a diverging or categorical color scale.
25
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
LEARN MORE:
There is a great deal of theory and advice about using color in data
visualization, including maps. Wilke’s book in particular has excellent
chapters on color scales and color pitfalls.
For example, if we filter out the small union territories to get a fairer distribution, we
can separately create tmapobjects of population growth and density. Then we can
arrange them next to each other for comparison.
Like GDP density, values like population growth and population density are
standardized data, and so they are well-suited for a choropleth.
26
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
LEARN MORE:
Inset Maps
tmapis also particularly useful for creating inset maps, those that include a small
window providing the wider geographic context for the main map.
The first step is creating a base or primary map. We have done this for sex ratio in
Northeast India.
27
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
NOTE:
One trick from tmap_tricks()is reversing the color scale by placing a -in
front of the palette name. This makes sense because our concern should
increase as sex ratio decreases.
Next, we created the smaller inset map, which will provide the wider geographic
context. For the small map, we wanted to highlight the Northeast region on the larger
map of India.
In order to do this, we first grouped the features by region. Using the same dplyr
syntax, we can reduce the 36 features to 8 regions.
These 8 regional features have a geometry reflecting the "sum" of their individual
sub-components. It is quite interesting that this kind of geometric operation can be
done so easily by using s
t_unify()behind the scenes.
28
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
Once we have a base map and an inset map, we can combine them with the
following syntax, using some trial and error to get the placement right.
Faceted Maps
tmapalso supports the creation of faceted maps, or small multiples. They can be
useful for attributes with a fairly small number of levels. For instance, if we have
population data for a few years, we could show a progression over time. In this case,
region is a useful variable for faceting. Splitting up a map by region can sometimes
highlight contrasts better than looking at one image of the entire area.
29
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
The free.coordsargument controls whether to show only the faceted map area or
instead highlight the facet’s place in the original map.
There is no inherent order to regions, but it's useful to impose one. Below we've
ordered the facets in a roughly counter-clockwise order starting from "Northern". To
do this, it helps to first make r
egionan ordered factor.
It's also important to pay attention to the nature of the distribution before making any
plot. In India, per capita GDP is highly skewed because of outliers like Goa and
Delhi. If you map this data on a linear scale, most states may end up the same color.
This will conceal important differences in the bulk of the data.
LEARN MORE:
30
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
31
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
NOTE:
So far, all of our maps have been choropleths. This was convenient because our
data was always standardized in some way — a density, percentage or ratio for
example. Choropleths, however, are poorly suited to raw count data. When dealing
with count data, such as population, a proportional symbols map can be more
effective.
32
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
The proportional symbols map retains the original geography, only obscured by the
top layer of symbols. The symbols retain the correct spatial arrangement and are
easy to interpret in relationship to each other. Still, judging the area of circles is more
difficult than compared to a non-spatial representation, such as a barplot.
geom_sf in ggplot2
Hopefully these examples have demonstrated that tmapis a robust mapping tool. At
the same time, the addition of geom_sfhas made ggplot2another attractive option.
ggplot2requires tidy data. Since spatial dataframes defined in the sfpackage are
dataframes, it makes sense that we could expect to use ggplot2to visualize sf
33
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
objects. Recently, ggplot2added support for sfobjects with geom_sf() . The key
advantage of geom_sf()is that tidyverse users are already familiar with ggplot2
and its wider ecosystem of add-on packages.
However, as this is a recent addition, you might expect a few bugs. For example, in
the above faceted tmapobject, setting free.coords = FALSEallowed for the entire
object to be plotted in each facet. At this time, faceting an s
fgeom doesn't seem to
allow setting s
cales = “free”to allow a similar outcome.
Nevertheless, there are many benefits and cases where we can visualize sfobjects
with g
gplot2 . For instance, ggplot2users will be familiar with the process of
mapping data from dataframes to aesthetics, and layering additional dataframes on
top of a plot. That same workflow holds for plotting s fobjects.
In the plot below, we want to add only the state name "Kerala" to the map. We could
have done it with the annotate()function, but instead we created an sfobject (also
a dataframe) holding only the feature we wanted to annotate (Kerala).
In order to do this successfully, however, we first need to find the geographic center
of Kerala to know the point from which to draw the label. Geometric operations like
calculating centroids, buffers and distance require a projected CRS as opposed to a
geographic CRS, and so we've done so below using s t_transform() .
Using this CRS, we were able to use st_transform()to project both sfobjects
onto the same projected CRS. Once that was done, we could add the Kerala label
using g
eom_text_repel()like we'd normally do in g
gplot2
.
34
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
35
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
Proportional symbols maps are not the only option for raw count data. A dot density
map can be an effective tool to spatially visualize count data, particularly when your
goal is to find clusters and regional patterns instead of exact data values.
Below we've created a dot density plot comparing rural and urban populations. To do
this, first, we depart from a tidy data format and gather()urban and rural population
data. Then we use the s t_sample()function to draw sample points based on the
respective urban and rural population data for each observation.
This type of visualization would be much more effective if we had data at smaller
levels of administration, such as districts. Instead, because we are sampling at the
state level, our dots will be placed in locations counter to the actual population
density. For example, Maharashtra’s urban population will be randomly spread
throughout the state instead of clustering in metros like Mumbai. Nevertheless, it's
still useful to see how such a map can be created.
36
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
37
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
While we can't see population clusters around metros, we can still see relatively
sparsely populated areas (Jammu & Kashmir and the Northeast), the extremely
dense rural belt of Uttar Pradesh and Bihar, and the relatively more urban South
India.
Moreover, as we'll see in the next lesson, this is one example of a visualization
where adding interactivity — specifically, the ability to separately plot urban and rural
data — can be a real benefit.
LEARN MORE:
For more details on generating dot density plots in R, see these excellent
blogs from Tarak and Paul Campbell.
38
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
state) and its population. When data is only partially spatial, a map may not always
the best visualization, depending on your objectives.
If you don't need a fully spatial representation of your data, there are other
visualization options that communicate some spatial aspects of the data, but diverge
in some aspect or another. Examples include cartograms, hexbin maps and
geofaceted plots.
Cartograms
In a cartogram, we maintain the overall geospatial nature of an object, but distort
the area of each observational unit so that each unit is scaled proportional to some
chosen variable.
39
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
While still maintaining the overall geographic structure of India, we can vividly see
which states shrink or expand when distorting geographic area by the size of
nominal GDP. (For the same reasons discussed above, we need to use a projected
CRS rather than a geographic one.)
40
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
41
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
Given that cartograms are already distorting the actual geographic shapes, it's often
not necessary to keep one continuous unit. In such cases, a non-continuous
cartogram may be preferable. Alternatively, we can go even more abstract and
replace all geographic shapes with a simple circle, scaled to the parameter of
interest.
NOTE:
Hexbin Maps
Similar to a Dorling cartogram, a hexbin map also replaces exact spatial boundaries
with a rough spatial arrangement. However, instead of mapping the variable of
interest to size, it's mapped to color.
Hexbin grids for the United States and a few other countries are well established, but
geogridis a new package under development that tries to generate automatic
hexbin grids given any set of geospatial polygons. Although the package lets you
generate a number of possible grids and select the best option, we had trouble
generating a map that adequately placed certain states, the Northeast and
non-contiguous territories in particular.
Nevertheless, the hexbin map below gives a sense of why reducing geospatial
polygons to a hexagon can be more useful, in certain cases, than the original
geometry.
42
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
The downside of this visualization is that it gives equal area to all states. Tiny union
territories are represented by the same area as Uttar Pradesh. However, if we
understand that context in advance, this can be a useful visualization if we only want
a distribution of per capita GDP across states, regardless of population or area.
43
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
NOTE:
Geofaceted Plots
Similar to a hexbin map, a geofaceted plot sacrifices exact spatial characteristics in
favor of a loose spatial arrangement. It strongly prioritizes accurate presentation of
the attribute data at obvious cost to the geospatial representation.
The geofacetpackage makes it easy to design a custom grid and use it to facet
data across the grid.
44
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
Although it doesn't look much like an Indian state map anymore, this visualization
does vividly communicate the vast differences in urban and rural populations across
states. It also highlights how the choice of visualization affects our interpretation.
NOTE:
The dot density map created above comes from the exact same data.
45
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
Final Thoughts
With the help of packages tmap , ggplot2
, cartogram
, geogridand geofacet, this
lesson has introduced some of the most common methods for creating various kinds
of static geospatial visualizations in R, such as choropleths, dot density maps and
cartograms.
Interested in going beyond static maps and exploring the world of animated or
interactive maps? Check out the next lesson in this course.
46
This lesson was written by Sean Angiolillo and was last updated on 29 Jan. 2019.
Now that we've completed an overview of static mapping in R in the previous lesson,
let’s explore how to create animated and interactive maps. We'll then conclude by
creating a Shiny app to show the potential of visualizing geospatial data through
interactive web applications in R.
Adding animation or interaction to a static map creates opportunities for stories and
experiences that are not otherwise possible in a static world. Despite this power, it’s
important to introduce these elements in the right circumstances. Animated and
interactive maps both demand a higher level of attention from the user; without this,
the visualization won't be as clear or meaningful.
Animated Maps
Animated maps are particularly well-suited for spatio-temporal data as they can
show change of a variable over time, but they certainly have other uses as well.
This lesson introduces two methods for making animated maps: animated tmapsand
the g
ganimatepackage. We’ll also need the packages and objects from the
previous lesson.
47
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
We can see an example of how this works by turning the earlier faceted plot of per
capita GDP by region into an animated gif. Instead of showing all facets at once in a
grid, we've looped each image into a gif, displaying them one at a time.
48
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
LEARN MORE:
Let's mimic an example that Pederson used in his keynote. After binding together
transformed data sets into one object, we can achieve the animation below with just
one additional line to our plot: t
ransition_states(), which mimics
49
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
Instead of visualizing per capita GDP as the previous animation did, let’s visualize
nominal GDP. Each frame may not be the most effective visualization, but animating
them together is certainly attention-grabbing.
50
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
Interactive Maps
Interactive maps have perhaps even greater storytelling potential than animations as
they allow the user, in some respects, to create their own narrative. Certain features,
such as panning and zooming, allow a level of freedom just not possible with a static
or even animated map.
If you need interaction, there are many levels of interaction to consider. In the case
of choropleths, often a simple tooltip with the exact value the color represents can
add value. But, of course, interaction can accomplish much more. For example, we
can change base maps to plot different kinds of geography. We can let users select
51
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
and filter their own data to be plotted. Users can also modify other aspects of a plot,
such as the color scheme or statistical transformations. Interactive maps also give
users opportunities for brushing and linking. Based on user input or selection, we
can design a reaction in another view. These are just a few of the possibilities, and
so it's important to think carefully about what you actually need.
For simple interactivity, you might try the ggiraphpackage from David Gohel. The
basic idea is to pass ggplot()an interactive geom in place of a traditional geom. In
the case of maps, this means using g eom_sf_interactive()instead of geom_sf().
After specifying additional arguments like t ooltip
,o nclickand d
ata_id
, simply
call g
giraphon the saved g gobject.
The code below adds a simple tooltip that includes state name and sex ratio when
hovering over the previous sex ratio map.
52
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
53
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
Interactive tmaps
One of the strongest reasons to add interactivity to a static map is the ability to layer
geospatial data (such as points or polygons) on top of base maps that depict
physical or human geography. Perhaps the easiest way to achieve this advantage is
by plotting t
mapobjects in t
map
’s "view" mode.
By default, tmap_mode()is set to "plot", but changing this argument to "view" can
make any tmapobject an interactive plot. This option builds on top of Leaflet, which
we'll cover below.
54
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
55
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
NOTE:
56
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
markers, polygons and popups. It is also very easy to embed Leaflet map widgets
into RMarkdown documents, web pages or Shiny apps.
One of the key advantages of Leaflet is the ability to draw on a huge range of map
tiles (which might include roads or natural features) that can lie underneath your data
set. Moreover, if you want to represent layers of spatial data, for instance as points
on top of polygons, then Leaflet is there for you.
The ability to interact with layers of spatial data comes in handy for enhancing our
earlier dot density map of rural and urban population. To address the overplotting of
dots, it can be useful to toggle between urban and rural populations, as shown
below.
In order to achieve this effect in Leaflet, we first need to use the st_cast()function
to convert our earlier multi-points into individual points because Leaflet doesn't
support multi-point objects at this time. We also need to transform the projected CRS
to a geographic CRS (longitude and latitude coordinates) for plotting in Leaflet. Then
we establish each respective layer as a "group". Remember, here one dot equals
one lakh (100,000) people.
57
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
58
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
LEARN MORE:
● This blog post covers visualizing geospatial data with sfand plotly
.
59
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
NOTE:
plotlycan also be used for brushing and linking views. Brushing refers to
subsetting data based on some kind of user input like a box selection. This user
selection is then linked to a part of the visualization, which reacts to the selection.
60
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
For example, we can design visualizations with multiple views that interact with each
other. The crosstalkpackage lets HTML widgets share data and "talk" to each
other. Taking advantage of the c
rosstalkpackage, p lotlyis able to "link views".
The c
rosstalk::bscols()function arranges HTML elements or widgets in
Bootstrap columns. This allows for persistent or generalized selection across
elements.
61
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
We can even aggregate brush selections into a "persistent selection". For instance,
we can examine the distribution of a variable like sex ratio in a histogram; then select
points at the low or high end of the distribution, and watch the map highlight which
states fall into the selected bin(s).
62
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
There may be simpler ways to more effectively communicate this data, but it helped
us get an idea of what is possible with plotly
.
As shown in this gallery, there are now more than 100 registered HTML widgets for
R. Communicating the full extent of their capabilities is often best done through a
web application. In general, a complete application affords more possibilities that
aren't possible with only the techniques above, such as downloading output after
some kind of interaction.
Building a complete web application is the final step for interactivity. Within R, Shiny
is the way to achieve this. Shiny is an R package that makes it easy to build
interactive web apps straight from R without any knowledge of web development
languages like HTML, CSS or JavaScript.
LEARN MORE:
63
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
Beyond these minimum two files, larger projects often involve a few other important
components. One is a separate data folder that holds all of the data read into the
app. Another is a file, perhaps named g lobal.R , that reads in data files, sets global
variables, and contains functions to be used in s
erver.R .
Lastly, you might add a styles.cssfile for custom styling. I chose to add
includeCSS(styles.css)inside the header tag within my u i.Rfile. This allowed
us to override any of the app’s default styling in a separate file without distracting
from the structure of u
i.R
.
LEARN MORE:
After you get a handle on these concepts, it can be helpful to view the code
for mapping applications in Shiny. Examples include those found in the
Leaflet documentation, Geocomputation with R, tmap, and the Shiny gallery
itself. The Shiny documentation also includes an example of an interactive
choropleth. Finally, this blog post is also aimed at beginners starting to learn
Shiny and Leaflet.
64
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
In the Shiny app below, you can construct a choropleth of any variable in the data
set for any subset of India’s regions. You can also compare how this choropleth
changes across a number of geographic representations, such as cartograms and
hexbin maps. Further, you can cross-check the data presented in the choropleth with
its corresponding dotplot and table in the adjacent tabs.
LEARN MORE:
Check out Sean Angiolillo's R User Meetup talk and blog about how he built
a much more complex Shiny app — an interactive data visualization of the
1991-2011 Indian Census data depicting “Households Classified by Source
and Location of Drinking Water and Availability of Electricity and Latrine”.
The gif below also shows some of the visualizations the app can generate.
65
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
As far as what Shiny is capable of, this app is still quite simple, but hopefully it
demonstrates some of Shiny’s potential for communicating interesting geospatial
data stories.
Final Thoughts
This lesson and the previous one together have only scratched the surface of R’s
mapping capabilities. R is well-known for its visualization libraries, and this reputation
holds for geospatial data as well. Whether you're creating static, animated or
interactive maps, there's an R package ready to help you create high-quality
visualizations. Just remember to keep in mind your objectives and goals for any
visualization before designing either a static, animated or interactive map.
66
This lesson was written by Sean Angiolillo and was last updated on 29 Jan. 2019.
However, we've yet to really do anything useful with the actual geometry of our
geospatial data. In this lesson, we'll introduce spatial subsetting, an important
family of operations applicable to geospatial data.
67
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
LEARN MORE:
To take an example using our previous data set of Indian states, we might wish to
filter for only states that share a border with Delhi NCR. Or, rather than filtering by
attributes like states or districts, we may only care about states or districts within a
certain distance from a particular point. Spatial subsetting operations allow us to
perform these kinds of manipulations.
Topological Relations
Many types of spatial subsetting operations are available at our fingertips. Different
types of spatial relations are more formally called topological relations. The two
examples given above describe different topological relations. The former is looking
for a common border, or perhaps areas that "touch", whereas the latter is looking for
areas "within" another area.
These functions require a pair of sfgeometry sets — a target object and a selecting
object. Before diving into the specific syntax, let’s first get a sense of how these
relations are defined.
68
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
LEARN MORE:
Preparing Data
Before diving in to the syntax of spatial subsetting, we need some sample data. We'll
use the tidycensusand tigrispackages to download median household income
data for the Philadelphia metro area at the census tract level.
NOTE:
In the US Census hierarchy, census tracts are below counties and above
block groups. See Kyle Walker’s tigrisslides for more information.
LEARN MORE:
69
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
We now have 1,186 census tracts covering the Philadelphia metropolitan area. This
is a larger area than we want to cover so we'll spatially subset this data set to a
70
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
smaller area based on distance from a central point of interest, in this case
Philadelphia’s City Hall.
Before we can do this, however, it's important to pay attention to the coordinate
reference system (CRS) of our geospatial data. The commands below show that the
data has a geographic CRS with EPSG code 4269.
In order to use spatial subsetting operations, we need to reproject our data from a
geographic CRS to a projected CRS. In this case, we’ve chosen to use EPSG code
2272.
71
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
LEARN MORE:
Now that we have projected census tracts, we'll define a circle and use it as the
second geometry feature set by which we'll subset the census tracts.
As shown in the map below, with these two simple feature geometry sets in the
same projected CRS, we are ready to spatially subset.
72
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
The syntax is remarkably simple with the square bracket method. It's very similar to
bracket subsetting of a dataframe. But inside the square bracket, where a logical
expression would filter rows, you just need to place the selecting simple feature
geometry (i.e. a spatial dataframe, s
fc_POLYGON , etc).
In the example below, we subset the original 1,186 census tracts by those that
intersect the circle we defined. The result is a new sfspatial dataframe with 617
observations.
73
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
NOTE:
The 617 observations returned from the intersection plus the 569
observations returned from st_disjoint()sum to the original 1,186 tracts.
74
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
Regardless of which method you choose, all three methods should return the same
number of observations in a spatial dataframe of the same CRS.
To give one example of this tidy workflow, note below how we can start with our
original spatial dataframe, perform a spatial subset (in this case st_within), and
directly pipe the result into g
gplot2
. As we'd expect, the result is a much smaller
and more circular shape, fitting just inside the boundaries of our circle.
75
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
76
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
It shows the same map of census tracts for the Philadelphia metropolitan area. After
choosing a topological relation, you can position a circle of any size over the map of
census tracts and click to perform a spatial subset.
● The syntax of the given spatial subset, in both square bracket and dense
matrix methods, appears in the right-hand panel.
● The histogram for the selection is plotted against the original distribution of
census tracts.
● The choropleth color scale and legend adjusts to the selection’s domain. This
can be used to reveal more detailed variation within a region. For example,
income levels vary widely between wealthier suburbs and core urban areas in
the Philadelphia metro region. Spatially subsetting a smaller, more
homogenous geographic area can show new patterns.
Final Thoughts
With this lesson and the Shiny apps as tools, you've hopefully learned:
Hopefully, now you are well on your way to becoming as comfortable spatially
subsetting your data as if it were simply attribute data.
Keep reading for one final lesson on how to explore satellite images, one of the best
forms of geospatial data today!
77
This lesson was written by Himanshu Sikaria and was last updated on 29 Jan. 2019.
In the previous lessons, we talked about how to handle basic geospatial data — any
data with a geographic component. Our previous examples used data sets with
several economic indicators for each Indian state.
However, there's a more complex form of geospatial data — raster images, which is
data captured by satellites orbiting the Earth. Raster images can be far more difficult
to find and process, but their high level of detail and frequent updates make them
incredibly valuable for analysis.
This lesson focuses on the basics of raster images — what they are, where to get
them, how to extract and process them, and what basic operations and analysis you
can do on them. We'll illustrate all of this by examining satellite data for a rural region
of Karnataka, a state in south India.
A raster file is an image of the Earth, which is geotagged. (That means that we can
find the exact location of any of raster image on the world map.) Like any other
image, raster images are made up of cells (pixels), and each cell has a value
associated with it.
78
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
LEARN MORE:
Looking for more information about raster images? This blog by ArcGIS
explains raster images perfectly, and these flashcards explain the jargon of
raster imagery.
Raster Attributes
Every raster scene has various attributes, or parameters. These can be accessed by
@— for example, rastername@extent .
2. resolution: The size of each cell (or pixel) that makes up the entire image.
This value is in degrees for the example below (1 degree ~ 110 kms).
3. extent: The latitude and longitude of the image's top right point and bottom
left point.
4. coord. ref.: This is the current raster file's projection. The Earth is a sphere,
and it needs to be projected to be converted to 2D.
5. values: The minimum and maximum values among all the cells in the raster.
LEARN MORE:
Learn more about geographic projections with this video from Vox.
79
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
80
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
The latest satellite, Landsat 8, orbits the Earth every 16 days and captures more
than 700 satellite images per day across 9 spectral bands and 2 thermal bands. Its
imagery has been used for everything from finding drought-prone areas and
monitoring coastal erosion to analyzing an area’s fire probability and setting the best
routes for electricity lines.
Landsat 8 Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS)
images consist of nine spectral bands. Bands 1 to 7 and 9 have a spatial resolution
of 30 meters, Band 8 (panchromatic) is 15 meters, and Bands 10 and 11 are 100
meters. The ultra blue Band 1 is useful for coastal and aerosol studies. Band 9 is
useful for cirrus cloud detection. Thermal Bands 10 and 11 are useful for providing
more accurate surface temperatures.
81
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
USGS gives free, public access to both its raw and processed satellite images. Raw
images are available on AWS S3 and Google Cloud Storage, where they can be
downloaded immediately. Processed images are available with the EROS Science
Processing Architecture (ESPA). Images are also available through a variety of data
products, such as SR (Surface Reflectance), TOA (Top of Atmosphere) and BR
(Brightness Temperature).
Accessing the processed Landsat 8 data can be tricky. There are two different APIs
— one by Development Seed for searching (called sat-api) and one by USGS for
downloading (called espa-api). Download requests have to include the product ID,
projection, and format of the data, then they must be approved by USGS, which can
take anywhere from a couple minutes to a couple days. To make matters worse, the
APIs input and output data with different structures.
LEARN MORE:
New to Landsat 8 data? Here's lots more information to get you started:
Landsat divides the entire earth into grids, each with a unique row and path. This tool
from USGS can be used to convert a latitude and longitude to a path and row.
82
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
A part of Karnataka that had a major drought lies in path 145 and row 49. Let’s
download the latest imagery for this grid using rLandsatfunctions.
NOTE:
83
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
The next step is to load the rasters to R using the rasterlibrary. The file size of
rasters are generally huge (one Landsat tile has about 60 million pixels), so the
rasterlibrary doesn't load the entire data to memory; only when required, the
functions call the values and processes them in chunks. As a result, the raster
library saves every intermediate variable in the temp folder.
LEARN MORE:
First, let’s try to load the data for different bands in R using the raster library, and
then we'll plot any one of them. To read a single raster image, we can use the
raster()function. To read a stack (multiple) of rasters at once, we can use the
stack()function. Printing the raster/stack file will give brief information about the
raster.
84
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
85
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
Plotting a RasterStack with the visible bands gives an image of exactly how the
human eye would see this piece of Earth from space.
86
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
87
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
88
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
In these images, we can clearly see the change in greenery over the years — 2014
is the most green, and 2017 is the least. In fact, in 2017, Karnataka faced the worst
drought in 42 years.
In the next section, let’s see how we can quantify this change and check which
regions were worst affected by the drought.
89
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
90
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
Basic Operations
Doing raster operations is easy — most of the time, it can be treated as a numeric
vector. By doing basic algebraic operations on different bands, we can create indices
that better explain the characteristics of the region.
Let’s try to create one of the most extensively used indices from Landsat — NDVI
(Normalized Difference Vegetation Index). NDVI is defined as (Band 5 - Band
4)/(Band 5 + Band 4) .
Negative values of NDVI (values approaching -1) correspond to water. Values close
to zero (-0.1 to 0.1) generally correspond to barren areas of rock, sand or snow.
Low, positive values (approximately 0.2 to 0.4) represent shrub and grassland, while
high positive values (values approaching 1) indicate temperate and tropical
rainforests.
LEARN MORE:
This document from Landsat gives more information about NVDI and all the
other spectral indices you can create using Landsat data.
91
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
92
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
93
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
2017 was definitely a bad year with a massive decrease in NDVI, though 2018
seems to be better with a positive NDVI change.
94
Introduction to GIS: Manipulating and Mapping Geospatial Data in R
Final Thoughts
The power of spatial data is immense, and this is just the beginning of the sort of
work that you can do with satellite imagery. In this lesson, with a few basic
operations and visualizations on freely available data, we analyzed how vegetation in
Karnataka changed over time. With the same data and techniques, we can do more
complex analysis and apply machine learning techniques to further classify land into
different types.
For example, at Atlan, we worked with Landsat 8 data in R to classify every piece of
land in India into one of four categories: water, barren, green, or built-up regions.
The results were really interesting — we could detect where and when new buildings
and houses were being built, green regions turned into barren ones, and rivers dried
up.
Land classification for Karnataka. (Black is built-up, yellow is barren, blue is water,
and green is green land.)
95