Gap Analysis
Gap Analysis
Gap Analysis
This is an code adaptation to develop an spatial gap index analysis sensu A methodological framework to
quantify the spatial quality of biological databases
The follow R code is an adaptation to get a final gap index map. For save the results in .PDF an .PNG
please uncomment the lines starting with #pdf(, #png(, and #dev.off(
The first step is load information. For this is required to have both a database with biological records and
the GIS layers. The GIS information can be retived from climatic folder and bias factors.
The datase must be ´data.frame´ class table with the follow columns: * Unique identifier * species name *
latitude * longitude
We validate the original records in taxonomical and geographical way with available codes. For this reason
we also incluide an acceptedNameUsage field. In order to replicate this analysis we recommend tho change
the column names to showed below instead of changing script if you are not R user.
Consider change the folder paths, check the encoding format, tables separator values and install libraries
dependences due the compatibility sytem. In our case we use the follow parameters:
1
The subproduct derived from the script could take several minutes to be created. For this reason the most
belated objects won’t be re-reprocessed if already exist in the outpath folder (outPlotDir). If you wan to
create it again just remove them from containign folder or set a new outpath.
1. Data loading
The initial step is prepare data, paths and functions and load libraries.
# ---------------------------
# 0. Data loading
# ---------------------------
library(raster)
library(maptools)
library(rgdal)
library(SPECIES)
library(classInt)
source('root/GAPfunctions.R')
outPlotDir <- 'root/output'
ruta_factores <- 'root/GIS/FACTORES'
ruta_ambientales <- 'root/GIS/AMBIENTALES'
rm(celdas)
As result of this step we shold have a path’s to bias factor layers, climate layers and output plots and the
follow objects: 1. Original table (DATOS) 2. Point polygon layer (DATOS2) 3. Polygon layer (mask) of
the region of interest (colombia) 4. An ‘ID’ raster of area of interest (grilla) 5. ‘Masked’ ID raster with the
polygon of area of interest (en_area) 6. A column indicating the pixel in which each pairs of coordinates over
(‘celdas’ column in DATOS table)
2. Records density
As the article show, there’s tree components to calculet the gap index. The first is the record density and is
estimated as the number of records over each pixel:
# ---------------------------
# 1. Records density
# ---------------------------
2
DATOS2$CONT <- 1
if (file.exists('INDICE_DENSIDAD.tif')){
densRegistros <- raster('INDICE_DENSIDAD.tif')
} else {
densRegistros <- rasterize(DATOS2, en_area, field = DATOS2$CONT, fun = sum, na.rm = TRUE)
densRegistros[is.na(densRegistros[])] <- 0
densRegistros <- mask(densRegistros, colombia, filename = "INDICE_DENSIDAD.tif", overwrite = TRUE)
}
if (file.exists('localidades.tif')){
localidades <- raster('localidades.tif')
} else {
localidades <- rasterize(DATOS2[, 'CONT'], en_area, fun = function(x, ...) {
length(unique(na.omit(x)))
}, filename = "localidades.tif")[[2]]
}
3
plot(colombia, main = "UBICACION DE LOS REGISTROS COLOMBIA")
plot(DATOS2, add = TRUE, pch = 20, col = rgb(139, 0, 0, 100, maxColorValue = 255, alpha = 0.2))
#dev.off()
# Let's generate a png file with the density plot. Could take several minutes becouse of table size
#pdf(file = paste0(outPlotDir, "/1.INDICE_densidad.pdf"))
plot(densRegistros, col = rev(topo.colors(10)), main="DENSIDAD DE PUNTOS/ 10Km2")
plot(colombia, add = TRUE)
4
DENSIDAD DE PUNTOS/ 10Km2
10
1200000
1000000
800000
5
600000
400000
200000
0
5
1e+06
10
s
den
rdinate
5e+05
5
sity
Y coo
0e+00
−78
−76 0
−74
X co −72
ordin
ates −70
−68
In our data we oberve some points with hig-density that make a particular configuration of color ramp. For
this we are gona to set the max value as the 95th percentile and observe in better scale the result. Note the
diference in legend magnitude.
6
DENSIDAD DE PUNTOS/10 Km2
10 95th
700
600
500
5
400
300
200
100
0
3. Bias in sampling
The second element for the gap index is a bias layer. Section 2.
The follow code will show the bias as distance function taking some physical layers as reference. The result is
just informative due will not take in account for GSI index estimation.
The third plot panel bias could be interpreted as a z-test score. In this sense values above 1.64 or bellow
-1.64 indicate statistical diference in that category.
# ---------------------------
# 2. Bias in sampling
# ---------------------------
DATOS_s <- unique(DATOS[, c('lat', 'lon')])
N <- nrow(DATOS_s)
# ---------------------------
# 2.1 Bias by phisical factors
# ---------------------------
AP <- readOGR(ruta_factores, "protectedAreas")
7
urbano <- readOGR(ruta_factores, "urban")
20
20
2
15
15
15
4.0
0.05
200000
3.5
10
10
10
1
150000
3.0 0.00
5
2.5
100000
0
−0.05
2.0
0
50000
1.5
−1
−0.10
−5
−5
−5
0 1.0
−2
−10
−10
−10
−3
Interval1
Interval2
Interval3
Interval4
−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68
8
sesgo_rios <- BIAS(biasLayer = rios, rasterMask = grilla, layerName = "RIOS", outDir = outPlotDir)
3
20
20
20
2
15
15
15
4.0
5e+05 0.10
1
3.5
10
10
10
0.05
4e+05
3.0 0.00
3e+05
0
5
5
2.5 −0.05
2e+05
−0.10
2.0
−1
0
0
1e+05 −0.15
1.5
−0.20
−5
−5
−5
0e+00 1.0
−2
−10
−10
−10
−3
Interval1
Interval2
Interval3
−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68 Interval4 −78 −76 −74 −72 −70 −68
9
Distance to Levels Bias index value in Bias
20 VIAS VIAS VIAS VIAS
20
20
2
15
15
15
4.0
300000
1
0.05
3.5
10
10
10
250000
3.0
0.00
200000
0
5
5
2.5
150000
−0.05
2.0
100000
0
0
−1
50000 1.5 −0.10
−5
−5
−5
0 1.0
−2
−10
−10
−10
−3
Interval1
Interval2
Interval3
Interval4
−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68
sesgo_urbano <- BIAS(biasLayer = urbano, rasterMask = grilla, layerName = "CASCOS_URBANOS", outDir = out
10
Distance to Levels Bias index value in Bias
CASCOS_URBANOS CASCOS_URBANOS CASCOS_URBANOS CASCOS_URBANOS
3
20
20
20
2
15
15
15
4.0
150000 0.02
3.5
10
10
10
1
3.0 0.00
100000
5
5
2.5 −0.02
0
2.0 −0.04
50000
0
0
1.5 −0.06
−1
−5
−5
−5
0 1.0
−2
−10
−10
−10
Interval1
Interval2
Interval3
Interval4
−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68
#dev.off()
The follow code will show the bias usi climatic layers, slope and elevation as reference. The main result is the
diference between the observed values for each layer given the coordinates vs. a random values. A kolmogorov
test is used.
# ---------------------------
# 2.2 Bias by environment
# ---------------------------
11
VARIABLES <- read.csv(paste0(outPlotDir, '/variablesDF.csv'))
} else {
VARIABLES <- ambientales[1:ncell(ambientales)]
VARIABLES <- na.omit(VARIABLES)
write.csv(VARIABLES, paste0(outPlotDir, '/variablesDF.csv'), row.names = FALSE)
}
for (i in 1:numPredictors){
12
DENSI_VAR <- density(VARIABLES[, i])
DENSI_MUES <- density(MUESTREO[, i])
minx <- min(min(DENSI_VAR$x), min(DENSI_MUES$x))
maxx <- max(max(DENSI_VAR$x), max(DENSI_MUES$x))
miny <- min(min(DENSI_VAR$y), min(DENSI_MUES$y))
maxy <- max(max(DENSI_VAR$y), max(DENSI_MUES$y))
0.30
Density
Density
Sampled Sampled Sampled
0.04
0.15
0.4
0.00
0.00
0.0
0 50 100 150 200 250 300 60 80 100 120 65 70 75 80 85 90 95
Density
0.00
0 200 400 600 800 1000 1200 1400 50 100 150 200 250 300 350 −100 −50 0 50 100 150 200 250
Density
Density
0.04
0.10
0.00
0.00
0.00
60 80 100 120 140 160 0 50 100 150 200 250 300 0 50 100 150 200 250 300
0.0010
Density
Density
0.0000
0.00
0.00
0 50 100 150 200 250 300 0 50 100 150 200 250 300 0 2000 4000 6000 8000 10000 12000
#dev.off()
13
PP−MES−MAS−HUMEDO PP−MES−MAS−SECO PP−ESTACIONAL
0.015
0.010
Density
Density
Density
Sampled Sampled Sampled
0.10
0.000
0.000
0.00
200 400 600 800 1000 1200 0 200 400 600 0 20 40 60 80 100 120
0.004
Density
Density
Density
0.003
0.003
0.000
0.000
0 500 1000 1500 2000 2500 3000 3500 0 500 1000 1500 2000 0 500 1000 1500 2000 2500
4e−06
0.003
Density
Density
Density
Sampled Sampled Sampled
0.000 0.003
0e+00
0.000
0 500 1000 1500 2000 2500 3000 0 1000 2000 3000 4000 5000 0e+00 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06
The climate bias layer will be estiated using BIAS() function for each variable. Finally all bias layers are
sumed.
#dev.off()
14
writeRaster(d, "INDICE_AMBIENTAL.tif", overwrite = TRUE)
15
2
4.0
10
10
0.15
1
3.5
0.10
3.0
0 0.05
5
5
2.5
0.00
2.0 −0.05
−1
0
0
1.5 −0.10
−0.15
1.0
−2
−5
−5
−3
Interval1
Interval2
Interval3
Interval4
−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68
The final component is the species completness. For this case we are gonna to use Jackknife and bootstrap
estimates for two databse columns: original species and accepted species name.
# ---------------------------
# 3. Data base completness
# ---------------------------
treshold <- 0
15
estimateSHQ <- richEst(spListByCellHQ$acceptedNameUsage, spListByCellHQ$celdas)
rm(spListByCell, spListByCellHQ)
compRichBoot <- compRichJack <- richJackHQ <- richBootHQ <- richJack <- richBoot <- en_area * 0
richBoot[as.numeric(rownames(estimateS))] <- estimateS$Boot
richJack[as.numeric(rownames(estimateS))] <- estimateS$JNhat
richBootHQ[as.numeric(rownames(estimateSHQ))] <- estimateSHQ$Boot
richJackHQ[as.numeric(rownames(estimateSHQ))] <- estimateSHQ$JNhat
10
3000 4000
2500
3000
2000
5
1500 2000
1000
1000
500
0 0
0
−85 −80 −75 −70 −65 −60 −85 −80 −75 −70 −65 −60
10
2500 3500
3000
2000 2500
5
1500 2000
1000 1500
1000
500 500
0 0
0
−85 −80 −75 −70 −65 −60 −85 −80 −75 −70 −65 −60
16
plot(compRichBoot, main = paste('Bootstrap:', nrow(estimateS)))
plot(colombia, add = TRUE)
plot(compRichJack, main = paste('Jackknife:', nrow(estimateS)))
plot(colombia, add = TRUE)
10
1.0 1.0
0.8 0.8
0.6 0.6
5
5
0.4 0.4
0.2 0.2
0.0 0.0
0
−85 −80 −75 −70 −65 −60 −85 −80 −75 −70 −65 −60
2.0
6
1.5
Density
Density
4
1.0
2
0.5
0.0
0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2
richBootVals richJackVals
#dev.off()
Finally sum the three standardized components using the article formula
# ---------------------------
# 4. GSI / Gap selection index
# ---------------------------
17
# Layer standardization
writeRaster(DENSIDAD,"INDICE_DENSIDAD_EST.tif", overwrite=TRUE)
writeRaster(AMBIENTAL,"INDICE_AMBIENTAL_EST.tif", overwrite=TRUE)
writeRaster(COMPLEMENTARIEDAD_BOOT,"INDICE_COMPLEMENTARIEDAD_BOOT_EST.tif", overwrite=TRUE)
writeRaster(COMPLEMENTARIEDAD_JACK,"INDICE_COMPLEMENTARIEDAD_JACK_EST.tif", overwrite=TRUE)
# Plot
## Records ubication
#png(file = paste0(outPlotDir, "/4 INDICE_final_ubicacion_registros.png")) ##comienza la grafica tipo pd
par(mfrow = c(1, 1))
plot(colombia, main = "UBICACION DE \n LOS REGISTROS")
plot(DATOS2, add = T, pch = 20, cex = 0.1, col = rgb(0.1, 0.2, 1, 0.01))
UBICACION DE
LOS REGISTROS
#dev.off()
## Records density
par(mfrow=c(1, 2))
plot(densRegistros, main = 'Records density', zlim = c(0, quantile(densRegistros[], .95, na.rm = TRUE)))
18
plot(colombia, add = TRUE, border = 'darkgrey')
plot(DENSIDAD, main = 'Records density', zlim = c(0, quantile(DENSIDAD[], .95, na.rm = TRUE)))
plot(colombia, add = TRUE, border = 'darkgrey')
10
700
5e−04
600
500 4e−04
5
5
400
3e−04
300
2e−04
200
1e−04
100
0
0 0e+00
−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68
## Environmental bias
par(mfrow=c(1, 2))
plot(mask(d, colombia), main = 'd Index')
plot(colombia, add = TRUE)
plot(mask(AMBIENTAL, colombia), main = 'd Index \n Normalized')
plot(colombia, add = TRUE)
19
d Index
d Index
10 Normalized
10
1.0
70
60
0.8
50
5
5
0.6
40
30 0.4
20
0.2
10
0
0
0 0.0
−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68
par(mfrow=c(2, 2))
dValsNoNA <- d[!is.na(d[])]
dValsNoNANoZeros <- d[!is.na(d[]) & d[] != 0]
hist(dValsNoNA, main = 'd Index no NA no 0')
hist(dValsNoNANoZeros, main = 'd Index \n no NA no 0')
## Completness
par(mfrow = c(1, 2))
20
d Index
d Index no NA no 0
no NA no 0
15000
5000
4000
10000
Frequency
Frequency
3000
2000
5000
1000
0
0
0 20 40 60 30 40 50 60 70
Ambiental
Ambiental no NA
dValsNoNA dValsNoNANoZeros
no NA no 0
4000
15000
3000
10000
Frequency
Frequency
2000
5000
1000
0
0.0 0.2 0.4 0.6 0.8 1.0 0.4 0.5 0.6 0.7 0.8 0.9 1.0
21
Bootstrap Jackknife
10
10
1.0 1.0
0.8 0.8
5
5
0.6 0.6
0.4 0.4
0.2 0.2
0
0
0.0 0.0
−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68
par(mfrow=c(2, 2))
22
Bootstrap
Bootstrap no NA
no NA no 0
1200
7000
1000
5000
800
Frequency
Frequency
600
3000
400
200
1000
0
0
0.0 0.2 0.4 0.6 0.8 1.0 0.75 0.80 0.85 0.90 0.95 1.00
Jackknife no NA no 0
bootValsNoNA Jackknife no NA no 0
bootValsNoNANoZeros
7000
600
500
5000
400
Frequency
Frequency
300
3000
200
1000
100
0
0.0 0.2 0.4 0.6 0.8 1.0 0.5 0.6 0.7 0.8 0.9 1.0
23
GAP SELCTION INDEX (BOOT) GAP SELCTION INDEX (JACK)
10
10
1.0 1.0
0.8 0.8
5
5
0.6 0.6
0.4 0.4
0.2 0.2
0
0
0.0 0.0
−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68
24
Jackk Boot
10
8
8
6
6
Density
Density
4
4
2
2
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
25
GSI values
10
Jackknife
Bootstrap
8
6
Density
4
2
0
#dev.off()
26
GAP SELCTION INDEX (BOOT) GAP SELCTION INDEX (JACK)
10
10
1.0 1.0
0.8 0.8
5
5
0.6 0.6
0.4 0.4
0.2 0.2
0
0
0.0 0.0
−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68
27