Gap Analysis

Gap analysis
Ivan Gonzalez - [email protected]

Wednesday, December 02, 2015
This is an code adaptation to develop an spatial gap index analysis sensu A methodological framework to
quantify the spatial quality of biological databases
The follow R code is an adaptation to get a final gap index map. For save the results in .PDF an .PNG
please uncomment the lines starting with #pdf(, #png(, and #dev.off(
The first step is load information. For this is required to have both a database with biological records and
the GIS layers. The GIS information can be retived from climatic folder and bias factors.
The datase must be ´data.frame´ class table with the follow columns: * Unique identifier * species name *
latitude * longitude
We validate the original records in taxonomical and geographical way with available codes. For this reason
we also incluide an acceptedNameUsage field. In order to replicate this analysis we recommend tho change
the column names to showed below instead of changing script if you are not R user.
DATOS <- read.csv('root/DATOS.csv', as.is = TRUE)

head(DATOS)
## ID source lat lon species acceptedNameUsage

## 1 415 GBIF 12.55 -81.73 Passiflora suberosa Passiflora suberosa
## 2 416 GBIF 12.58 -81.70 Scleria melaleuca Scleria gaertneri
## 3 417 GBIF 12.58 -81.70 Eleocharis caribaea Eleocharis geniculata
## 4 419 GBIF 12.58 -81.70 Cyperus rotundus Cyperus rotundus
## 5 421 GBIF 12.58 -81.70 Cyperus luzulae Cyperus luzulae
## 6 423 GBIF 12.53 -81.72 Acalypha alopecuroidea <NA>
Consider change the folder paths, check the encoding format, tables separator values and install libraries
dependences due the compatibility sytem. In our case we use the follow parameters:
## R version 3.0.1 (2013-05-16)

## Platform: x86_64-w64-mingw32/x64 (64-bit)
##
## locale:
## [1] LC_COLLATE=Spanish_Colombia.1252 LC_CTYPE=Spanish_Colombia.1252
## [3] LC_MONETARY=Spanish_Colombia.1252 LC_NUMERIC=C
## [5] LC_TIME=Spanish_Colombia.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] classInt_0.1-21 SPECIES_1.0 rgdal_0.9-2 maptools_0.8-34
## [5] raster_2.3-40 sp_1.0-17
##
## loaded via a namespace (and not attached):
## [1] class_7.3-7 digest_0.6.3 e1071_1.6-1 evaluate_0.5.5
## [5] foreign_0.8-54 formatR_1.0 grid_3.0.1 htmltools_0.2.6
## [9] knitr_1.8 lattice_0.20-15 rmarkdown_0.3.11 stringr_0.6.2
## [13] tools_3.0.1 yaml_2.1.13
1
The subproduct derived from the script could take several minutes to be created. For this reason the most
belated objects won’t be re-reprocessed if already exist in the outpath folder (outPlotDir). If you wan to
create it again just remove them from containign folder or set a new outpath.
1. Data loading
The initial step is prepare data, paths and functions and load libraries.
# ---------------------------
# 0. Data loading
# ---------------------------
library(raster)
library(maptools)
library(rgdal)
library(SPECIES)
library(classInt)
source('root/GAPfunctions.R')
outPlotDir <- 'root/output'
ruta_factores <- 'root/GIS/FACTORES'
ruta_ambientales <- 'root/GIS/AMBIENTALES'
colombia <- readShapePoly(paste0(ruta_factores, "/COLOMBIA.shp"))

grilla <- raster(paste0(ruta_factores, "/grilla 10k.img"))
grilla[] <- 1:ncell(grilla)
en_area <- mask(grilla, colombia)
## Assign the pixel ID to each coordinate

DATOS2 <- DATOS[, c('ID', 'lat', 'lon')]
coordinates(DATOS2)=~lon+lat
celdas <- raster:::extract(en_area, DATOS2)
DATOS$celdas <- celdas
rm(celdas)
As result of this step we shold have a path’s to bias factor layers, climate layers and output plots and the
follow objects: 1. Original table (DATOS) 2. Point polygon layer (DATOS2) 3. Polygon layer (mask) of
the region of interest (colombia) 4. An ‘ID’ raster of area of interest (grilla) 5. ‘Masked’ ID raster with the
polygon of area of interest (en_area) 6. A column indicating the pixel in which each pairs of coordinates over
(‘celdas’ column in DATOS table)
2. Records density
As the article show, there’s tree components to calculet the gap index. The first is the record density and is
estimated as the number of records over each pixel:
# ---------------------------
# 1. Records density
# ---------------------------
2
DATOS2$CONT <- 1
if (file.exists('INDICE_DENSIDAD.tif')){
densRegistros <- raster('INDICE_DENSIDAD.tif')
} else {
densRegistros <- rasterize(DATOS2, en_area, field = DATOS2$CONT, fun = sum, na.rm = TRUE)
densRegistros[is.na(densRegistros[])] <- 0
densRegistros <- mask(densRegistros, colombia, filename = "INDICE_DENSIDAD.tif", overwrite = TRUE)
}
if (file.exists('localidades.tif')){
localidades <- raster('localidades.tif')
} else {
localidades <- rasterize(DATOS2[, 'CONT'], en_area, fun = function(x, ...) {
length(unique(na.omit(x)))
}, filename = "localidades.tif")[[2]]
}
# Let's generate a png file with the density plot

data(wrld_simpl)
par(mfrow = c(1,1))
#png(file = paste0(outPlotDir, "/1-INDICE_densidad.png"))

plot(wrld_simpl, main="UBICACION DE LOS REGISTROS \n BASE DE DATOS INICIAL", col = 'lightyellow')
points(x=coordinates(DATOS2)[, 1], y=coordinates(DATOS2)[, 2], col = rgb(139, 0, 0, 100, maxColorValue =
UBICACION DE LOS REGISTROS

BASE DE DATOS INICIAL
3
plot(colombia, main = "UBICACION DE LOS REGISTROS COLOMBIA")
plot(DATOS2, add = TRUE, pch = 20, col = rgb(139, 0, 0, 100, maxColorValue = 255, alpha = 0.2))
UBICACION DE LOS REGISTROS COLOMBIA
#dev.off()
# Let's generate a png file with the density plot. Could take several minutes becouse of table size
#pdf(file = paste0(outPlotDir, "/1.INDICE_densidad.pdf"))
plot(densRegistros, col = rev(topo.colors(10)), main="DENSIDAD DE PUNTOS/ 10Km2")
plot(colombia, add = TRUE)
4
DENSIDAD DE PUNTOS/ 10Km2
10
1200000
1000000
800000
5
600000
400000
200000
0
−85 −80 −75 −70 −65 −60
persp(densRegistros, xlab = "X coordinates", ylab = "Y coordinates", zlab = "density",

phi = 35, theta = 20, col = "lightblue", expand = .5, ticktype = "detailed")
5
1e+06
10
s
den
rdinate
5e+05
5
sity
Y coo
0e+00
−78
−76 0
−74
X co −72
ordin
ates −70
−68
In our data we oberve some points with hig-density that make a particular configuration of color ramp. For
this we are gona to set the max value as the 95th percentile and observe in better scale the result. Note the
diference in legend magnitude.
q95 <- quantile(densRegistros, c(.95), na.rm = TRUE)

maxVal <- cellStats(densRegistros, 'max')
densRegistros2 <- reclassify(x = densRegistros, matrix(c(q95, maxVal, q95, 0, 0 , NA), nrow = 2, ncol =
densRegistros2[densRegistros2[] == 0] <- NA
plot(densRegistros2, col = rev(topo.colors(10)), main = "DENSIDAD DE PUNTOS/10 Km2\n95th")
6
DENSIDAD DE PUNTOS/10 Km2
10 95th
700
600
500
5
400
300
200
100
0
−85 −80 −75 −70 −65 −60
3. Bias in sampling
The second element for the gap index is a bias layer. Section 2.
The follow code will show the bias as distance function taking some physical layers as reference. The result is
just informative due will not take in account for GSI index estimation.
The third plot panel bias could be interpreted as a z-test score. In this sense values above 1.64 or bellow
-1.64 indicate statistical diference in that category.
# ---------------------------
# 2. Bias in sampling
# ---------------------------
DATOS_s <- unique(DATOS[, c('lat', 'lon')])
N <- nrow(DATOS_s)
# ---------------------------
# 2.1 Bias by phisical factors
# ---------------------------
AP <- readOGR(ruta_factores, "protectedAreas")
## OGR data source with driver: ESRI Shapefile

## Source: "root/GIS/FACTORES", layer: "protectedAreas"
## with 407 features
## It has 7 fields
7
urbano <- readOGR(ruta_factores, "urban")

## Source: "root/GIS/FACTORES", layer: "urban"
## It has 6 fields
rios <- readOGR(ruta_factores, "riversMain")

## Source: "root/GIS/FACTORES", layer: "riversMain"
## It has 8 fields
vias <- readOGR(ruta_factores, "roads")

## Source: "root/GIS/FACTORES", layer: "roads"
## It has 17 fields
#pdf(file = paste0(outPlotDir, "/2.1 INDICE_factores_sesgos.pdf "))

sesgo_ap <- BIAS(biasLayer = AP, rasterMask = grilla, layerName = "AREA_PROTEGIDA", outDir = outPlotDir
Distance to Levels Bias index value in Bias

AREA_PROTEGIDA AREA_PROTEGIDA AREA_PROTEGIDA AREA_PROTEGIDA
3
20
20
20
2
15
15
15
4.0
0.05
200000
3.5
10
10
10
1
150000
3.0 0.00
5
2.5
100000
0
−0.05
2.0
0
50000
1.5
−1
−0.10
−5
−5
−5
0 1.0
−2
−10
−10
−10
−3
Interval1
Interval2
Interval3
Interval4
−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68
8
sesgo_rios <- BIAS(biasLayer = rios, rasterMask = grilla, layerName = "RIOS", outDir = outPlotDir)

RIOS RIOS RIOS RIOS
3
20
20
20
2
15
15
15
4.0
5e+05 0.10
1
3.5
10
10
10
0.05
4e+05
3.0 0.00
3e+05
0
5
5
2.5 −0.05
2e+05
−0.10
2.0
−1
0
0
1e+05 −0.15
1.5
−0.20
−5
−5
−5
0e+00 1.0
−2
−10
−10
−10
−3
Interval1
Interval2
Interval3
−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68 Interval4 −78 −76 −74 −72 −70 −68
sesgo_vias <- BIAS(biasLayer = vias, rasterMask = grilla,layerName = 'VIAS', outDir = outPlotDir)
9
20 VIAS VIAS VIAS VIAS
20
20
2
15
15
15
4.0
300000
1
0.05
3.5
10
10
10
250000
3.0
0.00
200000
0
5
5
2.5
150000
−0.05
2.0
100000
0
0
−1
50000 1.5 −0.10
−5
−5
−5
0 1.0
−2
−10
−10
−10
−3
Interval1
Interval2
Interval3
Interval4
−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68
sesgo_urbano <- BIAS(biasLayer = urbano, rasterMask = grilla, layerName = "CASCOS_URBANOS", outDir = out
10
CASCOS_URBANOS CASCOS_URBANOS CASCOS_URBANOS CASCOS_URBANOS
3
20
20
20
2
15
15
15
4.0
150000 0.02
3.5
10
10
10
1
3.0 0.00
100000
5
5
2.5 −0.02
0
2.0 −0.04
50000
0
0
1.5 −0.06
−1
−5
−5
−5
0 1.0
−2
−10
−10
−10
Interval1
Interval2
Interval3
Interval4
−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68
#dev.off()
# Remove extra files and keep sesgo_ap, sesgo_rios, sesgo_vias, sesgo_urbano

rm(AP, urbano, rios, vias, DATOS_s, sesgo_ap, sesgo_rios, sesgo_vias, sesgo_urbano, grilla)
The follow code will show the bias usi climatic layers, slope and elevation as reference. The main result is the
diference between the observed values for each layer given the coordinates vs. a random values. A kolmogorov
test is used.
# ---------------------------
# 2.2 Bias by environment
# ---------------------------
# 2.2.1 Compare distributions

## Compare environment all distribution values and sampled values by kolmogorov and Kullback-Leibler div
ambientales <- stack(paste0(ruta_ambientales, '/', c(paste0("bio_", 1:19), "alt", "slope")))

envVarNames <- c("TEMPERATURA-MEDIA-ANUAL", "MEDIA-RANGO-DIURNO", "ISOTERMALIDAD", "ESTACIONALIDAD-TEMPE
"MAX-T-MES-MAS-CALIDO", "MIN-T-MES-FRIO", "RANGO-ANUAL-T", "T-MEDIA-DEL-CUARTO-HUMEDO",
"T-MEDIA-DEL-CUARTO-SECO", "T-MEDIA-DEL-CUARTO-CALIDO", "T-M-DEL-CUARTO-FRIO", "PRECIPITACI
"PP-MES-MAS-HUMEDO", "PP-MES-MAS-SECO", "PP-ESTACIONAL", "PP-CUARTO-HUMEDO", "PP-CUARTO-SEC
"PP-CUARTO-CALIDO", "PP-CUARTO-FRIO", "ALTURA", "PENDIENTE")
# Generate data.frame with all values

if (file.exists(paste0(outPlotDir, '/variablesDF.csv'))){
11
VARIABLES <- read.csv(paste0(outPlotDir, '/variablesDF.csv'))
} else {
VARIABLES <- ambientales[1:ncell(ambientales)]
VARIABLES <- na.omit(VARIABLES)
write.csv(VARIABLES, paste0(outPlotDir, '/variablesDF.csv'), row.names = FALSE)
}
# Generate data.frame with observed values

if (file.exists(paste0(outPlotDir, '/samplingDF.csv'))){
MUESTREO <- read.csv(paste0(outPlotDir, '/samplingDF.csv'))
} else {
MUESTREO <- extract(ambientales, DATOS2)
MUESTREO <- na.omit(MUESTREO)
write.csv(MUESTREO, paste0(outPlotDir, '/samplingDF.csv'), row.names = FALSE)
}
#pdf(file = paste0(outPlotDir, "/2.2 INDICE_COMPARACION_VARIABLES.pdf "))

par(mfrow = c(4, 3))
numPredictors <- dim(ambientales)[3]

RESULT <- NULL
for (i in 1:numPredictors){
# Generate a random vector of complete variable with sampled vector size

sample.Var.i <- sample(VARIABLES[, i], nrow(MUESTREO), replace = T)
# Compare both random and sampled vector

COMPARACION <- ks.test(sample.Var.i, MUESTREO[, i], )
VAR <- envVarNames[i]

media_var <- mean(VARIABLES[, i])
media_DATOS <- mean(MUESTREO[, i])
des_var <- sd(VARIABLES[, i])
des_DATOS <- sd(MUESTREO[, i])
coefvar_var <- des_var/media_var
coefvar_DATOS <- des_DATOS/media_DATOS
D <- COMPARACION$statistic
Pval <- COMPARACION$p.value
if (Pval <= 0.05){
DESICION <- "Equal distributions"
} else {
DESICION <- "Diferent distributions"
}
Nceldas <- ncell(VARIABLES[, i])
PI <- as.data.frame(table(MUESTREO[, i])/nrow(MUESTREO))
QTOTAL <- as.data.frame(table(VARIABLES[, i])/Nceldas)
QI <- QTOTAL[match(PI[, 1], QTOTAL[, 1]), ]
Q <- cbind(QI[, 2], PI[, 2])

Q <- as.matrix(Q)
#KLdiv(Q,overlap=F, method="discrete")
#DIVERGENCIA=KLdiv(Q,overlap=F,method="discrete",na.rm=T)[1,2]
12
DENSI_VAR <- density(VARIABLES[, i])
DENSI_MUES <- density(MUESTREO[, i])
minx <- min(min(DENSI_VAR$x), min(DENSI_MUES$x))
maxx <- max(max(DENSI_VAR$x), max(DENSI_MUES$x))
miny <- min(min(DENSI_VAR$y), min(DENSI_MUES$y))
maxy <- max(max(DENSI_VAR$y), max(DENSI_MUES$y))
plot(density(VARIABLES[,i]), col='black', xlab = '', main=VAR, xlim=c(minx, maxx), ylim=c(miny,maxy),

sub = bquote(p(H[a]): ~ .(Pval)))
lines(density(MUESTREO[,i]), col= "red", main=VAR)
legend('topright', lty = c(1, 1), col = c('black', 'red'), legend = c('All country', 'Sampled'))
RESUMEN <- cbind(VAR, media_var, media_DATOS, des_var, des_DATOS, coefvar_var, coefvar_DATOS,
D, Pval, DESICION)
RESULT <- rbind(RESULT, RESUMEN)
}
TEMPERATURA−MEDIA−ANUAL MEDIA−RANGO−DIURNO ISOTERMALIDAD

0.08
0.30
All country All country All country

Density
Density
Density
Sampled Sampled Sampled
0.04
0.15
0.4
0.00
0.00
0.0
0 50 100 150 200 250 300 60 80 100 120 65 70 75 80 85 90 95
p(Ha) : 0 p(Ha) : 0 p(Ha) : 0
ESTACIONALIDAD−TEMPERATURA MAX−T−MES−MAS−CALIDO MIN−T−MES−FRIO
0.00 0.03 0.06

0.015
Density
Density
Sampled Sampled Density Sampled

0.04
0.000
0.00
0 200 400 600 800 1000 1200 1400 50 100 150 200 250 300 350 −100 −50 0 50 100 150 200 250
p(Ha) : 0 p(Ha) : 0 p(Ha) : 0
RANGO−ANUAL−T T−MEDIA−DEL−CUARTO−HUMEDO T−MEDIA−DEL−CUARTO−SECO

0.08
0.20

Density
Density
Density

0.04
0.04
0.10
0.00
0.00
0.00
60 80 100 120 140 160 0 50 100 150 200 250 300 0 50 100 150 200 250 300
p(Ha) : 0 p(Ha) : 0 p(Ha) : 0
T−MEDIA−DEL−CUARTO−CALIDO T−M−DEL−CUARTO−FRIO PRECIPITACION−ANUAL

0.08
0.08
0.0010

Density
Density
Density

0.04
0.04
0.0000
0.00
0.00
0 50 100 150 200 250 300 0 50 100 150 200 250 300 0 2000 4000 6000 8000 10000 12000
p(Ha) : 0 p(Ha) : 0 p(Ha) : 0
#dev.off()
# Write table with summary for each variable

write.csv(RESULT, "goodness_fit.csv", row.names = FALSE)
# Remove extra objects

rm(MUESTREO, VARIABLES, RESULT, RESUMEN, VAR, media_var, media_DATOS, des_var, des_DATOS, coefvar_var, c
D, Pval, DESICION, DENSI_VAR, DENSI_MUES, sample.Var.i, Q, QI, QTOTAL)
13
PP−MES−MAS−HUMEDO PP−MES−MAS−SECO PP−ESTACIONAL
0.015
0.010
Density
Density
Density
0.10
0.000
0.000
0.00
200 400 600 800 1000 1200 0 200 400 600 0 20 40 60 80 100 120
p(Ha) : 0 p(Ha) : 0 p(Ha) : 0
PP−CUARTO−HUMEDO PP−CUARTO−SECO PP−CUARTO−CALIDO
0.004
Density
Density
Density
0.003
0.003

0.000
0.000
0.000
0 500 1000 1500 2000 2500 3000 3500 0 500 1000 1500 2000 0 500 1000 1500 2000 2500
p(Ha) : 0 p(Ha) : 0 p(Ha) : 0
PP−CUARTO−FRIO ALTURA PENDIENTE
4e−06
0.003
Density
Density
Density
0.000 0.003
0e+00
0.000
0 500 1000 1500 2000 2500 3000 0 1000 2000 3000 4000 5000 0e+00 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06
p(Ha) : 0 p(Ha) : 0 p(Ha) : 0
The climate bias layer will be estiated using BIAS() function for each variable. Finally all bias layers are
sumed.
# 2.2.2. Environmental bias layer
#pdf(file = paste0(outPlotDir, "/2.3 INDICE_sesgos_ambientales.pdf ")) ##comienza la grafica tipo pdf

par(mfrow = c(1,2))
d <- ambientales[[1]] * 0
resultados <- NULL
#names(ambientales) <- envVarNames
sesgosDF <- NULL
for (k in 1:numPredictors){
ses_amb <- BIAS(biasLayer = ambientales[[k]], layerName = envVarNames[k], outDir = outPlotDir, doplot
d <- sum(d, ses_amb$biasLayer, na.rm = TRUE)
sesgosDF <- cbind(sesgosDF, ses_amb$biasLayer[])
resultados <- cbind(resultados, ses_amb$biasValues)
}
colnames(sesgosDF) <- names(resultados) <- envVarNames
#dev.off()
pcaLayer <- raster(paste0(ruta_ambientales, '/pcaLayer.tif'))

ses_pca <- BIAS(biasLayer = pcaLayer, layerName = 'pca', outDir = outPlotDir, doplot = TRUE)
d2 <- ses_pca$biasLayer
write.table(resultados, "sesgos_ambientales.txt", sep="\t", col.names = TRUE)

write.table(sesgosDF, "sesgos_tabla_Capas.txt", sep="\t", col.names = TRUE)
14
writeRaster(d, "INDICE_AMBIENTAL.tif", overwrite = TRUE)
Levels Bias index value in Bias

pca pca pca
15
15
2
4.0
10
10
0.15
1
3.5
0.10
3.0
0 0.05
5
5
2.5
0.00
2.0 −0.05
−1
0
0
1.5 −0.10
−0.15
1.0
−2
−5
−5
−3
Interval1
Interval2
Interval3
Interval4
−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68
4. Data base completness
The final component is the species completness. For this case we are gonna to use Jackknife and bootstrap
estimates for two databse columns: original species and accepted species name.
# ---------------------------
# 3. Data base completness
# ---------------------------
spListByCell <- DATOS[!is.na(DATOS$celdas), c('species', 'celdas')]

spListByCellHQ <- DATOS[!is.na(DATOS$celdas) & !is.na(DATOS$acceptedNameUsage), c('acceptedNameUsage', '
freqTable <- table(spListByCell$celdas)

freqTableHQ <- table(spListByCellHQ$celdas)
treshold <- 0
spListByCell <- spListByCell[spListByCell$celdas %in% names(which(freqTable >= treshold)), ]

spListByCellHQ <- spListByCellHQ[spListByCellHQ$celdas %in% names(which(freqTableHQ >= treshold)), ]
estimateS <- richEst(sppList = spListByCell$species, indexID = spListByCell$celdas)
15
estimateSHQ <- richEst(spListByCellHQ$acceptedNameUsage, spListByCellHQ$celdas)
rm(spListByCell, spListByCellHQ)
compRichBoot <- compRichJack <- richJackHQ <- richBootHQ <- richJack <- richBoot <- en_area * 0
richBoot[as.numeric(rownames(estimateS))] <- estimateS$Boot
richJack[as.numeric(rownames(estimateS))] <- estimateS$JNhat
richBootHQ[as.numeric(rownames(estimateSHQ))] <- estimateSHQ$Boot
richJackHQ[as.numeric(rownames(estimateSHQ))] <- estimateSHQ$JNhat
compRichBoot[as.numeric(rownames(estimateS))] <- estimateS$Sobs/estimateS$Boot

compRichJack[as.numeric(rownames(estimateS))] <- estimateS$Sobs/estimateS$JNhat
#compRichBoot[as.numeric(rownames(estimateS)[estimateS$Sobs == 1])] <- NA

#compRichJack[as.numeric(rownames(estimateS)[estimateS$Sobs == 1])] <- NA
compRichJack[compRichJack[] >= 1] <- 1
compRichBoot[compRichBoot[] >= 1] <- 1
#pdf(file = paste0(outPlotDir, "/3 INDICE_COMPLEMENTARIEDAD.pdf "))

par(mfrow=c(2, 2))
plot(richBoot, main = paste('Bootstrap:', nrow(estimateS)))

plot(richJack, main = paste('Jackknife:', nrow(estimateS)))
plot(richBootHQ, main = paste('BootstrapHQ:', nrow(estimateSHQ)))

plot(richJackHQ, main = paste('JackknifeHQ:', nrow(estimateSHQ)))
Bootstrap: 6489 Jackknife: 6489

10
10
3000 4000
2500
3000
2000
5
1500 2000
1000
1000
500
0 0
0
−85 −80 −75 −70 −65 −60 −85 −80 −75 −70 −65 −60
BootstrapHQ: 6228 JackknifeHQ: 6228

10
10
2500 3500
3000
2000 2500
5
1500 2000
1000 1500
1000
500 500
0 0
0
−85 −80 −75 −70 −65 −60 −85 −80 −75 −70 −65 −60
16
plot(compRichBoot, main = paste('Bootstrap:', nrow(estimateS)))
plot(compRichJack, main = paste('Jackknife:', nrow(estimateS)))
richBootVals <- compRichBoot[!is.na(compRichBoot[]) & compRichBoot[] != 0]

richJackVals <- compRichJack[!is.na(compRichJack[]) & compRichJack[] != 0]
hist(richBootVals, main = 'Density Bootstrap', freq = FALSE, xlim = c(0, 1.2))
lines(density(richBootVals), main = 'Density Bootstrap')
hist(richJackVals, main = 'Density Jackknife', freq = FALSE, xlim = c(0, 1.2))

lines(density(richJackVals), main = 'Density Jackknife')
Bootstrap: 6489 Jackknife: 6489

10
10
1.0 1.0
0.8 0.8
0.6 0.6
5
5
0.4 0.4
0.2 0.2
0.0 0.0
0
−85 −80 −75 −70 −65 −60 −85 −80 −75 −70 −65 −60
Density Bootstrap Density Jackknife

2.5
8
2.0
6
1.5
Density
Density
4
1.0
2
0.5
0.0
0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2
richBootVals richJackVals
#dev.off()
writeRaster(compRichBoot, "INDICE_COMPLEMENTARIEDAD_JACK.tif", overwrite=TRUE)

writeRaster(compRichJack, "INDICE_COMPLEMENTARIEDAD_BOOTS.tif", overwrite=TRUE)
5. GSI/Gap selection index
Finally sum the three standardized components using the article formula
# ---------------------------
# 4. GSI / Gap selection index
# ---------------------------
17
# Layer standardization
DENSIDAD <- normalize01(densRegistros)

AMBIENTAL <- normalize01(d)
COMPLEMENTARIEDAD_BOOT <- normalize01(compRichBoot)
COMPLEMENTARIEDAD_JACK <- normalize01(compRichJack)
writeRaster(DENSIDAD,"INDICE_DENSIDAD_EST.tif", overwrite=TRUE)
writeRaster(AMBIENTAL,"INDICE_AMBIENTAL_EST.tif", overwrite=TRUE)
writeRaster(COMPLEMENTARIEDAD_BOOT,"INDICE_COMPLEMENTARIEDAD_BOOT_EST.tif", overwrite=TRUE)
writeRaster(COMPLEMENTARIEDAD_JACK,"INDICE_COMPLEMENTARIEDAD_JACK_EST.tif", overwrite=TRUE)
# Plot
## Records ubication
#png(file = paste0(outPlotDir, "/4 INDICE_final_ubicacion_registros.png")) ##comienza la grafica tipo pd
plot(colombia, main = "UBICACION DE \n LOS REGISTROS")
plot(DATOS2, add = T, pch = 20, cex = 0.1, col = rgb(0.1, 0.2, 1, 0.01))
UBICACION DE
LOS REGISTROS
#dev.off()
#pdf(file = paste0(outPlotDir, "/4 INDICE_final.pdf ")) ##comienza la grafica tipo pdf
## Records density
par(mfrow=c(1, 2))
plot(densRegistros, main = 'Records density', zlim = c(0, quantile(densRegistros[], .95, na.rm = TRUE)))
18
plot(colombia, add = TRUE, border = 'darkgrey')
plot(DENSIDAD, main = 'Records density', zlim = c(0, quantile(DENSIDAD[], .95, na.rm = TRUE)))
plot(colombia, add = TRUE, border = 'darkgrey')
Records density Records density

10
10
700
5e−04
600
500 4e−04
5
5
400
3e−04
300
2e−04
200
1e−04
100
0
0 0e+00
−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68
## Environmental bias
par(mfrow=c(1, 2))
plot(mask(d, colombia), main = 'd Index')
plot(mask(AMBIENTAL, colombia), main = 'd Index \n Normalized')
19
d Index
d Index
10 Normalized
10
1.0
70
60
0.8
50
5
5
0.6
40
30 0.4
20
0.2
10
0
0
0 0.0
−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68
par(mfrow=c(2, 2))
dValsNoNA <- d[!is.na(d[])]
dValsNoNANoZeros <- d[!is.na(d[]) & d[] != 0]
hist(dValsNoNA, main = 'd Index no NA no 0')
hist(dValsNoNANoZeros, main = 'd Index \n no NA no 0')
AMBValsNoNA <- AMBIENTAL[!is.na(AMBIENTAL[])]

AMBValsNoNANoZeros <- AMBIENTAL[!is.na(AMBIENTAL[]) & AMBIENTAL[] != 0]
hist(AMBValsNoNA, main = 'Ambiental no NA')
hist(AMBValsNoNANoZeros, main = 'Ambiental\nno NA no 0')
## Completness
20
d Index
d Index no NA no 0
no NA no 0
15000
5000
4000
10000
Frequency
Frequency
3000
2000
5000
1000
0
0
0 20 40 60 30 40 50 60 70
Ambiental
Ambiental no NA
dValsNoNA dValsNoNANoZeros
no NA no 0
4000
15000
3000
10000
Frequency
Frequency
2000
5000
1000
0
0.0 0.2 0.4 0.6 0.8 1.0 0.4 0.5 0.6 0.7 0.8 0.9 1.0
plot(compRichBoot, main = 'Bootstrap')

plot(compRichJack, main = 'Jackknife')
21
Bootstrap Jackknife
10
10
1.0 1.0
0.8 0.8
5
5
0.6 0.6
0.4 0.4
0.2 0.2
0
0
0.0 0.0
−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68
par(mfrow=c(2, 2))
bootValsNoNA <- compRichBoot[!is.na(compRichBoot[])]

bootValsNoNANoZeros <- compRichBoot[!is.na(compRichBoot[]) & compRichBoot[] != 0]
hist(bootValsNoNA, main = 'Bootstrap no NA')
hist(bootValsNoNANoZeros, main = 'Bootstrap\nno NA no 0')
JacValsNoNA <- compRichJack[!is.na(compRichJack[])]

JacValsNoNANoZeros <- compRichJack[!is.na(compRichJack[]) & compRichJack[] != 0]
hist(JacValsNoNA, main = 'Jackknife no NA no 0')
hist(JacValsNoNANoZeros, main = 'Jackknife no NA no 0')
# GSI / GAP INDEX
GSI_BOOT <- (3 - DENSIDAD - AMBIENTAL - COMPLEMENTARIEDAD_BOOT)/3

GSI_JACK <- (3 - DENSIDAD - AMBIENTAL - COMPLEMENTARIEDAD_JACK)/3
22
Bootstrap
Bootstrap no NA
no NA no 0
1200
7000
1000
5000
800
Frequency
Frequency
600
3000
400
200
1000
0
0
0.0 0.2 0.4 0.6 0.8 1.0 0.75 0.80 0.85 0.90 0.95 1.00
Jackknife no NA no 0
bootValsNoNA Jackknife no NA no 0
bootValsNoNANoZeros
7000
600
500
5000
400
Frequency
Frequency
300
3000
200
1000
100
0
0.0 0.2 0.4 0.6 0.8 1.0 0.5 0.6 0.7 0.8 0.9 1.0
plot(GSI_BOOT, main = 'GAP SELCTION INDEX (BOOT)', zlim = c(0, 1))

plot(GSI_JACK, main = 'GAP SELCTION INDEX (JACK)', zlim = c(0, 1))

23
GAP SELCTION INDEX (BOOT) GAP SELCTION INDEX (JACK)
10
10
1.0 1.0
0.8 0.8
5
5
0.6 0.6
0.4 0.4
0.2 0.2
0
0
0.0 0.0
−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68
densJack <- density(GSI_JACK, main = 'Jackk', xlim = c(0, 1))

densBoot <- density(GSI_BOOT, main = 'Boot', xlim = c(0, 1))
24
Jackk Boot
10
8
8
6
6
Density
Density
4
4
2
2
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
N = 11705 Bandwidth = 0.01688 N = 13373 Bandwidth = 0.01937

plot(densJack, main = 'GSI values', col = 'blue', ylim = c(0, 10))
lines(densBoot, col = 'red')
legend('topleft', legend = c('Jackknife', 'Bootstrap'),
lty = c(1, 1), lwd = c(1, 1), col = c('blue', 2))
25
GSI values
10
Jackknife
Bootstrap
8
6
Density
4
2
0
0.0 0.2 0.4 0.6 0.8 1.0
N = 11705 Bandwidth = 0.01688
#dev.off()
writeRaster(GSI_BOOT,"INDICE_GSI_BOOT.tif", overwrite = TRUE)

writeRaster(GSI_JACK,"INDICE_GSI_JACK.tif", overwrite = TRUE)

plot(GSI_BOOT, main = 'GAP SELCTION INDEX (BOOT)', zlim = c(0, 1))
plot(GSI_JACK, main = 'GAP SELCTION INDEX (JACK)', zlim = c(0, 1))

26
GAP SELCTION INDEX (BOOT) GAP SELCTION INDEX (JACK)
10
10
1.0 1.0
0.8 0.8
5
5
0.6 0.6
0.4 0.4
0.2 0.2
0
0
0.0 0.0
−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68
27

Gap Analysis

Uploaded by

Copyright:

Available Formats

Gap Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Gap Analysis

Uploaded by

Copyright:

Available Formats

Gap analysis

Ivan Gonzalez - [email protected]

DATOS <- read.csv('root/DATOS.csv', as.is = TRUE)

## ID source lat lon species acceptedNameUsage

## R version 3.0.1 (2013-05-16)

colombia <- readShapePoly(paste0(ruta_factores, "/COLOMBIA.shp"))

en_area <- mask(grilla, colombia)

## Assign the pixel ID to each coordinate

# Let's generate a png file with the density plot

#png(file = paste0(outPlotDir, "/1-INDICE_densidad.png"))

UBICACION DE LOS REGISTROS

UBICACION DE LOS REGISTROS COLOMBIA

−85 −80 −75 −70 −65 −60

persp(densRegistros, xlab = "X coordinates", ylab = "Y coordinates", zlab = "density",

q95 <- quantile(densRegistros, c(.95), na.rm = TRUE)

plot(colombia, add = TRUE)

−85 −80 −75 −70 −65 −60

## OGR data source with driver: ESRI Shapefile

## OGR data source with driver: ESRI Shapefile

rios <- readOGR(ruta_factores, "riversMain")

## OGR data source with driver: ESRI Shapefile

vias <- readOGR(ruta_factores, "roads")

## OGR data source with driver: ESRI Shapefile

#pdf(file = paste0(outPlotDir, "/2.1 INDICE_factores_sesgos.pdf "))

Distance to Levels Bias index value in Bias

Distance to Levels Bias index value in Bias

sesgo_vias <- BIAS(biasLayer = vias, rasterMask = grilla,layerName = 'VIAS', outDir = outPlotDir)

# Remove extra files and keep sesgo_ap, sesgo_rios, sesgo_vias, sesgo_urbano

# 2.2.1 Compare distributions

ambientales <- stack(paste0(ruta_ambientales, '/', c(paste0("bio_", 1:19), "alt", "slope")))

# Generate data.frame with all values

# Generate data.frame with observed values

#pdf(file = paste0(outPlotDir, "/2.2 INDICE_COMPARACION_VARIABLES.pdf "))

numPredictors <- dim(ambientales)[3]

# Generate a random vector of complete variable with sampled vector size

# Compare both random and sampled vector

VAR <- envVarNames[i]

Q <- cbind(QI[, 2], PI[, 2])

plot(density(VARIABLES[,i]), col='black', xlab = '', main=VAR, xlim=c(minx, maxx), ylim=c(miny,maxy),

TEMPERATURA−MEDIA−ANUAL MEDIA−RANGO−DIURNO ISOTERMALIDAD

All country All country All country

p(Ha) : 0 p(Ha) : 0 p(Ha) : 0

ESTACIONALIDAD−TEMPERATURA MAX−T−MES−MAS−CALIDO MIN−T−MES−FRIO

0.00 0.03 0.06

Sampled Sampled Density Sampled

p(Ha) : 0 p(Ha) : 0 p(Ha) : 0

RANGO−ANUAL−T T−MEDIA−DEL−CUARTO−HUMEDO T−MEDIA−DEL−CUARTO−SECO

All country All country All country

Sampled Sampled Sampled

p(Ha) : 0 p(Ha) : 0 p(Ha) : 0

T−MEDIA−DEL−CUARTO−CALIDO T−M−DEL−CUARTO−FRIO PRECIPITACION−ANUAL

All country All country All country

Sampled Sampled Sampled

p(Ha) : 0 p(Ha) : 0 p(Ha) : 0

# Write table with summary for each variable

# Remove extra objects

All country All country All country

p(Ha) : 0 p(Ha) : 0 p(Ha) : 0

PP−CUARTO−HUMEDO PP−CUARTO−SECO PP−CUARTO−CALIDO

All country All country All country