Gap Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Gap analysis

Ivan Gonzalez - [email protected]


Wednesday, December 02, 2015

This is an code adaptation to develop an spatial gap index analysis sensu A methodological framework to
quantify the spatial quality of biological databases
The follow R code is an adaptation to get a final gap index map. For save the results in .PDF an .PNG
please uncomment the lines starting with #pdf(, #png(, and #dev.off(
The first step is load information. For this is required to have both a database with biological records and
the GIS layers. The GIS information can be retived from climatic folder and bias factors.
The datase must be ´data.frame´ class table with the follow columns: * Unique identifier * species name *
latitude * longitude
We validate the original records in taxonomical and geographical way with available codes. For this reason
we also incluide an acceptedNameUsage field. In order to replicate this analysis we recommend tho change
the column names to showed below instead of changing script if you are not R user.

DATOS <- read.csv('root/DATOS.csv', as.is = TRUE)


head(DATOS)

## ID source lat lon species acceptedNameUsage


## 1 415 GBIF 12.55 -81.73 Passiflora suberosa Passiflora suberosa
## 2 416 GBIF 12.58 -81.70 Scleria melaleuca Scleria gaertneri
## 3 417 GBIF 12.58 -81.70 Eleocharis caribaea Eleocharis geniculata
## 4 419 GBIF 12.58 -81.70 Cyperus rotundus Cyperus rotundus
## 5 421 GBIF 12.58 -81.70 Cyperus luzulae Cyperus luzulae
## 6 423 GBIF 12.53 -81.72 Acalypha alopecuroidea <NA>

Consider change the folder paths, check the encoding format, tables separator values and install libraries
dependences due the compatibility sytem. In our case we use the follow parameters:

## R version 3.0.1 (2013-05-16)


## Platform: x86_64-w64-mingw32/x64 (64-bit)
##
## locale:
## [1] LC_COLLATE=Spanish_Colombia.1252 LC_CTYPE=Spanish_Colombia.1252
## [3] LC_MONETARY=Spanish_Colombia.1252 LC_NUMERIC=C
## [5] LC_TIME=Spanish_Colombia.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] classInt_0.1-21 SPECIES_1.0 rgdal_0.9-2 maptools_0.8-34
## [5] raster_2.3-40 sp_1.0-17
##
## loaded via a namespace (and not attached):
## [1] class_7.3-7 digest_0.6.3 e1071_1.6-1 evaluate_0.5.5
## [5] foreign_0.8-54 formatR_1.0 grid_3.0.1 htmltools_0.2.6
## [9] knitr_1.8 lattice_0.20-15 rmarkdown_0.3.11 stringr_0.6.2
## [13] tools_3.0.1 yaml_2.1.13

1
The subproduct derived from the script could take several minutes to be created. For this reason the most
belated objects won’t be re-reprocessed if already exist in the outpath folder (outPlotDir). If you wan to
create it again just remove them from containign folder or set a new outpath.

1. Data loading

The initial step is prepare data, paths and functions and load libraries.

# ---------------------------
# 0. Data loading
# ---------------------------
library(raster)
library(maptools)
library(rgdal)
library(SPECIES)
library(classInt)

source('root/GAPfunctions.R')
outPlotDir <- 'root/output'
ruta_factores <- 'root/GIS/FACTORES'
ruta_ambientales <- 'root/GIS/AMBIENTALES'

colombia <- readShapePoly(paste0(ruta_factores, "/COLOMBIA.shp"))


grilla <- raster(paste0(ruta_factores, "/grilla 10k.img"))
grilla[] <- 1:ncell(grilla)

en_area <- mask(grilla, colombia)

## Assign the pixel ID to each coordinate


DATOS2 <- DATOS[, c('ID', 'lat', 'lon')]
coordinates(DATOS2)=~lon+lat
celdas <- raster:::extract(en_area, DATOS2)
DATOS$celdas <- celdas

rm(celdas)

As result of this step we shold have a path’s to bias factor layers, climate layers and output plots and the
follow objects: 1. Original table (DATOS) 2. Point polygon layer (DATOS2) 3. Polygon layer (mask) of
the region of interest (colombia) 4. An ‘ID’ raster of area of interest (grilla) 5. ‘Masked’ ID raster with the
polygon of area of interest (en_area) 6. A column indicating the pixel in which each pairs of coordinates over
(‘celdas’ column in DATOS table)

2. Records density

As the article show, there’s tree components to calculet the gap index. The first is the record density and is
estimated as the number of records over each pixel:

# ---------------------------
# 1. Records density
# ---------------------------

2
DATOS2$CONT <- 1
if (file.exists('INDICE_DENSIDAD.tif')){
densRegistros <- raster('INDICE_DENSIDAD.tif')
} else {
densRegistros <- rasterize(DATOS2, en_area, field = DATOS2$CONT, fun = sum, na.rm = TRUE)
densRegistros[is.na(densRegistros[])] <- 0
densRegistros <- mask(densRegistros, colombia, filename = "INDICE_DENSIDAD.tif", overwrite = TRUE)
}

if (file.exists('localidades.tif')){
localidades <- raster('localidades.tif')
} else {
localidades <- rasterize(DATOS2[, 'CONT'], en_area, fun = function(x, ...) {
length(unique(na.omit(x)))
}, filename = "localidades.tif")[[2]]
}

# Let's generate a png file with the density plot


data(wrld_simpl)
par(mfrow = c(1,1))

#png(file = paste0(outPlotDir, "/1-INDICE_densidad.png"))


plot(wrld_simpl, main="UBICACION DE LOS REGISTROS \n BASE DE DATOS INICIAL", col = 'lightyellow')
points(x=coordinates(DATOS2)[, 1], y=coordinates(DATOS2)[, 2], col = rgb(139, 0, 0, 100, maxColorValue =

UBICACION DE LOS REGISTROS


BASE DE DATOS INICIAL

3
plot(colombia, main = "UBICACION DE LOS REGISTROS COLOMBIA")
plot(DATOS2, add = TRUE, pch = 20, col = rgb(139, 0, 0, 100, maxColorValue = 255, alpha = 0.2))

UBICACION DE LOS REGISTROS COLOMBIA

#dev.off()

# Let's generate a png file with the density plot. Could take several minutes becouse of table size
#pdf(file = paste0(outPlotDir, "/1.INDICE_densidad.pdf"))
plot(densRegistros, col = rev(topo.colors(10)), main="DENSIDAD DE PUNTOS/ 10Km2")
plot(colombia, add = TRUE)

4
DENSIDAD DE PUNTOS/ 10Km2
10

1200000

1000000

800000
5

600000

400000

200000
0

−85 −80 −75 −70 −65 −60

persp(densRegistros, xlab = "X coordinates", ylab = "Y coordinates", zlab = "density",


phi = 35, theta = 20, col = "lightblue", expand = .5, ticktype = "detailed")

5
1e+06
10

s
den

rdinate
5e+05
5
sity

Y coo
0e+00
−78
−76 0
−74
X co −72
ordin
ates −70
−68

In our data we oberve some points with hig-density that make a particular configuration of color ramp. For
this we are gona to set the max value as the 95th percentile and observe in better scale the result. Note the
diference in legend magnitude.

q95 <- quantile(densRegistros, c(.95), na.rm = TRUE)


maxVal <- cellStats(densRegistros, 'max')
densRegistros2 <- reclassify(x = densRegistros, matrix(c(q95, maxVal, q95, 0, 0 , NA), nrow = 2, ncol =
densRegistros2[densRegistros2[] == 0] <- NA
plot(densRegistros2, col = rev(topo.colors(10)), main = "DENSIDAD DE PUNTOS/10 Km2\n95th")

plot(colombia, add = TRUE)

6
DENSIDAD DE PUNTOS/10 Km2
10 95th

700

600

500
5

400

300

200

100
0

−85 −80 −75 −70 −65 −60

3. Bias in sampling

The second element for the gap index is a bias layer. Section 2.
The follow code will show the bias as distance function taking some physical layers as reference. The result is
just informative due will not take in account for GSI index estimation.
The third plot panel bias could be interpreted as a z-test score. In this sense values above 1.64 or bellow
-1.64 indicate statistical diference in that category.

# ---------------------------
# 2. Bias in sampling
# ---------------------------
DATOS_s <- unique(DATOS[, c('lat', 'lon')])
N <- nrow(DATOS_s)

# ---------------------------
# 2.1 Bias by phisical factors
# ---------------------------
AP <- readOGR(ruta_factores, "protectedAreas")

## OGR data source with driver: ESRI Shapefile


## Source: "root/GIS/FACTORES", layer: "protectedAreas"
## with 407 features
## It has 7 fields

7
urbano <- readOGR(ruta_factores, "urban")

## OGR data source with driver: ESRI Shapefile


## Source: "root/GIS/FACTORES", layer: "urban"
## with 1084 features
## It has 6 fields

rios <- readOGR(ruta_factores, "riversMain")

## OGR data source with driver: ESRI Shapefile


## Source: "root/GIS/FACTORES", layer: "riversMain"
## with 261 features
## It has 8 fields

vias <- readOGR(ruta_factores, "roads")

## OGR data source with driver: ESRI Shapefile


## Source: "root/GIS/FACTORES", layer: "roads"
## with 11406 features
## It has 17 fields

#pdf(file = paste0(outPlotDir, "/2.1 INDICE_factores_sesgos.pdf "))


sesgo_ap <- BIAS(biasLayer = AP, rasterMask = grilla, layerName = "AREA_PROTEGIDA", outDir = outPlotDir

Distance to Levels Bias index value in Bias


AREA_PROTEGIDA AREA_PROTEGIDA AREA_PROTEGIDA AREA_PROTEGIDA
3
20

20

20
2
15

15

15

4.0
0.05
200000

3.5
10

10

10
1

150000
3.0 0.00
5

2.5
100000
0

−0.05
2.0
0

50000

1.5
−1

−0.10
−5

−5

−5

0 1.0
−2
−10

−10

−10
−3

Interval1

Interval2

Interval3

Interval4

−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68

8
sesgo_rios <- BIAS(biasLayer = rios, rasterMask = grilla, layerName = "RIOS", outDir = outPlotDir)

Distance to Levels Bias index value in Bias


RIOS RIOS RIOS RIOS

3
20

20

20
2
15

15

15
4.0
5e+05 0.10

1
3.5
10

10

10
0.05
4e+05

3.0 0.00

3e+05

0
5

5
2.5 −0.05

2e+05
−0.10
2.0

−1
0

0
1e+05 −0.15
1.5

−0.20
−5

−5

−5
0e+00 1.0
−2
−10

−10

−10
−3

Interval1

Interval2

Interval3

−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68 Interval4 −78 −76 −74 −72 −70 −68

sesgo_vias <- BIAS(biasLayer = vias, rasterMask = grilla,layerName = 'VIAS', outDir = outPlotDir)

9
Distance to Levels Bias index value in Bias
20 VIAS VIAS VIAS VIAS

20

20
2
15

15

15
4.0

300000

1
0.05
3.5
10

10

10
250000

3.0
0.00
200000

0
5

5
2.5
150000
−0.05
2.0
100000
0

0
−1
50000 1.5 −0.10
−5

−5

−5
0 1.0

−2
−10

−10

−10
−3

Interval1

Interval2

Interval3

Interval4
−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68

sesgo_urbano <- BIAS(biasLayer = urbano, rasterMask = grilla, layerName = "CASCOS_URBANOS", outDir = out

10
Distance to Levels Bias index value in Bias
CASCOS_URBANOS CASCOS_URBANOS CASCOS_URBANOS CASCOS_URBANOS

3
20

20

20
2
15

15

15
4.0

150000 0.02
3.5
10

10

10
1
3.0 0.00

100000
5

5
2.5 −0.02

0
2.0 −0.04
50000
0

0
1.5 −0.06

−1
−5

−5

−5
0 1.0

−2
−10

−10

−10
Interval1

Interval2

Interval3

Interval4
−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68

#dev.off()

# Remove extra files and keep sesgo_ap, sesgo_rios, sesgo_vias, sesgo_urbano


rm(AP, urbano, rios, vias, DATOS_s, sesgo_ap, sesgo_rios, sesgo_vias, sesgo_urbano, grilla)

The follow code will show the bias usi climatic layers, slope and elevation as reference. The main result is the
diference between the observed values for each layer given the coordinates vs. a random values. A kolmogorov
test is used.

# ---------------------------
# 2.2 Bias by environment
# ---------------------------

# 2.2.1 Compare distributions


## Compare environment all distribution values and sampled values by kolmogorov and Kullback-Leibler div

ambientales <- stack(paste0(ruta_ambientales, '/', c(paste0("bio_", 1:19), "alt", "slope")))


envVarNames <- c("TEMPERATURA-MEDIA-ANUAL", "MEDIA-RANGO-DIURNO", "ISOTERMALIDAD", "ESTACIONALIDAD-TEMPE
"MAX-T-MES-MAS-CALIDO", "MIN-T-MES-FRIO", "RANGO-ANUAL-T", "T-MEDIA-DEL-CUARTO-HUMEDO",
"T-MEDIA-DEL-CUARTO-SECO", "T-MEDIA-DEL-CUARTO-CALIDO", "T-M-DEL-CUARTO-FRIO", "PRECIPITACI
"PP-MES-MAS-HUMEDO", "PP-MES-MAS-SECO", "PP-ESTACIONAL", "PP-CUARTO-HUMEDO", "PP-CUARTO-SEC
"PP-CUARTO-CALIDO", "PP-CUARTO-FRIO", "ALTURA", "PENDIENTE")

# Generate data.frame with all values


if (file.exists(paste0(outPlotDir, '/variablesDF.csv'))){

11
VARIABLES <- read.csv(paste0(outPlotDir, '/variablesDF.csv'))
} else {
VARIABLES <- ambientales[1:ncell(ambientales)]
VARIABLES <- na.omit(VARIABLES)
write.csv(VARIABLES, paste0(outPlotDir, '/variablesDF.csv'), row.names = FALSE)
}

# Generate data.frame with observed values


if (file.exists(paste0(outPlotDir, '/samplingDF.csv'))){
MUESTREO <- read.csv(paste0(outPlotDir, '/samplingDF.csv'))
} else {
MUESTREO <- extract(ambientales, DATOS2)
MUESTREO <- na.omit(MUESTREO)
write.csv(MUESTREO, paste0(outPlotDir, '/samplingDF.csv'), row.names = FALSE)
}

#pdf(file = paste0(outPlotDir, "/2.2 INDICE_COMPARACION_VARIABLES.pdf "))


par(mfrow = c(4, 3))

numPredictors <- dim(ambientales)[3]


RESULT <- NULL

for (i in 1:numPredictors){

# Generate a random vector of complete variable with sampled vector size


sample.Var.i <- sample(VARIABLES[, i], nrow(MUESTREO), replace = T)

# Compare both random and sampled vector


COMPARACION <- ks.test(sample.Var.i, MUESTREO[, i], )

VAR <- envVarNames[i]


media_var <- mean(VARIABLES[, i])
media_DATOS <- mean(MUESTREO[, i])
des_var <- sd(VARIABLES[, i])
des_DATOS <- sd(MUESTREO[, i])
coefvar_var <- des_var/media_var
coefvar_DATOS <- des_DATOS/media_DATOS
D <- COMPARACION$statistic
Pval <- COMPARACION$p.value
if (Pval <= 0.05){
DESICION <- "Equal distributions"
} else {
DESICION <- "Diferent distributions"
}
Nceldas <- ncell(VARIABLES[, i])
PI <- as.data.frame(table(MUESTREO[, i])/nrow(MUESTREO))
QTOTAL <- as.data.frame(table(VARIABLES[, i])/Nceldas)
QI <- QTOTAL[match(PI[, 1], QTOTAL[, 1]), ]

Q <- cbind(QI[, 2], PI[, 2])


Q <- as.matrix(Q)
#KLdiv(Q,overlap=F, method="discrete")
#DIVERGENCIA=KLdiv(Q,overlap=F,method="discrete",na.rm=T)[1,2]

12
DENSI_VAR <- density(VARIABLES[, i])
DENSI_MUES <- density(MUESTREO[, i])
minx <- min(min(DENSI_VAR$x), min(DENSI_MUES$x))
maxx <- max(max(DENSI_VAR$x), max(DENSI_MUES$x))
miny <- min(min(DENSI_VAR$y), min(DENSI_MUES$y))
maxy <- max(max(DENSI_VAR$y), max(DENSI_MUES$y))

plot(density(VARIABLES[,i]), col='black', xlab = '', main=VAR, xlim=c(minx, maxx), ylim=c(miny,maxy),


sub = bquote(p(H[a]): ~ .(Pval)))
lines(density(MUESTREO[,i]), col= "red", main=VAR)
legend('topright', lty = c(1, 1), col = c('black', 'red'), legend = c('All country', 'Sampled'))
RESUMEN <- cbind(VAR, media_var, media_DATOS, des_var, des_DATOS, coefvar_var, coefvar_DATOS,
D, Pval, DESICION)
RESULT <- rbind(RESULT, RESUMEN)
}

TEMPERATURA−MEDIA−ANUAL MEDIA−RANGO−DIURNO ISOTERMALIDAD


0.08

0.30

All country All country All country


Density

Density

Density
Sampled Sampled Sampled
0.04

0.15

0.4
0.00

0.00

0.0
0 50 100 150 200 250 300 60 80 100 120 65 70 75 80 85 90 95

p(Ha) : 0 p(Ha) : 0 p(Ha) : 0

ESTACIONALIDAD−TEMPERATURA MAX−T−MES−MAS−CALIDO MIN−T−MES−FRIO

0.00 0.03 0.06


All country All country All country
0.015
Density

Density

Sampled Sampled Density Sampled


0.04
0.000

0.00

0 200 400 600 800 1000 1200 1400 50 100 150 200 250 300 350 −100 −50 0 50 100 150 200 250

p(Ha) : 0 p(Ha) : 0 p(Ha) : 0

RANGO−ANUAL−T T−MEDIA−DEL−CUARTO−HUMEDO T−MEDIA−DEL−CUARTO−SECO


0.08
0.20

All country All country All country


Density

Density

Density

Sampled Sampled Sampled


0.04

0.04
0.10
0.00

0.00

0.00

60 80 100 120 140 160 0 50 100 150 200 250 300 0 50 100 150 200 250 300

p(Ha) : 0 p(Ha) : 0 p(Ha) : 0

T−MEDIA−DEL−CUARTO−CALIDO T−M−DEL−CUARTO−FRIO PRECIPITACION−ANUAL


0.08
0.08

0.0010

All country All country All country


Density

Density

Density

Sampled Sampled Sampled


0.04
0.04

0.0000
0.00

0.00

0 50 100 150 200 250 300 0 50 100 150 200 250 300 0 2000 4000 6000 8000 10000 12000

p(Ha) : 0 p(Ha) : 0 p(Ha) : 0

#dev.off()

# Write table with summary for each variable


write.csv(RESULT, "goodness_fit.csv", row.names = FALSE)

# Remove extra objects


rm(MUESTREO, VARIABLES, RESULT, RESUMEN, VAR, media_var, media_DATOS, des_var, des_DATOS, coefvar_var, c
D, Pval, DESICION, DENSI_VAR, DENSI_MUES, sample.Var.i, Q, QI, QTOTAL)

13
PP−MES−MAS−HUMEDO PP−MES−MAS−SECO PP−ESTACIONAL

All country All country All country

0.015
0.010
Density

Density

Density
Sampled Sampled Sampled

0.10
0.000

0.000

0.00
200 400 600 800 1000 1200 0 200 400 600 0 20 40 60 80 100 120

p(Ha) : 0 p(Ha) : 0 p(Ha) : 0

PP−CUARTO−HUMEDO PP−CUARTO−SECO PP−CUARTO−CALIDO

All country All country All country

0.004
Density

Density

Density

0.003
0.003

Sampled Sampled Sampled


0.000

0.000

0.000
0 500 1000 1500 2000 2500 3000 3500 0 500 1000 1500 2000 0 500 1000 1500 2000 2500

p(Ha) : 0 p(Ha) : 0 p(Ha) : 0

PP−CUARTO−FRIO ALTURA PENDIENTE

All country All country All country

4e−06
0.003
Density

Density

Density
Sampled Sampled Sampled
0.000 0.003

0e+00
0.000

0 500 1000 1500 2000 2500 3000 0 1000 2000 3000 4000 5000 0e+00 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06

p(Ha) : 0 p(Ha) : 0 p(Ha) : 0

The climate bias layer will be estiated using BIAS() function for each variable. Finally all bias layers are
sumed.

# 2.2.2. Environmental bias layer

#pdf(file = paste0(outPlotDir, "/2.3 INDICE_sesgos_ambientales.pdf ")) ##comienza la grafica tipo pdf


par(mfrow = c(1,2))
d <- ambientales[[1]] * 0
resultados <- NULL
#names(ambientales) <- envVarNames
sesgosDF <- NULL
for (k in 1:numPredictors){
ses_amb <- BIAS(biasLayer = ambientales[[k]], layerName = envVarNames[k], outDir = outPlotDir, doplot
d <- sum(d, ses_amb$biasLayer, na.rm = TRUE)
sesgosDF <- cbind(sesgosDF, ses_amb$biasLayer[])
resultados <- cbind(resultados, ses_amb$biasValues)
}
colnames(sesgosDF) <- names(resultados) <- envVarNames

#dev.off()

pcaLayer <- raster(paste0(ruta_ambientales, '/pcaLayer.tif'))


ses_pca <- BIAS(biasLayer = pcaLayer, layerName = 'pca', outDir = outPlotDir, doplot = TRUE)
d2 <- ses_pca$biasLayer

write.table(resultados, "sesgos_ambientales.txt", sep="\t", col.names = TRUE)


write.table(sesgosDF, "sesgos_tabla_Capas.txt", sep="\t", col.names = TRUE)

14
writeRaster(d, "INDICE_AMBIENTAL.tif", overwrite = TRUE)

Levels Bias index value in Bias


pca pca pca
15

15
2
4.0
10

10
0.15

1
3.5
0.10

3.0
0 0.05
5

5
2.5
0.00

2.0 −0.05
−1
0

0
1.5 −0.10

−0.15
1.0
−2
−5

−5
−3

Interval1

Interval2

Interval3

Interval4

−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68

4. Data base completness

The final component is the species completness. For this case we are gonna to use Jackknife and bootstrap
estimates for two databse columns: original species and accepted species name.

# ---------------------------
# 3. Data base completness
# ---------------------------

spListByCell <- DATOS[!is.na(DATOS$celdas), c('species', 'celdas')]


spListByCellHQ <- DATOS[!is.na(DATOS$celdas) & !is.na(DATOS$acceptedNameUsage), c('acceptedNameUsage', '

freqTable <- table(spListByCell$celdas)


freqTableHQ <- table(spListByCellHQ$celdas)

treshold <- 0

spListByCell <- spListByCell[spListByCell$celdas %in% names(which(freqTable >= treshold)), ]


spListByCellHQ <- spListByCellHQ[spListByCellHQ$celdas %in% names(which(freqTableHQ >= treshold)), ]

estimateS <- richEst(sppList = spListByCell$species, indexID = spListByCell$celdas)

15
estimateSHQ <- richEst(spListByCellHQ$acceptedNameUsage, spListByCellHQ$celdas)

rm(spListByCell, spListByCellHQ)

compRichBoot <- compRichJack <- richJackHQ <- richBootHQ <- richJack <- richBoot <- en_area * 0
richBoot[as.numeric(rownames(estimateS))] <- estimateS$Boot
richJack[as.numeric(rownames(estimateS))] <- estimateS$JNhat
richBootHQ[as.numeric(rownames(estimateSHQ))] <- estimateSHQ$Boot
richJackHQ[as.numeric(rownames(estimateSHQ))] <- estimateSHQ$JNhat

compRichBoot[as.numeric(rownames(estimateS))] <- estimateS$Sobs/estimateS$Boot


compRichJack[as.numeric(rownames(estimateS))] <- estimateS$Sobs/estimateS$JNhat

#compRichBoot[as.numeric(rownames(estimateS)[estimateS$Sobs == 1])] <- NA


#compRichJack[as.numeric(rownames(estimateS)[estimateS$Sobs == 1])] <- NA
compRichJack[compRichJack[] >= 1] <- 1
compRichBoot[compRichBoot[] >= 1] <- 1

#pdf(file = paste0(outPlotDir, "/3 INDICE_COMPLEMENTARIEDAD.pdf "))


par(mfrow=c(2, 2))

plot(richBoot, main = paste('Bootstrap:', nrow(estimateS)))


plot(richJack, main = paste('Jackknife:', nrow(estimateS)))

plot(richBootHQ, main = paste('BootstrapHQ:', nrow(estimateSHQ)))


plot(richJackHQ, main = paste('JackknifeHQ:', nrow(estimateSHQ)))

Bootstrap: 6489 Jackknife: 6489


10

10

3000 4000
2500
3000
2000
5

1500 2000
1000
1000
500
0 0
0

−85 −80 −75 −70 −65 −60 −85 −80 −75 −70 −65 −60

BootstrapHQ: 6228 JackknifeHQ: 6228


10

10

2500 3500
3000
2000 2500
5

1500 2000
1000 1500
1000
500 500
0 0
0

−85 −80 −75 −70 −65 −60 −85 −80 −75 −70 −65 −60

16
plot(compRichBoot, main = paste('Bootstrap:', nrow(estimateS)))
plot(colombia, add = TRUE)
plot(compRichJack, main = paste('Jackknife:', nrow(estimateS)))
plot(colombia, add = TRUE)

richBootVals <- compRichBoot[!is.na(compRichBoot[]) & compRichBoot[] != 0]


richJackVals <- compRichJack[!is.na(compRichJack[]) & compRichJack[] != 0]
hist(richBootVals, main = 'Density Bootstrap', freq = FALSE, xlim = c(0, 1.2))
lines(density(richBootVals), main = 'Density Bootstrap')

hist(richJackVals, main = 'Density Jackknife', freq = FALSE, xlim = c(0, 1.2))


lines(density(richJackVals), main = 'Density Jackknife')

Bootstrap: 6489 Jackknife: 6489


10

10
1.0 1.0
0.8 0.8
0.6 0.6
5

5
0.4 0.4
0.2 0.2
0.0 0.0
0

−85 −80 −75 −70 −65 −60 −85 −80 −75 −70 −65 −60

Density Bootstrap Density Jackknife


2.5
8

2.0
6

1.5
Density

Density
4

1.0
2

0.5
0.0
0

0.0 0.2 0.4 0.6 0.8 1.0 1.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2

richBootVals richJackVals

#dev.off()

writeRaster(compRichBoot, "INDICE_COMPLEMENTARIEDAD_JACK.tif", overwrite=TRUE)


writeRaster(compRichJack, "INDICE_COMPLEMENTARIEDAD_BOOTS.tif", overwrite=TRUE)

5. GSI/Gap selection index

Finally sum the three standardized components using the article formula

# ---------------------------
# 4. GSI / Gap selection index
# ---------------------------

17
# Layer standardization

DENSIDAD <- normalize01(densRegistros)


AMBIENTAL <- normalize01(d)
COMPLEMENTARIEDAD_BOOT <- normalize01(compRichBoot)
COMPLEMENTARIEDAD_JACK <- normalize01(compRichJack)

writeRaster(DENSIDAD,"INDICE_DENSIDAD_EST.tif", overwrite=TRUE)
writeRaster(AMBIENTAL,"INDICE_AMBIENTAL_EST.tif", overwrite=TRUE)
writeRaster(COMPLEMENTARIEDAD_BOOT,"INDICE_COMPLEMENTARIEDAD_BOOT_EST.tif", overwrite=TRUE)
writeRaster(COMPLEMENTARIEDAD_JACK,"INDICE_COMPLEMENTARIEDAD_JACK_EST.tif", overwrite=TRUE)

# Plot
## Records ubication
#png(file = paste0(outPlotDir, "/4 INDICE_final_ubicacion_registros.png")) ##comienza la grafica tipo pd
par(mfrow = c(1, 1))
plot(colombia, main = "UBICACION DE \n LOS REGISTROS")
plot(DATOS2, add = T, pch = 20, cex = 0.1, col = rgb(0.1, 0.2, 1, 0.01))

UBICACION DE
LOS REGISTROS

#dev.off()

#pdf(file = paste0(outPlotDir, "/4 INDICE_final.pdf ")) ##comienza la grafica tipo pdf

## Records density
par(mfrow=c(1, 2))
plot(densRegistros, main = 'Records density', zlim = c(0, quantile(densRegistros[], .95, na.rm = TRUE)))

18
plot(colombia, add = TRUE, border = 'darkgrey')
plot(DENSIDAD, main = 'Records density', zlim = c(0, quantile(DENSIDAD[], .95, na.rm = TRUE)))
plot(colombia, add = TRUE, border = 'darkgrey')

Records density Records density


10

10
700
5e−04
600

500 4e−04
5

5
400
3e−04
300
2e−04
200
1e−04
100
0

0 0e+00

−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68

## Environmental bias
par(mfrow=c(1, 2))
plot(mask(d, colombia), main = 'd Index')
plot(colombia, add = TRUE)
plot(mask(AMBIENTAL, colombia), main = 'd Index \n Normalized')
plot(colombia, add = TRUE)

19
d Index
d Index
10 Normalized

10
1.0
70

60
0.8
50
5

5
0.6
40

30 0.4

20
0.2
10
0

0
0 0.0

−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68

par(mfrow=c(2, 2))
dValsNoNA <- d[!is.na(d[])]
dValsNoNANoZeros <- d[!is.na(d[]) & d[] != 0]
hist(dValsNoNA, main = 'd Index no NA no 0')
hist(dValsNoNANoZeros, main = 'd Index \n no NA no 0')

AMBValsNoNA <- AMBIENTAL[!is.na(AMBIENTAL[])]


AMBValsNoNANoZeros <- AMBIENTAL[!is.na(AMBIENTAL[]) & AMBIENTAL[] != 0]
hist(AMBValsNoNA, main = 'Ambiental no NA')
hist(AMBValsNoNANoZeros, main = 'Ambiental\nno NA no 0')

## Completness
par(mfrow = c(1, 2))

20
d Index
d Index no NA no 0
no NA no 0

15000

5000
4000
10000
Frequency

Frequency

3000
2000
5000

1000
0

0
0 20 40 60 30 40 50 60 70
Ambiental
Ambiental no NA
dValsNoNA dValsNoNANoZeros
no NA no 0

4000
15000

3000
10000
Frequency

Frequency

2000
5000

1000
0

0.0 0.2 0.4 0.6 0.8 1.0 0.4 0.5 0.6 0.7 0.8 0.9 1.0

plot(compRichBoot, main = 'Bootstrap')


plot(colombia, add = TRUE)
plot(compRichJack, main = 'Jackknife')
plot(colombia, add = TRUE)

21
Bootstrap Jackknife
10

10
1.0 1.0

0.8 0.8
5

5
0.6 0.6

0.4 0.4

0.2 0.2
0

0
0.0 0.0

−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68

par(mfrow=c(2, 2))

bootValsNoNA <- compRichBoot[!is.na(compRichBoot[])]


bootValsNoNANoZeros <- compRichBoot[!is.na(compRichBoot[]) & compRichBoot[] != 0]
hist(bootValsNoNA, main = 'Bootstrap no NA')
hist(bootValsNoNANoZeros, main = 'Bootstrap\nno NA no 0')

JacValsNoNA <- compRichJack[!is.na(compRichJack[])]


JacValsNoNANoZeros <- compRichJack[!is.na(compRichJack[]) & compRichJack[] != 0]
hist(JacValsNoNA, main = 'Jackknife no NA no 0')
hist(JacValsNoNANoZeros, main = 'Jackknife no NA no 0')

# GSI / GAP INDEX

GSI_BOOT <- (3 - DENSIDAD - AMBIENTAL - COMPLEMENTARIEDAD_BOOT)/3


GSI_JACK <- (3 - DENSIDAD - AMBIENTAL - COMPLEMENTARIEDAD_JACK)/3

par(mfrow = c(1, 2))

22
Bootstrap
Bootstrap no NA
no NA no 0

1200
7000

1000
5000

800
Frequency

Frequency

600
3000

400
200
1000
0

0
0.0 0.2 0.4 0.6 0.8 1.0 0.75 0.80 0.85 0.90 0.95 1.00

Jackknife no NA no 0
bootValsNoNA Jackknife no NA no 0
bootValsNoNANoZeros
7000

600
500
5000

400
Frequency

Frequency

300
3000

200
1000

100
0

0.0 0.2 0.4 0.6 0.8 1.0 0.5 0.6 0.7 0.8 0.9 1.0

plot(GSI_BOOT, main = 'GAP SELCTION INDEX (BOOT)', zlim = c(0, 1))


plot(colombia, add = TRUE)

plot(GSI_JACK, main = 'GAP SELCTION INDEX (JACK)', zlim = c(0, 1))


plot(colombia, add = TRUE)

23
GAP SELCTION INDEX (BOOT) GAP SELCTION INDEX (JACK)
10

10
1.0 1.0

0.8 0.8
5

5
0.6 0.6

0.4 0.4

0.2 0.2
0

0
0.0 0.0

−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68

densJack <- density(GSI_JACK, main = 'Jackk', xlim = c(0, 1))


densBoot <- density(GSI_BOOT, main = 'Boot', xlim = c(0, 1))

24
Jackk Boot
10

8
8

6
6
Density

Density

4
4

2
2
0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

N = 11705 Bandwidth = 0.01688 N = 13373 Bandwidth = 0.01937

par(mfrow = c(1, 1))


plot(densJack, main = 'GSI values', col = 'blue', ylim = c(0, 10))
lines(densBoot, col = 'red')
legend('topleft', legend = c('Jackknife', 'Bootstrap'),
lty = c(1, 1), lwd = c(1, 1), col = c('blue', 2))

25
GSI values

10
Jackknife
Bootstrap
8
6
Density

4
2
0

0.0 0.2 0.4 0.6 0.8 1.0

N = 11705 Bandwidth = 0.01688

#dev.off()

writeRaster(GSI_BOOT,"INDICE_GSI_BOOT.tif", overwrite = TRUE)


writeRaster(GSI_JACK,"INDICE_GSI_JACK.tif", overwrite = TRUE)

par(mfrow = c(1, 2))


plot(GSI_BOOT, main = 'GAP SELCTION INDEX (BOOT)', zlim = c(0, 1))
plot(colombia, add = TRUE)

plot(GSI_JACK, main = 'GAP SELCTION INDEX (JACK)', zlim = c(0, 1))


plot(colombia, add = TRUE)

26
GAP SELCTION INDEX (BOOT) GAP SELCTION INDEX (JACK)
10

10
1.0 1.0

0.8 0.8
5

5
0.6 0.6

0.4 0.4

0.2 0.2
0

0
0.0 0.0

−78 −76 −74 −72 −70 −68 −78 −76 −74 −72 −70 −68

27

You might also like