Modelling Spatial Data in R

Tim Newbold

2025-11-27

Overview

In this session, we will explore some of the basic options available for modelling spatial data in R. Given the vast number of methods available for spatial modelling, we will only be able to scratch the surface in this session. However, the tutorial will hopefully give you an introduction into some of the basic approaches we might use to deal with spatial dependence in ecological data.

Dataset and Non-spatial Models

For this tutorial, we will return to the dataset on beetle biodiversity that we used in the tutorial on handling spatial data, but first we will load the R packages that we will be working with this session.

library(dplyr)
library(ggplot2)
library(spdep)
library(spatialreg)
library(INLA)

Now let’s load the beetle data.

url("https://www.dropbox.com/scl/fi/4olcttgagt0sd92hg64vk/Ewers2007BeetleData.rds?rlkey=14c8qby2023lt9924bi3th0ee&dl=1") %>%
  readRDS() -> beetles

As a reminder, this dataset contains samples of beetle species richness at sites in New Zealand. Sites were sampled either in natural forest (primary vegetation) or in livestock pastures. In the session on spatial data, we added estimates of the proportion of natural habitat in the 5 kilometres surrounding each site (the dataset loaded includes these estimates).

We will start with a simple model of species richness as a function of the interaction between land use and the proportion of natural habitat in the surrounding landscape.

m1 <- glm(data = beetles,
          formula = Species_richness~LandUse+NaturalHabitat5k+
            LandUse:NaturalHabitat5k,
          family="poisson")

If we inspect the model summary, it appears that beetle species richness is significantly lower in pasture compared to primary vegetation, that there is a weak negative relationship between surrounding natural habitat and species richness in primary vegetation, but that this relationship with surrounding natural habitat is significantly more positive in pastures than in primary vegetation.

summary(m1)

## 
## Call:
## glm(formula = Species_richness ~ LandUse + NaturalHabitat5k + 
##     LandUse:NaturalHabitat5k, family = "poisson", data = beetles)
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                      4.04897    0.03022 134.002  < 2e-16 ***
## LandUsePasture                  -1.10054    0.06768 -16.261  < 2e-16 ***
## NaturalHabitat5k                -0.32636    0.10898  -2.995  0.00275 ** 
## LandUsePasture:NaturalHabitat5k  2.16088    0.31795   6.796 1.07e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 2034.4  on 232  degrees of freedom
## Residual deviance: 1014.5  on 229  degrees of freedom
## AIC: 2289.9
## 
## Number of Fisher Scoring iterations: 4

Plotting the predictions from this model seems to support this conclusion.

preds <- predict(object = m1,se.fit = TRUE)

beetles$PredLM <- exp(preds$fit)
beetles$PredLMLower <- exp(preds$fit - 1.96 * preds$se.fit)
beetles$PredLMUpper <- exp(preds$fit + 1.96 * preds$se.fit)

p <- ggplot(data = beetles) + 
  geom_point(mapping = aes(x = NaturalHabitat5k,y = Species_richness,
                           colour = LandUse)) + 
  geom_line(mapping = aes(x = NaturalHabitat5k,y = PredLM,
                          colour = LandUse)) + 
  geom_ribbon(mapping = aes(x = NaturalHabitat5k,
                            ymin = PredLMLower,ymax = PredLMUpper,
                            fill = LandUse),
              alpha = 0.2) + 
  theme_classic()

p

However, we need to consider that the data we are analyzing are drawn from a spatially structured survey of sites. In spatial ecological data it is common to detect positive spatial autocorrelation in our response variable, and thus in the residuals of a simple model of the data. When data are positively spatially autocorrelated, it means that values sampled nearby tend to be more similar to each other than values sampled further apart. Positive spatial autocorrelation in ecological data occurs for a number of reasons. One key reason is that nearby ecosystems may be connected by dispersal of organisms between them. A second main reason is that the environment itself is spatially structured, such that environmental conditions tend to be more similar for nearby locations and less similar for locations further apart, all else being equal. Residual spatial autocorrelation can also occur if we have omitted an important spatially structured variable from our analysis.

Visualizing Spatial Autocorrelation

Before visualizing or testing for spatial autocorrelation, we first need to create a set of spatial weights, based on the spatial proximity of the locations we are considering in our analysis. To do this, we first create a list of spatial neighbours using the tri2nb function in the spdep package. Next, we convert these to a list of spatial weights using the nb2listw function in the same package.

# Identify spatial neighbourhoods
nb1 <- tri2nb(coords = beetles[,c('Longitude','Latitude')])
# Convert spatial neighbourhoods into spatial weights
sw1 <- nb2listw(neighbours = nb1,style = "W")

To visualize potential spatial autocorrelation in our data and model residuals, we can use a Moran plot.

We will start by showing spatial autocorrelation in our response variable (species richness). The moran.plot function generates a data frame with, among other things, the raw values of the specified variable (here species richness), labelled x in the data frame produced by the function, and a spatially weighted average of the variable at neighbouring locations, which is labelled wx in the data frame.

We will plot points identified as influential, represented in the is_inf column, as large diamonds, and label them with the corresponding row number from the original dataset.

We will add a trend line to show any correlation between raw values and neighbouring values. The positive relationship that we observe here indicates the presence of positive spatial autocorrelation. In other words, smaller values of species richness tend to be associated with smaller values in neighbouring locations, while larger values tend to be associated with larger values in neighbouring locations.

mp <- moran.plot(x = beetles$Species_richness,listw = sw1,plot = FALSE)

p <- ggplot(data = mp,mapping = aes(x = x,y = wx)) + 
  geom_point(shape = 16) + 
  geom_smooth(formula = y~x, method="lm",colour="#5DA899",fill="#5DA899") + 
  geom_hline(yintercept=mean(mp$wx), lty=2) + 
  geom_vline(xintercept=mean(mp$x), lty=2) + 
  geom_point(data=mp[mp$is_inf,], aes(x=x, y=wx), shape=18,size=4) + 
  geom_text(data=mp[mp$is_inf,], aes(x=x, y=wx, label=labels, vjust=1.5)) + 
  scale_x_continuous(name = "Species Richness") + 
  scale_y_continuous(name = "Weighted Avg. Neighbour Species Richness") + 
  theme_classic()

p

We can produce the same plot but for the residuals of our simple non-spatial model. This highlights again the significant residual spatial autocorrelation, shown by the fact that positive model residuals tend to be associated with positive residuals in the surrounding neighbourhood, while negative residuals tend to be associated with negative residuals in the surrounding neighbourhood.

mp <- moran.plot(x = residuals(m1),listw = sw1,plot = FALSE)

p <- ggplot(data = mp,mapping = aes(x = x,y = wx)) + 
  geom_point(shape = 16) + 
  geom_smooth(formula = y~x, method="lm",colour="#5DA899",fill="#5DA899") + 
  geom_hline(yintercept=mean(mp$wx), lty=2) + 
  geom_vline(xintercept=mean(mp$x), lty=2) + 
  geom_point(data=mp[mp$is_inf,], aes(x=x, y=wx), shape=18,size=4) + 
  geom_text(data=mp[mp$is_inf,], aes(x=x, y=wx, label=labels, vjust=1.5)) + 
  scale_x_continuous(name = "Model Residuals") + 
  scale_y_continuous(name = "Weighted Avg. Neighbour Model Residual") + 
  theme_classic()

p

Testing for Spatial Autocorrelation

The Moran plot suggests the existence of positive spatial autocorrelation, both in the original estimates of species richness as well as in the residuals of our simple, non-spatial model. However, we should use a statistical test to check whether this apparent spatial autocorrelation is statistically significant. The test to use in this case is called a Moran test. You can run this test using the moran.test function in the spdep package.

# Run Moran test for spatial autocorrelation
mt1 <- moran.test(x = residuals(m1),listw = sw1)

mt1

## 
##  Moran I test under randomisation
## 
## data:  residuals(m1)  
## weights: sw1    
## 
## Moran I statistic standard deviate = 7.5708, p-value = 1.854e-14
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.289984067      -0.004310345       0.001511044

Inspecting the results of the Moran test reveals that we have strong and significant positive spatial autocorrelation in the residuals of our simple generalized linear model.

Accounting for Spatial Dependency - Spatial Autoregression

Running Spatial Autoregressive Models

The existence of positive spatial autocorrelation in the residuals of our model indicates that we have spatial dependency in our data: the values that we are recording at one location depend on the values recorded in nearby locations. This violates the assumption of independence of data points made by standard parametric statistical tests. Even if we are not using classical statistical tests, the existence of spatial structure in the data may bias the results of our analyses.

Therefore, to make robust inferences with our data, we need some way of accounting for the spatial dependency in the data. One simple way of doing this is to use spatial autoregression, with a number of functions to do this provided in the spatialreg package.

We have a two key decisions to make in running a spatial autoregressive model:

Whether to account for spatial dependency in just the model error (spatial error model), for just the response variable (spatial lag model), or for both the response variable and the model error (simultaneous autoregression model).
Whether to inclucde a spatial term just for the response variable or error (simple spatial autoregression), or for both the response/error and also the explanatory variables (spatial Durbin model).

Let’s try all combinations of these options, and compare model fit.

These spatial autoregressive models are an adaptation of linear models, and so can’t handle data drawn from a Poisson distribution. Therefore, we will have to log-transform species richness prior to analysis. As discussed in the session on generalized linear modelling, this is not best practice for count, but is unavoidable at the moment.

For comparison, we will also re-run the simple non-spatial model as a linear model with log-transformed richness.

beetles$LogRichness <- log(beetles$Species_richness)

# Run simple non-spatial linear model with log-transformed species richness
m1.nonspat <- lm(data = beetles,formula = LogRichness~LandUse+NaturalHabitat5k+
    LandUse:NaturalHabitat5k)

# Spatial dependence in model error only (spatial error models)
# Only the error term (simple spatial error model)
m1.spat.sar.err <- spatialreg::errorsarlm(
  data = beetles,formula = LogRichness~LandUse+NaturalHabitat5k+
    LandUse:NaturalHabitat5k,
  listw = sw1)
# Error term and explanatory variables (spatial Durbin error model)
m1.spat.sar.err.d <- spatialreg::errorsarlm(
  data = beetles,formula = LogRichness~LandUse+NaturalHabitat5k+
    LandUse:NaturalHabitat5k,
  listw = sw1,Durbin = TRUE)
# Spatial dependence in the response variable only (spatial lag models)
# Only the response variable (simple spatial lag model)
m1.spat.sar.lag <- spatialreg::lagsarlm(
  data = beetles,formula = LogRichness~LandUse+NaturalHabitat5k+
    LandUse:NaturalHabitat5k,listw = sw1)
# Response variable and explanatory variables (spatial Durbin lag model)
m1.spat.sar.lag.d <- spatialreg::lagsarlm(
  data = beetles,formula = LogRichness~LandUse+NaturalHabitat5k+
    LandUse:NaturalHabitat5k,listw = sw1,Durbin = TRUE)
# Spatial dependence in the model error and response variable
# (simultaneous autoregression)
# Only error and response variable (simple simultaneous autoregression)
m1.spat.sar.sac <- spatialreg::sacsarlm(
  data = beetles,formula = LogRichness~LandUse+NaturalHabitat5k+
    LandUse:NaturalHabitat5k,listw = sw1)
# Error and response variable, and also explanatory variables
# (spatial Durbin simultaneous autoregression)
m1.spat.sar.sac.d <- spatialreg::sacsarlm(
  data = beetles,formula = LogRichness~LandUse+NaturalHabitat5k+
    LandUse:NaturalHabitat5k,listw = sw1,Durbin = TRUE)

If you inspect the output of these models, you will see a number of parameters that describe the strength of spatial dependency in the data. The lambda parameter describes the strength of spatial dependency in the model error (spatial error and simultaneous autoregression models). The rho parameter describes the strength of spatial dependency in the response variable (spatial lag and simultaneous autoregression models). Finally for the spatial Durbin models, we get model parameters describing spatial lagged effects for each of the explanatory variables, as well as interaction terms if specified (i.e., how do the values of the explanatory variables in neighbouring locations affect the response variable in the focal location).

Comparing Spatial Autoregressive Models

We can compare the AIC values of these models to see which appears to fit the data most parsimoniously. For comparison, we will also include the simple non-spatial model in this comparison.

AIC(m1.nonspat,
    m1.spat.sar.err,m1.spat.sar.err.d,
    m1.spat.sar.lag,m1.spat.sar.lag.d,
    m1.spat.sar.sac,m1.spat.sar.sac.d)

##                   df      AIC
## m1.nonspat         5 157.6132
## m1.spat.sar.err    6 115.2757
## m1.spat.sar.err.d  9 117.4660
## m1.spat.sar.lag    6 133.1647
## m1.spat.sar.lag.d  9 116.9247
## m1.spat.sar.sac    7 117.2337
## m1.spat.sar.sac.d 10 118.4942

This AIC comparison suggests that all of the spatial models outperform the simple non-spatial model, but that the simplest spatial model, which accounts for spatial autoregression only in the model error, is the best fitting one.

Let’s check if this model has been successful in removing residual spatial autocorrelation.

mt.sar.err <- moran.test(x = residuals(m1.spat.sar.err),listw = sw1)
mt.sar.err

## 
##  Moran I test under randomisation
## 
## data:  residuals(m1.spat.sar.err)  
## weights: sw1    
## 
## Moran I statistic standard deviate = 0.32121, p-value = 0.374
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.008203129      -0.004310345       0.001517687

This test shows that there is no longer significant spatial autocorrelation in the residuals of this spatial model, suggesting that it has successfully dealt with the issue of spatial dependency.

If we inspect the summary of this model, we can see that the conclusions of our non-spatial generalized linear model are still supported: there is a significant reduction in beetle species richness in pastures compared to primary vegetation, and a significantly more positive effect of surrounding natural habitat on beetle species richness in pastures compared to primary vegetation.

In this model summary, the lambda statistic describes the strength of spatial autoregression in the model error.

summary(m1.spat.sar.err)

## 
## Call:
## spatialreg::errorsarlm(formula = LogRichness ~ LandUse + NaturalHabitat5k + 
##     LandUse:NaturalHabitat5k, data = beetles, listw = sw1)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -1.032809 -0.185751  0.011248  0.192718  0.735194 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##                                   Estimate Std. Error z value  Pr(>|z|)
## (Intercept)                      3.9216374  0.1138541 34.4444 < 2.2e-16
## LandUsePasture                  -1.1030177  0.1514292 -7.2840  3.24e-13
## NaturalHabitat5k                 0.0058271  0.4220423  0.0138  0.988984
## LandUsePasture:NaturalHabitat5k  2.2379340  0.7444244  3.0063  0.002645
## 
## Lambda: 0.58142, LR test value: 44.338, p-value: 2.7636e-11
## Asymptotic standard error: 0.070533
##     z-value: 8.2432, p-value: 2.2204e-16
## Wald statistic: 67.95, p-value: 2.2204e-16
## 
## Log likelihood: -51.63784 for error model
## ML residual variance (sigma squared): 0.085028, (sigma: 0.29159)
## Number of observations: 233 
## Number of parameters estimated: 6 
## AIC: 115.28, (AIC for lm: 157.61)

Predictions and Plotting with Spatial Autoregressive Models

If we run the built-in prediction function on a spatial autoregression model, we get something rather different than when we make predictions for simple linear regression models or generalized linear models.

preds.sar.err <- data.frame(predict(m1.spat.sar.err))

head(preds.sar.err)

##        fit    trend      signal
## 1 3.764433 3.923511 -0.15907706
## 2 3.846128 3.923511 -0.07738252
## 3 3.835814 3.923511 -0.08769684
## 4 3.805651 3.923511 -0.11785982
## 5 3.835031 3.923511 -0.08847904
## 6 3.838180 3.923511 -0.08533012

Here, the ‘trend’ column represents the fitted effect of the explanatory variables (land use and surrounding natural habitat), ignoring the spatial dependency, the ‘signal’ column represents the effect of spatial dependency, and the ‘fit’ column contains the overall predictions from the model, combining the effects of the explanatory variables with the effect of spatial dependency (i.e., ‘trend’ + ‘signal’).

For plotting purposes, we will rename these columns to make sure we can identify them later, and add them to the original dataset.

names(preds.sar.err) <- c("FitSARErr","TrendSARErr","SignalSARErr")
# Try binding the predictions with the original dataset
# Will fail if this operation has already been run once
try(beetles %>% bind_cols(preds.sar.err,.name_repair = "check_unique") -> beetles)

If we plot the overall predictions on a graph, the effect of the spatial dependency means it is hard to discern the response to the explanatory variables.

p <- ggplot(data = beetles,mapping = aes(x = NaturalHabitat5k)) + 
  geom_point(mapping = aes(y = Species_richness,colour=LandUse)) + 
  geom_line(mapping = aes(y = exp(FitSARErr),colour=LandUse),linetype = "dashed") + 
  theme_classic()

p

We can also plot the ‘trend’ column to represent the effect of the explanatory variables, but ignoring spatial dependency. This can yield values that are on a different scale to the original species richness measurements, although not in this case because our model only has an autogressive term for the model error (if you re-run the code, but making predictions with one of the more complex models, you will see what I mean).

p <- ggplot(data = beetles,mapping = aes(x = NaturalHabitat5k)) + 
  geom_point(mapping = aes(y = Species_richness,colour=LandUse)) + 
  geom_line(mapping = aes(y = exp(TrendSARErr),colour=LandUse)) + 
  geom_line(mapping = aes(y = exp(FitSARErr),colour=LandUse),linetype = "dashed") + 
  theme_classic()

p

Another problem that we have with the predictions from a spatial autoregression is that we don’t automatically get an estimate of the uncertainty in our predictions. We can use a work-around, manually calculating the confidence in our model estimates from the model coefficients and the variance-covariance matrix (which gives the uncertainty associated with each coefficient as well as covariance between each pair of coefficients). Although this is the standard approach for obtaining confidence in the estimates made by classical statistical models, it is a bit of a fudge for spatial models (ignoring the spatial component of the model), and so we should treat these uncertainty estimates with some caution!

# Obtain the model design matrix (this represents the values of the
# explanatory variables for each row of data)
mm <- model.matrix(formula(m1.spat.sar.err),data = beetles)

# Extract the model coefficients
coefs <- coef(m1.spat.sar.err)

# Remove the 'lambda' coefficient, which gets in the way when we try
# to make predictions
coefs <- coefs[-which(names(coefs)=='lambda')]

# Obtain the variance-covariance matrix from the model
vc <- vcov(m1.spat.sar.err)

# Remove rows and columns associated with the 'lambda' parameter, which again
# cause problems when making predictions
vc <- vc[-which(rownames(vc)=='lambda'),]
vc <- vc[,-which(colnames(vc)=='lambda')]

# Multiply the model design matrix by the model-estimated coefficients to
# obtain the predicted values (these are the same as the 'trend' estimates
# we obtained earlier with the predict function)
pred <- mm %*% coefs

# Obtain standard error around the predictions from the variance-covariance matrix
se_pred <- sqrt(diag(mm %*% vc %*% t(mm)))

# Add the predictions and confidence intervals to the original data frame
beetles$PredSARErr <- exp(pred)
beetles$PredSARErrLwr <- exp(pred - 1.96 * se_pred)
beetles$PredSARErrUpr <- exp(pred + 1.96 * se_pred)


p <- ggplot(data = beetles,mapping = aes(x = NaturalHabitat5k)) + 
  geom_point(mapping = aes(y = Species_richness,colour=LandUse)) + 
  geom_line(mapping = aes(y = PredSARErr,colour=LandUse)) + 
  geom_ribbon(mapping = aes(ymin = PredSARErrLwr,ymax = PredSARErrUpr,
                            fill=LandUse),alpha=0.2) + 
  theme_classic()

p

Accounting for Spatial Dependency - INLA Models

In the final section of the tutorial, we will work with a powerful, flexible and very fast-running set of Bayesian methods for developing models with spatial dependency, using the INLA package, which uses Integrated Nested Laplace Approximation to find parameter values.

We will only here scratch the surface of what these models can do. If you want to find out more, this website has a lot of useful materials.

We will here run a conditional autoregressive Besag model to capture the spatial dependency (see this tutorial for further details).

We first need to define the neighbourhood of sampled locations again, and now convert this to an INLA graph object. We also need to create a variable that identifies each row in the data, for linking to the spatial neighbourhoods object.

nb <- tri2nb(coords = beetles[,c('Longitude','Latitude')])
nb.mat <- nb2mat(neighbours = nb,style = "B",zero.policy = TRUE)
nb.graph <- inla.read.graph(nb.mat)

beetles$ID <- 1:nrow(beetles)

We can now create our model formula, which contains the standard formulation for relating species richness to land use and natural habitat, as well as the spatial dependency component, which specifies a Besag model.

mod.formula <- Species_richness~LandUse+NaturalHabitat5k+LandUse:NaturalHabitat5k+
  f(ID, model="besag",graph = nb.graph)

Now, we can run our INLA model. We need to specify that we want to return the model correlation matrix, which allows us to extract the variances and covariances of the model parameters, for plotting uncertainty.

m1.spat.inla <- inla(formula = mod.formula,family = "poisson",
                     data = beetles,
                     control.fixed = list(correlation.matrix=TRUE))

Inspecting the summary of this model, you can see the coefficient estimates (with uncertainty). These values should now hopefully be at least somewhat familiar.

summary(m1.spat.inla)

## Time used:
##     Pre = 1.16, Running = 1.99, Post = 0.415, Total = 3.56 
## Fixed effects:
##                                   mean    sd 0.025quant 0.5quant 0.975quant
## (Intercept)                      3.899 0.128      3.647    3.900      4.151
## LandUsePasture                  -1.172 0.151     -1.468   -1.172     -0.875
## NaturalHabitat5k                 0.228 0.537     -0.826    0.227      1.283
## LandUsePasture:NaturalHabitat5k  2.385 0.750      0.911    2.386      3.857
##                                   mode kld
## (Intercept)                      3.900   0
## LandUsePasture                  -1.172   0
## NaturalHabitat5k                 0.227   0
## LandUsePasture:NaturalHabitat5k  2.386   0
## 
## Linear combinations (derived):
##                                 ID   mean    sd 0.025quant 0.5quant 0.975quant
## (Intercept)                      1  3.899 0.128      3.648    3.900      4.151
## LandUsePasture                   2 -1.172 0.151     -1.468   -1.172     -0.876
## NaturalHabitat5k                 3  0.228 0.537     -0.826    0.227      1.283
## LandUsePasture:NaturalHabitat5k  4  2.385 0.750      0.912    2.386      3.856
##                                   mode kld
## (Intercept)                      3.900   0
## LandUsePasture                  -1.172   0
## NaturalHabitat5k                 0.227   0
## LandUsePasture:NaturalHabitat5k  2.386   0
## 
## Random effects:
##   Name     Model
##     ID Besags ICAR model
## 
## Model hyperparameters:
##                  mean    sd 0.025quant 0.5quant 0.975quant mode
## Precision for ID 3.68 0.525       2.77     3.65       4.82 3.57
## 
## Marginal log-Likelihood:  -1102.39 
##  is computed 
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

We can plot the model-estimated effects of land use and natural habitat from this model in a very similar way as for the simple spatial autoregressive models we used earlier.

coefs <- m1.spat.inla$summary.fixed$mean
names(coefs) <- m1.spat.inla$names.fixed

vc <- m1.spat.inla$misc$lincomb.derived.covariance.matrix

pred <- mm %*% coefs

se_pred <- sqrt(diag(mm %*% vc %*% t(mm)))

beetles$PredINLA <- exp(pred)
beetles$PredINLALower <- exp(pred - 1.96 * se_pred)
beetles$PredINLAUpper <- exp(pred + 1.96 * se_pred)

beetles$FittedINLA <- m1.spat.inla$summary.fitted.values$mean

p <- ggplot(data = beetles) + 
  geom_point(mapping = aes(x = NaturalHabitat5k,y = Species_richness,
                           colour = LandUse)) +
  geom_line(mapping = aes(x = NaturalHabitat5k,y = FittedINLA,
                          colour = LandUse),linetype = 2,alpha = 0.7) + 
  geom_line(mapping = aes(x = NaturalHabitat5k,y = PredINLA,
                          colour = LandUse)) +
  geom_ribbon(mapping = aes(x = NaturalHabitat5k,
                            ymin = PredINLALower,ymax = PredINLAUpper,
                            fill = LandUse),
              alpha = 0.2) +
  theme_classic()

p

Finally, let’s run a Moran test to assess the spatial autocorrelation in the residuals of this INLA model. This test shows that there is no significant residual spatial autocorrelation with this model.

inla.resids <- beetles$LogRichness - log(beetles$FittedINLA)

nb <- tri2nb(beetles[,c('Longitude','Latitude')])
w <- nb2listw(neighbours = nb,style = "W")
mt <- moran.test(x = inla.resids,listw = w)

mt

## 
##  Moran I test under randomisation
## 
## data:  inla.resids  
## weights: w    
## 
## Moran I statistic standard deviate = -0.24148, p-value = 0.5954
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      -0.013637339      -0.004310345       0.001491834

Exercises

Download a dataset that describes recorded mammal species richness across Egypt:

url("https://www.dropbox.com/scl/fi/hy5zlgtav7mguncqs9i0h/EgyptMammalSpeciesRichness.rds?rlkey=vavnb7d0appd16q1d14tkau46&dl=1") %>% readRDS -> EgyptMammalSpeciesRichness

The columns in this dataset describe recorded mammal species richness (SpRich), annual mean temperature (AnnualTemp), total annual precipitation (AnnualPrecip), as well as the Longitude and Latitude spatial coordinates.

Try making some simple generalized linear models of species richness as a function of the climate variables. Then, test for spatial autocorrelation, and try out some of the spatial modelling techniques we have learned in the preceding tutorial.