Spatial statistics#
Spatial statistics is a branch of statistics that deals with the analysis and interpretation of data that has spatial or geographical components, considering how neighboring locations influence each other. It involves techniques for exploring, modelling, and understanding the patterns and relationships within spatial data. The state-of-the-art approach to spatial data analysis are hierarchical Bayesian models [Banerjee et al., 2003].
Since Gaussian processes are able to cpature correlation at different locations depending on the disctance between those locations, it makes them a foudational tool used in spatial modelling. In GLMMs, the term presented by latent GPs, or their close relatives - multivariate Normal distribution - play the role of the spatial random effect.
Three types of spatial data#
There is three types of spatial data - areal, geostatistical, and point pattern.
Selection of the spatial statistical method is determined by the type of the available information about the data. There are three types of spatial data and corresponding models: point-level (or geostatistical data, areal (or lattice) data and spatial point patterns.
Geostatistical data is a collection of random observations at fixed locations. Spatial proximity is defined via a function of distance between pairs of locations.The goal of geostatistical modelling is to identify the effect of covariates that determine disease risk and to predict the outcome at unsampled locations within the study area (referred to as kriging). The spatial random effect in this context is modelled via kernel-based GPs. See this chapter for more details.
Areal data are individual-level or aggregated data typically consisting of counts or rates with geographical information available over a set of regions with common borders. These areas may correspond to administrative units such as states, districts or counties or a regular grid - lattice. Spatial correlation between areas is implemented based on the neighbouring structure. Analysis of areal data aims to identify trends and spatial patterns and to assess large-scale associations between the disease risk and its predictors. The spatial random effect is captured in areal data models by multivariate Normal distributions. See this chapter for more details.
Point pattern data consists of random locations of events. Dependence between case locations is modelled via a Gaussian process. This type of models are particularly appealing for datasets with precisely known locations of events due to their ability to capture disease clusters and identify factors associated with them. Events of a point pattern, tagged with an additional discrete coordinate, constitute marked point pattern data. One model for analysing point pattern data is log-Gaussian Cox process (LGCP). See a chapter on LGCPs here.