Areal data modelling#

Areal data#

In spatial statistics, areal data refers to data that are aggregated or summarized within predefined geographic areas or regions, such as census tracts, zip code areas, counties, or administrative districts. Unlike point-referenced data, which represent individual locations with specific coordinates, areal data are associated with spatial units that cover an area on the map.

Areal data are commonly used in spatial statistics since this is often the level at which data are available, and, at the same time, modelling at areal level is useful since this is the level at which public policy decisions are typically made.

#!pip install geopandas

import geopandas as gpd

import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
from matplotlib.cm import ScalarMappable
from shapely.geometry import Polygon

import numpy as np

import networkx as nx

import jax
import jax.numpy as jnp

import numpyro
import numpyro.distributions as dist
from numpyro.infer import MCMC, NUTS
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Geopandas#

Before we start looking at statististical models of areal data, let us get familiar with the geopandas package.

geopandas is a Python library that provides convenient, unified access to the functionalities of the pandas library and additional spatial data manipulation tools. It is built upon the capabilities of pandas, allowing for easy manipulation, analysis, and visualization of geospatial data.

Here’s a brief tutorial to get you started with GeoPandas:

Installation#

First, you need to install GeoPandas along with its dependencies. You can install GeoPandas using pip, a Python package manager, by executing the following command in your terminal or command prompt:

!pip install geopandas

Importing#

Once GeoPandas is installed, you can import it in your Python script or Jupyter Notebook:

import geopandas as gpd

Reading Spatial Data#

GeoPandas can read various spatial data formats such as shapefiles, GeoJSON, and GeoPackage. You can read spatial data using the read_file() function:

# Reading a shapefile

# you can try the command below or download the data manually
# !wget -O south-africa.geojson https://github.com/codeforgermany/click_that_hood/blob/main/public/data/south-africa.geojson

gdf_sa = gpd.read_file('data/south-africa.geojson')

Exploring Data#

GeoPandas extends the capabilities of pandas DataFrames to work with spatial data. You can use familiar pandas methods to explore and manipulate GeoDataFrames:

gdf_sa.head()
name cartodb_id created_at updated_at geometry
0 Alfred Nzo District 1 2015-05-21 14:28:26+00:00 2015-05-21 14:28:26+00:00 MULTIPOLYGON (((28.20686 -30.29096, 28.20691 -...
1 Amajuba District 2 2015-05-21 14:28:26+00:00 2015-05-21 14:28:26+00:00 MULTIPOLYGON (((29.61686 -28.04684, 29.62217 -...
2 Amathole District 3 2015-05-21 14:28:26+00:00 2015-05-21 14:28:26+00:00 MULTIPOLYGON (((25.88731 -32.4868, 25.89398 -3...
3 Bojanala Platinum District 4 2015-05-21 14:28:26+00:00 2015-05-21 14:28:26+00:00 MULTIPOLYGON (((26.38498 -25.05934, 26.41811 -...
4 Buffalo City Metropolitan 5 2015-05-21 14:28:26+00:00 2015-05-21 14:28:26+00:00 MULTIPOLYGON (((27.15761 -32.83846, 27.15778 -...
# Displaying the first few rows of the GeoDataFrame
print(gdf_sa.head())

# Summary statistics
print(gdf_sa.describe())

# Checking column names
print(gdf_sa.columns)

# how many rows?
print(gdf_sa.shape)
                         name  cartodb_id                created_at  \
0         Alfred Nzo District           1 2015-05-21 14:28:26+00:00   
1            Amajuba District           2 2015-05-21 14:28:26+00:00   
2           Amathole District           3 2015-05-21 14:28:26+00:00   
3  Bojanala Platinum District           4 2015-05-21 14:28:26+00:00   
4   Buffalo City Metropolitan           5 2015-05-21 14:28:26+00:00   

                 updated_at                                           geometry  
0 2015-05-21 14:28:26+00:00  MULTIPOLYGON (((28.20686 -30.29096, 28.20691 -...  
1 2015-05-21 14:28:26+00:00  MULTIPOLYGON (((29.61686 -28.04684, 29.62217 -...  
2 2015-05-21 14:28:26+00:00  MULTIPOLYGON (((25.88731 -32.4868, 25.89398 -3...  
3 2015-05-21 14:28:26+00:00  MULTIPOLYGON (((26.38498 -25.05934, 26.41811 -...  
4 2015-05-21 14:28:26+00:00  MULTIPOLYGON (((27.15761 -32.83846, 27.15778 -...  
       cartodb_id
count   52.000000
mean    26.500000
std     15.154757
min      1.000000
25%     13.750000
50%     26.500000
75%     39.250000
max     52.000000
Index(['name', 'cartodb_id', 'created_at', 'updated_at', 'geometry'], dtype='object')
(52, 5)

Plot the shapefile#

We could plot just the boundaries, or values attributed to each area.

fig, ax = plt.subplots(1, 2, figsize=(15,10))

# Extract the boundary geometries
boundary_gdf = gdf_sa.boundary

# Plot the boundary geometries
boundary_gdf.plot(ax=ax[0], color='black', lw=0.5)
ax[0].set_title('South Africa: borders of administrative units')

gdf_sa.plot(column='name', ax=ax[1])
ax[1].set_title('South Africa: values')

plt.show()
_images/e7dd414cfcbf8dc55e3e4d2fb2d68a9b3ff0b57704b767d4bbfb8ede255f4031.png

Saving Data#

You can save GeoDataFrames to various spatial data formats using the to_file() method:

gdf.to_file('path/to/output.shp')

Create data manually#

If required, we can also create a geopandas data frame manually:

# create a GeoDataFrame with administrative boundaries (made-up data for illustration)
data = {'geometry': [Polygon([(0, 0), (0, 2), (2, 2), (2, 0)]),  # Polygon 1
                     Polygon([(2, 0), (2, 2), (4, 2), (4, 0)]),  # Polygon 2
                     Polygon([(0, 2), (1, 4), (3, 4), (3, 2)]),  # Polygon 3
                     Polygon([(3, 2), (3, 6), (6, 4), (5, 2)]),  # Polygon 4
                     Polygon([(4, 0), (4, 2), (6, 2), (6, 0)]),  # Polygon 5
                     Polygon([(4, 2), (4, 4), (6, 4), (6, 2)])],  # Polygon 6
        'Name': ['Admin District 1', 'Admin District 2', 'Admin District 3',
                 'Admin District 4', 'Admin District 5', 'Admin District 6'],  # Names of admin districts
        'Population': [5000, 7000, 4000, 8000, 6000, 3000]}  # Population counts for each admin district


gdf_toy = gpd.GeoDataFrame(data)
# plot the administrative boundaries with population size represented by color
fig, ax = plt.subplots(figsize=(8, 5))

# define colormap
cmap = plt.cm.viridis  # change the colormap here
norm = Normalize(vmin=gdf_toy['Population'].min(), vmax=gdf_toy['Population'].max())
sm = ScalarMappable(norm=norm, cmap=cmap)
sm.set_array([])  # necessary to use sm with plt.colorbar

# plot each administrative boundary with color based on population size
for idx, row in gdf_toy.iterrows():
    color = sm.to_rgba(row['Population'])
    gdf_toy.iloc[[idx]].plot(ax=ax, color=color, edgecolor='black')

# add colorbar with a shorter length
cbar = plt.colorbar(sm, ax=ax, shrink=0.5)  # adjust the shrink parameter to change the length of the color bar
cbar.set_label('Population Size')

plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.title('Example of areal data')
plt.grid(True)
plt.tight_layout()
plt.show()
_images/178162bed0281824b27206e921acc46402d6609e6686b81c67bf5700e0fb9245.png

Adjancency structure#

Most models of areal data rely on the concept of adjacency matrix, i.e. a matrix \(A = (a_{ij})\) with entries

\[\begin{split} a_{ij} = \begin{cases} 1, & \text{if areas } B_i \text{ and } B_j \text{ are neighbours},\\ 0, & \text{otherwise}. \end{cases} \end{split}\]

The adjacency matrix must be symmetric:

\[\begin{split} A = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{12} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{1n} & a_{2n} & \cdots & a_{nn} \end{pmatrix} \end{split}\]

Let us compute adjacency matrix for the toy data, as well as the South Africa example.

def compute_adjacency_matrix(gdf):
    num_geometries = len(gdf)
    adjacency_matrix = np.zeros((num_geometries, num_geometries), dtype=int)
    
    for idx1, geometry1 in enumerate(gdf.geometry):
        for idx2, geometry2 in enumerate(gdf.geometry):
            if idx1 != idx2:
                if geometry1.touches(geometry2):  # Check if geometries share a common boundary
                    adjacency_matrix[idx1, idx2] = 1
    
    return adjacency_matrix

def visualize_adjacency_matrix(matrix):
    plt.figure(figsize=(6, 4))
    plt.imshow(matrix, cmap='binary', origin='lower')
    plt.title('Adjacency Matrix')
    plt.xlabel('Geometry Index')
    plt.ylabel('Geometry Index')
    plt.grid(False)
    plt.show()
adjacency_matrix_toy = compute_adjacency_matrix(gdf_toy)
adjacency_matrix_sa = compute_adjacency_matrix(gdf_sa)

# Visualize the adjacency matrices
visualize_adjacency_matrix(adjacency_matrix_toy)
visualize_adjacency_matrix(adjacency_matrix_sa)
_images/e90065af1eb8cf8baa6c189601a417256f1c3dd9b1125122862d115e09af6260.png _images/05f5385eb68fbdfd872eb9340e0f2076427215d7cb1796b6d0b2a5804e1a6d75.png

To understand adjacency, we can think about areal data as graphs with binary edges. Here each area represents a vertex in a graph, and an edge exists between vertices \(i\) and \(j\) if areas \(B_i\) and \(B_j\) share a border.

def adj_graph(gdf, size=(5,5), ttl='Adjacency Graph: toy data', plot=True):
    
    # create a empty graph
    nx_graph = nx.Graph()

    # add nodes with their coordinates and area names
    for idx, geometry in enumerate(gdf.geometry):
        centroid = geometry.centroid
        nx_graph.add_node(idx, pos=(centroid.x, centroid.y))

    adjacency_matrix = compute_adjacency_matrix(gdf)    

    # add edges based on adjacency
    for i, j in zip(*np.where(adjacency_matrix == 1)):
        nx_graph.add_edge(i, j)

    # visualize the graph with nodes aligned with actual locations and labeled with area names
        pos = nx.get_node_attributes(nx_graph, 'pos')

    if plot:
        fig, ax = plt.subplots(1, 1, figsize=size)
        nx.draw(nx_graph, pos, with_labels=True, node_size=300, node_color="skyblue", font_size=8, font_color="black", ax=ax)
        plt.title(ttl)
        plt.show()

    return nx_graph, pos
_,_ = adj_graph(gdf_toy)
_,_ =adj_graph(gdf_sa, size=(8,8), ttl='Adjacency Graph: SA data')
_images/748172c8bd76c778052b3c3c17b20735a6909524dfe6ff9b30093ede1899ade6.png _images/432199a3b1bcb1cab73184c8d57b76eece837c1b683b863954bd17e79c864bc1.png

Models of areal data#

In models for areal data, the geographic units are denoted by \(B_i\), and the data are typically sums or averages of variables over these blocks. To introduce spatial association, we define a neighborhood structure based on the arrangement of the blocks in the map. Once the neighborhood structure is defined, one can construct models models resembling autoregressive time series models, such as the intrinsic conditionally autoregressive model (ICAR, also called Besag) and conditionally autoregressive model (CAR).

All of the classical have the same functional form

\[ f \sim \mathcal{N}(0, Q^-) \]

and differ in how they define the precision matrix \(Q\).

The Besag, or ICAR, model#

The Besag, or ICAR (Intrinsic Conditional Autoregressive) model assumes that each location’s value is conditionally dependent on the values of its neighbors. This conditional dependency is often modeled as a weighted sum of the neighboring values, where the weights are determined by the neighborhood structure. Conditional mean of the random effect \(f_i\) is an average of its neighbours \(\{f_j \}_{j\sim i}\) and its precision is proportional to the number of neighbours. It describes the vector of spatially varying random effects \(f = (f_1, ..., f_n)^T\) using the prior \(f \sim \mathcal{N}(0, Q^-).\)

Here \(Q^-\) denotes the generalised inverse of matrix \(Q\), which, in turn, is computed as

\[\begin{split} Q = \tau R, \\ R = D - A. \end{split}\]

\(D\) is a diagnoal matrix. It’s \(i\)-th element equals the total number of neighbours of area \(B_i\). Hence, \(D\) can be computed from \(A\):

\[\begin{split} D = \begin{pmatrix} \sum_{j=1}^n a_{1j} & 0 & \cdots & 0 \\ 0 & \sum_{j=1}^n a_{2j} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \sum_{j=1}^n a_{nj} \end{pmatrix}. \end{split}\]

Note that the structure matrix here can be viewed as graph Laplacian.

\(R\) is a rank-defficient matrix. It is recommended, for example, to place an adidtional constraint, such sum-to-zero constraint on each non-singleton connected component:

\[ \sum_i f_i = 0. \]

Conditional Autoregressive Models (CAR)#

Similar to ICAR, the CAR model assumes that the value of a variable in one area depends on the values of neighboring areas, with weights specified by a spatial adjacency matrix. However, it introcues an additional parameter \(\alpha\), allowing to estimate the amount of spatial correlation.

The spatial random effect now is modelled as

\[f \sim \mathcal{N}(0, Q^{-1}) \]

with

\[\begin{split} Q = \tau R, \\ R = D - \alpha A, \\ 0 \le \alpha < 1. \end{split}\]

If \(\alpha=0\), the model consist only of i.i.d. random effects, and if \(\alpha\) is close to 1, the model approaches ICAR.

Here is an artificial example of the CAR model:

def expit(x):
    return 1/(1 + jnp.exp(-x))

def car_model(y, A):

    n = A.shape[0]
    d = jnp.sum(A, axis=1)
    D = jnp.diag(d)

    b0 = numpyro.sample('b0', dist.Normal(0,1).expand([n]))
    tau = numpyro.sample('tau', dist.Gamma(3, 2)) 
    alpha = numpyro.sample('alpha', dist.Uniform(low=0., high=0.99))

    Q_std = D - alpha*A
    car_std = numpyro.sample('car_std', dist.MultivariateNormal(loc=jnp.zeros(n), precision_matrix=Q_std))
    sigma = numpyro.deterministic('sigma', 1./jnp.sqrt(tau))
    car = numpyro.deterministic('car', sigma * car_std)

    lin_pred = b0 + car
    p = numpyro.deterministic('p', expit(lin_pred))

    # likelihood
    numpyro.sample("obs", dist.Bernoulli(p), obs=y)


# spatial data 'y' and adjacency structure 'adj'
y = jnp.array([0, 0, 1, 0, 1, 0, 0, 0])

A = np.array([[0, 1, 0, 0, 0, 0, 0, 0],
              [1, 0, 1, 1, 0, 0, 0, 1],
              [0, 1, 0, 1, 0, 0, 0, 1],
              [0, 1, 1, 0, 1, 0, 0, 1],
              [0, 0, 0, 1, 0, 0, 0, 0],
              [0, 0, 0, 0, 0, 0, 1, 0],
              [0, 0, 0, 0, 0, 1, 0, 1],
              [0, 1, 1, 1, 0, 0, 1, 0]])

# Run MCMC to infer the parameters
nuts_kernel = NUTS(car_model)
mcmc = MCMC(nuts_kernel, num_samples=1000, num_warmup=1000, progress_bar=False)
mcmc.run(rng_key=jax.random.PRNGKey(0), y=y, A=A)

# Extract posterior samples
samples = mcmc.get_samples()

mcmc.print_summary()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[12], line 41
     39 nuts_kernel = NUTS(car_model)
     40 mcmc = MCMC(nuts_kernel, num_samples=1000, num_warmup=1000, progress_bar=False)
---> 41 mcmc.run(rng_key=jax.random.PRNGKey(0), y=y, A=A)
     43 # Extract posterior samples
     44 samples = mcmc.get_samples()

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/numpyro/infer/mcmc.py:682, in MCMC.run(self, rng_key, extra_fields, init_params, *args, **kwargs)
    680 map_args = (rng_key, init_state, init_params)
    681 if self.num_chains == 1:
--> 682     states_flat, last_state = partial_map_fn(map_args)
    683     states = jax.tree.map(lambda x: x[jnp.newaxis, ...], states_flat)
    684 else:

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/numpyro/infer/mcmc.py:443, in MCMC._single_chain_mcmc(self, init, args, kwargs, collect_fields, remove_sites)
    441 # Check if _sample_fn is None, then we need to initialize the sampler.
    442 if init_state is None or (getattr(self.sampler, "_sample_fn", None) is None):
--> 443     new_init_state = self.sampler.init(
    444         rng_key,
    445         self.num_warmup,
    446         init_params,
    447         model_args=args,
    448         model_kwargs=kwargs,
    449     )
    450     init_state = new_init_state if init_state is None else init_state
    451 sample_fn, postprocess_fn = self._get_cached_fns()

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/numpyro/infer/hmc.py:749, in HMC.init(self, rng_key, num_warmup, init_params, model_args, model_kwargs)
    744 # vectorized
    745 else:
    746     rng_key, rng_key_init_model = jnp.swapaxes(
    747         vmap(random.split)(rng_key), 0, 1
    748     )
--> 749 init_params = self._init_state(
    750     rng_key_init_model, model_args, model_kwargs, init_params
    751 )
    752 if self._potential_fn and init_params is None:
    753     raise ValueError(
    754         "Valid value of `init_params` must be provided with" " `potential_fn`."
    755     )

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/numpyro/infer/hmc.py:693, in HMC._init_state(self, rng_key, model_args, model_kwargs, init_params)
    686 def _init_state(self, rng_key, model_args, model_kwargs, init_params):
    687     if self._model is not None:
    688         (
    689             new_init_params,
    690             potential_fn,
    691             postprocess_fn,
    692             model_trace,
--> 693         ) = initialize_model(
    694             rng_key,
    695             self._model,
    696             dynamic_args=True,
    697             init_strategy=self._init_strategy,
    698             model_args=model_args,
    699             model_kwargs=model_kwargs,
    700             forward_mode_differentiation=self._forward_mode_differentiation,
    701         )
    702         if init_params is None:
    703             init_params = new_init_params

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/numpyro/infer/util.py:712, in initialize_model(rng_key, model, init_strategy, dynamic_args, model_args, model_kwargs, forward_mode_differentiation, validate_grad)
    710     init_strategy = _init_to_unconstrained_value(values=unconstrained_values)
    711 prototype_params = transform_fn(inv_transforms, constrained_values, invert=True)
--> 712 (init_params, pe, grad), is_valid = find_valid_initial_params(
    713     rng_key,
    714     substitute(
    715         model,
    716         data={
    717             k: site["value"]
    718             for k, site in model_trace.items()
    719             if site["type"] in ["plate"]
    720         },
    721     ),
    722     init_strategy=init_strategy,
    723     enum=has_enumerate_support,
    724     model_args=model_args,
    725     model_kwargs=model_kwargs,
    726     prototype_params=prototype_params,
    727     forward_mode_differentiation=forward_mode_differentiation,
    728     validate_grad=validate_grad,
    729 )
    731 if not_jax_tracer(is_valid):
    732     if device_get(~jnp.all(is_valid)):

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/numpyro/infer/util.py:446, in find_valid_initial_params(rng_key, model, init_strategy, enum, model_args, model_kwargs, prototype_params, forward_mode_differentiation, validate_grad)
    444 # Handle possible vectorization
    445 if is_prng_key(rng_key):
--> 446     (init_params, pe, z_grad), is_valid = _find_valid_params(
    447         rng_key, exit_early=True
    448     )
    449 else:
    450     (init_params, pe, z_grad), is_valid = lax.map(_find_valid_params, rng_key)

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/numpyro/infer/util.py:432, in find_valid_initial_params.<locals>._find_valid_params(rng_key, exit_early)
    428 init_state = (0, rng_key, (prototype_params, 0.0, prototype_grads), False)
    429 if exit_early and not_jax_tracer(rng_key):
    430     # Early return if valid params found. This is only helpful for single chain,
    431     # where we can avoid compiling body_fn in while_loop.
--> 432     _, _, (init_params, pe, z_grad), is_valid = init_state = body_fn(init_state)
    433     if not_jax_tracer(is_valid):
    434         if device_get(is_valid):

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/numpyro/infer/util.py:416, in find_valid_initial_params.<locals>.body_fn(state)
    414     z_grad = jacfwd(potential_fn)(params)
    415 else:
--> 416     pe, z_grad = value_and_grad(potential_fn)(params)
    417 z_grad_flat = ravel_pytree(z_grad)[0]
    418 is_valid = jnp.isfinite(pe) & jnp.all(jnp.isfinite(z_grad_flat))

    [... skipping hidden 8 frame]

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/numpyro/infer/util.py:298, in potential_energy(model, model_args, model_kwargs, params, enum)
    294 substituted_model = substitute(
    295     model, substitute_fn=partial(_unconstrain_reparam, params)
    296 )
    297 # no param is needed for log_density computation because we already substitute
--> 298 log_joint, model_trace = log_density_(
    299     substituted_model, model_args, model_kwargs, {}
    300 )
    301 return -log_joint

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/numpyro/infer/util.py:91, in log_density(model, model_args, model_kwargs, params)
     85     except ValueError:
     86         raise ValueError(
     87             "Model and guide shapes disagree at site: '{}': {} vs {}".format(
     88                 site["name"], model_shape, guide_shape
     89             )
     90         )
---> 91     log_prob = site["fn"].log_prob(value)
     93 if (scale is not None) and (not is_identically_one(scale)):
     94     log_prob = scale * log_prob

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/numpyro/distributions/util.py:706, in validate_sample.<locals>.wrapper(self, *args, **kwargs)
    705 def wrapper(self, *args, **kwargs):
--> 706     log_prob = log_prob_fn(self, *args, **kwargs)
    707     if self._validate_args:
    708         value = kwargs["value"] if "value" in kwargs else args[0]

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/numpyro/distributions/discrete.py:92, in BernoulliProbs.log_prob(self, value)
     89 @validate_sample
     90 def log_prob(self, value):
     91     ps_clamped = clamp_probs(self.probs)
---> 92     return xlogy(value, ps_clamped) + xlog1py(1 - value, -ps_clamped)

    [... skipping hidden 5 frame]

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/jax/_src/scipy/special.py:504, in _xlogy_jvp(primals, tangents)
    502 (x_dot, y_dot) = tangents
    503 result = xlogy(x, y)
--> 504 return result, (x_dot * lax.log(y) + y_dot * x / y).astype(result.dtype)

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/jax/_src/numpy/array_methods.py:573, in _defer_to_unrecognized_arg.<locals>.deferring_binary_op(self, other)
    571 args = (other, self) if swap else (self, other)
    572 if isinstance(other, _accepted_binop_types):
--> 573   return binary_op(*args)
    574 # Note: don't use isinstance here, because we don't want to raise for
    575 # subclasses, e.g. NamedTuple objects that may override operators.
    576 if type(other) in _rejected_binop_types:

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/jax/_src/numpy/ufunc_api.py:177, in ufunc.__call__(self, out, where, *args)
    175   raise NotImplementedError(f"where argument of {self}")
    176 call = self.__static_props['call'] or self._call_vectorized
--> 177 return call(*args)

    [... skipping hidden 11 frame]

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/jax/_src/numpy/ufuncs.py:1142, in _multiply(x, y)
   1115 @partial(jit, inline=True)
   1116 def _multiply(x: ArrayLike, y: ArrayLike, /) -> Array:
   1117   """Multiply two arrays element-wise.
   1118 
   1119   JAX implementation of :obj:`numpy.multiply`. This is a universal function,
   (...)
   1140     Array([ 0, 10, 20, 30], dtype=int32)
   1141   """
-> 1142   x, y = promote_args("multiply", x, y)
   1143   return lax.mul(x, y) if x.dtype != bool else lax.bitwise_and(x, y)

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/jax/_src/numpy/util.py:355, in promote_args(fun_name, *args)
    353 """Convenience function to apply Numpy argument shape and dtype promotion."""
    354 check_arraylike(fun_name, *args)
--> 355 _check_no_float0s(fun_name, *args)
    356 check_for_prngkeys(fun_name, *args)
    357 return promote_shapes(fun_name, *promote_dtypes(*args))

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/jax/_src/numpy/util.py:326, in check_no_float0s(fun_name, *args)
    324 """Check if none of the args have dtype float0."""
    325 if any(dtypes.dtype(arg) == dtypes.float0 for arg in args):
--> 326   raise TypeError(
    327       f"Called {fun_name} with a float0 array. "
    328       "float0s do not support any operations by design because they "
    329       "are not compatible with non-trivial vector spaces. No implicit dtype "
    330       "conversion is done. You can use np.zeros_like(arr, dtype=np.float) "
    331       "to cast a float0 array to a regular zeros array. \n"
    332       "If you didn't expect to get a float0 you might have accidentally "
    333       "taken a gradient with respect to an integer argument.")

TypeError: Called multiply with a float0 array. float0s do not support any operations by design because they are not compatible with non-trivial vector spaces. No implicit dtype conversion is done. You can use np.zeros_like(arr, dtype=np.float) to cast a float0 array to a regular zeros array. 
If you didn't expect to get a float0 you might have accidentally taken a gradient with respect to an integer argument.

Task 28

  • Simulate data from the CAR model using South African boundaries and neighbourhood structure (use adjacency_matrix_sa as A, and y=None to create simulations) with Bernoulli likelihood

  • Fit the CAR model to the data which you have simluated (make adjustments if necessary).

  • Plot posterior distribution of \(\alpha\), and add the true velue of \(\alpha\) to this plot. What do you observe?

  • Make sactter plots of \(f_\text{true}\) vs \(f_\text{fitted}\), and add a diagonal line to this plot. What do you observe?

The Besag-Yorg-Mollié model#

The Besag, York, and Mollié (BYM) decomposes the spatial random effect \(f\) into the sum of an i.i.d. component \(v\) and spatiallly structured component \(w\). Each of these two components has its own precision parameter \(\tau_v\) and \(\tau_w\), respectively:

\[ f \sim \mathcal{N}(0, \tau_v^{-1}I + \tau_w^{-1}R^-). \]

This model provides more flexibility than the CAR formulation.