Computationally Efficient Methods in Spatial Statistics
Applications in Environmental Modeling
David Bolin
Centre for Mathematical Sciences
Mathematical Statistics
Lund Institute of Technology,
Lund University,
2009
ISSN 1404028X

Abstract:

In this thesis, computationally efficient statistical models for large spatial
environmental data sets are constructed.


In the first part of the thesis, a method for estimating spatially dependent
temporal trends is developed. A spacevarying regression model, where the
regression coefficients for the spatial locations are dependent, is used.
The spatial dependence structure is specified by a Gaussian Markov Random
Field model, and the model parameters are estimated using the Expectation
Maximization algorithm, which allows for feasible computation times for
relatively large data sets. The model is used to analyze temporal trends
in vegetation data

from the African Sahel, and the results indicate a substantial gain in accuracy
compared with methods based on independent ordinary least squares regressions
for the individual pixels in the data set.


In the second part of the thesis, explicit computationally efficient wavelet
Markov approximations of Gaussian Matérn fields are derived using
Hilbert space approximations. Using a simulationbased study, the wavelet
approximations are compared with two of the most popular methods for efficient
covariance approximations. The study indicates that, for a given computational
cost, the wavelet Markov methods have a substantial gain in accuracy compared
with the other methods.


Finally, a new class of stochastic field models is constructed using nested
Stochastic Partial Differential Equations (SPDEs). The model class is
computationally efficient, applicable to data on general smooth manifolds,
and includes both the Gaussian Matérn fields and a wide family of
fields with oscillating covariance functions. Nonstationary covariance models
are obtained by spatially varying the parameters in the SPDEs, and the model
parameters are estimated using direct numerical optimization, which is more
efficient than standard Markov Chain Monte Carlo procedures. As examples
of areas of application, the model class is used to approximate popular models
in random ocean wave theory, and applied to a large data set of global Total
Column Ozone (TCO) data.


The TCO data set contains approximately 180 000 measurements, showing that
the models allow for efficient inference, even for large environmental data
sets.




