Computationally Efficient Methods in Spatial Statistics

Applications in Environmental Modeling

David Bolin


Centre for Mathematical Sciences
Mathematical Statistics
Lund Institute of Technology,
Lund University,
2009

ISSN 1404-028X
Abstract:
In this thesis, computationally efficient statistical models for large spatial environmental data sets are constructed.
In the first part of the thesis, a method for estimating spatially dependent temporal trends is developed. A space-varying regression model, where the regression coefficients for the spatial locations are dependent, is used. The spatial dependence structure is specified by a Gaussian Markov Random Field model, and the model parameters are estimated using the Expectation Maximization algorithm, which allows for feasible computation times for relatively large data sets. The model is used to analyze temporal trends in vegetation data
from the African Sahel, and the results indicate a substantial gain in accuracy compared with methods based on independent ordinary least squares regressions for the individual pixels in the data set.
In the second part of the thesis, explicit computationally efficient wavelet Markov approximations of Gaussian Matérn fields are derived using Hilbert space approximations. Using a simulation-based study, the wavelet approximations are compared with two of the most popular methods for efficient covariance approximations. The study indicates that, for a given computational cost, the wavelet Markov methods have a substantial gain in accuracy compared with the other methods.
Finally, a new class of stochastic field models is constructed using nested Stochastic Partial Differential Equations (SPDEs). The model class is computationally efficient, applicable to data on general smooth manifolds, and includes both the Gaussian Matérn fields and a wide family of fields with oscillating covariance functions. Non-stationary covariance models are obtained by spatially varying the parameters in the SPDEs, and the model parameters are estimated using direct numerical optimization, which is more efficient than standard Markov Chain Monte Carlo procedures. As examples of areas of application, the model class is used to approximate popular models in random ocean wave theory, and applied to a large data set of global Total Column Ozone (TCO) data.
The TCO data set contains approximately 180 000 measurements, showing that the models allow for efficient inference, even for large environmental data sets.