Spatial Statistics and Ancestral Recombination Graphs
with Applications in Gene Mapping and Geostatistics
Linda Hartman
Centre for Mathematical Sciences
Mathematical Statistics
Lund University
2007
ISBN 9789162872663
LUTFMS10312007

Abstract:

This thesis explores models and algorithms in geostatistics and gene mapping.
The first part deals with the use of computationally effective lattice models
for inference of data with a continuous spatial index. The fundamental idea
is to approximate a Gaussian field with a Gaussian Markov random field (GMRF)
on a lattice, and then to conduct a bilinear interpolation of this at nonlattice
locations. The resulting model is used for spatial interpolation, both in
a Bayesian approach using Markov chain Monte Carlo (MCMC), and in kriging.


The second part of the thesis concerns genetic association analysis, particularly
multilocus gene mapping using casecontrol samples.

The algorithms utilize the fact that a population based sample of haplotypes
(a collection of alleles at closely linked loci on the same chromosome) mirrors
the population history of shared ancestry, mutation, recombination etc. Around
the disease locus chromosomes carrying the disease mutation will be more
similar than chromosomes that do not carry the disease mutation (on account
of increased levels of shared ancestry).


Two models and corresponding algorithms for gene mapping are presented. The
first explicitly models the genealogy taking the oversampling of cases into
account. Under certain model approximations, a permutationbased test for
genetic association is developed that is computationally feasible, even when
haplotype phase is unknown. It contends with arbitrary phenotypes and genetic
models, allows for neutral mutations, and adapts to marker allele frequencies.


The second model utilizes concepts and algorithms from both spatial statistics
and statistical genetics. A spatial smoothing model is used for haplotypes,
such that structurally similar haplotypes have risk parameters with high
correlation. The disease locus is then searched as the place where a local
similarity measure produces risk parameters that can discriminate between
cases and controls. Different covariance structures and similarity metrics
are suggested and compared.



Key words:

Gaussian Markov random fields, kriging, bilinear interpolation,genetic
association analysis, coalescent, ancestral recombination graph, haplotype,
Generalized linear mixed models, identitybystate, identitybydescent








