Spatial Statistics and Ancestral Recombination Graphs

with Applications in Gene Mapping and Geostatistics

Linda Hartman

Centre for Mathematical Sciences
Mathematical Statistics
Lund University
2007

ISBN 978-91-628-7266-3
LUTFMS-1031-2007


Abstract:
This thesis explores models and algorithms in geostatistics and gene mapping. The first part deals with the use of computationally effective lattice models for inference of data with a continuous spatial index. The fundamental idea is to approximate a Gaussian field with a Gaussian Markov random field (GMRF) on a lattice, and then to conduct a bilinear interpolation of this at non-lattice locations. The resulting model is used for spatial interpolation, both in a Bayesian approach using Markov chain Monte Carlo (MCMC), and in kriging.
The second part of the thesis concerns genetic association analysis, particularly multi-locus gene mapping using case-control samples.
The algorithms utilize the fact that a population based sample of haplotypes (a collection of alleles at closely linked loci on the same chromosome) mirrors the population history of shared ancestry, mutation, recombination etc. Around the disease locus chromosomes carrying the disease mutation will be more similar than chromosomes that do not carry the disease mutation (on account of increased levels of shared ancestry).
Two models and corresponding algorithms for gene mapping are presented. The first explicitly models the genealogy taking the over-sampling of cases into account. Under certain model approximations, a permutation-based test for genetic association is developed that is computationally feasible, even when haplotype phase is unknown. It contends with arbitrary phenotypes and genetic models, allows for neutral mutations, and adapts to marker allele frequencies.
The second model utilizes concepts and algorithms from both spatial statistics and statistical genetics. A spatial smoothing model is used for haplotypes, such that structurally similar haplotypes have risk parameters with high correlation. The disease locus is then searched as the place where a local similarity measure produces risk parameters that can discriminate between cases and controls. Different covariance structures and similarity metrics are suggested and compared.
Key words:
Gaussian Markov random fields, kriging, bilinear interpolation,genetic association analysis, coalescent, ancestral recombination graph, haplotype, Generalized linear mixed models, identity-by-state, identity-by-descent