Pointwise and Genomewide Significance Calculations in Gene Mapping through Nonparametric Linkage Analysis

Theory, Algorithms and Applications

Lars Ängquist

Centre for Mathematical Sciences
Mathematical Statistics
Lund University
2007

ISBN 978-91-628-7068-3
LUNFMS-1019-2006


Abstract:
In linkage analysis or, in a wider sense, gene mapping one searches for disease loci along a genome. This is done by observing so called marker genotypes (alleles) and phenotypes (affecteds/unaffecteds) of a pedigree set, i.e. a set of multigenerational families, in order to locate the loci corresponding to the underlying disease genes or, at least, to narrow down the interesting genome regions. In this context the key concept is the genetic inheritance of alleles with respect to the phenotype outcomes. A significant deviation from what is expected under random inheritance is taken as statistical evidence of existing genetic components suggested to be located at the loci giving significant results.
In the thesis introduction we begin by outlining the needed genetical foundation of statistical genetics as well as some basic concepts, for instance, the process of allelic inheritance, the genetic disease model, the pedigree set, the inheritance vector and various types of genetic information. Next, we give an introduction to one-locus nonparametric linkage analysis focusing on significance calculations of nonparametric linkage (NPL) scores and, moreover, make some comments on the generalizations to two-locus procedures and the, related but contrasting, approach of parametric linkage analysis. In the third section we very briefly discuss some competing and complementary subfields within the context of statistical genetics and finally we put the papers included in this thesis into context by summarizing their content.
Performing gene mapping-studies through whole, or substantial parts of, the genome gives rise to interpretational problems according to multiple testing. The theme of the thesis is how to calculate significance levels and powers in several contexts of such kind.
In the first two papers one-locus NPL analysis, i.e. where one searches for one disease gene at a time, is considered. In Paper A existing analytical approximations of significance levels are improved and extended. The suggested formula is based on extreme-value theory for stochastic processes and a general link function between a continuous version of an arbitrary distribution function and the standard normal distribution function. In Paper B, in order to calculate significance levels, a new variant of weighted simulation for stochastic processes is developed. The method can handle complete as well as incomplete marker data and is very fast in relation to traditional methods of performing such simulations using Monte Carlo-based algorithms.
The last two papers are directed towards two-locus NPL analysis, i.e. where one is interested in diseases with genetic components based on two distinct (nonsyntenic) disease genes. In Paper C significance levels and powers using unconditional two-locus analysis, i.e. where one simultaneously searches for two disease genes, are derived and discussed for homogeneous pedigree sets based on units of affected sib-pairs. Finally, in Paper D, a general approach for calculation of significance levels and powers in conditional two-locus analysis is developed. The conditional approach might be seen as a hybrid of one-locus and two-locus NPL analysis. Of central importance to this paper is the concept of noncentrality parameters, which basically is the expected value of the test statistic of interest, i.e. the NPL score, under a corresponding instance of the alternative hypotheses.
Key words:
Allele sharing, nonparametric linkage analysis, two-locus linkage analysis, conditional linkage analysis, NPL score, significance calculations, analytical approximation, process maximum, crossover rate, normal approximation, Monte Carlo simulation, importance sampling, exponential tilting, cost adjusted relative efficiency, classes of score functions, genetic disease models, composite hypotheses, gene-gene interaction, noncentrality parameter, optimal score functions, conditioning loci, ROC curves.