Pointwise and Genomewide Significance Calculations in Gene Mapping through
Nonparametric Linkage Analysis
Theory, Algorithms and Applications
Lars Ängquist
Centre for Mathematical Sciences
Mathematical Statistics
Lund University
2007
ISBN 9789162870683
LUNFMS10192006

Abstract:


In linkage analysis or, in a wider sense, gene mapping one searches for disease
loci along a genome. This is done by observing so called marker genotypes
(alleles) and phenotypes (affecteds/unaffecteds) of a pedigree set, i.e.
a set of multigenerational families, in order to locate the loci corresponding
to the underlying disease genes or, at least, to narrow down the interesting
genome regions. In this context the key concept is the genetic inheritance
of alleles with respect to the phenotype outcomes. A significant deviation
from what is expected under random inheritance is taken as statistical evidence
of existing genetic components suggested to be located at the loci giving
significant results.


In the thesis introduction we begin by outlining the needed genetical foundation
of statistical genetics as well as some basic concepts, for instance, the
process of allelic inheritance, the genetic disease model, the pedigree set,
the inheritance vector and various types of genetic information. Next, we
give an introduction to onelocus nonparametric linkage analysis focusing
on significance calculations of nonparametric linkage (NPL) scores and, moreover,
make some comments on the generalizations to twolocus procedures and the,
related but contrasting, approach of parametric linkage analysis. In the
third section we very briefly discuss some competing and complementary subfields
within the context of statistical genetics and finally we put the papers
included in this thesis into context by summarizing their content.


Performing gene mappingstudies through whole, or substantial parts of, the
genome gives rise to interpretational problems according to multiple testing.
The theme of the thesis is how to calculate significance levels and powers
in several contexts of such kind.


In the first two papers onelocus NPL analysis, i.e. where one searches for
one disease gene at a time, is considered. In Paper A existing analytical
approximations of significance levels are improved and extended. The suggested
formula is based on extremevalue theory for stochastic processes and a general
link function between a continuous version of an arbitrary distribution function
and the standard normal distribution function. In Paper B, in order to calculate
significance levels, a new variant of weighted simulation for stochastic
processes is developed. The method can handle complete as well as incomplete
marker data and is very fast in relation to traditional methods of performing
such simulations using Monte Carlobased algorithms.


The last two papers are directed towards twolocus NPL analysis, i.e. where
one is interested in diseases with genetic components based on two distinct
(nonsyntenic) disease genes. In Paper C significance levels and powers using
unconditional twolocus analysis, i.e. where one simultaneously searches
for two disease genes, are derived and discussed for homogeneous pedigree
sets based on units of affected sibpairs. Finally, in Paper D, a general
approach for calculation of significance levels and powers in conditional
twolocus analysis is developed. The conditional approach might be seen as
a hybrid of onelocus and twolocus NPL analysis. Of central importance to
this paper is the concept of noncentrality parameters, which basically is
the expected value of the test statistic of interest, i.e. the NPL score,
under a corresponding instance of the alternative hypotheses.


Key words:

Allele sharing, nonparametric linkage analysis, twolocus linkage analysis,
conditional linkage analysis, NPL score, significance calculations, analytical
approximation, process maximum, crossover rate, normal approximation, Monte
Carlo simulation, importance sampling, exponential tilting, cost adjusted
relative efficiency, classes of score functions, genetic disease models,
composite hypotheses, genegene interaction, noncentrality parameter, optimal
score functions, conditioning loci, ROC curves.








