On Computation of P-values in Parametric Linkage Analysis

Azra Kurbasic and Ola Hössjer

Centre for Mathematical Sciences
Mathematical Statistics
Lund Institute of Technology,
Lund University,

ISSN 1403-9338
Parametric linkage analysis is usually used to find chromosomal regions linked to a disease (phenotype) that is described with a specific genetic model. This is done by investigating the relations between the disease and genetic markers, that is, loci of known position with a clear Mendelian mode of inheritance. Assume we have found an interesting region on a chromosome that we suspect is linked to the disease. Then we want to test the hypothesis of no linkage versus the alternative one of linkage. As a measure we use a maximal lod score $Z_{\mbox{\scriptsize max}}$. It is well known that the maximal lod score has asymptotically a $(2 \ln 10)^{-1}\times (\frac{1}{2}\chi^{2}(0)+\frac{1}{2}\chi^{2}(1))$ distribution under the null hypothesis of no linkage when only one point (one marker) on the chromosome is studied. In this paper, we show, both by simulations and theoretical arguments, that the null hypothesis distribution of $Z_{\mbox{\scriptsize max}}$ has no simple form when more than one marker is used (multipoint analysis). In fact, the distribution of $Z_{\mbox{\scriptsize max}}$ depends both on the number of families, their structure, the genetic model, marker denseness, and marker informativity. This means
that a constant critical limit of $Z_{\mbox{\scriptsize max}}$ leads to tests associated with different significance levels. Because of the above-mentioned problems from the statistical point of view a p-value is a more desirable measure of significance than the maximal lod score.
Key words:
Linkage analysis, lod score distribution, pointwise/genomwide p-value.