Modelling Allelic and DNA Copy Number Variations using Continuousindex Hidden
Markov Models
Susann Stjernqvist
Centre for Mathematical Sciences
Mathematical Statistics
Lund University
2010
ISBN 9789174730258
LUNFMS10212010

Abstract:

In human cells there are usually two copies of each chromosome, but in cancer
cells abnormalities could exist. The differences consist of segments of
chromosomes with an altered number of copies. There can be deletions as well
as amplifications and the lengths of the segments can also vary. Localising
the deviant regions is of great importance for increasing the knowledge of
the disease. In this thesis the copy numbers are modelled using Hidden Markov
Models (HMMs). A hidden Markov process can be described as a Markov process
observed in noise; thus it consists of two differens processes such that
one is an unobservable Markov process, while the other is the observed process.


In paper A we present a method suitable for a CGH data from tiling BAC arrays,
i.e. the probes are rather long and could overlap. In addition they are of
unequal lengths and unevenly spread over the genome, which makes it suitable
to apply a continuousindex process. We assume the Markov model to have a
discrete state space and the parameters are estimated with an MCEM algorithm.
The model in paper B is a modification of the model in paper A, such that
the Markov process takes values in a continuous state space. This makes the
method more realistic since it can handle larger differences in the data,
including systematic errors. In addition we assume some of the transition
rates to be common to get a parsimonious model. We take a Bayesian approach
and use reversible jump MCMC to simulate the Markov process.


In paper C we present a model designed for SNP data which consists of allelic
intensities for the two alleles at each SNP. We assume a discrete number
of states, but keep the parsimonious approach from paper B such that some
of the transition rates are common. The SNPs are point measurements but unevenly
spread over the genome which motivates a continuousindex process. Further
on in paper D we present an MCMC sampler, which is suitable for hidden Markov
models, when taking a Bayesian approach. We alternate between updating the
parameters and the trajectory, and for the latter update we present a sequential
Monte Carlo method based on forward filteringbackward simulation. The method
is applied on oligonucleotide copy number data with the same model as in
paper B.




Key words:

Hidden Markov models, DNA copy number, allelic copy number, Markov chain
Monte Carlo








