Modelling of DNA Copy Number Variations using Continuous-index Hidden Markov Models

Susann Stjernqvist

Centre for Mathematical Sciences
Mathematical Statistics
Lund University,

ISSN 1404-028X
The number of copies of DNA have been shown to differ between cancer tumour cells and healthy cells. These aberrations can be deletions as well as amplifications. Sometimes entire chromosomes are affected but in other cases it is only one or several short segments. This thesis will show how to model the copy numbers and thereby find the deviant regions.
One method to measure DNA copy number variations is array Comparative Genomic Hybridisation (aCGH) which is a kind of microarray technique. The method yields the ratio between the number of copies of the DNA of a test sample and a given reference sample. Each spot on the array, corresponds to a short sequence of base pairs in the genome.
There are several different methods for modelling aCGH data and the methods in this thesis belong to the group that uses hidden Markov models. A hidden Markov chain can be described as a Markov chain observed in noise. Since the clones are of different lengths, are unevenly spread over the genome and may overlap we introduce a continuous-index method.
In the first paper a continuous-index hidden Markov model with a fixed number of states is introduced. The parameters are estimated using Monte Carlo EM and Markov chains are simulated by MCMC. In the second paper we further develop the model to make it more realistic and less complex by introducing a latent continuous Markov jump process. Then the process have a continuous state space. A Bayesian approach is embraced and we continue using MCMC for the simulations.