• Title: Manifold Dimension Estimation for Omics Data Analysis: Current Methods and a Novel Approach
  • Description: In the field of molecular biology many data sets with thousands, tens of thousands or even more variables are produced daily, for example in genomics. Traditional statistical approaches such as hypothesis testing cannot exploit the full potential of such data sets when there are functional relations between the variables, and if the functional relations are non-linear also linear methods such as PCA do not work. The more general approach is to look for manifolds on which the data are supported, and the first step in most manifold learning methods is to determine the dimension of the manifold. In this work we review five current methods of manifold dimension estimation: PCA, Takens' estimator, the Hill estimator, vector quantization, and k-NN. We also introduce a novel dimension estimator - the expected absolute projection (EAP) estimator, and compare its performance to the five other methods. The results do not show any significant advantage of the EAP estimator, however we do suggest improvements of the EAP estimator which might render it competitive.
  • Start Date: Sept. 14, 2010
  • Finished Date: March 4, 2011
  • Supervisor: Magnus Fontes
  • Supervisor: Charlotte Soneson
  • Student: Kerstin Johnsson, (N-05)
  • Report (778.4 KB)
  • Popular Science Report (55.6 KB)

Questions: webmaster
Last update: 2013-04-11

Centre for Mathematical Sciences, Box 118, SE-22100, Lund. Telefon: +46 46-222 00 00 (vx)