Low-level Analysis of Microarray Data

Henrik Bengtsson

Centre for Mathematical Sciences
Mathematical Statistics
Lund Institute of Technology

ISBN 91-628-6215-4

This thesis consists of an extensive introduction followed by seven papers (A-F) on low-level analysis of microarray data. Focus is on calibration and normalization of observed data. The introduction gives a brief background of the microarray technology and its applications in order for anyone not familiar with the field to read the thesis. Formal definitions of calibration and normalization are given.
Paper A illustrates a typical statistical analysis of microarray data with background correction, normalization, and identification of differentially expressed genes (among thousands of candidates). A small analysis on the final results for different number of replicates and different image analysis software is also given.
Paper B introduces a novel way for displaying microarray data called the print-order plot, which displays data in the order the corresponding spots were printed to the array. Utilizing these, so called (microtiter-) plate effects are identified. Then, based on a simple variability measure for replicated spots across arrays, different normalization sequences are tested and evidence for the existence of plate effects are claimed.
Paper C presents an object-oriented extension with transparent reference variables to the R language. It is provides the necessary foundation in order to implement the microarray analysis package described in Paper F.
Paper D is on affine transformations of two-channel microarray data and their effects on the log-ratio log-intensity transform. Affine transformations, that is, the existence of channel biases, can explain commonly observed intensity-dependent effects in the log-ratios. In the light of the affine transformation, several normalization methods are revisited. At the end of the paper, a new robust affine normalization is suggested that relies on iterative reweighted principal component analysis.
Paper E suggests a multiscan calibration method where each array is scanned at various sensitivity levels in order to uniquely identify the affine transformation of signals that the scanner and the image-analysis methods introduce. Observed data strongly support this method. In addition, multiscan-calibrated data has an extended dynamical range and higher signal-to-noise levels. This is real-world evidence for the existence of affine transformations of microarray data.
Paper F describes the aroma package ? An R Object-oriented Microarray Analysis environment ? implemented in R and that provides easy access to our and others low-level analysis methods.
Paper G provides an calibration method for spotted microarrays with dilution series or spike-ins. The method is based on a heteroscedastic affine stochastic model. The parameter estimates are robust against model misspecification.