Affine calibration for microarrays with dilution series or spikeins
Henrik Bengtsson and Ola Hössjer
Centre for Mathematical Sciences
Mathematical Statistics
Lund Institute of Technology,
Lund University,
2004
ISSN 14039338

Abstract:


Background:

In this theoretical study we follow up our previous theoretical and applied
work on affine calibration and normalization methods by suggesting a stochastic
affine model for two or multichannel microarrays containing dilution series.
With replicated probes (spots) at various, not necessarily known, concentration
levels it is possible to identify the channel biases uniquely. For this to
be true it is necessary that some genes are differentially expressed, but
which they are or what their relative geneexpression levels are do not have
to be known. The method presented may also be applied to so called spikein
data. From the maximum likelihood estimator, estimates of the relative
geneexpression levels for all dilution series follow directly. Given the
estimated channel biases and scale factors, back transformation gives calibrated
probe signals that are proportional to the amount of fluorophore in each
channel. With the assumption that most genes are nondifferentially expressed,
the proposed calibration method makes the signals in each channel proportional
to the corresponding geneexpression levels.


Results:

The model suggested is a heteroscedastic affine model where the standard
deviations of the error terms are equal for all replicates that belong to
the same gene, but may vary freely between genes. For a specific probe, the
standard deviation in one channel is assumed to be proportional to the other
channels. These constraints were chosen as a compromise between model flexibility
and feasible parameter estimates. We investigate the properties of the parameter
estimates from simulated data by comparing bootstrap and asymptotic confidence
intervals based on normal approximation for different setups of number of
genes and replicates. We simulate twochannel geneexpression data with technical
dilution series from the suggested model and, in order to investigate the
robustness of the estimator against model misspecification, a more general
model for which the standard deviations are not only proportional to the
geneexpression levels, but also the probe concentrations. We find that the
standard deviation of the biasparameter estimates is inversely proportional
to the square root of the product of the number genes and replicates, provided
the latter is not too small. For modest number of replicates, the standard
deviation based on the Fisher information is a good approximation to the
bootstrap ditto. When too few replicates are available, normality can not
be assumed and the standard errors are underestimated. For similar reasons,
for different setups of number of genes and replicates such that the total
number of probes are the same, we found that it is better to use more replicates
and fewer genes, if possible. Comparing dilution series with uniformly and
logarithmicuniformly distributed concentrations better results are obtained
using the latter.


Availability:

All methods have been implemented in the crossplatform R package called
aroma, which is available for free from http://www.braju.com/R/.




Key words:

spotted microarrays; systematic effects; bias; calibration; normalization;
dilution series; technical replicates; profile likelihood; Fisher information;
stochastic affine model; heteroscedastic noise.
