Affine calibration for microarrays with dilution series or spike-ins

Henrik Bengtsson and Ola Hössjer

Centre for Mathematical Sciences
Mathematical Statistics
Lund Institute of Technology,
Lund University,

ISSN 1403-9338
In this theoretical study we follow up our previous theoretical and applied work on affine calibration and normalization methods by suggesting a stochastic affine model for two- or multi-channel microarrays containing dilution series. With replicated probes (spots) at various, not necessarily known, concentration levels it is possible to identify the channel biases uniquely. For this to be true it is necessary that some genes are differentially expressed, but which they are or what their relative gene-expression levels are do not have to be known. The method presented may also be applied to so called spike-in data. From the maximum likelihood estimator, estimates of the relative gene-expression levels for all dilution series follow directly. Given the estimated channel biases and scale factors, back transformation gives calibrated probe signals that are proportional to the amount of fluorophore in each channel. With the assumption that most genes are non-differentially expressed, the proposed calibration method makes the signals in each channel proportional to the corresponding gene-expression levels.
The model suggested is a heteroscedastic affine model where the standard deviations of the error terms are equal for all replicates that belong to the same gene, but may vary freely between genes. For a specific probe, the standard deviation in one channel is assumed to be proportional to the other channels. These constraints were chosen as a compromise between model flexibility and feasible parameter estimates. We investigate the properties of the parameter estimates from simulated data by comparing bootstrap and asymptotic confidence intervals based on normal approximation for different setups of number of genes and replicates. We simulate two-channel gene-expression data with technical dilution series from the suggested model and, in order to investigate the robustness of the estimator against model misspecification, a more general model for which the standard deviations are not only proportional to the gene-expression levels, but also the probe concentrations. We find that the standard deviation of the bias-parameter estimates is inversely proportional to the square root of the product of the number genes and replicates, provided the latter is not too small. For modest number of replicates, the standard deviation based on the Fisher information is a good approximation to the bootstrap ditto. When too few replicates are available, normality can not be assumed and the standard errors are under-estimated. For similar reasons, for different setups of number of genes and replicates such that the total number of probes are the same, we found that it is better to use more replicates and fewer genes, if possible. Comparing dilution series with uniformly and logarithmic-uniformly distributed concentrations better results are obtained using the latter.
All methods have been implemented in the cross-platform R package called aroma, which is available for free from
Key words:
spotted microarrays; systematic effects; bias; calibration; normalization; dilution series; technical replicates; profile likelihood; Fisher information; stochastic affine model; heteroscedastic noise.