Microarray Analysis
Scanning
Images of Cy5 and Cy3 fluorescence intensities were generated by scanning arrays using the GenePix 5000 scanner and were analyzed with GenePixPro 5.1 software. Aberrant spots (e.g. those contaminated with speckles, debris or smears during hybridization) were flagged and removed from subsequent analysis.
Error Model
X score and average X score.
In the Rosetta error model, the measurement for each spot is converted into an X score:
X = (a2 – a1)/[s12
+ s22 + f2 (a12
+ a22)] -
Where a1,2 are the intensities measured in the two channels for each
spot, s1,2 are the uncertainties due to
background subtraction, and f is a fractional multiplicative error.
X values are distributed approximately normal and have a mean of 0. Values from
lower intensities have a lower X score that those of higher intensities, thus
are normalized with respect to the original intensities that are noisier at
the lower end.
Since the X values are approximately normally distributed, the average X values
of three spots are also normally distributed. The confidence value (p-value)
is then calculated from this new distribution.
f value calculation for Rosetta error model.
Previously, we have calculated the f-value for the Rosetta error model based
on the kurtosis of the X value distribution. We have now developed a more reliable
method, which is based on an inherent assumption of the error model, i.e. that
the X values – unlike the original (a2 /a1 ratio)
- have the same standard deviation across low intensities and high intensities,
if correct values for s1 , s2
and f are chosen. Since s1 and s2
are known, f is fitted such that the standard deviations of X are approximately
equal at the lower and higher end.
Rosetta model:
X = (a2 – a1)/[s12
+ s22 + f2 (a12
+ a22)] -
A simple way to fit f is therefore the following algorithm:
(a) Select data that follow the noise distribution rather than the signal (e.g.
X < 0)
(b) Calculate the X values for all spots given a start f value (e.g. 1)
(c) Compare the standard deviation of X for low intensities SDlow (e.g. top
10% of the list) to the standard deviation of X for higher intensities SDhigh
(e.g. 20-30% from top)
(d) Set f = f * SDhigh/ SDlow and go back to b for calculation of X values.
Comment: If SDlow > SDhigh, f is too large; if SDlow < SDhigh, f is too
small
(f) Loop (b-d) until the absolute difference between SDlow and SDhigh is close
to 0 with desired precision (e.g. 0.001)
This set of averaged values gave us a new distribution that was subsequently used to calculate P values of average X (probe set P values). If the probe set P value was less than 0.001, the three probes were marked as potentially bound.
As most probes were spaced within the resolution limit of chromatin immunoprecipitation,
we next required that multiple probes in the probe set provide evidence of a
binding event. Candidate bound probe sets were required to pass one of two additional
filters: two of the three probes in a probe set must each have single probe
P values< 0.005 or the center probe in the probe set has a single probe P
value< 0.001 and one of the flanking probes has a single point P value<
0.1. These two filters cover situations where a binding event occurs midway
between two probes and each weakly detects the event or where a binding event
occurs very close to one probe and is very weakly detected by a neighboring
probe.