Microarray Analysis

Scanning

Images of Cy5 and Cy3 fluorescence intensities were generated by scanning arrays using the GenePix 5000 scanner and were analyzed with GenePixPro 5.1 software. Aberrant spots (e.g. those contaminated with speckles, debris or smears during hybridization) were flagged and removed from subsequent analysis.

Error Model

X score and average X score.

In the Rosetta error model, the measurement for each spot is converted into an X score:

X = (a2 – a1)/[s12 + s22 + f2 (a12 + a22)] -

Where a1,2 are the intensities measured in the two channels for each spot, s1,2 are the uncertainties due to background subtraction, and f is a fractional multiplicative error.

X values are distributed approximately normal and have a mean of 0. Values from lower intensities have a lower X score that those of higher intensities, thus are normalized with respect to the original intensities that are noisier at the lower end.

Since the X values are approximately normally distributed, the average X values of three spots are also normally distributed. The confidence value (p-value) is then calculated from this new distribution.

f value calculation for Rosetta error model.

Previously, we have calculated the f-value for the Rosetta error model based on the kurtosis of the X value distribution. We have now developed a more reliable method, which is based on an inherent assumption of the error model, i.e. that the X values – unlike the original (a2 /a1 ratio) - have the same standard deviation across low intensities and high intensities, if correct values for s1 , s2 and f are chosen. Since s1 and s2 are known, f is fitted such that the standard deviations of X are approximately equal at the lower and higher end.

Rosetta model:
X = (a2 – a1)/[s12 + s22 + f2 (a12 + a22)] -

A simple way to fit f is therefore the following algorithm:
(a) Select data that follow the noise distribution rather than the signal (e.g. X < 0)
(b) Calculate the X values for all spots given a start f value (e.g. 1)
(c) Compare the standard deviation of X for low intensities SDlow (e.g. top 10% of the list) to the standard deviation of X for higher intensities SDhigh (e.g. 20-30% from top)
(d) Set f = f * SDhigh/ SDlow and go back to b for calculation of X values.
Comment: If SDlow > SDhigh, f is too large; if SDlow < SDhigh, f is too small
(f) Loop (b-d) until the absolute difference between SDlow and SDhigh is close to 0 with desired precision (e.g. 0.001)

This set of averaged values gave us a new distribution that was subsequently used to calculate P values of average X (probe set P values). If the probe set P value was less than 0.001, the three probes were marked as potentially bound.

As most probes were spaced within the resolution limit of chromatin immunoprecipitation, we next required that multiple probes in the probe set provide evidence of a binding event. Candidate bound probe sets were required to pass one of two additional filters: two of the three probes in a probe set must each have single probe P values< 0.005 or the center probe in the probe set has a single probe P value< 0.001 and one of the flanking probes has a single point P value< 0.1. These two filters cover situations where a binding event occurs midway between two probes and each weakly detects the event or where a binding event occurs very close to one probe and is very weakly detected by a neighboring probe.