Data Normalization and Analysis
We used GenePix software (Axon) to obtain background-subtracted intensity values for
each fluorophore for every feature on the array. To obtain set-normalized intensities, we
first calculated, for each slide, the median intensities in each channel for the set of 1,500
control probes described here and included on each array. For multiple slide sets (whole
genome and promoter array), we then calculated the average of these median intensities for
all slides. Intensities were then normalized such that the median intensity of each channel
for an individual slide equaled the average of the median intensities of that channel across
all slides.
Among the Agilent controls is a set of negative control spots that contain 60-mer
sequences that do not cross-hybridize to human genomic DNA. We calculated the median
intensity of these negative control spots in each channel and then subtracted this number
from the set-normalized intensities of all other features.
To correct for different amounts of genomic and immunoprecipitated DNA hybridized to
the chip, the set-normalized, negative control-subtracted median intensity value of the IPenriched
DNA channel was then divided by the median of the genomic DNA channel.
This yielded a normalization factor that was applied to each intensity in the genomic DNA
channel.
Next, we calculated the log of the ratio of intensity in the IP-enriched channel to intensity
in the genomic DNA channel for each probe and used a whole chip error model (Hughes et
al., 2000) to calculate confidence values for each spot on each array (single probe p-value).
This error model functions by converting the intensity information in both channels to an
X score which is dependent on both the absolute value of intensities and background noise
in each channel. When available, replicate data were combined, using the X scores and
ratios of individual replicates to weight each replicate's contribution to a combined X score
and ratio. The X scores for the combined replicate are assumed to be normally distributed
which allows for calculation of a p-value for the enrichment ratio seen at each feature. Pvalues
were also calculated based on a second model assuming that, for any range of signal
intensities, IP:control ratios below 1 represent noise (as the immunoprecipitation should
only result in enrichment of specific signals) and the distribution of noise among ratios
above 1 is the reflection of the distribution of noise among ratios below 1.
|