Activated Signal Transduction Kinases Frequently Occupy Target Genes
Identification of Bound Regions
To automatically determine bound regions in the datasets, we developed an algorithm to incorporate information from neighboring probes. For each 60-mer, we calculated the average X score of the 60-mer and its two immediate neighbors. If a feature was flagged as abnormal during scanning, we assumed it gave a neutral contribution to the average X score. Similarly, if an adjacent feature was beyond a reasonable distance from the probe (1000 bp), we assumed it gave a neutral contribution to the average X score. The distance threshold of 1000 bp was determined based on the maximum size of labeled DNA fragments put into the hybridization. Since the maximum fragment size was approximately 550 bp, we reasoned that probes separated by 1000 or more bp would not be able to contribute reliable information about a binding event halfway between them. This set of averaged X values from each probe and its neighbors gave us a new distribution that was subsequently used to calculate combined p-values for each probe.
The combined p-value detects bound regions with higher confidence because bound regions typically span over two or more probes due to the fragment size distribution of immunoprecipitated DNA. However, a highly significant combined p-value can still be obtained with only one p-value from a single probe being highly significant. Therefore, we use a combination of single and combined p-values to qualify binding events.
Using Gcn4p binding as a test model, we developed the following filter for binding events: the center probe in the probe set has a single P value< 0.001, one of the flanking probes has a single point P value< 0.01 and the three probes combined have a P value < 0.001. When analyzing the regions bound by kinases, we modified this algorithm to more optimally detect binding peaks that have a lower maximum enrichment but span over larger regions (i.e. the entire open reading frame). For Hog1p, Fus3p, Kss1p and Ste5p, we therefore used the following filter: the center probe in the probe set has a single P value< 0.005 (for lower maximal enrichment), and the three probes combined have a P value < 0.0001 and one of the flanking probes has a combined P value< 0.001 (a more stringent and extended requirement for the combined p-value).
Bound probe sets that overlapped were collapsed into bound regions. For each bound region, the closest ORF was then assigned and the maximum ChIP enrichment identified (see Tables S2-S6 for the results).