Error Estimates

We previously estimated a false positive rate of 4% and a false negative rate of ~24% for genome-wide binding data that meets a P ≤ 0.001 threshold (Harbison et al.). To estimate false positive and negative rates for the current technology, we compared maximum IP/WCE ratios for the Gcn4 data to high likelihood positive and high likelihood negative lists of binding targets. A positive list of 84 genes was selected on the basis of previous high confidence binding data (P ≤ 0.001, (Harbison et al.), the presence of a perfect or near perfect Gcn4 consensus binding site (TGASTCA) in the region of -400bp to +50bp, and a greater then 2-fold change in steady state mRNA levels dependent on Gcn4 when shifted to amino acid starvation medium (Natarajan et al.). The negative list of 945 genes was selected by weak binding (P ≥ 0.1), absence of a motif near the presumed start site, and less then a 60% change in steady state mRNA levels in response to shift to amino acid starvation. Each gene was scored based on the maximum median-normalized IP/WCE ratio found in the region -250 to +50bp from the UAS. Parameters were optimized by maximizing the absolute difference in identified genes in both the positive list and negative lists using the Statistics-ROC package for Perl. Based on these results, we estimate a false positive rate of <1% and a false negative rate of ~25%.

Whitehead Institute
9 Cambridge Center
Cambridge, MA 02142
[T] 617.258.5218
[F] 617.258.0376