Error Estimates

 

We previously estimated a false positive rate of 6-10% for genome-wide binding data that meets a P ≤ 0.001 threshold. The present study is focused on DNA regions that are both bound (P ≤ 0.001) and contain a conserved match to a binding site specificity. Of 47 sites that were used by Lee et al. to determine the error rate and that met our criteria for binding sites, 45 were confirmed by independent gene-specific ChIP experiments. Thus, the frequency of false positives in this dataset is likely to be approximately 4%.

The false negative rate is more difficult to estimate, but it is likely to be approximately 24% in the present genome location dataset. This estimate was derived by determining the number of binding interactions reported in the literature for cell cycle regulators that were not identified in the genome-wide location data at P ≤ 0.001 and associated with conserved binding sites (12/50). We selected the cell cycle literature for analysis because of the extensive study of this group of regulators and their targets.