17.17 The False Discovery Rate and the Simes Method

17.17.1 The False Discovery Rate

When testing multiple hypotheses, there is always the possibility that one or more tests have appeared significant just by chance. Various techniques have been proposed to adjust the p-values or to otherwise correct for multiple testing issues. Among these are the Bonferroni adjustment, Simes’ method, and the False Discovery Rate. The adjustments discussed in the sections above based on there being more than one way to perform a multi-way split (that is, to adjust from P to aP) also fall into the category of a multiple-hypothesis adjustment. However, the following discussion and technique is used in Optimus RP specifically to correct for the multiplicity of potential splits over many different predictors.

Suppose m hypotheses are tested, and R of them are rejected (positive results). Of the rejected hypotheses, suppose that V of them are really null–that is, that V is the number of type I errors, or false positive results. The False Discovery Rate is defined as

         (         )
FDR  = E  V- ∣ R > 0 P r(R > 0),
          R

that is, the expected proportion of false positive findings among all rejected hypotheses times the probability of making at least one rejection.

Suppose we are rejecting (the null hypothesis) on the basis of the p-values p1,...,pm from these m tests, specifically when a p-value is less than a parameter γ. If we can treat the p-values as being independent, then we can estimate Pr(p γ) as

^P r(P ≤ γ) = max(R(-γ),1),
                 m

where R(γ) is the number of pi less than or equal to γ, and use this to estimate the False Discovery Rate FDR as

 ^        ----γ-----
F DR(γ) = ^P r(P ≤ γ).

When Optimus RP computes this for γ equal to any particular p-value, these expressions simplify to

^           R(γ)-
Pr(P ≤ γ) =  m  ,

and

          m γ    mγ
F^DR( γ) = R(γ) = -j-,

where j is the number of p-values less than or equal to γ.

See B. (We use π0 = 1 here.)

17.17.2 Simes’ Method

When analyzing data with many potential splitters, it is desirable to sift out those of greater significance. When plotting p-values (raw or adjusted) sorted by splitter number (7.4), consolidation of p-values by Simes’ method over a moving window is offered by Optimus RP.

With Simes’ method, a process resembling finding the False Discovery Rate is used over those p-values corresponding to splitters that are close to each other (in spreadsheet order), namely, within the moving window of splitters.

Suppose that the window size is k. With this method, these k p-values are sorted according to their size. Call these p1,...,pj,...,pk, with p1 being the smallest. The values of

kpj
 j

are then evaluated for all j, and the smallest (best) kpj-
j is assigned as the “Simes’ value” for the window. (Optimus RP will plot this at the middle of the window.)

As can be seen, if one splitter has a very small p-value, just by chance, and the others in the window have large p-values, the one small p-value will be “corrected out” by its multiplication by k. (This is something like a bonferroni correction.) However, if several p-values are fairly small, the largest of the small values will be not just multiplied by k, but divided by how many of these small values there are. This smaller and better value shows there is more likely to be something significant happening within this window.