Benjamini-Hochberg Procedure with the Simulator

Jacob Bien


Suppose we wish to test n hypotheses, H1,,Hn, and we have a p-value p^i for each hypothesis Hi. That is,


Benjamini and Hochberg (1995) design a procedure (for when these p-values are independent) that controls what they call the false discovery rate (FDR), which is the expected proportion of the rejected tests that should not have been rejected:

E{Hi:iS}[iS1{Hi rejected}max[1,i=1n1{Hi rejected}]].

Given a desired FDR q, the BH procedure finds a data-adaptive threshold level p^(q) and rejects all Hi for which p^ip^(q). The threshold level is given by comparing the sorted p-values p^(1)p^(n) to a line of slope q/n and identifying the largest p-value that is below this line. That is, p^(q)=p^(i^) where


In this simulation, we verify that the BH procedure works, and we investigate the effect that correlation has on the FDR control.

Main simulation

In the Models section below, we show the code for make_correlated_pvalues, a function that generates a model object given parameters n (the number of hypotheses), π0 (the fraction of hypotheses that are null), and ρ (the correlation between any pair of test statistics). In the simulation below, we fix n=100 and vary π0 and ρ.

name_of_simulation <- "fdr"
sim <- new_simulation(name = name_of_simulation,
                      label = "False Discovery Rate") %>%
  generate_model(make_correlated_pvalues, seed = 123,
                 n = 100,
                 pi0 = list(0.8, 1),
                 rho = list(-0.01, 0, 0.1, 0.9),
                 vary_along = c("pi0", "rho")) %>%
  simulate_from_model(nsim = 50, index = 1:4) %>%
  run_method(bh_methods, parallel = list(socket_names = 4)) %>%
  evaluate(list(fdp, nd))

The variable bh_methods is defined in the Methods section below and corresponds to the BH procedure for four different values of q.

We begin by looking at the results when ρ=0 (i.e., independent tests).

sim %>% 
  subset_simulation(rho == 0) %>% 
  tabulate_eval(metric_name = "fdp", output_type = "html", 
                format_args = list(digits = 0, nsmall = 2))
A comparison of Mean false discovery proportion (averaged over 200 replicates).
BH (q = 0.05) BH (q = 0.1) BH (q = 0.2)
pi0 = 0.8, rho = 0 0.03 (0.01) 0.08 (0.01) 0.16 (0.01)
pi0 = 1, rho = 0 0.06 (0.02) 0.09 (0.02) 0.17 (0.03)

It appears that the BH procedure does control the FDR at each stated q value. However, we also see that when π0 is less than 1, it tends to be more conservative. Indeed, Benjamini and Hochberg show that the FDR of BH does not exceed π0q.

Adding to a simulation

We might like to increase the number of simulations.

Suppose we return to this simulation several days later and wish to double the number of random draws used. In the above code, we had 200 draws, which were simulated in 4 chunks of 50 draws each. The simulator allows us to add to a simulation without requiring us to re-run parts of the simulation that have already been run.

If we had closed the R session without saving the image[^]: for the sake of reproducibility, I like to always start with a fresh workspace, so I can be sure that my functions aren’t calling a global variable that I have forgotten about), we can open a new one in the directory that has the files directory in it. We start by loading sim, which is the Simulation object (containing all the necessary pointers to saved files). Loading sim is fast because it only loads the file locations, not the files themselves.

sim <- load_simulation("fdr")

We do so by simply adding 4 more chunks, with index=5:8. Each distinct value of index corresponds to a separate random number generator stream. This is convenient because it means that we do not have to rely on the state of the RNG after completing one chunk to start the next one.

sim <- sim %>%
  simulate_from_model(nsim = 50, index = 5:8) %>%
  run_method(bh_methods, parallel = list(socket_names = 4)) %>%
  evaluate(list(fdp, nd))

We can look again at the table.

sim %>% 
  subset_simulation(rho == 0) %>% 
  tabulate_eval(metric_name = "fdp", output_type = "html", 
                format_args = list(digits = 0, nsmall = 2))
A comparison of Mean false discovery proportion (averaged over 400 replicates).
BH (q = 0.05) BH (q = 0.1) BH (q = 0.2)
pi0 = 0.8, rho = 0 0.03 (0.00) 0.08 (0.01) 0.16 (0.01)
pi0 = 1, rho = 0 0.06 (0.01) 0.10 (0.01) 0.19 (0.02)

Some plots

The FDR is the average of the false discovery proportion (FDP). We can look at these raw values (400 for each method-model pair).

sim %>% 
  subset_simulation(rho == 0) %>% 
plot_eval(metric_name = "fdp")

When π0=1, we see that the FDP is either 0 or 1. This is because if we make any number of discoveries, then they will all be false (but if we do not make any discoveries, we have FDP=0).

We now investigate the effect of ρ, the correlation between the test statistics. We now fix π0=0.8 and look at how the plots vary with ρ.

sim %>% 
  subset_simulation(pi0 == 0.8) %>% 
plot_eval(metric_name = "fdp", varying = "rho")

Since ρ is numeric, it might be more informative to look at the FDR as a function of ρ.

sim %>% 
  subset_simulation(pi0 == 0.8) %>% 
plot_eval_by(metric_name = "fdp", varying = "rho")

We see that the procedure becomes more conservative as the dependence increases, but still does control FDR (which was shown for positive dependence in Benjamini and Yekutieli, 2001). Looking at π0=1, we see for negative ρ some indication of anti-conservativeness.

sim %>% 
  subset_simulation(pi0 == 1) %>% 
plot_eval_by(metric_name = "fdp", varying = "rho")

Creating a simulation based on a preexisting one

To investigate this particular question in greater depth, we might create a new simulation object based on the previous one. We’d like to increase the number of random draws for this particular setting, but don’t care about doing so for the others.

sim2 <- subset_simulation(sim, pi0 == 1 & rho == -0.01) %>%
  rename("fdr-negdep") %>%
  relabel("BH Procedure under negative dependence") %>%
  simulate_from_model(nsim = 50, index = 9:20) %>%
  run_method(bh_methods) %>%
  evaluate(list(fdp, nd))

Remaking the table (on the basis of 1000 random draws), the anti-conservativeness appears to persist in the q=0.05 case.

tabulate_eval(sim2, metric_name = "fdp", output_type = "html", 
                format_args = list(digits = 0, nsmall = 2))
A comparison of Mean false discovery proportion (averaged over 1000 replicates).
BH (q = 0.05) BH (q = 0.1) BH (q = 0.2)
pi0 = 1, rho = -0.01 0.07 (0.01) 0.11 (0.01) 0.20 (0.01)

Observe that at this point, sim and sim2 are two separate simulation objects that refer to some common simulation results. For example, their Model and Draws objects are the same.

m <- model(sim, pi0 == 1 & rho == -0.01)
m2 <- model(sim2)
all.equal(m, m2)
## [1] TRUE
d <- draws(sim, pi0 == 1 & rho == -0.01)
d2 <- draws(sim2, index = 1:8)
all.equal(d, d2)
## [1] TRUE

When model and draws (and likewise output and evals) are called on a simulation object, they load the appropriate files referenced by the Simulation object. The models m and m2 are identical (and likewise for d and d2) because both Simulation objects refer to the same saved files. Thus, having multiple simulation objects does not lead to copies of the (potentially large) results files being made. Instead, only the references themselves are duplicated.



make_correlated_pvalues <- function(n, pi0, rho) {
  # Gaussian copula model...
  # n pvalues, the first n*pi0 of which are null, coming from a multivariate
  # normal with all correlations rho.
  sigma <- matrix(rho, n, n)
  diag(sigma) <- 1
  n0 <- round(n * pi0)
  delta <- 2 # size of signal
  mu <- rep(c(0, delta), c(n0, n - n0)) # n0 are null
  new_model(name = "correlated-pvalues",
            label = sprintf("pi0 = %s, rho = %s", pi0, rho),
            params = list(n = n, rho = rho, sigma = sigma,
                          pi0 = pi0, mu = mu, delta = delta,
                          nonnull = which(mu != 0)),
            simulate = function(n, mu, sigma, nsim) {
              # this function must return a list of length nsim
              x <- rmvnorm(nsim, mean = mu, sigma = sigma)
              pvals <- 1 - pnorm(x)
              return(split(pvals, row(pvals))) # make each row its own list element


make_bh <- function(q) {
  # q is the desired level of control for the FDR
  new_method(name = paste0("bh", q),
             label = sprintf("BH (q = %s)", q),
             settings = list(q = q),
             method = function(model, draw, q) {
               p <- sort(draw)
               cutline <- seq(model$n) * q / model$n
               threshold <- max(p[p <= cutline], 0)
               list(rejected = which(draw <= threshold))

qvalues <- c(0.05, 0.1, 0.2)
bh_methods <- sapply(qvalues, make_bh)


fdp <- new_metric(name = "fdp",
                  label = "false discovery proportion",
                  metric = function(model, out) {
                    fp <- setdiff(out$rejected, model$nonnull)
                    nd <- max(length(out$rejected), 1)
                    return(length(fp) / nd)

nd <- new_metric(name = "nd",
                 label = "number of discoveries",
                 metric = function(model, out) length(out$rejected))


To cite the simulator, please use


Bien J (2016). “The simulator: An Engine to Streamline Simulations.” Submitted.

A BibTeX entry for LaTeX users is

@Article{, title = {The {simulator}: An Engine to Streamline Simulations}, author = {Jacob Bien}, year = {2016}, journal = {Submitted}, }