Parallel processing • flipr

The flipr package uses functions contained in the furrr package for parallel processing. The setting of parallelization has to be done on the user side. We illustrate here how to achieve asynchronous evaluation. We use the future package to set the plan, the parallel package to define a default cluster, and the progressr package to report progress updates.

More precisely, setting a default cluster with parallel is useful to allow parallel computation of the point estimate if an estimation is needed. On the other side, setting the plan with future allows parallel computation when evaluating the plausibility function. A comparison of computation time with sequential and parallel computation for those two cases is done in the following.

Computation without parallel processing

To show the benefit of parallel processing, we compare here the processing times necessary to compute a point estimation and to evaluate the grid for a plausibility function. First, here is the computation without parallelization.

set.seed(1234)
x <- rnorm(10, 1, 1)
y <- rnorm(10, 4, 1)

null_spec <- function(y, parameters) {
  purrr::map(y, ~ .x - parameters[1])
}
stat_functions <- list(stat_t)
stat_assignments <- list(delta = 1)

pf <- PlausibilityFunction$new(
  null_spec = null_spec,
  stat_functions = stat_functions,
  stat_assignments = stat_assignments,
  x, y
)
pf$set_nperms(2000)

tic()
pf$set_point_estimate()
time_without_parallel <- toc()$callback_msg

time_without_parallel
#> [1] "135.323 sec elapsed"

pf$set_parameter_bounds(
  point_estimate = pf$point_estimate, 
  conf_level = pf$max_conf_level
)
pf$set_grid(
  parameters = pf$parameters, 
  npoints = 100L
)

tictoc::tic()
pf$evaluate_grid(grid = pf$grid)
time_without_future <- tictoc::toc()$callback_msg

time_without_future
#> [1] "42.511 sec elapsed"

Computation with parallel processing

By setting the desired number of cores, we define the number of background R sessions that will be used to evaluate expressions in parallel. This number is used to set the multisession plan with the function future::plan() and to define a default cluster with parallel::setDefaultCluster(). Then, to enable the visualization of evaluation progress, we can put the code in the progressr::with_progress() function, or more simply set it for all the following code with the progressr::handlers() function. After these settings, flipr functions can be used, as shown in this example.

ncores <- 3
future::plan(future::multisession, workers = ncores)
cl <- parallel::makeCluster(ncores)
parallel::setDefaultCluster(cl)
progressr::handlers(global = TRUE)

set.seed(1234)
x <- rnorm(10, 1, 1)
y <- rnorm(10, 4, 1)

null_spec <- function(y, parameters) {
  purrr::map(y, ~ .x - parameters[1])
}
stat_functions <- list(stat_t)
stat_assignments <- list(delta = 1)

pf <- PlausibilityFunction$new(
  null_spec = null_spec,
  stat_functions = stat_functions,
  stat_assignments = stat_assignments,
  x, y
)
pf$set_nperms(2000)

tic()
pf$set_point_estimate()
time_with_parallel <- toc()$callback_msg

time_with_parallel
#> [1] "47.882 sec elapsed"

pf$set_parameter_bounds(
  point_estimate = pf$point_estimate, 
  conf_level = pf$max_conf_level
)
pf$set_grid(
  parameters = pf$parameters, 
  npoints = 100L
)

tictoc::tic()
pf$evaluate_grid(grid = pf$grid)
time_with_future <- tictoc::toc()$callback_msg

parallel::stopCluster(cl)

It is good practice to shut down the workers with the parallel::stopCluster() function at the end of the code.

time_with_future
#> [1] "14.463 sec elapsed"

This experiment proves that we can save a lot of computation time when using parallel processing. Indeed, by setting 3 cores for each of the parallel processing tools, we reduced approximately by 3 times the computation time for both computing a point estimation and evaluating the plausibility function.

Finally, to return to a sequential plan with no progress updates, the following code can be used. It also allows to shut down the workers used in future.

future::plan(future::sequential)
parallel::setDefaultCluster(NULL)
progressr::handlers(global = FALSE)