Introduction

This document provides some comparisons of different ways of parallel computing in R. There are a number of ways for achieving parallel computing in R, for example,

Here I provide some simple comparisons of the performances of different parallelism in R. The methods being compared include:

Comparison of running many linear regressions.

In this comparison, the task is to run many linear regressions, using 6 cores. I run everything on my Macbook pro laptop with 2.5GHz Intel Core i7 CPU and 16G RAM.

I first use following codes to simulate data.

Nlm = 10000
nobs = 100
X = matrix( rnorm(nobs*3), ncol=3 )
Y = matrix(rnorm(Nlm*nobs), nrow=Nlm)

So there will be 10,000 linear regressions. For each one, there are 100 observations and 3 covariates in the regressions.

I then created following functions for different type of parallelism.

## functions to run many lm            
runManyLM.loop <- function(X, Y) {
    N =nrow(Y)
    beta = rep(0, N)
    for(i in 1:N) {
        beta[i] = lm(Y[i,]~X)$coef[2]
    }
}

## use foreach
runManyLM.foreach <- function(X, Y, numCores) {
    registerDoParallel(numCores)
    N = nrow(Y)
    beta = foreach(i = 1:N, .combine=c) %dopar% {
        lm(Y[i,]~X)$coef[2]
    }
}

## use mclapply
runManyLM.mclapply <- function(X, Y, numCores) {
    N = nrow(Y)
    foo <- function(i, Y, X) {
        lm(Y[i,]~X)$coef[2]
    }
    beta = mclapply(1:N, foo, mc.cores = numCores, Y, X)
}

## use parLapply      
runManyLM.parLapply <- function(X, Y, numCores) {
    N = nrow(Y)
    foo <- function(i, Y, X) {
        lm(Y[i,]~X)$coef[2]
    }
    registerDoParallel(cores=numCores)
    cl <- makeCluster(numCores, type="FORK")
    beta <- parLapply(cl, 1:N, foo, Y, X)
    stopCluster(cl)
}

## use bplapply            
runManyLM.bplapply <- function(X, Y, numCores, BPPARAM) {
    N = nrow(Y)
    foo <- function(i, Y, X) {
        lm(Y[i,]~X)$coef[2]
    }
    beta = bplapply(1:N, foo, BPPARAM = BPPARAM, Y, X)
}

I then benchmark the performances of these functions.

library(foreach)
library(doParallel)
library(microbenchmark)
library(BiocParallel)
numCores = 4
mParam = MulticoreParam(workers=numCores)
snowSOCK <- SnowParam(workers = numCores, type = "SOCK")
snowFORK <- SnowParam(workers = numCores, type = "FORK")

result <- microbenchmark(runManyLM.foreach(X,Y, numCores),
                         runManyLM.mclapply(X,Y, numCores),
                         runManyLM.parLapply(X,Y, numCores),
                         runManyLM.bplapply(X,Y, numCores, mParam),
                         runManyLM.bplapply(X,Y, numCores, snowSOCK),
                         runManyLM.bplapply(X,Y, numCores, snowFORK),
                         times=50)

The results are shown below:

##                                           expr      min     mean      max
## 1            runManyLM.foreach(X, Y, numCores) 3.027713 3.297434 3.923210
## 2           runManyLM.mclapply(X, Y, numCores) 1.636211 1.744566 2.213691
## 3          runManyLM.parLapply(X, Y, numCores) 1.914132 2.059862 2.566976
## 4   runManyLM.bplapply(X, Y, numCores, mParam) 2.285717 2.525748 3.212917
## 5 runManyLM.bplapply(X, Y, numCores, snowSOCK) 6.487970 6.927471 9.762698
## 6 runManyLM.bplapply(X, Y, numCores, snowFORK) 2.311750 2.596356 3.429641

Based on the results, mclapply provides the best performance. parLapply is slightly slower. foreach is about the half speed. For the 3 bplapply, using SOCK with SNOW is very slow. Not sure why. The other two provides similar performances.