Asynchronous Processing Using Job Queue

In this post, a way to overcome one of R's limitations of lack of multi-threading is discussed by job queuing using the jobqueue package

In this post, a way to overcome one of R’s limitations (lack of multi-threading) is discussed by job queuing using the jobqueue package - a generic asynchronous job queue implementation for R. See the package description below.

The jobqueue package is meant to provide an easy-to-use interface that allows to queue computations for background evaluation while the calling R session remains responsive. It is based on a 1-node socket cluster from the parallel package. The package provides a way to do basic threading in R. The main focus of the package is on an intuitive and easy-to-use interface for the job queue programming construct. … Typical applications include: background computation of lengthy tasks (such as data sourcing, model fitting, bootstrapping), simple/interactive parallelization (if you have 5 different jobs, move them to up to 5 different job queues), and concurrent task scheduling in more complicated R programs. …

Added to the typical applications indicated above, this package can be quite beneficial with a Shiny application especially when long-running process has to be served.

The package is not on CRAN and it can be installed as following.

 1# http://r-forge.r-project.org/R/?group_id=2066
 2if(!require(jobqueue)) {
 3  pkg_src <- if(grepl("win", Sys.info()["sysname"], ignore.case = TRUE)) {
 4    "http://download.r-forge.r-project.org/bin/windows/contrib/3.2/jobqueue_1.0-4.zip"
 5  } else {
 6    "http://download.r-forge.r-project.org/src/contrib/jobqueue_1.0-4.tar.gz"
 7  }
 8  
 9  install.packages(pkg_src, repos = NULL)
10}
11
12library(jobqueue)

As can be seen in the description, it is highly related to the parallel package and thus it wouldn’t be hard to understand how it works if you know how to do parallel processing using that package - if not, have a look at this post.

Here is a quick example of job queue. In the following function, execution is suspended for 1 second at each iteration and the processed is blocking until it is executed in base R.

1fun <- function(max_val) {
2  unlist(lapply(1:max_val, function(x) {
3    Sys.sleep(1)
4    x
5  }))
6}

Using the package, however, the function can be executed asynchronously as shown below.

 1# create queue
 2# similar to makeCluster()
 3queue <- Q.make()
 4# send local R object
 5# similar to clusterEvalQ() or clusterCall()
 6Q.sync(queue, fun)
 7# execute function
 8# similar to clusterApply() or parLapply()
 9Q.push(queue, fun(10))
10### another job can be done while it is being executed
11# get result - NULL is not complete
12Q.pop(queue)

## NULL

1while (TRUE) {
2  out <- Q.pop(queue)
3  message(paste("INFO execution not completed?", is.null(out)))
4  if(!is.null(out)) {
5    break
6  }
7}

## INFO execution not completed? TRUE
## INFO execution not completed? TRUE
## INFO execution not completed? TRUE
## INFO execution not completed? TRUE
## INFO execution not completed? TRUE
## INFO execution not completed? TRUE
## INFO execution not completed? TRUE

## INFO execution not completed? FALSE

1# close queue
2# similar to stopCluster()
3Q.close(queue)
4out

##  [1]  1  2  3  4  5  6  7  8  9 10

Another example of applying job queue is fitting a bootstrap-based algorithm. In this example, each of 500 trees are grown and they are combined at the end - note that, in practice, it’d be better to save outputs and combine them later.

 1q1 <- Q.make()
 2q2 <- Q.make()
 3# load library
 4Q.push(q1, library(randomForest), mute = TRUE)
 5Q.push(q2, library(randomForest), mute = TRUE)
 6Q.push(q1, rf <- randomForest(Species ~ ., data=iris, importance=TRUE, proximity=TRUE))
 7Q.push(q2, rf <- randomForest(Species ~ ., data=iris, importance=TRUE, proximity=TRUE))
 8# should be waited until completion in practice
 9r1 <- Q.pop(q1)
10r2 <- Q.pop(q2)
11Q.close(q1)
12Q.close(q2)
13
14library(randomForest)
15do.call("combine", list(r1, r2))

## 
## Call:
##  randomForest(formula = Species ~ ., data = iris, importance = TRUE,      proximity = TRUE) 
##                Type of random forest: classification
##                      Number of trees: 1000
## No. of variables tried at each split: 2

I hope this article is useful.

Asynchronous Processing Using Job Queue

Related Posts

Comments