API is an effective way of distributing analysis outputs to external clients. When it comes to API development with R, however, there are not many choices. Probably development would be made with plumber, Rserve, rApache or OpenCPU if a client or bridge layer to R is not considered.

This is 2 part series in relation to API development with R. In this post, serving an R function with plumber, Rserve and rApache is discussed. OpenCPU is not discussed partly because it could be overkill for API. Also its performance may be similar to rApache with Prefork Multi-Processing Module enabled. Then deploying the APIs in a Docker container, making example HTTP requests and their performance will be discussed in Part II.

Plumber

The plumber package is the easiest way of exposing an R function via API and it is built on top of the httpuv package.

Here a simple function named test is defined in plumber-src.R. test() returns a number after waiting the amount of seconds specified by wait. Note the details of HTTP methods and resource paths can be specified as the way how a function is documented. By default, the response is converted into a json string and it is set to be unboxed.

1#' Test function
2#' @serializer unboxedJSON
3#' @get /test
4#' @post /test
5test <- function(n, wait = 0.5, ...) {
6    Sys.sleep(wait)
7    list(value = n)
8}

The function can be served as shown below. Port 9000 is set for the plumber API.

1library(plumber)
2r <- plumb("path-to-plumber-src.R")
3r$run(port=9000, host="0.0.0.0")

Rserve

According to its project site,

Rserve is a TCP/IP server which allows other programs to use facilities of R from various languages without the need to initialize R or link against R library.

There are a number of Rserve client libraries and a HTTP API can be developed with one of them. For example, it is possible to set up a client layer to invoke an R function using the pyRserve library while a Python web freamwork serves HTTP requests.

Since Rserve 1.7-0, however, a client layer is not mandatory because it includes the built-in R HTTP server. Using the built-in server has a couple of benefits. First development can be simpler without a client or bridge layer. Also performance of the API can be improved. For example, pyRserve waits for 200ms upon connecting to Rserve and this kind of overhead can be reduced significantly if HTTP requests are handled directly.

The FastRWeb package relies on Rserve’s built-in HTTP server and basically it serves HTTP requests by sourcing an R script and executing a function named as run - all source scripts must have run() as can be checked in the source.

I find the FastRWeb package is not convenient for API development for several reasons. First, as mentioned earlier, it sources an R script and executes run(). However, after looking into the source code, it doesn’t need to be that way. Rather a more flexible way can be executing a function that’s already loaded. Secondly application/json is a popular content type but it is not understood by the built-in server. Finally, while it mainly aims to serve R graphics objects and HTML pages, json string can be effective for HTTP responses. In this regards, some modifications are maded as discussed below.

test() is the same to the plumber API.

1#### HTTP RESOURCES
2test <- function(n, wait = 0.5, ...) {
3    Sys.sleep(wait)
4    list(value = n)
5}

In order to use the built-in server, a function named .http.request should be found. Here another function named process_request is created and it is used instead of .http.request() defined in the FastRWeb package. process_request() is basically divided into 2 parts: builing request object and building output object.

  • building request object - the request obeject is built so that the headers are parsed so as to identify the request method. Then the request parameters are parsed according to the request method and content type.
  • building output object - the output object is a list of payload, content-type, headers and status-code. A function can be found by the request URL and it is checked if all function arguments are found in the request parameters. Then payload is obtained by executing the matching function if all arguments are found. Otherwise the 400 (Bad Request) error will be returned.
 1#### PROCESS REQUEST
 2process_request <- function(url, query, body, headers) {   
 3    #### building request object
 4    ## not strictly necessary as in FastRWeb, 
 5    ## just to make clear of request related variables
 6    request <- list(uri = url, method = 'POST', 
 7                        query = query, body = body)
 8    
 9    ## parse headers
10    request$headers <- parse_headers(headers)
11    if ("request-method" %in% names(request$headers)) 
12        request$method <- c(request$headers["request-method"])
13
14    ## parse parameters (function arguments)
15    ## POST accept only 2 content types
16    ## - application/x-www-form-urlencoded by built-in server
17    ## - application/json
18    ## used below as do.call(function_name, request$pars)
19    request$pars <- list()
20    if (request$method == 'POST') {
21        if (!is.null(body)) {
22            if (is.raw(body)) 
23                body <- rawToChar(body)
24            if (any(grepl('application/json', request$headers))) 
25                body <- jsonlite::fromJSON(body)
26            request$pars <- as.list(body)
27        }
28    } else {
29        if (!is.null(query)) {
30            request$pars <- as.list(query)
31        }
32    }
33
34    #### building output object
35    ## list(payload, content-type, headers, status_code)
36    ## https://github.com/s-u/Rserve/blob/master/src/http.c#L358
37    payload <- NULL
38    content_type <- 'application/json; charset=utf-8'
39    headers <- character(0)
40    status_code <- 200
41    
42    ## generate payload (function output)
43    ## function name must match to resource path for now
44    matched_fun <- gsub('^/', '', request$uri)
45    
46    ## no resource path means no matching function
47    if (matched_fun == '') {
48        payload <- list(api_version = '1.0')
49        if (grepl('application/json', content_type)) 
50            payload <- jsonlite::toJSON(payload, auto_unbox = TRUE)
51        return (list(payload, content_type, headers)) # default status 200
52    }
53    
54    ## check if all defined arguments are supplied
55    defined_args <- formalArgs(matched_fun)[formalArgs(matched_fun) != '...']
56    args_exist <- defined_args %in% names(request$pars)
57    if (!all(args_exist)) {
58        missing_args <- defined_args[!args_exist]
59        payload <- list(message = paste('Missing parameter -', 
60                                        paste(missing_args, collapse = ', ')))
61        status_code <- 400
62    }
63    
64    if (is.null(payload)) {
65        payload <- tryCatch({
66            do.call(matched_fun, request$pars)
67        }, error = function(err) {
68            list(message = 'Internal Server Error')
69        })
70        
71        if ('message' %in% names(payload))
72            status_code <- 500
73    }
74
75    if (grepl('application/json', content_type)) 
76        payload <- jsonlite::toJSON(payload, auto_unbox = TRUE)
77    
78    return (list(payload, content_type, headers, status_code))
79}
80
81# parse headers in process_request()
82# https://github.com/s-u/FastRWeb/blob/master/R/run.R#L65
83parse_headers <- function(headers) {
84    ## process headers to pull out request method (if supplied) and cookies
85    if (is.raw(headers)) headers <- rawToChar(headers)
86    if (is.character(headers)) {
87        ## parse the headers into key/value pairs, collapsing multi-line values
88        h.lines <- unlist(strsplit(gsub("[\r\n]+[ \t]+"," ", headers), "[\r\n]+"))
89        h.keys <- tolower(gsub(":.*", "", h.lines))
90        h.vals <- gsub("^[^:]*:[[:space:]]*", "", h.lines)
91        names(h.vals) <- h.keys
92        h.vals <- h.vals[grep("^[^:]+:", h.lines)]
93        return (h.vals)
94    } else {
95        return (NULL)
96    }
97}

process_request() replaces .http.request() in the source script of Rserve - it’ll be explained futher in Part II.

1## Rserve requires .http.request function for handling HTTP request
2.http.request <- process_request

rApache

rApache is a project supporting web application development using the R statistical language and environment and the Apache web server.

rApache provides multiple ways to specify an R function that handles incoming HTTP requests - see the manual for details. Among the multiple RHandlers, I find using a Rook application can be quite effective.

Here is the test function as a Rook application. As process_request(), it parses function arguments according to the request method and content type. Then a value is returned after wating the specified seconds. The response of a Rook application is a list of status, headers and body.

 1test <- function(env) {
 2    req <- Request$new(env)
 3    res <- Response$new()
 4    
 5    request_method <- env[['REQUEST_METHOD']]
 6    rook_input <- env[['rook.input']]$read()
 7    content_type <- env[['CONTENT_TYPE']]
 8    
 9    req_args <- if (request_method == 'GET') {
10        req$GET()
11    } else {
12        # only accept application/json
13        if (!grepl('application/json', content_type, ignore.case = TRUE)) {
14            NULL
15        } else if (length(rook_input) == 0) {
16            NULL
17        } else {
18            if (is.raw(rook_input))
19                rook_input <- rawToChar(rook_input)
20            rjson::fromJSON(rook_input)
21        }
22    }
23    
24    if (!is.null(req_args)) {
25        wait <- if ('wait' %in% names(req_args)) req_args$wait else 1
26        n <- if ('n' %in% names(req_args)) req_args$n else 10
27        Sys.sleep(wait)
28        list(
29            status = 200,
30            headers = list('Content-Type' = 'application/json'),
31            body = rjson::toJSON(list(value=n))
32        )
33    } else {
34        list(
35            status = 400,
36            headers = list('Content-Type' = 'application/json'),
37            body = rjson::toJSON(list(message='No parameters specified'))
38        )
39    }
40}

This is all for Part I. In Part II, it’ll be discussed how to deploy the APIs via a Docker container, how to make example requests and their performance. I hope this article is interesting.