In Part I, it is discussed how to serve an R function with plumber, Rserve and rApache. In this post, the APIs are deployed in a Docker container and, after showing example requests, their performance is compared. The rocker/r-ver:3.4 is used as the base image and each of the APIs is added to it. For simplicity, the APIs are served by Supervisor. For performance testing, Locust is used. The source of this post can be found in this GitHub repository.

Deployment

As can be seen in the Dockerfile below, plumber, Rserve and rApache are installed in order. Plumber is an R package so that it can be installed by install.packages(). The latest versions of Rserve and rApache are built after installing their dependencies. Note, for rApache, the Rook package is installed as well because the test function is served as a Rook application.

Then the source files are copied to /home/docker/<api-name>. rApache requires extra configuration. First the Rook app (rapache-app.R) and site configuration file (rapache-site.conf) are symlinked to the necessary paths and the site is enabled.

Finaly Suervisor is started with the config file that monitors/manages the APIs.

 1FROM rocker/r-ver:3.4
 2MAINTAINER Jaehyeon Kim <dottami@gmail.com>
 3
 4RUN apt-get update && apt-get install -y wget supervisor
 5
 6## Plumber
 7RUN R -e 'install.packages(c("plumber", "jsonlite"))'
 8
 9## Rserve
10RUN apt-get install -y libssl-dev
11RUN wget http://www.rforge.net/Rserve/snapshot/Rserve_1.8-5.tar.gz \
12    && R CMD INSTALL Rserve_1.8-5.tar.gz
13
14## rApache
15RUN apt-get install -y \
16    libpcre3-dev liblzma-dev libbz2-dev libzip-dev libicu-dev
17RUN apt-get install -y apache2 apache2-dev
18RUN wget https://github.com/jeffreyhorner/rapache/archive/v1.2.8.tar.gz \
19    && tar xvf v1.2.8.tar.gz \
20    && cd rapache-1.2.8 && ./configure && make && make install
21
22RUN R -e 'install.packages(c("Rook", "rjson"))'
23
24RUN echo '/usr/local/lib/R/lib/' >> /etc/ld.so.conf.d/libR.conf \
25    && ldconfig
26
27## copy sources to /home/docker
28RUN useradd docker && mkdir /home/docker \
29	&& chown docker:docker /home/docker
30
31RUN mkdir /home/docker/plumber /home/docker/rserve /home/docker/rapache
32COPY ./src/plumber /home/docker/plumber/
33COPY ./src/rserve /home/docker/rserve/
34COPY ./src/rapache /home/docker/rapache/
35COPY ./src/api-supervisor.conf /home/docker/api-supervisor.conf
36RUN chmod -R 755 /home/docker
37
38RUN ln -s /home/docker/rapache/rapache-site.conf \
39    /etc/apache2/sites-available/rapache-site.conf \
40    && ln -s /home/docker/rapache/rapache-app.R /var/www/rapache-app.R
41
42## config rApache
43RUN echo 'ServerName localhost' >> /etc/apache2/apache2.conf \
44    && /bin/bash -c "source /etc/apache2/envvars" && mkdir -p /var/run/apache2 \
45    && a2ensite rapache-site
46
47CMD ["/usr/bin/supervisord", "-c", "/home/docker/api-supervisor.conf"]

Plumber

As can be seen in api-supervisor.conf, the plumber API can be started at port 9000 as following. (plumber-src.R and plumber-serve.R are discussed in Part I)

1/usr/local/bin/Rscript /home/docker/plumber/plumber-serve.R

Rserve

In order to utilize the built-in HTTP server of Rserve, http.port should be specified in rserve.conf. Also it is necessary to set daemon disable to manage Rserve by Supervisor.

1http.port 8000
2remote disable
3auth disable
4daemon disable
5control disable

Then it is possible to start the Rserve API at port 8000 as shown below. (rserve-src.R is discussed in Part I.)

1/usr/local/bin/R CMD Rserve --slave --RS-conf /home/docker/rserve/rserve.conf \
2  --RS-source /home/docker/rserve/rserve-src.R

rApache

The site config file of the rApache API is shown below.

1LoadModule R_module /usr/lib/apache2/modules/mod_R.so
2<Location /test>
3    SetHandler r-handler
4    RFileEval /var/www/rapache-app.R:Rook::Server$call(test)
5</Location>

It is possible to start the rApache API at port 80 as following. (rapache-app.R is discussed in Part I.)

1apache2ctl -DFOREGROUND

This Docker container can be built and run as following. Note the container’s port 80 is mapped to the host’s port 7000 to prevent a possible conflict.

1## build
2docker build -t=api ./api/.
3
4## run
5# rApache - 7000, Rserve - 8000, plumber - 9000
6# all APIs managed by supervisor
7docker run -d -p 7000:80 -p 8000:8000 -p 9000:9000 --name api api:latest

Example Request

Example requests to the APIs and their responses are shown below. When a request includes both n and wait parameters, the APIs return 200 response as expected. Only the Rserve API properly shows 400 response and the others need some modification.

1library(httr)
2plumber200 <- POST(url = 'http://localhost:9000/test', encode = 'json',
3                   body = list(n = 10, wait = 0.5))
4unlist(c(api = 'plumber', status = status_code(plumber200),
5         content = content(plumber200)))
1##           api        status content.value 
2##     "plumber"         "200"          "10"
1rapache200 <- POST(url = 'http://localhost:7000/test', encode = 'json',
2                   body = list(n = 10, wait = 0.5))
3unlist(c(api = 'rapache', status = status_code(rapache200),
4         content = content(rapache200)))
1##           api        status content.value 
2##     "rapache"         "200"          "10"
1rserve200 <- POST(url = 'http://localhost:8000/test', encode = 'json',
2                  body = list(n = 10, wait = 0.5))
3unlist(c(api = 'rserve', status = status_code(rserve200),
4         content = content(rserve200)))
1##           api        status content.value 
2##      "rserve"         "200"          "10"
1rserve400 <- POST(url = 'http://localhost:8000/test', encode = 'json',
2                  body = list(wait = 0.5))
3unlist(c(api = 'rserve', status = status_code(rserve400),
4         content = content(rserve400)))
1##                     api                  status         content.message 
2##                "rserve"                   "400" "Missing parameter - n"

Performance Test

A way to examine performance of an API is to look into how effectively it can serve multiple concurrent requests. For this, Locust, a Python based load testing tool, is used to simulate 1, 3 and 6 concurrent requests successively.

The test locust file is shown below.

 1import json
 2from locust import HttpLocust, TaskSet, task
 3
 4class TestTaskSet(TaskSet):
 5
 6    @task
 7    def test(self):
 8        payload = {'n':10, 'wait': 0.5}
 9        headers = {'content-type': 'application/json'}
10        self.client.post('/test', data=json.dumps(payload), headers=headers)
11         
12class MyLocust(HttpLocust):
13    min_wait = 0
14    max_wait = 0
15    task_set = TestTaskSet

With this file, testing can be made as following (eg for 3 concurrent requests).

1locust -f ./locustfile.py --host http://localhost:8000 --no-web -c 3 -r 3

When only 1 request is made successively, the average response time of the APIs is around 500ms. When there are multiple concurrent requests, however, the average response time of the plumber API increases significantly. This is because R is single threaded and requests are queued by httpuv. On the other hand, the average response time of the Rserve API stays the same and this is because Rserve handles concurrent requests by forked processes. The performance of the rApache API is in the middle. In practice, it is possible to boost the performance of the rApache API by enabling Prefork Multi-Processing Module although it will consume more memory.

As expected, the Rserve API handles considerably many requests per second.

Note that the test function in this post is a bit unrealistic as it just waits before returning a value. In practice, R functions will consume more CPU and the average response time will tend to increase when multiple requests are made concurrently. Even in this case, the benefit of forking will persist.

This series investigate exposing R functions via an API. I hope you enjoy reading this series.