RStudio Environment on DigitalOcean with Docker

R

I’ll be running a training course in a few weeks which will use RStudio as the main computational tool. Since it’s a short course I don’t want to spend a lot of time sorting out technical issues. And with multiple operating systems (and versions) these issues can be numerous and pervasive. Setting up a RStudio server which everyone can access (and that requires no individual configuration!) makes a lot of sense.

These are some notes about how I got this all set up using a Docker container on DigitalOcean. This idea was inspired by this article. I provide some additional details about the process.

Local Setup

I began by trying things out on my local machine. The first step was to install Docker. On my Linux machine this was a simple procedure. I added my user to the docker group and I was ready to roll.

Validate Docker

Being my first serious foray into the world of Docker I spent some time getting familiar with the tools. First it makes sense to validate that Docker is correctly configured and operational. Check the version.

$ docker -v
Docker version 17.06.0-ce, build 02c1d87

Check the current status of the Docker service. This should indicate that Docker is loaded, running and active.

$ systemctl status docker

To see further system information about Docker:

$ docker info

Finally run a quick test to ensure that Docker is able to download and launch images.

$ docker run hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://cloud.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/engine/userguide/

RStudio Container

A selection of RStudio Docker containers are hosted by the Rocker project. We’ll install the verse container which contains base R, RStudio, tidyverse, devtools and some packages related to publishing.

$ docker pull rocker/verse

That will download a load of content. Depending on the speed of your connection it might take a couple of minutes. Once the downloads are complete we can spin it up.

$ docker run -d -p 80:8787 rocker/verse

Now point your browser at http://localhost:80/. You should see a login dialog. Login with username rstudio and password rstudio.

Once you’ve satisfied yourself that the RStudio server is working properly, we’ll shut it down. Check on the running Docker containers.

$ docker ps

The ID in the output from the previous command is used to stop the container.

$ docker stop 487487fc346d

Creating a New Container Image

We’re now going to create a custom Docker image based on the rocker/verse image we used above. We do this by creating a Dockerfile. You can take a look at the one that I am using in this GitHub repository. It adds a few minor features to the rocker/verse image:

  • a small shell script for generating new user profiles;
  • the whois package for the apg command (although I am currently using openssl for password generation); and
  • a few extra R packages.

Check out the best practices for creating a Dockerfile.

Building

We need to build the image before we can launch it. Navigate to the folder which contains the Dockerfile and then do the following:

$ docker build -t rstudio:latest .

That will step through the instructions in the Dockerfile, building up the new image as a series of layers. We can get an idea of which components contributed the most to the resulting image.

$ docker history rstudio:latest
IMAGE               CREATED              CREATED BY                                      SIZE                COMMENT
1206300d01f8        About a minute ago   /bin/sh -c R -e 'install.packages("RSeleni...   11.6MB              
4f0daf5ee744        4 hours ago          /bin/sh -c R -e 'install.packages(c("binma...   3.4MB               
60e254d31a5a        4 hours ago          /bin/sh -c apt-get install whois                2.31MB              
5107e33b5c77        4 hours ago          /bin/sh -c apt-get update                       15.5MB              
a720b73666a2        4 hours ago          /bin/sh -c #(nop)  MAINTAINER Andrew Colli...   0B                  
8232739f906d        7 hours ago          /bin/sh -c apt-get update   && apt-get ins...   763MB               
<missing>           7 hours ago          /bin/sh -c apt-get update -qq && apt-get -...   720MB               
<missing>           10 hours ago         /bin/sh -c #(nop)  CMD ["/init"]                0B                  
<missing>           10 hours ago         /bin/sh -c #(nop)  VOLUME [/home/rstudio/k...   0B                  
<missing>           10 hours ago         /bin/sh -c #(nop)  EXPOSE 8787/tcp              0B                  
<missing>           10 hours ago         /bin/sh -c #(nop) COPY file:b221a73265993c...   1.17kB              
<missing>           10 hours ago         /bin/sh -c #(nop) COPY file:3012c80f63f800...   2.36kB              
<missing>           10 hours ago         /bin/sh -c apt-get update   && apt-get ins...   486MB               
<missing>           10 hours ago         /bin/sh -c #(nop)  ENV PANDOC_TEMPLATES_VE...   0B                  
<missing>           10 hours ago         /bin/sh -c #(nop)  ARG PANDOC_TEMPLATES_VE...   0B                  
<missing>           10 hours ago         /bin/sh -c #(nop)  ARG RSTUDIO_VERSION          0B                  
<missing>           10 hours ago         /bin/sh -c #(nop)  CMD ["R"]                    0B                  
<missing>           10 hours ago         /bin/sh -c sed -i "s/deb.debian.org/cloudf...   477MB               
<missing>           10 hours ago         /bin/sh -c #(nop)  ENV R_VERSION=3.4.1 LC_...   0B                  
<missing>           10 hours ago         /bin/sh -c #(nop)  ARG BUILD_DATE               0B                  
<missing>           10 hours ago         /bin/sh -c #(nop)  ARG R_VERSION                0B                  
<missing>           10 hours ago         /bin/sh -c #(nop)  LABEL org.label-schema....   0B                  
<missing>           2 weeks ago          /bin/sh -c #(nop)  CMD ["bash"]                 0B                  
<missing>           2 weeks ago          /bin/sh -c #(nop) ADD file:93a0dbb6973bc13...   100MB               

We can now test the new container.

$ docker run -d -p 80:8787 --name rstudio rstudio:latest

Once you are satisfied that it works, stop the container.

Deploy on DigitalOcean

We’re now in a position to deploy the image on DigitalOcean. If you don’t already have an account, go ahead and create one now,

Create a Droplet

Once you’ve logged in to your DigitalOcean account, create a new Droplet and choose the Docker one-click app (I chose Docker 17.06.0-ce on 16.04). Make sure that you provide your SSH public key.

Connect as root

Once the Droplet is live (give it a moment or two, even after it claims to be “Good to go!”), use the IP address from the DigitalOcean dashboard to make a SSH connection. You’ll connect initially as the root user.

$ ssh -l root 104.236.93.95

Swap Space

Docker containers use the kernel, memory and swap from the host. So if you’ve created a relatively small Droplet then you might want to add swap space.

Create a docker User

Create a docker user and add it to the docker group.

# useradd -g users -G docker -m -s /bin/bash docker

Add your SSH public key to .ssh/authorized_keys for the docker user. Terminate your root connection and reconnect as the docker user.

$ ssh docker@104.236.73.164
$ groups
users docker

Build the Container

Clone the GitHub repository. Navigate to the folder which contains the RStudio Dockerfile. Now build the image on the Droplet.

$ docker build -t rstudio:latest .

And then launch a container.

$ docker run -d -p 80:8787 --name rstudio rstudio:latest

Connect to the Droplet using the IP address from the DigitalOcean dashboard.

Sign in using the same credentials as before. Sweet: you’re connected to an instance of RStudio running somewhere out in the cloud.

Accessing Usernames and Passwords

Obviously the default credentials we’ve been using are a security hole. We’ll need to fix that. We’ll also need to create a brace of new accounts which we can give to the course delegates. These new accounts need to be created on the container not the host!

To accomplish all of this we’ll need to connect to the running Docker container. Again use docker ps to find the ID of the running container. Then connect a bash shell using docker exec, providing the container ID as the -i argument.

$ docker exec -t -i df3a7a5af57e /bin/bash

Delete the rstudio user.

root@df3a7a5af57e:/# userdel rstudio

Now create some new users using the generate-users.sh scripts packaged with the image. For example, to generate five new users:

root@df3a7a5af57e:/# /usr/sbin/generate-users.sh 5
U001,/kK160rx
U002,hhNk7FJl
U003,RaH4EJYP
U004,YBpMcl6n
U005,9Rcl8gye

This will create the user profiles and home folders. The usernames and passwords are dumped to the terminal in CSV formay. Record these and then assign a pair to each of the course delegates.

Persisting User Data

You’ll probably want to use a mechanism for persisting user data. There are a couple of options for doing this. A simple technique which I have found helpful is documented here.

Finish and Klaar

Feel free to fork the repository and customise the Dockerfile to suit your requirements. Let me know how this works out for you. I’m rather excited to run a course which will not be plagued by technical issues!

Categorically Variable