Using Dropbox for syncing Shiny app data on Amazon EC2
I’ve been working more or less full-time on our app for causal mapping, now called simply “Causal Map” for over a year now. You can find out more on causalmap.app. This post is about a solution I’ve found for storing and syncing data at our app.
Our app is built with Shiny provides web apps for R. Our use case:
- We’re going to have a limited number of users to start with, maybe a max of 10 concurrent users
- Users work for extended periods of time usually one one single project which consists of five tables and a few other bits of data
- There is no need for more than one user to work concurrently on the same project
- Each project is more or less self-contained; there is almost no need to combine or compare data across projects
- Users of course want to log on to the app and load up projects which they have previously saved
- In time, we may need to scale so that the app is containerised and replicated across several servers
- As we also have a couple of local installations on our own computers, we would like to be able to sometimes work offline on the same series of projects and sync the changes later, with each user only ever dealing with separate sets of data unique to them.
For more than half a year, I’ve been struggling to solve these problems using a database. But databases bring a bunch of other problems, such as:
- Slow to load individual projects
- Slow enough to save projects that continuous autosave is difficult to implement, meaning that we had to implement - asynchronous background operation, which is famously not so easy in R - algorithms to identify which rows to add, delete and update
- The usual problems with Unicode
- Difficult to implement offline working
The challenge of providing persistent data storage in Shiny apps is well-known. If like us you are fed up with making databases work, and work fast enough, you can try loading from and saving to cloud storage services using tools like rclone or rdrop2. But I’ve found those to be slow too with some of the same problems as with databases.
So what I did was install headless Dropbox, by doing these commands in the terminal, as user
rstudio. This is on Ubuntu Linux at an Amazon EC2 instance.
Fetch what you need from Drobox.
cd ~ && wget -O - "https://www.dropbox.com/download?plat=lnx.x86_64" | tar xzf - Then I had to
unset DISPLAY because otherwise the following command to install Dropbox fails when it doesn’t find a display.
… and follow the instructions about authentication.
But now all of your existing Dropbox folders will start syncing, which you don’t want. You need the commands to stop that: what I did was also installed the whole of nautilus dropbox, even though I don’t need most of that package.
sudo apt-get install nautilus-dropbox
You only need to sync one data folder, say
app-data-folder. This is the folder where your app data is going to go. You’re going to tell Dropbox not to sync the whole of your Dropbox folder and then to remove your special folder from the exclusion list, i.e., sync it. But you have to switch to
/home/rstudio/Dropbox to run the following commands which otherwise won’t work:
dropbox exclude add *
dropbox exclude remove app-data-folder
So now just that one folder should appear in
/home/rstudio/Dropbox and be syncing across your Dropbox account.
However your app can’t see the data there, so in the last line you symlink from it to a place within your app folder.
sudo ln -s /home/rstudio/Dropbox/my-app-data-folder /home/rstudio/ShinyApps/textApp/app/my-app-data-folder
That’s one line.
And there you go. Fill your pockets with Dropboxy goodness. Instant, byte-level sync across multiple instances, as long as you think carefully about where conflicts can arise. In the simplest case, each user has their own folder and only has access to the data in that folder. Or, you can allow different users access to the same data but not concurrently, perhaps by setting a lockfile. Or, you can try and allow realtime access to multiple users to the same data but you’re going to have to be clever not to get sync conflicts. If this doesn’t fit your use case, you’ll have to try a different solution from this one.
I also did
sudo drobox autostart y
… let’s see if that works.