|||

Using Dropbox for syncing Shiny app data on Amazon EC2

I’ve been working more or less full-time on our app for causal mapping, now called simply Causal Map” for over a year now. You can find out more on causalmap.app. This post is about a solution I’ve found for storing and syncing data at our app.

Our app is built with Shiny provides web apps for R. Our use case:

  • We’re going to have a limited number of users to start with, maybe a max of 10 concurrent users
  • Users work for extended periods of time usually one one single project which consists of five tables and a few other bits of data
  • There is no need for more than one user to work concurrently on the same project
  • Each project is more or less self-contained; there is almost no need to combine or compare data across projects
  • Users of course want to log on to the app and load up projects which they have previously saved
  • In time, we may need to scale so that the app is containerised and replicated across several servers
  • As we also have a couple of local installations on our own computers, we would like to be able to sometimes work offline on the same series of projects and sync the changes later, with each user only ever dealing with separate sets of data unique to them.

For more than half a year, I’ve been struggling to solve these problems using a database. But databases bring a bunch of other problems, such as:

  • Slow to load individual projects
  • Slow enough to save projects that continuous autosave is difficult to implement, meaning that we had to implement - asynchronous background operation, which is famously not so easy in R - algorithms to identify which rows to add, delete and update
  • The usual problems with Unicode
  • Difficult to implement offline working

The challenge of providing persistent data storage in Shiny apps is well-known. If like us you are fed up with making databases work, and work fast enough, you can try loading from and saving to cloud storage services using tools like rclone or rdrop2. But I’ve found those to be slow too with some of the same problems as with databases.

So what I did was install headless Dropbox, by doing these commands in the terminal, as user rstudio. This is on Ubuntu Linux at an Amazon EC2 instance.

Fetch what you need from Drobox. cd ~ && wget -O - "https://www.dropbox.com/download?plat=lnx.x86_64" | tar xzf - Then I had to unset DISPLAY because otherwise the following command to install Dropbox fails when it doesn’t find a display.

unset DISPLAY

~/.dropbox-dist/dropboxd

… and follow the instructions about authentication.

But now all of your existing Dropbox folders will start syncing, which you don’t want. You need the commands to stop that: what I did was also installed the whole of nautilus dropbox, even though I don’t need most of that package.

sudo apt-get install nautilus-dropbox

You only need to sync one data folder, say app-data-folder. This is the folder where your app data is going to go. You’re going to tell Dropbox not to sync the whole of your Dropbox folder and then to remove your special folder from the exclusion list, i.e., sync it. But you have to switch to /home/rstudio/Dropbox to run the following commands which otherwise won’t work:

cd /home/rstudio/Dropbox

dropbox exclude add *

dropbox exclude remove app-data-folder

So now just that one folder should appear in /home/rstudio/Dropbox and be syncing across your Dropbox account.

However your app can’t see the data there, so in the last line you symlink from it to a place within your app folder.

sudo ln -s /home/rstudio/Dropbox/my-app-data-folder /home/rstudio/ShinyApps/textApp/app/my-app-data-folder

That’s one line.

And there you go. Fill your pockets with Dropboxy goodness. Instant, byte-level sync across multiple instances, as long as you think carefully about where conflicts can arise. In the simplest case, each user has their own folder and only has access to the data in that folder. Or, you can allow different users access to the same data but not concurrently, perhaps by setting a lockfile. Or, you can try and allow realtime access to multiple users to the same data but you’re going to have to be clever not to get sync conflicts. If this doesn’t fit your use case, you’ll have to try a different solution from this one.

I also did

cd /home/rstudio/Dropbox

sudo drobox autostart y

… let’s see if that works.

Up next Progress on the Causal Map app I’ve been working more or less full-time on our app for causal mapping, now called simply “Causal Map” for over a year now. You can find out more on Publications I just realised an easy way to put a whole folder of my publications on the web using Google Drive. https://bit.ly/sp99-publications. This page
Latest posts Causal Mapping - an earlier guide The walk to school in Sarajevo Glitches Draft blog post for AEA365 Theory Maker! Inventory & analysis of small conservation grants, C&W Africa - Powell & Mesbach! Lots of charts! Answering the “why” question: piecing together multiple pieces of causal information rbind.fill for 1-dimensional tables in r yED graph editor Examples of trivial graph format Using attr labels for ggplot An evaluation puzzle: “Talent show” An evaluation puzzle: “Mobile first” An evaluation puzzle: “Many hands” An evaluation puzzle: Loaves and fishes An evaluation puzzle: “Freak weather” An evaluation puzzle: “Billionaire” Publications Using Dropbox for syncing Shiny app data on Amazon EC2 Progress on the Causal Map app Articles and presentations related to Causal Maps and Theorymaker Better ways to present country-level data on a world map: equal-area cartograms A starter kit for reproducible research with R A reproducible workflow for evaluation reports Welcome to the Wiggle Room Realtime comments on a Theory of Change Responses to open questions shown as tooltips in a chart A panel on visualising Theories of Change for EES 2018? Peer mentoring for evaluators How do you explain reproducible research to clients? Links for my AEA eval2017 presentation, Washington DC