Chapter 15 Common Application Caveats

15.1 Reactivity anti-patterns

15.1.1 Reactivity is awesome… until it is not

Let’s face it, reactivity is awesome… until it is not. Reactivity is a common source of confusion for beginners, and a common source of bugs and bottlenecks, even for seasoned {shiny} developers. Most of the time, issues come from the fact that there is too much reactivity, i.e. we build apps where too much things happen, and some things are updated way more often than they should, and computations are performed when they should not, and in the end we have a hard time understanding what is really happening inside our application.

Of course, it is a nice feature to make everything react instantly to changes, but when building larger apps it is easy to create monsters, i.e complicated, messy reactive graphs where everything is updated too much and too often. Or worse, we generate endless reactive loops, aka “the reactive inferno” where A invalidates B which invalidates C which invalidates A which invalidates B which invalidates C…

Let’s take a small example of a reactive inferno:

library(shiny)
library(lubridate)
ui <- function(){
  tagList(
    # Adding a first input which allow to select a specific date
    dateInput(
      "date", 
      "choose a date"
    ), 
    # Adding a second input allowing to specify a year
    selectInput(
      "year", 
      "Choose a year", 
      choices = 2010:2030
    )
  )
}

server <- function(
  input, 
  output, 
  session
){
  # We want the year to be update whenever the dateInput is updated
  observeEvent( input$date , {
    updateSelectInput(
      session, 
      "year", 
      selected = year(input$date)
    )
  })
  
  # We want the date to be update whenever the selectInput is updated
  observeEvent( input$year , {
    updateDateInput(
      session, 
      "date", 
      value = lubridate::as_date(
        sprintf("%s-01-01", input$year)
      )
    )
  })
  
}

shinyApp(ui, server)

Here, we want to handle something pretty common:

  • The user can pick a date and the year input is updated
  • And the other way round: when the year input changes, the date is updated too

But if you try to run this in your console, it will end as a reactive inferno: date update year that updates date that updates year…

And the more you work on your app, the more complex it gets, and the more you will be likely to end up in the reactive inferno. In this section, we will deal with reactivity and how to have more control on it, and about a way to share data across modules without relying on passing along reactive objects.

If you want to implement this pattern, here is a small workaround using observeEvent(ignoreInit = TRUE), and req():

library(shiny)
library(lubridate)
ui <- function(){
  tagList(
    # Adding a first input which allow to select a specific date
    dateInput(
      "date", 
      "choose a date"
    ), 
    # Adding a second input allowing to specify a year
    selectInput(
      "year", 
      "Choose a year", 
      choices = 2010:2030
    )
  )
}

server <- function(
  input, 
  output, 
  session
){
  # Setting a reactiveValues list that will keep track of the latest changes
  r <- reactiveValues(
    date = NULL, 
    year = NULL
  )
  
  # When the date is changed, we set its value inside r$date
  observeEvent( input$date , {
    r$date <- input$date
  }, ignoreInit = TRUE)
  
  # Whenever r$date is changed, we launch an observer
  observeEvent( r$date , {
    # We use req() to control the update of the selectInput(), which 
    # is updated only if the year of the input is different from the year
    # of the date input
    req(year(r$date) != input$year)
    updateSelectInput(
      session, 
      "year", 
      selected = year(r$date)
    )
  }, ignoreInit = TRUE)
  
  # When the year is changed, we set its value inside r$year
  observeEvent( input$year , {
    r$year <- input$year
  }, ignoreInit = TRUE)
  
  # Whenever r$year is changed, we launch an observer
  observeEvent( r$year , {
    # We use req() to control the update of the dateInput(), which 
    # is updated only if the year of the input is different from the year
    # of the date input
    req(r$year != year(input$date))
    new_date <- input$date
    year(new_date) <- as.numeric(r$year)
    updateDateInput(
      session, 
      "date", 
      value = new_date
    )
  }, ignoreInit = TRUE)
  
}

shinyApp(ui, server)

15.1.2 observe vs observeEvent

One of the most common feature of reactive inferno is the use of observe() in cases where you should use observeEvent. Spoiler: you should try to use observeEvent() as much as possible, and avoid observe()as much as possible.

At first, observe() seems easier to implement, and feels like a shortcut as you do not have to think about what to react to: everything gets updated without you thinking about it. But the truth is that this stairway does not lead to heaven.

Let’s stop and think about observe() for a minute. This function updates every time a reactive object it contains is invalidated. Yes, this works well if you have a small amount of reactive objects in the observer, but that gets tricky whenever you start adding things a long list of things inside your observe(), as you might be launching a computation 10 times if your reactive scope contains 10 reactive objects that are somehow invalidated in chain. And believe us, we have seen pieces of code where the observe() contains hundreds of lines of code, with reactives objects all over the place, with one observe() context being invalidated dozens of times when one input changes in the application.

For example, let’s start with that:

## DO NOT DO GLOBAL VARIABLES, IT'S JUST TO SIMPLIFY THE EXAMPLE
# We initiate a counter that will help to track how many times 
# some pieces of the code are called
i <- 0
library(shiny)
library(cli)
ui <- function(){
  tagList(
    # We are adding a simple text input that will be printed to the console
    textInput("txt", "Text")
  )
}

server <- function(input, output, session){
  observe({
    # Every time this reactive context is invalidated, we add 1 to the i value
    i <<- i + 1
    # We print the i value to the console, and the value of input$txt
    cat_rule(as.character(i))
    print(input$txt)
  })
}

shinyApp(ui, server)

Oh, and then, let’s add a small selectInput():

i <- 0
library(shiny)
library(cli)
ui <- function(){
  tagList(
    # We are adding a simple text input that will be printed to the console
    textInput("txt", "Text"),
    # We add a selectInput() to allow text transformation
    selectInput("casefolding", "Casefolding", c("lower", "upper"))
  )
}

server <- function(input, output, session){
  observe({
    # Every time this reactive context is invalidated, we add 1 to the i value
    i <<- i + 1
    # We print the i value to the console
    cat_rule(as.character(i))
    # If the user select lower, then the text is 
    # passed through tolower, otherwise it's passed 
    # through toupper
    if (input$casefolding == "lower") {
      print(tolower(input$txt))
    } else  {
      print(toupper(input$txt))
    }
  })
}

shinyApp(ui, server)

And, as time goes by, we add another control flow to our observe():

i <- 0
library(shiny)
library(cli)
library(stringi)
ui <- function(){
  tagList(
    # We are adding a simple text input that will be printed to the console
    textInput("txt", "Text"),
    # We add a selectInput() to allow text transformation
    selectInput("casefolding", "Casefolding", c("lower", "upper")),
    # A new checkbox to reverse (or not) the input text
    checkboxInput("rev", "reverse")
  )
}

server <- function(input, output, session){
  observe({
    # Every time this reactive context is invalidated, we add 1 to the i value
    i <<- i + 1
    # We print the i value to the console
    cat_rule(as.character(i))
    # Use input_txt as a container for our input
    input_txt <- input$txt
    if (input$rev){
      # If the input$rev is select, we reverse the text
      input_txt <- stri_reverse(input_txt)
    }
    # If the user select lower, then the text is 
    # passed through tolower, otherwise it's passed 
    # through toupper
    if (input$casefolding == "lower") {
      print(tolower(input_txt))
    } else  {
      print(toupper(input_txt))
    }
  })
}

shinyApp(ui, server)

And it would be nice to keep the selected values into a reactive list, so that we can reuse it elsewhere. And maybe you would like to add a checkbox so that the logs are printed to the console only if checked.

i <- 0
library(shiny)
library(cli)
library(stringi)
ui <- function(){
  tagList(
    # We are adding a simple text input that will be printed to the console
    textInput("txt", "Text"),
    # We add a selectInput() to allow text transformation
    selectInput("casefolding", "Casefolding", c("lower", "upper")),
    # A new checkbox to reverse (or not) the input text
    checkboxInput("rev", "reverse")
  )
}

server <- function(input, output, session){
  # We are using a reactiveValues to keep this input value
  r <- reactiveValues()
  observe({
    # Every time this reactive context is invalidated, we add 1 to the i value
    i <<- i + 1
    # We print the i value to the console
    cat_rule(as.character(i))
    if (input$rev){
      # If the input$rev is select, we reverse the text
      r$input_txt <- stri_reverse(r$input_txt)
    } else {
      # Otherwise, we leave it as it is
      r$input_txt <- input$txt
    }
    # If the user select lower, then the text is 
    # passed through tolower, otherwise it's passed 
    # through toupper
    if (input$casefolding == "lower") {
      print(tolower(r$input_txt))
    } else  {
      print(toupper(r$input_txt))
    }
  })
}

shinyApp(ui, server)

Ok, now can you tell how many potential invalidation points we have got here? Three: whenever input$txt, input$rev or input$casefolding change. Of course, three is not that much, but you get the idea.

Let’s pause a minute and think about why we use observe() here. To update the values inside r$input_txt, yes. But do we need to use observe() for, say, updating r$input_txt under dozens of conditions, each time the user types a letter? Possibly not.

We generally want our observer to update its content under a small, controlled number of inputs, i.e. with a controlled number of invalidation points. And, what we often forget is that users do not type/select correctly on the first try. No, they usually try and miss, restart, change things, amplifying the reactivity “over-happening”.

Moreover, long observe() statements are hard to debug, and they make collaboration harder when the trigger to the observe logic can potentially live anywhere between line one and line 257 of your observe(). That’s why (well, in 99% of cases), it is safer to go with observeEvent, as it allows to see at a glance what are the condition under which the content is invalidated and re-evaluated. Then, if a reactive context is invalidated, you know why. For example, here is where the reactive invalidation can happen (lines with a *)61:

observe({
i <<- i + 1
cat_rule(as.character(i))
* if (input$rev){
*   r$input_txt <- stri_reverse(r$input_txt)
} else {
*   r$input_txt <- input$txt
}
* if (input$casefolding == "lower") {
*   print(tolower(r$input_txt))
} else  {
*   print(toupper(r$input_txt))
}
})

Whereas in this refactored code using observeEvent(), it is easier to identify where the invalidation can happen:

observeEvent( c(
*  input$rev, 
*  input$txt
),{
i <<- i + 1
cat_rule(as.character(i))
if (input$rev){
r$input_txt <- stri_reverse(r$input_txt)
} else {
r$input_txt <- input$txt
}
if (input$casefolding == "lower") {
print(tolower(r$input_txt))
} else  {
print(toupper(r$input_txt))
}
})

15.1.3 Building triggers and watchers

To prevent this, one way to go is to create “flags” objects, which can be thought of as internal buttons to control what you want to invalidate: you create the button, set some places where you want these buttons to invalidate the context, and finally press these buttons.

These objects are launched with an init function, then these flags are triggered with trigger(), and wherever we want these flags to invalidate a reactive context, we watch() these flags.

The idea here is to get a full control over the reactive flow: we only invalidate contexts when we want, making the general flow of the app more predictable. These flags are available using the {gargoyle} (Fay 2020e) package, that can be installed from GitHub with:

remotes::install_github("ColinFay/gargoyle")
  • gargoyle::init("this") initiate a "this" flag: most of the time you will be generating them at the app_server() level

  • gargoyle::watch("this") sets the flag inside a reactive context, so that it will be invalidated every time you trigger("this") this flag.

  • gargoyle::trigger("this") triggers the flags

And, bonus, as these functions use the session object, they are available across all modules. That also means that you can easily trigger an event inside a module from another one.

This pattern is for example implemented into {hexmake} (Fay 2020g) (though not with {gargoyle}), where the rendering of the image on the right is fully controlled by the "render" flag. The idea here is to allow a complete control over when the image is recomputed: only when trigger("render") is called does the app regenerate the image, helping us lower the reactivity of the application. That might seems like a lot of extra work, but that is definitely worth considering on the long run, as it will help optimizing the rendering (fewer computation) and lowering the number of errors that can result from too much reactivity inside an application.

Here is a small example of this implementation, using an environment to store the value. When using this pattern, we do not rely on any reactive value invalidating the reactive context: the second result is only displayed when the "render2" flag is triggered, giving us a full control on how the reactivity is propagated.

library(shiny)
library(gargoyle)
ui <- function(){
  fluidPage(
    tagList(
      # Creating an action button to launch the computation
      actionButton("compute", "Compute"), 
      # Output for all runif()
      verbatimTextOutput("result"), 
      # This output will change only if runif() > 0.5
      verbatimTextOutput("result2"),
      # This button will reset x$results to 0, we use it 
      # to show that it won't launch a series of reactivity 
      # invalidation
      actionButton("reset", "Reset x")
    )
  )
}

server <- function(
  input, 
  output, 
  session
){
  
  # Mimic an R6 class, i.e a non-reactive object
  x <- environment()
  
  # Creating two watchers
  init("render_result", "render_result2")
  
  observeEvent( input$compute , {
    # When the user presses compute, we launch runif()
    x$results <- runif(1)
    # Every time a new value is stored, we render result
    trigger("render_result")
    # Only render the second result if x$results is over 0.5
    if (x$results > 0.5){
      trigger("render_result2")
    }
  })
  
  output$result <- renderPrint({
    # Will be rendered every time
    watch("render_result")
    # require x$results before rendering the output
    req(x$results)
    x$results
  })
  
  
  output$result2 <- renderPrint({
    # This will only be rendered if trigger("render_result2")
    # is called
    watch("render_result2")
    req(x$results)
    x$results
  })
  
  observeEvent( input$reset , {
    # This resets x$results. This code block is here
    # to show that reactivity is not triggered in this app unless
    # a trigger() is called
    x$results <-  0
    print(x$results)
  })
  
}

shinyApp(ui, server)

15.1.4 Using R6 as a data storage

One pattern we have also been playing with is storing the app business logic inside one or more R6 objects. Why would we want to do that?

A. Sharing data across module

Sharing an R6 object makes it simpler to create data that are shared across modules, but without the complexity generated by reactive objects, and the instability of using global variables.

Basically, the idea is to hold the whole logic of your data reading/cleaning/processing/outputting inside an R6 class. An object of this class is then initiated at the top level of your application, and you can pass this object to the sub-modules. Of course, this makes even more sense if you are combining it with the trigger/watch pattern from before!

library(shiny)
data_cleaning_ui <- function(id){
  ns <- NS(id)
  tagList(
    # Defining the UI for your first module
    # [...]
  )
}

data_cleaning_server <- function(input, output, session, r6){
  
  observeEvent( input$launch_cleaning , {
    # Once the launch_cleaning input is triggered, we 
    # use the internal method from our r6 object 
    r6$clean(arg1 = input$a, arg2 = input$b)
    # Triggering the plot
    trigger("plot")
  })
  
}

plotting_ui <- function(id){
  ns <- NS(id)
  tagList(
    # Defining the UI for your second module
    # [...]
  )
}

plotting_server <- function(input, output, session, r6){
  
  # Rendering, inside this second module, the plot based on the 
  # cleaning done in the other module
  output$plot <- renderPlot({
    # We use the trigger/watch pattern from before
    watch("plot")
    # Calling the plot() method from our R6 object
    r6$plot()
  })
}


ui <- function(){
  tagList(
    # Putting our two module UIs here
    data_cleaning_ui("data_cleaning_ui"),
    plotting_ui("plotting_ui")
  )
}

server <- function(
  input, 
  output, 
  session
){
  # We start by creating a new instance of th
  r6 <- MyDataProcessing$new()
  # Passing this object to the two server functions
  callModule(data_cleaning_server, "data_cleaning_ui", r6)
  callModule(plotting_server, "plotting_ui", r6)
}

shinyApp(ui, server)

B. Get sure it is tested

During the process of building a robust {shiny} app, we strongly suggest that you test as many things as you can. This is where using an R6 for your business logic of your app makes sense: this allows you to build the whole testing of your application data logic outside of any reactive context: you simply build unit tests just as any other function.

For example, let’s say we have the following R6 generator:

MyData <- R6::R6Class(
  "MyData", 
  # Defining our public methods, that will be the dataset container, 
  # and a summary function
  public = list(
    data = NULL,
    initialize = function(data){
      self$data <- data
    }, 
    summarize = function(){
      summary(self$data)
    }
  )
)

We can then build a test for this class using {testthat}:

library(testthat)

Attaching package: 'testthat'
The following object is masked from 'package:purrr':

    is_null
The following object is masked from 'package:dplyr':

    matches
The following objects are masked from 'package:magrittr':

    equals, is_less_than, not
test_that("R6 Class works", {
  # We define a new instance of this class, that will contain 
  # the mtcars data.frame
  my_data <- MyData$new(mtcars)
  # We will expect my_data to have two classes: "MyData" and "R6"
  expect_is(my_data, "MyData")
  expect_is(my_data, "R6")
  # And the summarize method to return a table
  expect_is(my_data$summarize(), "table")
  # We would expect the data contained in the object to match the one 
  # taken as input to new()
  expect_equal(my_data$data, mtcars)
  # And the summarize method to be equal to the summary() on the 
  # input object
  expect_equal(my_data$summarize(), summary(mtcars))
})

Using R6 allows to rely on these battle-tested tools when it comes to testing functions. Something which is made more complex when using other patterns like reactiveValues().

15.1.5 Logging reactivity with {whereami}

Getting a good sense of how reactivity is actually working in your app is not an easy task: the reactivity logic is a graph, and it happens very quickly when you run the app so it’s very hard to follow everything.

whereami::whereami()’s (Sidi and Müller 2019) goal is simple: informing you about where it is called, i.e from what file and at which line, and how many times. For example, if you add it inside a RMarkdown inside a book written with book (like the one you are reading now), you will get:

whereami::whereami()
── Running From: ./engineering-production-grade-shiny-apps.R

Combining cat_where() will implement a reactive logging to your console while developing: that way, you can instantaneously know what reactive contexts are invalidated while using the application. Of course, you still have to implement it by hand, but that is definitely worth the effort: seeing in real time, in your console, which line is run allows you to detect unexpected behavior. For example, you will be able to see that the observeEvent() from mod_main.R#79 has been called 17 times when launching the app, which might be an unexpected behavior.

The following screenshot shows what a {whereami} log might look like, here for the {hexmake} application.

{whereami} output for {hexmake}

FIGURE 15.1: {whereami} output for {hexmake}

And bonus, once the app is closed, you can get a list of all the “counters” with whereami::counter_get(), and how many time they each have been called, and plot(whereami::counter_get()) will draw a raw plot of the various counters.

plot of {whereami} counters

FIGURE 15.2: plot of {whereami} counters

15.2 R does too much

15.2.1 Rendering UI from server side

There are many reasons we would want to change things on the UI based on what happens in the server: changing the choices of a selectInput() based on the columns of a table which is uploaded by the user, showing and hiding pieces of the app according to an environment variable, allow the user to create an indeterminate amount of inputs, etc.

Chances are that to do that, you have been using the uiOutput() & renderUI() functions from {shiny} (Chang et al. 2020). Even if convenient, and the functions of choice in some specific context, this couple makes R do a little bit too much: you are making R regenerate the whole UI component instead of changing only what you need., which can be a suboptimal, be it from the user point of view, or from a developer perspective.

One of the reason why this pattern might not be optimal is in the case where your visitors do not have an high speed internet, or when visiting using a smartphone, contexts where every byte count. Rendering large elements from the server side in your {shiny} app means that these elements will have to transit through the socket, i.e they need to be sent by the server, and downloaded by the browser. In this case, the smaller the message size the better!

From the developer perspective, you will create a code that is harder to reason about, as we are used to have the UI parts in the UI functions (but that is not related to performance).

Here are three strategies to code without uiOutput() & renderUI().

A. Implement UI events in JavaScript

Mixing languages is better than writing everything in one, if and only if using only that one is likely to overcomplicate the program. \(

\)

We will see in the last chapter of this book how you can integrate JS inside your {shiny} app, and how even basic functions can be useful for making your app server lighter. For example, compare:

library(shiny)
ui <- function(){
  tagList(
    # Adding a button with an onclick event, that will show or hide the plot
    actionButton(
      "change", 
      "show/hide graph", 
      # The toggle() function hide or show the queried element
      onclick = "$('#plot').toggle()"
    ), 
    plotOutput("plot")
  )
}

server <- function(
  input, 
  output, 
  session
){
  output$plot <- renderPlot({
    # This renderPlot will only be called once
    cli::cat_rule("Rendering plot")
    plot(iris)
  })
}

shinyApp(ui, server) 

to

library(shiny)
ui <- function(){
  tagList(
    # We use a pattern without JavaScript
    actionButton("change", "show/hide graph"), 
    plotOutput("plot")
  )
}

server <- function(
  input, 
  output, 
  session
){

  output$plot <- renderPlot({
    # Here, every time the button is clicked, this reactive 
    # context will be invalidated, and the code re-evaluated
    cli::cat_rule("Rendering plot")
    # Simulate a show and hide pattern
    req(input$change %% 2 == 0)
    plot(iris)
  })
  
}

shinyApp(ui, server)

The result is the same, but the first version is shorter and easier to reason about: we have one button, and the behavior of the button is contained into itself. And, the second solution redraws the plot every time the reactiveValues is updated, making R compute way more than it should, whereas with the JavaScript only solution, the plot is not recomputed every time you need to show it: the plot is drawn by R only once.

At a local level, the improvements described in this section will not make your application way faster: for example, rendering UI elements (let’s says rendering a simple title) will not be computationally heavy. But at a global level, lighter UI computation from the server side helps the general rendering of the app: let’s say you have an output that takes 3 seconds to run, then if the whole UI + output is to be rendered on the server side, the whole UI stays blank until everything is computed.

Compare:

library(shiny)
ui <- function(){
  tagList(
    # We make the whole UI be generated by R
    uiOutput("caption")
  )
}

server <- function(
  input, 
  output, 
  session
){
  output$caption <- renderUI({
    # Simulate something that takes 3 seconds to run
    Sys.sleep(3)
    # Returning the UI
    tagList(
      h3("test"), 
      shinipsum::random_text(10)
    )
    
  })
}

shinyApp(ui, server)

to

library(shiny)
ui <- function(){
  tagList(
    # Only the text input will be rendered by R
    h3("test"), 
    textOutput("caption")
  )
}

server <- function(
  input, 
  output, 
  session
){
  output$caption <- renderText({
    # Here, we only render the text, not the whole UI
    Sys.sleep(3)
    shinipsum::random_text(10)
  })
}

shinyApp(ui, server)

In the first example, the UI will wait for the server to have rendered, while in the second we will first see the title, then the rendered text after a few seconds. That approach makes the user experience better: they know that something is happening, while completely blank page is confusing.

Also, R being single threaded, manipulating DOM elements from the server side make R busy doing these DOM manipulation while it could be computing something else. And let’s imagine it takes a quarter of a second to render the DOM element, that is a full second for rendering four of them, while R should be busy doing something else!

B. update* inputs

Almost every {shiny} inputs, even the custom ones from packages, come with an update_ function that allows to change the input values from the server side, instead of recreating the UI entirely. For example, here is a way to update the content of a selectInput from the server side:

library(shiny)
ui <- function(){
  tagList(
    # We start the selectInput empty
    selectInput("species", "Species", choices = NULL), 
    # The selectInput will be populate when the update button is pressed
    actionButton("update", "Update")
  )
}

server <- function(
  input, 
  output, 
  session
){
  observeEvent( input$update , {
    # Update the selectInput with the species from iris
    spc <- unique(iris$Species)
    updateSelectInput(
      session, 
      "species", 
      choices = spc, 
      selected = spc[1]
    )
  })
  
}

shinyApp(ui, server)

This switch to updateSelectInput makes the code easier to reason about as the selectInput is where it should be: inside the UI, instead of another pattern where we would use renderUI() and uiOutput(). Plus, with the update method, we are only changing what is needed, not re-generating the whole input.

C. insertUI and removeUI

Another way to dynamically change what is in the UI is with insertUI() and removeUI(). It is more global than the solution we have seen before with setting the reactiveValue to NULL or to a value, as it allows to target a larger UI element: we can insert or remove the whole input, instead of having the DOM element inserted but empty. This method allows to have a lighter DOM: <div> which are not rendered are not generated empty, they are simply not there.

Two things to note concerning this method, though:

  • Removing an element from the app will not delete the input from the input list. In other word, if you have selectInput("x", "x"), and that you remove this input using removeUI(), you will still have input$x in the server.

For example, in the following example, the input$val value will not be removed once you have called removeUI(selector = "#val").

library(shiny)
ui <- function(){
  tagList(
    # Creating a text input that will be removed from the UI whenever
    # the remove button is pressed
    textInput("value", "Value", "place"),
    actionButton("remove", "Remove UI")
  )
}

server <- function(
  input, 
  output, 
  session
){
  
  observeEvent( input$remove , {
    # When the button is pressed, the textInput will be removed from the UI
    removeUI(selector = "#value")
  })
  
  observe({
    # We observe input$value every second. You'll realise that even after the UI
    # is removed, input$value is still available.
    invalidateLater(1000)
    print(input$value)
  })
  
}

shinyApp(ui, server)
  • Both these functions take a jQuery selector to select the element in the UI. We will introduce these selectors in chapter @ref(#optimjs).

15.2.2 Too Much Data in Memory

If you are building a {shiny} application, there is a great chance you are building it to analyze data. If you are dealing with large datasets, you should consider deporting the data handling and computation to an external database system: for example to an SQL database. Why? Because these system has been created to handle and manipulate data on disk: in other words it will allow you to perform operations on your data without having to clutter R memory with a large dataset.

For example, if you have a selectInput() that is used to perform a filter on a dataset, you can do that filter straight inside SQL, instead of bringing all the data to R and then do the filter. That is even more necessary if you are building the app for a large number of users: for example if one {shiny} session takes up to 300MB, multiply that by the number of users that will need one session, and you will have a rough estimate of how much RAM you will need. On the contrary, if you lighten the data manipulation so that it is done by the back-end, you will have, let’s say, one database with 300 mo of data, then the database size will remain (more or less constant), and the only RAM used by {shiny} will be the data manipulation, not the data storage. That’s even more true now that almost any operation you can do today in {dplyr} (Wickham, François, et al. 2020) would be doable with an SQL backend, and that is the purpose of the {dbplyr} (Wickham and Ruiz 2020) package: translates {dplyr} code into SQL.

If using a database as a back-end seems a little bit far-fetched right now, that is how it is done in most programming languages: if you are building a web app with NodeJS or Python for example, and need to interact with data, nothing will be stored in RAM: you will be relying on an external database to store your data. Then your application will be used to make queries to this database backend.

15.3 Reading data

{shiny} applications are a tool of choice when it comes to analyzing data. But that also means that these data have to be imported/read at some point in time, and reading data can be time consuming. How can we optimize that? In this section, we will take a look at three strategies: including datasets inside your application, using R packages for fast data reading, and when and why you should move to an external database system.

15.3.1 Including Data in your Application

If you are building your application using the {golem} (Fay et al. 2020) framework, you are building your application as a package. R packages provide a way to include internal datasets, that can then be used as objects inside your app. This is the solution you should go for if your data are never to rarely updated: the datasets are created during package development, then included inside the build of your package. The plus side of this approach is that it makes the data fast to read, as they are serialized as R native objects.

To include data inside your application, you can use the usethis::use_data_raw( name = "my_dataset", open = FALSE ) command which is inside the 02_dev.R script inside the dev/ folder of your source application (if you are building the app with {golem}). This will create a folder called data-raw at the root of your application folder, with a script to prepare your dataset. Here, you can read the data, modify it if necessary, and then save it with usethis::use_data(my_dataset). Once this is done, you will have access to the my_dataset object inside your application.

This is for example what is done in the {tidytuesday201942} (Fay 2020l) application, in data-raw/big_epa_cars.R: the csv is read there, and then used as an internal dataset inside the application.

15.3.2 Reading External Datasets

Other applications use data that are not available at build time: they are created to analyze data that are uploaded by users, or maybe they are fetch on an external service while using the app (for example by calling an API). When you are building an application for the “user data” use case, the first thing you will need is to provide users a way to upload their dataset: shiny::fileInput().

One crucial thing to keep in mind when it comes to using user-uploaded files is that you have to be (very) strict with the way you handle files:

  • Always specify what type of file you want: shiny::fileInput() has an accept parameter that allows you to set one or more MIME types or extension. When using this argument (for example with text/csv, .csv, or .xslx), the user will only be able to select a subset of files from their computer: the ones that matches the type.
  • Always perform checks once the file is uploaded, even more if it is tabular data: column type, naming, empty rows… The more you check the file for potential errors, the less your application is likely to fail to analyze this uploaded dataset.
  • If the data reading takes a while, do not forget to add a visual progression cue: be it a shiny::withProgress() or tools from the {waiter} package.

Whenever you offer a user the possibility to upload anything, you can be sure that at some point, they will upload a file that will make the app crash. By setting a specific MIME type and by doing a series of check once the file is uploaded, you will make your application more stable. Finally, having a visual cue that “something is happening” is very important for the user experience, as “something is happening” is better than not knowing what is happening, and it may also prevent the user from clicking again and again on the upload button, or worst, they will stop using the app.

Now we have our fileInput() set, how do we read these data as fast as possible? There are several options depending on the type of data you are reading. Here are some packages that can make the file reading faster:

  • For tabular, flat dataset (typically csv, tsv, or text), {vroom} (Hester and Wickham 2020b) can read data at a 1.40 GB/sec/sec speed. {data.table} (Dowle and Srinivasan 2020), and its fread() function, is also fast at reading delimited files.
  • For JSON files, {jsonlite} (Ooms 2014), or more recently, {RcppSimdJSON} (Eddelbuettel and Knapp 2020), a binding to the simdjson C++ library.
  • If you need to read Excel files inside your app, {readxl} (Wickham and Bryan 2019) offers a binding to the RapidXML C++ library, reading Excel files fast.
  • Most files exported from statistical software (SAS, SPSS…) can be read using either the {foreign} (R Core Team 2020) or {haven} (Wickham and Miller 2020) packages.

15.3.3 Using External DataBases

Another type of data analyzed in a shiny application is one contained inside an external database. Database are heavily used in the data science world, and in the software engineering as a whole. Databases come with API and drivers that help retrieving and transferring data: be it SQL, NoSQL, or even graph.

Using a database is one of the solutions for making your app lighter, and more efficient in the long run, notably if you need to scale your app to thousands of visitors. Indeed, if you plan on having your app scale to numerous people, that will mean that a lot of R processes will be triggered. And if your data is contained in your app, this will mean that each R process will take a significant amount of RAM if the dataset is large. For example, if your dataset alone takes ~300 mb of RAM, that means that if you want to launch the app 10 times, you will need ~3gb of RAM. On the other hand, if you decide to switch these data to an external database, it will lower the global RAM need: the DB will takes these 300mb of data, and each shiny application will make request to the database. For instance, if the database needs 300mb, and one shiny app 50mb, then 10 app will be 300mb + 50 * 1b mo. In practice, other things are to be considered: making database requests can be computationally expensive, and might need some network adjustments, but you get the idea.

How do one chose between database backend? Well, first of all you need to see what is available in the environment the application will be deployed: maybe the company you are building the application for already has database servers deployed. If ever you are free to choose any database as a backend, your choice should be driven by what kind of operations you want to make on these databases. For example, SQL databases are designed to store tabular data, and they tend to be very fast when it comes to reading data: so if you have one or more large data.frames you want to use inside your application, and with no specific update of these data, an SQL backend can be the perfect choice. On the other hand, a NoSQL database like MongoDB will be faster when it comes to doing write operation, and can store any kind of objects: for example, {hexmake} can use a MongoDB backend to store RDS files. But that comes with a price: read are a little bit slower, and you might have to work a little bit more on handling the JSON results that come out of MongoDB. Another example of an app that uses on an external database is {databasedemo}, available at engineering-shiny.org/databasedemo/. Feel free to follow this link for more information about this application!

Covering all the available type of databases and the packages associated with each is a very, very large topic: there are dozens of database systems, and as many (if not more) packages to interact with them. For a more extensive coverage of using databases in R, please follow these resources:

  • Databases using R, the official RStudio documentation around databases and R

  • colinfay/r-db, a docker image, with an companion guide, that bundles the toolchain for a lot of database systems for R

  • CRAN Task View: Databases with R: the official task view from CRAN with a series of packages for database manipulation

15.3.4 Data Source Checklist

How to choose between these three methodologies:

Choice Update Size
Package data Never to very rare Low to medium
Reading files Uploaded by Users Preferably low
External DataBase Never to Streaming Low to Big

References

Chang, Winston, Joe Cheng, JJ Allaire, Yihui Xie, and Jonathan McPherson. 2020. Shiny: Web Application Framework for R. http://shiny.rstudio.com.

Dowle, Matt, and Arun Srinivasan. 2020. Data.table: Extension of ‘Data.frame‘.

Eddelbuettel, Dirk, and Brendan Knapp. 2020. RcppSimdJson: Rcpp Bindings for the Simdjson Header-Only Library for Json Parsing.

Fay, Colin. 2020e. Gargoyle: An Event-Based Framework for Shiny.

Fay, Colin. 2020g. Hexmake: Hex Stickers Maker. https://github.com/colinfay/hexmake.

Fay, Colin. 2020l. Tidytuesday201942: A Golem App for Tidy Tuesday. http://www.github.com/ColinFay/tidytuesday201942.

Fay, Colin, Vincent Guyader, Sébastien Rochette, and Cervan Girard. 2020. Golem: A Framework for Robust Shiny Applications. https://github.com/ThinkR-open/golem.

Hester, Jim, and Hadley Wickham. 2020b. Vroom: Read and Write Rectangular Text Data Quickly.

Ooms, Jeroen. 2014. “The Jsonlite Package: A Practical and Consistent Mapping Between Json Data and R Objects.” arXiv:1403.2805 [stat.CO]. https://arxiv.org/abs/1403.2805.

R Core Team. 2020. Foreign: Read Data Stored by Minitab, S, Sas, Spss, Stata, Systat, Weka, dBase, ... https://svn.r-project.org/R-packages/trunk/foreign.

Sidi, Jonathan, and Kirill Müller. 2019. Whereami: Reliably Return the Source and Call Location of a Command. https://github.com/yonicd/whereami.

Wickham, Hadley, and Jennifer Bryan. 2019. Readxl: Read Excel Files.

Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2020. Dplyr: A Grammar of Data Manipulation.

Wickham, Hadley, and Evan Miller. 2020. Haven: Import and Export Spss, Stata and Sas Files.

Wickham, Hadley, and Edgar Ruiz. 2020. Dbplyr: A Dplyr Back End for Databases.


  1. Of course it’s an over-simplification: the reactive context will not be invalidated in all of these contexts, the idea is to illustrate how observe() can lead to invalidation point that are spread all across the code bloc.↩︎


ThinkR Website