Chapter 16 Common Application Caveats

16.1 Reactivity anti-patterns

16.1.1 Reactivity is awesome… until it is not

Let’s face it, reactivity is awesome… until it is not. Reactivity is a common source of confusion for beginners, and a common source of bugs and bottlenecks, even for seasoned Shiny developers. Most of the time, issues come from the fact that there is too much reactivity, i.e. we build apps where too much things happen, and some things are updated way more often than they should, and computations are performed when they should not.

Of course, it is a nice feature to make everything react instantly to changes, but when building larger apps it is easy to create monsters, i.e complicated, messy reactive graphs where everything is updated too much and too often. Or worse, we generate endless reactive loops, aka “the reactive inferno” where A invalidates B which invalidates C which invalidates A which invalidates B which invalidates C…

Let’s take a small example of a reactive inferno:

library(shiny)
ui <- function(){
  tagList(
    dateInput(
      "date", 
      "choose a date"
    ), 
    selectInput(
      "year", 
      "Choose a year", 
      choices = 2010:2030
    )
  )
}

server <- function(
  input, 
  output, 
  session
){
  observeEvent( input$date , {
    updateSelectInput(
      session, 
      "year", 
      selected = lubridate::year(input$date)
    )
  })
  
  observeEvent( input$year , {
    updateDateInput(
      session, 
      "date", 
      value = lubridate::as_date(
        sprintf("%s-01-01", input$year)
      )
    )
  })
  
}

shinyApp(ui, server)

Here, we want to handle something pretty common:

  • The user can pick a date and the year input is updated
  • And the other way round: when the year input changes, the date is updated too

But if you try to run this in your console, it will end as a reactive inferno: date update year that updates date that updates year…

And the more you work on your app, the more complex it gets, and the more you will be likely to end up in the reactive inferno. In this section, we will be speaking a little bit about reactivity and how to have more control on it, and about a way to share data across modules without relying on passing along reactive objects.

16.1.2 observe vs observeEvent

One of the most common feature of reactive inferno is the use of observe() in cases where you should use observeEvent. Spoiler: you should try to use observeEvent() as much as possible, and avoid observe()as much as possible.

At first, observe() seems easier to implement, and feels like a shortcut as you do not have to think about what to react to: everything gets updated without you thinking about it. But the truth is that this stairway does not lead to heaven.

Let’s stop and think about observe() for a minute. This function updates every time a reactive object it contains is invalidated. Yes, this works well if you have a small amount of reactive objects in the observer, but that gets tricky whenever you start adding things a long list of things inside your observe(), as you might be launching a computation 10 times if your reactive scope contains 10 reactive objects that are somehow invalidated in chain. And believe us, we have seen pieces of code where the observe() contains hundreds of lines of code, with reactives objects all over the place, with one observe() context being invalidated dozens of times when one input changes in the application.

For example, let’s start with that:

## DO NOT DO GLOBAL VARIABLES, IT'S JUST TO SIMPLIFY THE EXAMPLE
i <- 0
library(shiny)
library(cli)
ui <- function(request){
  tagList(
    textInput("txt", "txt")
  )
}

server <- function(input, output, session){
  observe({
    i <<- i + 1
    cat_rule(as.character(i))
    print(input$txt)
  })
}

shinyApp(ui, server)

Oh, and then, let’s add a small selectInput():

i <- 0
library(shiny)
library(cli)
ui <- function(request){
  tagList(
    textInput("txt", "txt"), 
    selectInput("tolower", "casse", c("lower", "upper"))
  )
}

server <- function(input, output, session){
  observe({
    i <<- i + 1
    cat_rule(as.character(i))
    if (input$tolower == "lower") {
      print(tolower(input$txt))
    } else  {
      print(tolower(input$txt))
    }
  })
}

shinyApp(ui, server)

And, as time goes by, we add another control flow to our observe():

i <- 0
library(shiny)
library(cli)
library(stringi)
ui <- function(request){
  tagList(
    textInput("txt", "txt"), 
    selectInput("tolower", "casse", c("lower", "upper")), 
    checkboxInput("rev", "reverse")
  )
}

server <- function(input, output, session){
  observe({
    i <<- i + 1
    cat_rule(as.character(i))
    if (input$rev){
      x <- stri_reverse(input$txt)
    } else {
      x <- input$txt
    }
    if (input$tolower == "lower"){
      print(tolower(x))
    } else {
      print(tolower(x))
    }
  })
}

shinyApp(ui, server)

And it would be nice to keep the selected values into a reactive list, so that we can reuse it elsewhere. And maybe you would like to add a checkbox so that the logs are printed to the console only if checked.

i <- 0
library(shiny)
library(cli)
library(stringi)
ui <- function(request){
  tagList(
    textInput("txt", "txt"), 
    selectInput("tolower", "casse", c("lower", "upper")), 
    checkboxInput("rev", "reverse")
  )
}

server <- function(input, output, session){
  r <- reactiveValues()
  observe({
    i <<- i + 1
    cat_rule(as.character(i))
    if (input$rev) {
      r$x <- stri_reverse(input$txt) 
    } else {
      r$x <- input$txt
    }
    if (input$tolower == "lower"){
      r$x <- tolower(r$x)
    } else {
      r$x <- toupper(r$x)
    }
  })
}

shinyApp(ui, server)

Ok, now can you tell how many potential invalidation points we have got here? Three: whenever input$txt, input$rev or input$tolower change. Of course, three is not that much, but you get the idea.

Let’s pause a minute and think about why we use observe() here. To update the values inside r$x, yes. But do we need to use observe() for, say, updating r$x under dozens of conditions, each time the user types a letter? Possibly not.

We generally want our observer to update its content under a small, controlled number of inputs, i.e. with a controlled number of invalidation points. And, what we often forget is that users do not type/select correctly on the first try. No, they usually try and miss, restart, change things, amplifying the reactivity “over-happening”.

Moreover, long observe() statements are hard to debug, and they make collaboration harder when the trigger to the observe logic can potentially live anywhere between line one and line 257 of your observe(). That’s why (well, in 99% of cases), it is safer to go with observeEvent, as it allows to see at a glance what are the condition under which the content is invalidated and re-evaluated. Then, if a reactive context is invalidated, you know why. For example, here is where the reactive invalidation can happen (lines with a *):

observe({
  i <<- i + 1
  cat_rule(as.character(i))
*  if (input$rev) {
*    r$x <- stri_reverse(input$txt) 
  } else {
*    r$x <- input$txt
  }
*  if (input$tolower == "lower"){
    r$x <- tolower(r$x)
  } else {
    r$x <- toupper(r$x)
  }
})

Whereas in this code, it is easier to identify where the invalidation can happen:

observeEvent( c(
*  input$rev, 
*  input$txt
  ),{
  i <<- i + 1
  cat_rule(as.character(i))
  if (input$rev) {
    r$x <- stri_reverse(input$txt) 
  } else {
    r$x <- input$txt
  }
  if (input$tolower == "lower"){
    r$x <- tolower(r$x)
  } else {
    r$x <- toupper(r$x)
  }
})

16.1.3 Building triggers and watchers

To prevent this, one way to go is to create “flags” objects, which can be thought of as internal buttons to control what you want to invalidate: you create the button, set some places where you want these buttons to invalidate the context, and finally press these buttons.

These objects are launched with an init function, then these flags are triggered with trigger(), and wherever we want these flags to invalidate a reactive context, we watch() these flags.

The idea here is to get a full control over the reactive flow: we only invalidate contexts when we want, making the general flow of the app more predictable. These flags are available using the {gargoyle} (Fay 2020e) package, that can be installed from GitHub with:

remotes::install_github("ColinFay/gargoyle")
  • gargoyle::init("this") initiate a "this" flag: most of the time you will be generating them at the app_server() level

  • gargoyle::watch("this") sets the flag inside a reactive context, so that it will be invalidated every time you trigger("this") this flag.

  • gargoyle::trigger("this") triggers the flags

And, bonus, as these functions use the session object, they are available across all modules. That also means that you can easily trigger an event inside a module from another one.

This pattern is for example implemented into {hexmake} (Fay 2020g) (though not with {gargoyle}), where the rendering of the image on the right is fully controlled by the "render" flag. The idea here is to allow a complete control over when the image is recomputed: only when trigger("render") is called does the app regenerate the image, helping us lower the reactivity of the application. That might seems like a lot of extra work, but that is definitely worth considering on the long run, as it will help optimizing the rendering (fewer computation) and lowering the number of errors that can result from too much reactivity inside an application.

16.1.4 Using R6 as a data storage

One pattern we have also been playing with is storing the app business logic inside one or more R6 objects. Why would we want to do that?

16.1.4.1 Sharing data across module

Sharing an R6 object makes it simpler to create data that are shared across modules, but without the complexity generated by reactive objects, and the instability of using global variables.

So basically, the idea is to hold the whole logic of your data reading / cleaning / processing / outputting inside an R6 class. An object of this class is then initiated at the top level of your application, and you can pass this object to the sub-modules. Of course, this makes even more sense if you are combining it with the trigger/watch pattern from before!

library(shiny)
nameui <- function(id){
  ns <- NS(id)
  tagList(
    # [...]
  )
}

name <- function(input, output, session, obj){
  ns <- session$ns
  output$that <- renderThis({
    obj$compute()
    trigger("plot")
  })
}

name2ui <- function(id){
  ns <- NS(id)
  tagList(
    # [...]
  )
}

name2 <- function(input, output, session){
  ns <- session$ns
  output$plot <- renderThis({
    watch("plot")
    obj$plot()
  })
}


ui <- function(request){
  tagList(
    nameui("nameui"),
    name2ui("name2ui")
  )
}

server <- function(
  input, 
  output, 
  session
){
  obj <- MyDataProcess$new()
  callModule(name, "nameui", obj)
  callModule(name2, "name2ui", obj)
}

shinyApp(ui, server)

16.1.4.2 Get sure it is tested

During the process of building a robust Shiny app, we strongly suggest that you test as many things as you can. This is where using an R6 for your business logic of your app makes sense: this allows you to build the whole testing of your application data logic outside of any reactive context: you simply build unit tests just as any other function.

16.2 R does too much

16.2.1 Rendering UI from server side

There are many reasons we would want to change things on the UI based on what happens in the server: changing the choices of a selectInput() based on the columns of a table which is uploaded by the user, showing and hiding pieces of the app according to an environment variable, allow the user to create an indeterminate amount of inputs, etc.

Chances are that to do that, you have been using the uiOutput() & renderUI() functions from {shiny} (Chang et al. 2020). Even if convenient, and the functions of choice in some specific context, this couple makes R do a little bit too much: you are making R regenerate the whole UI component instead of changing only what you need. Plus, you will create a code that is harder to reason about, as we are used to have the UI parts in the UI functions (but that is not related to performance).

Here are three strategies to code without uiOutput() & renderUI().

16.2.1.1 Implement UI events in JavaScript

Mixing languages is better than writing everything in one, if and only if using only that one is likely to overcomplicate the program.

We will see in the last chapter of this book how you can integrate JS inside your Shiny app, and how even basic functions can be useful for making your app server lighter. For example, compare:

library(shiny)
ui <- function(){
  tagList(
    actionButton(
      "change", 
      "show/hide graph", 
      onclick = "$('#plot').toggle()"
    ), 
    plotOutput("plot")
  )
}

server <- function(
  input, 
  output, 
  session
){
  output$plot <- renderPlot({
    cli::cat_rule("Rendering plot")
    plot(iris)
  })
}

shinyApp(ui, server) 

to

library(shiny)
ui <- function(){
  tagList(
    actionButton("change", "show/hide graph"), 
    plotOutput("plot")
  )
}

server <- function(
  input, 
  output, 
  session
){
  
  r <- reactiveValues(plot = iris)
  
  observeEvent( input$change , {
    if (input$change %% 2 == 0){
      r$plot <- iris
    } else {
      r$plot <- NULL
    }
  })
  
  output$plot <- renderPlot({
    cli::cat_rule("Rendering plot")
    req(r$plot)
    plot(r$plot)
  })
  
}

shinyApp(ui, server)

The result is the same, but the first version is shorter and easier to reason about: we have one button, and the behavior of the button is contained into itself. And, on top of being harder to maintain, the second solution redraws the plot every time the reactiveValues is updated, making R compute way more than it should, whereas with the JavaScript only solution, the plot is not recomputed every time you need to show it.

At a local level, the improvements described in this section will not make your application way faster: for example, rendering UI elements (let’s says rendering a simple title) will not be computationally heavy. But at a global level, lighter UI computation from the server side helps the general rendering of the app: let’s say you have an output that takes 3 seconds to run, then if the whole UI + output is to be rendered on the server side, the whole UI stays blank.

Compare:

library(shiny)
ui <- function(){
  tagList(
    uiOutput("caption")
  )
}

server <- function(
  input, 
  output, 
  session
){
  output$caption <- renderUI({
    Sys.sleep(3)
    tagList(
      h3("test"), 
      p("caption")
    )
    
  })
}

shinyApp(ui, server)

to

library(shiny)
ui <- function(){
  tagList(
    h3("test"), 
    textOutput("caption")
  )
}

server <- function(
  input, 
  output, 
  session
){
  output$caption <- renderText({
    Sys.sleep(3)
    "Lorem ipsum dolor"
  })
}

shinyApp(ui, server)

In the first example, the UI will wait for the server to have rendered, while in the second we will first see the title, then the rendered text after a few seconds. That approach makes the user experience better: they know that something is happening, while completely blank page is confusing.

Also, R being single threaded, manipulating DOM elements from the server side make R busy doing these DOM manipulation while it could be computing something else. And let’s imagine it takes a quarter of a second to render the DOM element, that is a full second for rendering four of them, while R should be busy doing something else!

16.2.1.2 update* inputs

Almost every Shiny inputs, even the custom ones from packages, come with an update_ function that allows to change the input values from the server side, instead of recreating the UI entirely. For example, here is a way to update the content of a selectInput from the server side:

library(shiny)
ui <- function(){
  tagList(
    selectInput("species", "Species", choices = NULL), 
    actionButton("update", "Update")
  )
}

server <- function(
  input, 
  output, 
  session
){
  observeEvent( input$update , {
    spc <- unique(iris$Species)
    updateSelectInput(
      session, 
      "species", 
      choices = spc, 
      selected = spc[1]
    )
  })
  
}

shinyApp(ui, server)

This switch to updateSelectInput makes the code easier to reason about as the selectInput is where it should be: inside the UI, instead of another pattern where we would use renderUI() and uiOutput(). Plus, with the update method, we are only changing what is needed, not re-generating the whole input.

16.2.1.3 insertUI and removeUI

Another way to dynamically change what is in the UI is with insertUI() and removeUI(). It is more global than the solution we have seen before with setting the reactiveValue to NULL or to a value, as it allows to target a larger UI element: we can insert or remove the whole input, instead of having the DOM element inserted but empty. This method allows to have a lighter DOM: <div> which are not rendered are not generated empty, they are simply not there.

Two things to note concerning this method, though:

  • Removing an element from the app will not delete the input from the input list. In other word, if you have selectInput("x", "x"), and that you remove this input using removeUI(), you will still have input$x in the server.

For example, in the following example, the input$val value will not be removed once you have called removeUI(selector = "#val").

library(shiny)
ui <- function(){
  tagList(
    textInput("val", "Value", "place"),
    actionButton("rm", "Remove UI")
  )
}

server <- function(
  input, 
  output, 
  session
){
  
  observeEvent( input$rm , {
    removeUI(selector = "#val")
  })
  
  observe({
    invalidateLater(1000)
    print(input$val)
  })

}

shinyApp(ui, server)
  • Both these functions take a jQuery selector to select the element in the UI. We will introduce these selectors in the last chapter of this book.

16.2.2 Too Much Data in Memory

If you are building a Shiny application, there is a great chance you are building it to analyze data. If you are dealing with large datasets, you should consider deporting the data handling and computation to an external database system: for example to an SQL database. Why? Because these system has been created to handle and manipulate data on disk: in other words it will allow you to perform operations on your data without having to clutter R memory with a large dataset.

For example, if you have a selectInput() that is used to perform a filter on a dataset, you can do that filter straight inside SQL, instead of bringing all the data to R and then do the filter. That is even more necessary if you are building the app for a large number of users: for example if one Shiny session takes up to 300MB, multiply that by the number of users that will need one session, and you will have a rough estimate of how much RAM you will need. On the contrary, if you lighten the data manipulation so that it is done by the back-end, you will have, let’s say, one database with 300 mo of data, then the database size will remain (more or less constant), and the only RAM used by Shiny will be the data manipulation, not the data storage. That’s even more true now that almost any operation you can do today in {dplyr} (Wickham, François, et al. 2020) would be doable with an SQL backend, and that is the purpose of the {dbplyr} (Wickham and Ruiz 2020) package: translates {dplyr} code into SQL.

If using a database as a back-end seems a little bit far-fetched right now, that is how it is done in most programming languages: if you are building a web app with NodeJS or Python for example, and need to interact with data, nothing will be stored in RAM: you will be relying on an external database to store your data. Then your application will be used to make queries to this database backend.

16.3 Reading data

Shiny applications are a tool of choice when it comes to analyzing data. But that also means that these data have to be imported/read at some point in time, and reading data can be time consuming. How can we optimize that? In this section, we will take a look at three strategies: including datasets inside your application, using R packages for fast data reading, and when and why you should move to an external database system.

16.3.1 Including Data in your Application

If you are building your application using the {golem} (Guyader et al. 2020) framework, you are building your application as a package. R packages provide a way to include internal datasets, that can then be used as objects inside your app. This is the solution you should go for if your data are never to rarely updated: the datasets are created during package development, then included inside the build of your package. The plus side of this approach is that it makes the data fast to read, as they are serialized as R native objects.

To include data inside your application, you can use the usethis::use_data_raw( name = "my_dataset", open = FALSE ) command which is inside the 02_dev.R script inside the dev/ folder of your source application (if you are building the app with {golem}). This will create a folder called data-raw at the root of your application folder, with a script to prepare your dataset. Here, you can read the data, modify it if necessary, and then save it with usethis::use_data(my_dataset). Once this is done, you will have access to the my_dataset object inside your application.

This is for example what is done in the {tidytuesday201942} (Fay 2020l) application, at data-raw/big_epa_cars.R: the csv is read there, and then used as an internal dataset inside the application.

16.3.2 Reading External Datasets

Other applications use data that are not available at build time: they are created to analyze data that are uploaded by users, or maybe they are fetch on an external service while using the app (for example by calling an API). When you are building an application for the “user data” use case, the first thing you will need is to provide users a way to upload their dataset: shiny::fileInput().

One crucial thing to keep in mind when it comes to using user-uploaded files is that you have to be (very) strict with the way you handle files:

  • Always specify what type of file you want: shiny::fileInput() has an accept parameter that allows you to set one or more MIME types or extension. When using this argument (for example with text/csv, .csv, or .xslx), the user will only be able to select a subset of files from their computer: the ones that matches the type.
  • Always perform checks once the file is uploaded, even more if it is tabular data: column type, naming, empty rows… The more you check the file for potential errors, the less your application is likely to fail to analyze this uploaded dataset.
  • If the data reading takes a while, do not forget to add a visual progression cue: be it a shiny::withProgress() or tools from the {waiter} package.

Why do we do that? Because whenever you offer a user the possibility to upload anything, you can be sure that at some point, they will upload a file that will make the app crash. By setting a specific MIME type and by doing a series of check once the file is uploaded, you will make your application more stable. Finally, having a visual cue that “something is happening” is very important for the user experience, as “something is happening” is better than not knowing what is happening, and it may also prevent the user from clicking again and again on the upload button.

Now we have our fileInput() set, how do we read these data as fast as possible? There are several options depending on the type of data you are reading. Here are some packages that can make the file reading faster:

  • For tabular, flat dataset (typically csv, tsv, or text), {vroom} (Hester and Wickham 2020b) can read data at a 1.40 GB/sec/sec speed. {data.table} (Dowle and Srinivasan 2019), and its fread() function, is also fast at reading delimited files.
  • For JSON files, {jsonlite} (Ooms 2014)
  • If you need to read Excel files inside your app, {readxl} (Wickham and Bryan 2019) offers a binding to the RapidXML C++ library, reading Excel files fast.

16.3.3 Using External DataBases

Another type of data analyzed in a shiny application is one contained inside an external database. Database are wildly used in the data science world, and in the software engineering as a whole. Being a widely used source of data, databases come with API and drivers that help retrieving and transferring data: be it SQL, NoSQL, or even graph.

Using a database is one of the solutions for making your app lighter, and more efficient in the long run, notably if you need to scale your app to thousands of visitors. Indeed, if you plan on having your app scale to numerous people, that will mean that a lot of R processes will be triggered. And if your data is contained in your app, this will mean that each R process will take a significant amount of RAM if the dataset is large. For example, if your dataset alone takes ~300 mb of RAM, that means that if you want to launch the app 10 times, you will need ~3gb of RAM. On the other hand, if you decide to switch these data to an external database, it will lower the global RAM need: the DB will takes these 300mb of data, and each shiny application will make request to the database. So, schematically, if the database needs 300mb, and one shiny app 50mb, then 10 app will be 300mb + 50 * 1b mo. Of course, it is not as simplistic as that, and other things are to be considered: making database requests can be computationally expensive, and might need some network adjustments, but you get the idea.

How do one chose between database backend? Well, first of all you need to see what is available in the environment the application will be deployed: maybe the company you are building the application for already has database servers deployed. If ever you are free to choose any database as a backend, your choice should be driven by what kind of operations you want to make on these databases. For example, SQL databases are designed to store tabular data, and they tend to be very fast when it comes to reading data: so if you have one or more large data.frames you want to use inside your application, and with no specific update of these data, an SQL backend can be the perfect choice. On the other hand, a NoSQL database like MongoDB will be faster when it comes to doing write operation, and can store any kind of objects: for example, {hexmake} can use a MongoDB backend to store RDS files. But that comes with a price: read are a little bit slower, and you might have to work a little bit more on handling the JSON results that come out of MongoDB.

Covering all the available type of databases and the packages associated with each is a very, very large topic: there are dozens of database systems, and as many (if not more) packages to interact with them. For a more extensive coverage of using databases in R, please follow these resources:

  • Databases using R, the official RStudio documentation around databases and R

  • colinfay/r-db, a docker image, with an companion guide, that bundles the toolchain for a lot of database systems for R

  • CRAN Task View: Databases with R: the official task view from CRAN with a series of packages for database manipulation

16.3.4 Reminder

How to choose between these three methodologies:

Choice Update Size
Package data Never to very rare Low to medium
Reading files Uploaded by Users Preferably low
External DataBase Never to Streaming Low to Big

References

Chang, Winston, Joe Cheng, JJ Allaire, Yihui Xie, and Jonathan McPherson. 2020. Shiny: Web Application Framework for R. https://CRAN.R-project.org/package=shiny.

Dowle, Matt, and Arun Srinivasan. 2019. Data.table: Extension of ‘Data.frame‘. https://CRAN.R-project.org/package=data.table.

Fay, Colin. 2020e. Gargoyle: An Event-Based Framework for ’Shiny’.

Fay, Colin. 2020g. Hexmake: Hex Stickers Maker. https://github.com/colinfay/hexmake.

Fay, Colin. 2020l. Tidytuesday201942: A Golem App for Tidy Tuesday. http://www.github.com/ColinFay/tidytuesday201942.

Guyader, Vincent, Colin Fay, Sébastien Rochette, and Cervan Girard. 2020. Golem: A Framework for Robust Shiny Applications. https://github.com/ThinkR-open/golem.

Hester, Jim, and Hadley Wickham. 2020b. Vroom: Read and Write Rectangular Text Data Quickly. https://CRAN.R-project.org/package=vroom.

Ooms, Jeroen. 2014. “The Jsonlite Package: A Practical and Consistent Mapping Between Json Data and R Objects.” arXiv:1403.2805 [stat.CO]. https://arxiv.org/abs/1403.2805.

Wickham, Hadley, and Jennifer Bryan. 2019. Readxl: Read Excel Files. https://CRAN.R-project.org/package=readxl.

Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2020. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.

Wickham, Hadley, and Edgar Ruiz. 2020. Dbplyr: A ’Dplyr’ Back End for Databases. https://CRAN.R-project.org/package=dbplyr.


ThinkR Website