Chapter 17 Optimizing Shiny Code

17.1 Optimizing R code

In its core, Shiny runs R code on the server side. So to be efficient, the R code computing your values and returning results also has to be optimized.

Optimizing R code is such a broad topic that it would be possible to write a full book about it, and in fact a lot of books about R already cover this topic. Instead of re-writing these books, we will try to point to some crucial resources you can refer to if you want get started optimizing your R code.

  • Efficient R programming (Gillespie and Lovelace 2017), has a series of methods you can quickly put into practice for more efficient R code.

  • Advanced R (Wickham 2019b) has a chapter about optimizing R code (number 24). In the rest of this chapter, we will be focusing on how to optimize Shiny specifically.

17.2 Caching elements

17.2.1 What is caching?

Caching is the process of storing resources intensive results so that when they are needed again, your program can reuse the result another time without having to redo the computation again.

How does it work? Let’s make a brief parallel with the human brain, and imagine that you know that you will need to use a phone number many time in the day, and for the purpose of this thought experiment you are completely unable to remember it51 . What are you going to do? There are two solutions here: either you look in the phone book or in your phone contact list every time you need it, which takes a couple of seconds every time, or you use a post-it that you put on your computer screen with the number of it, so that you have direct access to it when you need it. It takes a couple of seconds the first time you look for the number, but it is almost instantaneous the next times you need it.

This is what caching do: keep the result of computation so when they are needed in the very same context, they are quickly accessible. The downside being that you only have limited space on your screen: when your screen is covered by sticky notes, you can not store any more notes52 .

In the context of an interactive application in a framework like Shiny, it makes much sense to cache data structures: users tend to repeat what they do, or go back and forth between parameters. For example, if you have a graph that takes 2 seconds to render (which is quite common in Shiny, notably when relying on {ggplot2} (Wickham, Chang, et al. 2020)), you do not want these 2 seconds to be repeated over and over again when users switch from one parameter to another and back to the first, as the two graphs will be the same for the same parameter. Same goes for queries to a database: if a query is done with the same parameters, and you know that they will return the same result, there is no need to ask the database again and again—ask the cache to retrieve the data.

17.2.2 Native Caching in R

At least two packages in R implement caching of functions (also called memoization): {R.cache} (Bengtsson 2019), and {memoise} (Wickham, Hester, et al. 2017). They both more or less work the same way: you will call a memoization function on another function, and cache is created for this function output, based on the arguments value. Then every time you call this function again with the same parameters, the cache is returned instead of computing the function another time. So for example, if computing your data once takes 5 seconds with the parameter n = 50, the next time you will be calling this function with n = 50, instead of recomputing, R will go and fetch the value stored in cache.

Here is a simple example with {memoise}:

library(memoise)
library(tictoc)
fct <- function(sleep = 1){
  Sys.sleep(sleep)
  return(Sys.time())
}
mfct <- memoise(fct)
tic()
mfct(2)
[1] "2020-07-07 08:00:17 UTC"
toc()
2.004 sec elapsed
tic()
mfct(2)
[1] "2020-07-07 08:00:17 UTC"
toc()
0.026 sec elapsed

Let’s try with another example that might look more like what we can find in a Shiny App: connecting to a database, using the {DBI} (R Special Interest Group on Databases (R-SIG-DB), Wickham, and Müller 2019) and {RSQLite} (Müller et al. 2020) packages

con <- DBI::dbConnect(
  RSQLite::SQLite(), 
  dbname = ":memory:"
)

# Writing a large dataset to the db
DBI::dbWriteTable(
  con, 
  "diams", 
  dplyr::bind_rows(
    purrr::rerun(10, ggplot2::diamonds)
  )
)

# Do a query to the SQL db
fct_sql <- function(SQL, con){
  DBI::dbGetQuery(
    con, SQL
  ) 
}
mfct <- memoise(fct_sql)
tic()
res_a <- mfct("SELECT * FROM diams WHERE cut = 'Ideal'", con)
toc()
0.459 sec elapsed
tic()
res_b <- mfct("SELECT * FROM diams WHERE cut = 'Ideal'", con)
toc()
0.026 sec elapsed
all.equal(res_a, res_b)
[1] TRUE
tic()
res_c <- mfct("SELECT * FROM diams WHERE cut = 'Good'", con)
toc()
0.129 sec elapsed
setequal(res_a, res_c)
[1] FALSE

Note that you can change where the cache is stored by {memoise}. Here, we will save it in a temp directory (but do not do this in production).

tpd <- fs::path(paste(sample(letters, 10), collapse = ""))
tpd <- fs::dir_create(tpd)
dfs <- cache_filesystem(tpd)
mfct <- memoise(fct_sql, cache = dfs)
res_a <- mfct("SELECT * FROM diams WHERE cut = 'Ideal'", con)
res_b <- mfct("SELECT * FROM diams WHERE cut = 'Good'", con)
fs::dir_tree(tpd)
btljraiodv
├── 4dfb1b2d292b3622
└── 90d60b159380b430

As you can see, we now have two cache objects inside the directory we have specified as a cache_filesystem.

17.2.3 Caching Shiny

At the time of writing this page (April 2020), {shiny} (Chang et al. 2020) has one caching function: renderCachedPlot(). This function behaves more or less like the renderPlot() function, except that it is tailored for caching. The extra arguments you will find are cacheKeyExpr and sizePolicy: the former is the list of inputs and values that allow to cache the plot—every time these values and inputs are the same, they produce the same graph, so {shiny} will be fetching inside the cache instead of computing the value another time. sizePolicy is a function that returns a width and an height, and which are used to round the plot dimension in pixels, so that not every pixel combination are generated in the cache.

The good news is that converting existing renderPlot() functions to renderCachedPlot() is pretty straightforward in most cases: take your current renderPlot(), and add the cache keys53 .

Here is an example:

library(shiny)
ui <- function(request){
  tagList(
    selectInput("tbl", "Table", c("iris", "mtcars", "airquality")),
    plotOutput("plot")
  )
}

server <- function(
  input, 
  output, 
  session
){
  
  output$plot <- renderCachedPlot({
    plot(
      get(input$tbl)
    )
  }, cacheKeyExpr = {
    input$tbl
  })
  
}

shinyApp(ui, server)

If you try this app, the first rendering of the three plots will take a little bit of time, but every subsequent rendering of the plot is almost instantaneous.

And if we apply what we have just seen with {memoise}:

con <- DBI::dbConnect(
  RSQLite::SQLite(), 
  dbname = ":memory:"
)
DBI::dbWriteTable(
  con, 
  "diams", 
  dplyr::bind_rows(
    purrr::rerun(100, ggplot2::diamonds)
  )
)

fct_sql <- function(cut, con){
  # NEVER EVER SPRINTF AN SQL CODE LIKE THAT
  # IT'S SENSITIVE TO SQL INJECTIONS, WE'RE
  # DOING IT FOR THE EXAMPLE
  DBI::dbGetQuery(
    con, sprintf(
      "SELECT * FROM diams WHERE cut = '%s'", 
      cut
    )
  )  %>% head()
}
db <- cache_filesystem("cache/")
fct_sql <- memoise(fct_sql, cache = db)
ui <- function(request){
  tagList(
    selectInput("cut", "cut", unique(ggplot2::diamonds$cut)),
    tableOutput("tbl")
  )
}

server <- function(
  input, 
  output, 
  session
){
  
  output$tbl <- renderTable({
    fct_sql(input$cut, con)
  })
  
}

shinyApp(ui, server)

You will see that the first time you run this piece of code, it will take a couple of seconds to render the table for a new input$cut value. But if you re-select this input a second time, the output will show instantaneously.

Caching is a nice way to make your app faster: even more if you expect your output to be stable over time: if the plot created by a series of inputs stays the same all along your app lifecycle, it is worth thinking about implementing an on-disk caching. At the time of writing these lines (April 2020), you can also use remote caching, in the form of Amazon S3 storage or with Google Cloud Storage. To do that, you will need the development version of {memoise} (version 1.1.0).

If your application needs “fresh” data every time it is used, for example because data in the SQL database are updated every hour, cache will not be of much help here, on the contrary: the same inputs on the function will render different output depending on when they are called.

One other thing to remember is that, just like our computer screen from our phone number example from before, you do not have unlimited space when it come to storing cache: storing a large amount of cache will take space on your disk.

For example, from our stored cache from before:

fs::dir_info(tpd)[, "size", drop  = FALSE]
# A tibble: 2 x 1
         size
  <fs::bytes>
1       1.86M
2     462.31K

Managing cache at a system level is out of scope for this book, but note that the most commonly accepted rule for deleting cache is called LRU, for Least Recently Used. The underlying principle of this approach is that users tend to need what they have needed recently: hence the more a piece of data has been used recently, the more likely it is that it will be needed soon.

And this can be retrieved with:

fs::dir_info(tpd)[, "access_time", drop  = FALSE]
# A tibble: 2 x 1
  access_time        
  <dttm>             
1 2020-07-07 08:00:28
2 2020-07-07 08:00:22

Hence, when using cache, it might be interesting to periodically removed the oldest used cache, so that you can regain some space on the server running the application.

17.3 Asynchronous in Shiny

One of the drawbacks of Shiny is that as it is running on top of R, it is single threaded: meaning that each computation is run in sequence, one after the other. Well, at least natively, as methods have emerged to run pieces of code in parallel.

17.3.1 How to

To launch code blocks in parallel, we will use a combination of two packages, {future} (Bengtsson 2020) and {promises} (Cheng 2020), and a reactiveValue(). {future} is an R package which main purpose is to allow users to send code to be run elsewhere, i.e in another session, thread, or even on another machine. {promises}, on the other hand, is a package providing structure for handling asynchronous programming in R.54

17.3.1.1 Asynchronous for Cross-sessions Availability

The first type of asynchronous programming in Shiny is the one that allows non-blocking programming at a cross-session context. In other words, it is a programming method which is useful in the context of running one Shiny session which is accessed by multiple users. Natively, in Shiny, if user1 comes and launches a 15 second computation, then user2 has to wait for this computation to finish, before launching their own 15 second computation, and user3 has to wait the 15 seconds of user1 plus the 15 seconds for user, etc.

With {future} and {promises}, each long computation is sent to be run somewhere else, so when user1 launches their 15 second computation, they are not blocking the R process for user2 and user3.

How does it work55 ? {promises} comes with two operators which will be useful in our case, %...>% and %...!%: the first being “what happens when the future() is solved?” (i.e. when the computation from the future() is completed), and the second is “what happens if the future() fails?” (i.e. what to do when the future() returns an error).

Here is an example of using this skeleton:

library(future)
library(promises)
# We're opening several R session (future specific)
plan(multisession) 
future({
  Sys.sleep(3)
  return(rnorm(5))
}) %...>% (
  function(result){
    print(result)
  }
) %...!% (
  function(error){
    stop(error)
  }
)

If you run this in your console, you will see that you have access to the R console directly after launching the code. And a couple of seconds later (a little bit more than 3), the result of the rnorm(5) will be printed to the console.

Note that you can also write one-line function with . as a parameter, instead of building the full anonymous function (we will use this notation in the rest of the chapter):

library(future)
library(promises)
plan(multisession)
future({
  Sys.sleep(3)
  return(rnorm(5))
}) %...>% 
  print(.) %...!% 
  stop(.)

Let’s port this to Shiny:

library(shiny)
ui <- function(request){
  tagList(
    verbatimTextOutput("pr")
  )
}

server <- function(
  input, 
  output, 
  session
){
  output$pr <- renderPrint({
    future({
      Sys.sleep(3)
      return(rnorm(5))
    }) %...>% 
      print(.) %...!% 
      stop(.)
  })
}

shinyApp(ui, server)

If you have run this, that does not seem like a revolution: but trust us, the Sys.sleep() is not blocking as it allows other users to launch the same computation at the same moment.

17.3.1.2 Inner-session Asynchronousity

In the previous section we have implemented cross-session asynchronousity, meaning that the code is non-blocking, but when two or more users access the same app: the code is still blocking at an inner-session level.

Let’s have a look at this code:

library(shiny)
ui <- function(request){
  tagList(
    verbatimTextOutput("pr"), 
    plotOutput("plot")
  )
}

server <- function(
  input, 
  output, 
  session
){
  output$pr <- renderPrint({
    future({
      Sys.sleep(3)
      return(rnorm(5))
    }) %...>% 
      print(.) %...!% 
      stop(.)
  })
  
  output$plot <- renderPlot({
    plot(iris)
  })
}

shinyApp(ui, server)

Here, you would expect the plot to be available before the rnorm(), but it is not: {promises} is still blocking at an inner-session level, so elements are still rendered sequentially. To bypass that, we will use a reactiveValue() structure.

library(shiny)
library(promises)
library(future)
plan(multisession)

ui <- function(request){
  tagList(
    verbatimTextOutput("pr"), 
    plotOutput("plot")
  )
}

server <- function(
  input, 
  output, 
  session
) {
  
  rv <- reactiveValues(
    res = NULL
  )
  
  future({
    Sys.sleep(5)
    rnorm(5)
  }) %...>%
    (function(e){
      rv$res <- e
    }) %...!%
    (function(e){
      rv$res <- NULL
      warning(e)
    })
  
  output$pr <- renderPrint({
    req(rv$res)
  })
  
  output$plot <- renderPlot({
    plot(iris)
  })
}

shinyApp(ui, server)

Let’s detail this code step by step:

  • rv <- reactiveValues creates a reactiveValue() that will contain NULL, and which will serve the content of renderPrint() when the future() is resolved. It is initiated as NULL so that the renderPrint() is silent at launch.

  • %...>% rv() %...!% is the {promises} structure we have seen before.

  • %...!% (function(e){ rv$res <- NULL ; warning(e) }) is what happens when the future({}) fails: we are setting the rv$res value back to NULL so that the renderPrint() does not fails and print an error in case of failure.

17.3.1.3 Potential Pitfalls of Asynchronous Shiny

There is one thing to be aware of if you plan on using this async methodology: that you are not in a sequential context anymore. What that implies is that the first future({}) you will send is not necessary the first you will get back. For example, if you send SQL requests to be run asynchronically and that each call takes between 1 an 10 seconds to return, there is a chance that the first request to return will be the last one you have sent.

To handle that, we can adopt two different strategies, depending on what we need:

  • We need only the last expression sent. In other words, if we send three expressions to be evaluated somewhere, we only need to get back the last one.
    To handle that, the best way is to have an id that is also sent to the future, and when the future comes back, we check that this id is the one we are expecting. If it is, we update the reactiveValues(). If it is not, we ignore it.
library(shiny)
library(promises)
library(future)
plan(multisession)


ui <- function(request){
  tagList(
    actionButton("go", "go"),
    verbatimTextOutput("pr"), 
    plotOutput("plot")
  )
}

server <- function(
  input, 
  output, 
  session
) {
  
  rv <- reactiveValues(
    res = NULL, 
    last_id = 0
  )
  
  observeEvent( input$go , {
    rv$last_id <- rv$last_id + 1
    last_id <- rv$last_id
    
    future({
      if (last_id %% 2 == 0){
        Sys.sleep(3)
      }
      list(
        id = last_id,
        res = rnorm(5)
      )
    }) %...>%
      (function(e){
        cli::cat_rule(
          sprintf("Back from %s", e$id)
        )
        if (e$id == rv$last_id){
          rv$res <- e
        }
      }) %...!%
      (function(e){
        rv$res <- NULL
        warning(e)
      })
    cli::cat_rule(
      sprintf("%s sent", rv$last_id)
    )
  })
  
  output$pr <- renderPrint({
    req(rv$res)
  })
  
  output$plot <- renderPlot({
    plot(iris)
  })
}

shinyApp(ui, server)
  • We need to treat the outputs in the order they are received. In that case, instead of waiting for the very last input, you will need to build a structure that will receive the output, check if this output is the “next in line”, store it if it is not, or return it if it is and see if there is another output in the queue. This type of implementation is a little bit more complex so we will not detail it all inside this chapter, but here is a small implementation using {liteq} (Csárdi 2019a).
library(promises)
library(future)
plan(multisession)

library(liteq)

Attaching package: 'liteq'
The following object is masked from 'package:purrr':

    is_empty
db <- tempfile()
q <- ensure_queue("jobs", db = db)
for (i in 1:5){
  future({
    Sys.sleep(
      sample(1:5, 1)
    )
    return(rnorm(5))
  }) %...>% 
    (function(res){
      publish(
        q, 
        title = as.character(i), 
        message = paste(
          res, 
          collapse = ","
        )
      )
    }) %...!% 
    stop(.)
}

References

Bengtsson, Henrik. 2019. R.cache: Fast and Light-Weight Caching (Memoization) of Objects and Results to Speed up Computations. https://CRAN.R-project.org/package=R.cache.

Bengtsson, Henrik. 2020. Future: Unified Parallel and Distributed Processing in R for Everyone. https://CRAN.R-project.org/package=future.

Chang, Winston, Joe Cheng, JJ Allaire, Yihui Xie, and Jonathan McPherson. 2020. Shiny: Web Application Framework for R. https://CRAN.R-project.org/package=shiny.

Cheng, Joe. 2020. Promises: Abstractions for Promise-Based Asynchronous Programming. https://CRAN.R-project.org/package=promises.

Csárdi, Gábor. 2019a. Liteq: Lightweight Portable Message Queue Using ’Sqlite’. https://CRAN.R-project.org/package=liteq.

Gillespie, Colin, and Robin Lovelace. 2017. Efficient R Programming. O’Reilly Media, Inc, USA.

Müller, Kirill, Hadley Wickham, David A. James, and Seth Falcon. 2020. RSQLite: ’SQLite’ Interface for R. https://CRAN.R-project.org/package=RSQLite.

R Special Interest Group on Databases (R-SIG-DB), Hadley Wickham, and Kirill Müller. 2019. DBI: R Database Interface. https://CRAN.R-project.org/package=DBI.

Wickham, Hadley. 2019b. Advanced R, Second Edition. Chapman; Hall/CRC.

Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, and Dewey Dunnington. 2020. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.

Wickham, Hadley, Jim Hester, Kirill Müller, and Daniel Cook. 2017. Memoise: Memoisation of Functions. https://CRAN.R-project.org/package=memoise.


  1. Anyway, now that we all have smartphone, who still remembers phone numbers?↩︎

  2. In that case, you can either hide pre-existing sticky notes, or buy a bigger screen. But we are not here to talk about cache management theory. If you are interesting in reading more about caching theory, we suggest the excellent Algorithms to Live By, by Brian Christian and Tom Griffiths (Christian and Griffiths 2016).↩︎

  3. In some cases you will have to configure the size policy, but in most cases the default values work just well.↩︎

  4. If you are familiar with promises in JavaScript, {promises} is an implementation of this structure into R.↩︎

  5. We’re providing a short introduction with key concepts, but for a more thorough introduction, please refer to the online documentation↩︎


ThinkR Website