Chapter 15 The Need for Optimization

Only once we have a solid characterization of the surface area we want to improve can we begin to identify the best way to improve it.

15.1 Build first, then optimize

15.1.1 Identifying bottlenecks

Refactoring existing code for speed sounds like an appealing activity for a lot of us: it is always satisfying to watch our functions get faster, or finding a more elegant way to solve a problem that also results in making your code a little bit faster. Or as Maude Lemaire simply puts in “Refactoring at Scale” (Lemaire 2020), “Refactoring can be a little bit like eating brownies: the first few bites are delicious, making it easy to get carried away and accidentally eat an entire dozen. When you’ve taken your last bite, a bit of regret and perhaps a twinge of nausea kick in.

But beware! As Donald Knuth puts it “Premature optimization is the root of all evil”. What does that means? That focusing on optimizing small portions of your app before making it work fully is the best way to lose time along the way, even more in the context of a production application, where there are deadlines and a limited amount of time to build the application.

Why? Here is the general idea: in the following schema below, you can make the circles travel to the bottleneck as fast as you want, the circles will still be slowed by the narrow bottleneck, hence you will just be losing time making the circles move faster, without actually gaining any time on the global performance. So focus on making the bottleneck larger, before focusing on making the circle travel fast. When? Once the application is ready: here in our example, we can only detect the bottleneck once the bottle is actually built, not while we are building the circle.

Schema of a bottleneck, Adapted from WikiMedia

FIGURE 15.1: Schema of a bottleneck, Adapted from WikiMedia

This bottleneck is the very thing you should be optimizing: having faster code anywhere else except this bottleneck will not make your app faster: you will just make your app reach the bottleneck faster, but there will still be this part of your app that slows everything down. But this is something you might only realize when the app is fully built: pieces might be fast apart, but slow when put together. It is also possible that the test dataset you have been using from the start works just fine, but when you try your app with a bigger, more realistic dataset, the application is actually way slower than it should be. And, maybe you have been using an example dataset so that you do not have to query the database every time you implement a new feature, but actually the SQL query to the database is very slow. This is something you will discover only when the application is fully functional, not when building the parts: and realizing that when you only have 5% of the allocated time for this project left on your calendar is not a good surprise.

Or to sum up:

Get your design right with an un-optimized, slow, memory-intensive implementation before you try to tune. Then, tune systematically, looking for the places where you can buy big performance wins with the smallest possible increases in local complexity.

15.1.2 Do you need faster functions?

Optimizing an app is a matter of trade-offs: of course, in a perfect world, every piece of the app would be tailored to be fast, easy to maintain, and elegant. But in the real world, you have deadlines, limited times and resources, and we are all but humans. That means that at the end of the day, your app will not be completely perfect: a software can always be made better. No piece of code has ever reached complete perfection.

Given that, do you want to spend 5 days out of the 30 you have planned optimizing a function so that it runs in a quarter of a second instead of half a second, then realize the critical bottleneck of your app is actually the SQL query and not the data manipulation? Of course a function running two times faster is a good thing, but think about it in context: for example, how many times is this function called ? We can safely bet that if your function is only called once, working on making it twice faster might not be the one function you would want to focus on (well, unless you have unlimited time to work on your project, and in that case lucky you, you can spend a massive amount of time building the perfect software). On the other hand, the function which is called thousands of time in your application might benefit from being optimized.

And all of this is basic maths. Let’s assume the following:

  • A current scenario takes 300 seconds to be accomplished on your application
  • One function A() takes 30 seconds, and it’s called once
  • One function B() takes 1 second, and it’s called 50 times

If you divide the execution time of A() by two, you would be performing a local optimization of 15 seconds, and a global optimization of 15 seconds. On the other hand, if you divide the execution time of B() by two, you would be performing a local optimization of 0.5 seconds, but a global optimization of 25 seconds.

Again, this kind of optimization is hard to detect until the app is functional. An optimization of 15 seconds is way greater that an optimization of 0.5 seconds. Yet you will only realize that once the application is up and running!

15.1.3 Don’t sacrifice readability

As said in the last section, every piece of code can be rewritten to be faster, either from R to R or using a lower level language: for example C or C++. You can also rebuild data manipulation code switching from a package to another, use complex data structures to optimizing memory usage, etc, etc. But that comes with a price: not keeping thing simple for the sake of local optimization makes maintenance harder, even more if you are using a lesser known language/package. Refactoring a piece of code is better done when you keep in mind that “the primary goal should be to produce human-friendly code, even at the cost of your original design. If the laser focus is on the solution rather than the process, there’s a greater chance your application will end up more more contrived and complicated than it was in the first place(Lemaire 2020).

For example, switching some portions of your code to C++ implies that you might be the only person being able to maintain that specific portion of code, or that your colleague taking over the project will have to spend hours learning the tools you have been building, or the language you have chosen to write your functions with.

Again, optimization is always a matter of trade-off: is the half-second local optimization worth the extra hours you will have to spend correcting bugs when the app will crash and when you will be the only one able to correct it? Also, are the extra hours/days spent rewriting a working code-base worth the speed gain of 0.5 seconds on one function?

For example, let’s compare both these implementations of the same function, one in R, and one in C++ via {Rcpp} (Eddelbuettel et al. 2020) Of course, the C++ function is faster than the R one—this is the very reason of using C++ with R.

double mean_cpp(NumericVector x) {
  int j;
  int size = x.size();
  double res = 0;
  for (j = 0; j < size; j++){
    res = res + x[j];
  return res / size;

benched <- bench::mark(
  cpp = mean_cpp(1:100000),
  native = mean(1:100000), 
  iterations = 1000
# A tibble: 2 x 6
  expression    min median `itr/sec` mem_alloc `gc/sec`
  <bch:expr> <bch:> <bch:>     <dbl> <bch:byt>    <dbl>
1 cpp         178µs  218µs     3332.     784KB     13.4
2 native      507µs  577µs     1640.        0B      0  

(Note: we will come back to bench::mark later)

Though, how much time gain is worth being sure you will get someone in your team to take over the maintenance if needed? In other words, given that (in our example), we are gaining around -3.585510^{-4} on the execution time of our function, is it worth switching to C++? Using external languages or complex data structures implies that from the start, you will need to think about who and how your code base will be maintained over the years. Chances are that if you plan on using a Shiny application during a span of several years, various R developers will be working on the project, and including C++ code inside your application means that these future developer will either be required to know C++, or they will not be able to maintain this piece of code.

So, to sum up, there are three ways to optimize your application & R code, and the bad news is that you can not optimize for all of them:

  • Optimizing for speed
  • Optimizing for memory
  • Optimizing for readability/maintainability

Leading a successful project means that you should, as much as possible, find the perfect balance between these three.

15.2 Tools for profiling

15.2.1 Profiling R code Identifying bottlenecks

The best way to profile R code is by using the {profvis} (Chang, Luraschi, and Mastny 2019) package46 , a package designed to evaluate how much time each part of a function call take. With {profvis}, you can spot the bottleneck of your function. Without an automated tool to do the profiling, the developers would have to profile by guessing, which will, most of the time, come with bad results:

One of the lessons that the original Unix programmers learned early is that intuition is a poor guide to where the bottlenecks are, even for one who knows the code in question intimately.

Instead of guessing, it is safe bet to go for a tool like {profvis}, which allows to have a detailed view of what takes a long time to run in your R code.

Using this package is quite straightforward: put the code you want to benchmark inside the profvis() function47 , wait for the code to run, and… that is it, you now have an analysis of your code running time.

Here is an example with 3 nested functions, top(), middle() and bottom(), where top() calls middle() which calls bottom():

top <- function(){
  # We use profvis::pause() because Sys.sleep() doesn't 
  # show in the flame graph
  lapply(1:10, function(x){
    x * 10

middle <- function(){
  1e4 * 9

bottom_a <- function(){
bottom_b <- function(){

What you see now is what is called a flame graph: it is a detailed timing of how your function has run, with a clear decomposition of the call stack. What you see on top window is the expression evaluated, and on the bottom a detail of the call stack, with what looks like a little bit like a Gantt diagram. This result reads as such: the wider the function call, the more time it has taken R to computer this piece of code. On the very bottom, the “top” function (i.e. the function which is directly called in the console), and the more you go up, the more you enter the nested function calls.

Here is how to read this graph:

  • On the x axis, the time spent computing the function. Our top() function being the only one executed, it takes the whole record time.

  • Then, the second level is the first level of what is called inside top(): first, the function pauses, then it does a series of call to FUN (which is the internal anonymous function from lapply()), then calls the middle() function, which spans from around 100 ms to the end of the call. Then, a detail of middle(), which calls bottom_a() and bottom_b(), which each pause() for a given amount of time.

{profvis} flame graph

FIGURE 15.2: {profvis} flame graph

If you click on the “Data” tab, you will also find another view of the flame graph, where you can read the hierarchy of calls and the time and memory spent on each function call:

{profvis} data tab

FIGURE 15.3: {profvis} data tab

If you are working on profiling the memory usage, you can also use the {profmem} (Bengtsson 2018) package which, instead of focusing on execution time, will record the memory usage of calls.

p <- profmem({
  x <- raw(1000)
  A <- matrix(rnorm(100), ncol = 10)
Rprofmem memory profiling of:
    x <- raw(1000)
    A <- matrix(rnorm(100), ncol = 10)

Memory allocations:
       what bytes               calls
1     alloc   264          <internal>
2     alloc   496          <internal>
3     alloc   496          <internal>
4     alloc  1072          <internal>
5     alloc  1048               raw()
6     alloc   280            matrix()
7     alloc   560            matrix()
8     alloc   560            matrix()
9     alloc  1072            matrix()
10    alloc   848 matrix() -> rnorm()
11    alloc  2552 matrix() -> rnorm()
12    alloc   848            matrix()
13    alloc   528          <internal>
14    alloc  1648          <internal>
15    alloc  1648          <internal>
16    alloc  1072          <internal>
17    alloc   256          <internal>
18    alloc   456          <internal>
19    alloc   216          <internal>
20    alloc   256          <internal>
total       16176                    

You can also get the total allocated memory with:

[1] 16176

And extract specific values based on the memory allocation:

p2 <- subset(p, bytes > 1000)
Rprofmem memory profiling of:
    x <- raw(1000)
    A <- matrix(rnorm(100), ncol = 10)

Memory allocations:
       what bytes               calls
4     alloc  1072          <internal>
5     alloc  1048               raw()
9     alloc  1072            matrix()
11    alloc  2552 matrix() -> rnorm()
14    alloc  1648          <internal>
15    alloc  1648          <internal>
16    alloc  1072          <internal>
total       10112                    

(Example extracted from {profmem} help page).

Here it is, now you have a tool to identify bottlenecks! Benchmarking R Code

Identifying bottlenecks is a start, but what to do now? In the next chapter about optimization, we will dive deeper into common strategies for optimizing R & Shiny code. But before that, remember this rule: never start optimizing if you can not benchmark this optimization. Why? Because developers are not perfect at identifying bottlenecks and estimating if something is faster or not, and some optimization methods might lead to slower code. Of course, most of the time they will not, but in some cases adopting optimization methods leads to writing slower code, because we have missed a bottleneck in our new code. And of course, without a clear documentation of what we are doing, we will be missing it, relying only on our intuition as an rough guess of speed gain.

In other words, if you want to be sure that you are actually optimizing, be sure that you have a basis to compare with.

How to do that? One thing that can be done is to keep an RMarkdown file with your starting point: use this notebook to keep track of what you are doing, by noting where you are starting from (i.e, what’s the original function you want to optimize), and compare it with the new one. By using an Rmd, you can document the strategies you have been using to optimize the code, e.g: “switched from for loop to vectorize function”, “changed from x to y”, etc. This will also be helpful for the future: either for you in other projects (you can get back to this document), or for other developers, as it will explain why specific decisions have been made.

To do the timing computation, you can use the {bench} (Hester 2020a) package, which compares the execution time (and other metrics) of two functions. This function takes a series of named elements, each containing an R expression that will be timed. Note that by default, the mark() function compares the output of each function,

Once the timing is done, you will get a data.frame with various metrics about the benchmark.

x <- function(size){
  res <- numeric(size)
  for (i in 1:size){
    res[i] <- i * 10
y <- function(size){
  (1:size) * 10
res <- bench::mark(
  `for` = x(1000), 
  vectorized = y(1000), 
  iterations = 1000
# A tibble: 2 x 6
  expression    min median `itr/sec` mem_alloc `gc/sec`
  <bch:expr> <bch:> <bch:>     <dbl> <bch:byt>    <dbl>
1 for        50.2µs 53.9µs    16915.    30.3KB        0
2 vectorized  2.6µs  4.6µs   131805.    11.8KB        0

Here, we have an empiric evidence that one code is faster than the other: by benchmarking the speed of our code, we are able to determine which function is the fastest.

If you want a graphical analysis, {bench} comes with an autoplot method for {ggplot2} (Wickham, Chang, et al. 2020):

{bench} autoplot

FIGURE 15.4: {bench} autoplot

And, bonus point, {bench} takes time to check that the two outputs are the same, so that you are sure you are comparing the very same thing, which is another crucial aspect of benchmarking: be sure you are not comparing apple with oranges!

15.2.2 Profiling Shiny Shiny back-end

You can profile Shiny applications using the {profvis} package, just as any other piece of R code. The only thing to note if you want to use this function on an app built with {golem} (Guyader et al. 2020), you will have to wrap the run_app() function in a print() function. Long story short, what make the app run is not the function itself, but the printing of the function, so the object returned by run_app() itself can not be profiled. See the discussion on this issue on {golem} to learn more about this. Shiny front-end Google LightHouse Shiny front-end Google LightHouse

One other thing that can be optimized when it comes to the user interface is the webpage rendering performance. To do that, we can use standard web development tools: as said several times, a Shiny application IS a web application, so tools that are language agnostic will work with Shiny. There are thousands of tools available to do exactly that, and going through all of them would probably not make a lot of sense.

So, let’s focus on getting started with a basic but powerful tool, that comes for free inside your browser: Google Lighthouse, one of the famous tool for profiling web pages, and which is bundled into recent versions of Google Chrome. The nice thing is that this tool not only covers what you see (i.e. not only what you are actually rendering on your personal computer), but can also audit your app with various configurations, notably on mobile, with low bandwidth and/or mimicking 3G connection. being able to perform audit of our application as seen on a mobile device is a real strength: we are developing application on our computer, and might not be regularly checking how our application is performing on a mobile. Yet a large portion of the web navigation is performed on a mobile or table. Already in 2016, Google wrote that “More than half of all web traffic now comes from smartphones and tablets”. Knowing the exact number of visitors that browse through mobile is hard: the web is vast, and not all website record the traffic they receive. Yet many, if not all, studies around how the web is browsed are reporting the same results: more traffic is performed via mobile than via computer48 .

And, the pro of running it in your browser is that it can perform the analysis on locally deployed applications: in other word, you can launch your Shiny application in your R console, open the app in Google Chrome, and run the audit. A lot of online services needs an URL to do the audit!

Each result from the audit comes with a series of advises and changes you can make on your application to make it better, with links to know more about the specific issue.

And of course, last but not least, you also got the results of the metrics you have “passed”, and it is always a good mood booster to see our app passing some audited points!

Here is a quick introduction to this tool:

  • Open Chrome in incognito mode, so that the page performance is not influenced by any of the installed extensions in your Google Chrome
  • Open your developer console, either by going to View > Developer > Developer tools, by doing right click > Inspect, or with the keyboard shortcut ctrl/cmd + alt + I
  • Go to the “Audit” tab
  • Configure your report (or leave the default)
  • Click on “Generate Report”

Note that you can also install a command line tool with npm install -g lighthouse49 , then run lighthouse http://urlto.audit: it will produce either a JSON (if asked) or an HTML report (the default).

Launching LightHouse audit from the Google Chrome

FIGURE 15.5: Launching LightHouse audit from the Google Chrome

Google Lighthouse is computing a series of analysis about your webpage.

LightHouse audit results

FIGURE 15.6: LightHouse audit results

Once the audit is finished, you have some basic but useful indications about your application:

  • Performance. This metric mostly analyzes the rendering time of the page: for example how many time does it take to load the app in full, that is to say how many time it takes from the first byte received to the app being fully ready to be used, the time between the very first call to the server and the very first response, etc. With {shiny} (Chang et al. 2020), you will probably get low performance here, notably due to the fact that {shiny} is serving external dependencies that you might not be able to control. For example, the report from {hexmake} (Fay 2020g) suggests to “Eliminate render-blocking resources”, and most of them are not controlled by the shiny developer: they come bundled with shiny::fluidPage itself.

  • Accessibility. Google Lighthouse performs a series of tests about accessibility (see our chapter about accessibility for more information).

  • Best practices bundles a list of “misc” best practices around web applications.

  • SEO: search engine optimization, or how will your app perform when it comes to search engine indexation.

  • Progressive Web App (PWA): a PWA is an app that can run on any device, “reaching anyone, anywhere, on any device with a single codebase”. Google audit your application to see if your application fits with this idea.

Profiling web page is a wide topic and a lot of things can be done to enhance the global page performance. That being said, if you have a limited time to invest in optimizing the front-end performance of the application, Google Lighthouse is a perfect tool, and can be your go-to audit tool for your application.

And if you want to do if from R, the npm lighthouse module allows to output the audit in JSON, which can then be brought back to R!

lighthouse --output json --output-path data-raw/output.json http://localhost:2811

Then, being a JSON file, you can call if from R:

lighthouse_report <- jsonlite::read_json("data-raw/output.json")
[1] "5.4 s"

The results are contained in the audits sections of this object, and each of these sub-elements contains a description field, detailing what the metric means.

Here are for example some of the results, focused on performance, with there respective description: “First Meaningful Paint”
[1] "First Meaningful Paint measures when the primary content of a page is visible. [Learn more]("
lighthouse_report$audits$`first-meaningful-paint` %>% %>%
  dplyr::select(title, score, displayValue) %>%
title score displayValue
First Meaningful Paint 0.23 5.4 s “Speed Index”
[1] "Speed Index shows how quickly the contents of a page are visibly populated. [Learn more]("
lighthouse_report$audits$`speed-index` %>% %>%
  dplyr::select(title, score, displayValue) %>%
title score displayValue
Speed Index 0.56 5.4 s “Estimated Input Latency”
[1] "Estimated Input Latency is an estimate of how long your app takes to respond to user input, in milliseconds, during the busiest 5s window of page load. If your latency is higher than 50 ms, users may perceive your app as laggy. [Learn more]("
lighthouse_report$audits$`estimated-input-latency` %>% %>%
  dplyr::select(title, score, displayValue) %>%
title score displayValue
Estimated Input Latency 1 10 ms “Total Blocking Time”
[1] "Sum of all time periods between FCP and Time to Interactive, when task length exceeded 50ms, expressed in milliseconds."
lighthouse_report$audits$`total-blocking-time` %>% %>%
  dplyr::select(title, score, displayValue) %>%
title score displayValue
Total Blocking Time 1 30 ms “Time to first Byte”
[1] "Time To First Byte identifies the time at which your server sends a response. [Learn more]("
lighthouse_report$audits$`time-to-first-byte` %>%
  .[c("title", "score", "displayValue")] %>% %>%
title score displayValue
Server response times are low (TTFB) 1 Root document took 200 ms

Google Lighthouse also comes with a continuous integration tool, so that you can use it as a regression testing tool for your application. To know more, feel free to read the documentation! Side note on minification

Chances are that right now you are not using minification in your Shiny application. Minification is the process of removing unnecessary characters from files, without changing the way the code works, in order to make the file size lighter. The general idea being that line breaks, spaces and a specific set of characters are used inside scripts for human readability, and are not useful when it comes to the way a computer reads a piece of code. So why not removing them when they are served in a larger software? This is what minification does.

Here is an example of how minification works, taken from Empirical Study on Effects of Script Minification and HTTP Compression for Traffic Reduction (Sakamoto et al. 2015):

var sum = 0;
for ( var i = 0; i <=10; i ++ ) {
  sum += i ;
alert( sum ) ;

is minified into:

var sum=0;for(var i=0;i<=10;i++){sum+=i};alert(sum);

Both these code blocks behave the same way, but the second one will be lighter when saved to a file: this is the very core principle of minification of files. It is something pretty common to do when building web applications: on the web, every byte counts, so the lighter your external resources the better. Minifiction is something important as the heavier your resources, the longer your application will take to launch, and:

  • Page launch time is crucial when it comes to ranking the pages on the web

  • The heavier the resources, the longer it will be to launch the application on a mobile, notably if users are visiting your application from a 3G/4G network

And do not forget,

Extremely high-speed network infrastructures are becoming more and more popular in developed countries. However, we still face crowded and low-speed Wi-Fi environments on airport, cafe, international conference, etc. Especially, a network environment of mobile devices requires efficient usage of network bandwidth.

Taken from (Sakamoto et al. 2015)

To minify JavaScript, HTML and CSS files from R, you can use the {minifyr} (Fay 2020h) package, that wraps the node-minify NodeJS library. For example, compare the size of this file from {shiny}:

  system.file("www/shared/shiny.css", package = "shiny")

To its minified version:

minified <- minifyr::minifyr_css_cleancss(
  system.file("www/shared/shiny.css", package = "shiny"), 

That might not seem like much (a couple of KB) at a small scale, but as it can be done automatically, why not leverage these small performances gain when building larger applications? Of course, minification will not suddenly make your application blazing fast, but that’s something you should consider when deploying an application to production, notably if you use a lot of packages with interactive widget: they might contain CSS and JavaScript files that are not minified.

Note that {shiny} files are minifed by default, so you will not have to re-minify them. But most package that extend {shiny} are not, so minifying the CSS and JavaScript from these packages might help you win some points on Google LightHouse report!

To do this automatically, you can add the {minifyr} commands to your deployment, be it on your CD/CI platform, or as a Dockerfile step. {minifyr} comes with a series of functions to do that:

  • minify_folder_css(), minify_folder_js(), minify_folder_html() and minify_folder_json() do a bulk minification of the files found in a folder that match the extension
  • minify_package_js(), minify_package_css(), minify_package_html() and minify_package_json() will minify the CSS and JavaScript files contained inside a package installed on the machine

Here is what it can look like inside a Dockerfile:50

FROM rocker/shiny-verse:3.6.3

RUN apt-get -y install curl
RUN curl -sL | bash -
RUN apt-get install -y nodejs

RUN Rscript -e 'remotes::install_github("colinfay/minifyr")'
RUN Rscript -e 'remotes::install_cran("cicerone")'
RUN Rscript -e 'library(minifyr);minifyr_npm_install(TRUE);minify_package_js("cicerone", minifyr_js_uglify)'


Bengtsson, Henrik. 2018. Profmem: Simple Memory Profiling for R.

Chang, Winston, Joe Cheng, JJ Allaire, Yihui Xie, and Jonathan McPherson. 2020. Shiny: Web Application Framework for R.

Chang, Winston, Javier Luraschi, and Timothy Mastny. 2019. Profvis: Interactive Visualizations for Profiling R Code.

Eddelbuettel, Dirk, Romain Francois, JJ Allaire, Kevin Ushey, Qiang Kou, Nathan Russell, Douglas Bates, and John Chambers. 2020. Rcpp: Seamless R and C++ Integration.

Fay, Colin. 2020g. Hexmake: Hex Stickers Maker.

Fay, Colin. 2020h. Minifyr: Minify Css, Html and Javascript Files.

Guyader, Vincent, Colin Fay, Sébastien Rochette, and Cervan Girard. 2020. Golem: A Framework for Robust Shiny Applications.

Hester, Jim. 2020a. Bench: High Precision Timing of R Expressions.

Lemaire, Maude. 2020. Refactoring at Scale. Henry Holt.

Sakamoto, Yasutaka, Shinsuke Matsumoto, Seiki Tokunaga, Sachio Saiki, and Masahide Nakamura. 2015. “Empirical Study on Effects of Script Minification and HTTP Compression for Traffic Reduction.” In 2015 Third International Conference on Digital Information, Networking, and Wireless Communications (DINWC). IEEE.

Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, and Dewey Dunnington. 2020. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics.

  1. {utils} also comes with a function call Rprof(), but we will not be examining this one here, as {profvis} provides a more user-friendly and enhanced interface to this profiling function.↩︎

  2. Do not forget to add {} inside profvis({}) if you want to write several lines of code.↩︎

  3. broadbandsearch for example, reports a 53.3% share for mobile browsing.↩︎

  4. Being a NodeJS application, you will need to have NodeJS installed on your machine.↩︎

  5. Note that you will need to install NodeJS inside the container↩︎

ThinkR Website