16.2 Caching Elements
16.2.1 What is Caching
Store resource-intensive results
So they can be reused without recomputation
Downside: you can’t cache ’em all
Only cache if the thing:
- would be identical each time it is made (for a given input)
Good things to cache:
Plots
Database queries
16.2.2 Native caching in R
{R.cache} and {memoise} work in a similar way:
Wrap a function with a higher-order function (
new_func <- memoise::memoise(initial_func)
)Call the new function with some arguments (
new_func(x = 1, y = 2)
)- it does the expensive computation
- then stores the result
Call the new function again
- if you’ve used the args before, it will return immediately with the stored result
The arguments are converted to a look-up key
library(memoise)
# tic/toc are used for timing
library(tictoc)
function(seconds = 1) {
sleep_and_return_time <-Sys.sleep(seconds)
return(Sys.time())
}
memoise(sleep_and_return_time)
msleep_and_return_time <-
# first call is sloooooooow
tic()
msleep_and_return_time(10)
toc()
# second call is fast
tic()
msleep_and_return_time(10)
toc()
But …. why is sleep_and_return_time
a bad example for a cacheable function?
The book has a more reproducible example using database queries (there’s a db-in-shiny example below, so we don’t cover this example)
{memoise} stores the cache on disk, you can set where on the file-system it is stored
16.2.3 Caching in {shiny}
An app:
library(shiny)
function(){
ui <-tagList(
# The user can select one of the cut from ggplot2::diamonds,
# {shiny} will then query the SQL database to retrieve the
# first rows of the result
selectInput("cut", "cut", unique(ggplot2::diamonds$cut)),
tableOutput("tbl")
)
}
function(con) {
srv_builder <- function(
server <-
input,
output,
session
){# Rendering the table of the SQL call
$tbl <- renderTable({
output# Using a memoised function allows to prevent from
# calling the SQL database every time the user inputs
# a change
memoised_fct_sql(input$cut, con)
})
}
server
}
# We create an in-memory database using SQLite
DBI::dbConnect(
con <-::SQLite(),
RSQLitedbname = ":memory:"
)
# Writing a large dataset to the db
::dbWriteTable(
DBI
con,"diams",
# This table will have 539400 rows
::bind_rows(
dplyr::rerun(10, ggplot2::diamonds)
purrr
)
)
shinyApp(ui, server = srv_builder(con))
What haven’t we defined here?
library(memoise)
function(cut, con){
fct_sql <-# NEVER EVER SPRINTF AN SQL CODE LIKE THAT
# IT'S SENSITIVE TO SQL INJECTIONS, WE'RE
# DOING IT FOR THE EXAMPLE
::cat_rule("Calling the SQL db")
cli DBI::dbGetQuery(
results <-sprintf(
con, "SELECT * FROM diams WHERE cut = '%s'",
cut
)
)head(results)
}
# Using a local cache
cache_filesystem("cache")
cache_dir <- memoise(fct_sql, cache = cache_dir) memoised_fct_sql <-
App workflow:
- Start the app
- set input$cut to a new value
- time how long it takes to render the new table
- set input$cut to an unused value
- note that it takes the same length of time
- set input$cut to the first selected value
- note that it is way faster than the original call
{shiny} already has caching functionality:
shiny::bindCache()
- [older]
shiny::renderCachedPlot()
Note from the RStudio shiny blog:
""“As of Shiny 1.6.0, we recommend using renderPlot() with bindCache() instead”""
Example shiny app
- taken from the book
- but rewritten to use
renderPlot() %>% bindCache()
rather thanrenderCachedPlot()
library(shiny)
function() {
ui <-tagList(
# We select a data.frame to plot
selectInput(
"tbl",
"Table",
c("iris", "mtcars", "airquality")
),# This plotOutput will be cached
plotOutput("plot")
)
}
function(
server <-
input,
output,
session
) {# The cache mechanism is made available by 'bindCache'
$plot <- renderPlot({
output# Plotting the selected data.frame
plot(get(input$tbl))
%>%
}) bindCache(
# List here all the reactive expression that will
# be used as cache key when running the app,
# you will see that the first time you plot one
# graph, it takes a couple of seconds,
# but the second time, it's almost
# instantaneous
$tbl
input
)
}
shinyApp(ui, server)
Notes:
- you can also cache on remote storage (e.g., S3)
- you only have limited space - old cached values will be dropped