Packages:
suppressPackageStartupMessages({library(tidyverse)library(skimr)})
Data:
# data from tidytuesday# https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-03-31/readme.mdbrewing_materials <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/brewing_materials.csv')beer_taxed <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/beer_taxed.csv')brewer_size <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/brewer_size.csv')beer_states <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/beer_states.csv')
barrels_to_gallons <- function(total_barrels) { # A barrel of beer is 31 gallons gallons <- total_barrels * 31 return(gallons)}barrels_to_gallons(3.65)
## [1] 113.15
formals(barrels_to_gallons)
## $total_barrels
body(barrels_to_gallons)
## {## gallons <- total_barrels * 31## return(gallons)## }
body()
in the previous slide, it did not contain the commented code chunk. attr()
to print the function's other attributes. Here, srcref
prints the source code and other formatting: attr(barrels_to_gallons, "srcref")
## function(total_barrels) {## # A barrel of beer is 31 gallons## gallons <- total_barrels * 31 ## return(gallons)## }
funs <- list( gallons_est = function(barrels) barrels * 31, gallons_real = function(barrels) barrels * 31.657)funs$gallons_real(10)
## [1] 316.57
You can invoke functions if the arguments are contained in the data structure with do.call()
:
args <- brewer_size %>% select(total_barrels) %>% top_n(3) %>% as.list()do.call(barrels_to_gallons, args)
## [1] 6106047525 6051553498 6080563272
To compose multiple function calls you can:
x <- runif(100)sqrt(mean(square(deviation(x))))
out <- deviation(x)out <- square(out)out <- mean(out)out <- sqrt(out)out
x %>% deviation() %>% square() %>% mean() %>% sqrt()
"The focus is on what’s being done (the verbs), rather than on what’s being modified (the nouns)."
x <- 10g <- function() { x <- 20 x}g()
## [1] 20
Makes sense. What about?
x <- "IPA's taste and smell like dirty socks"f <- function() xg <- function() { x <- "I like the taste of dirty socks and therefore IPA's" f()}g() # what does this return? PS: it's the correct answer
R is lexically scoped, and therefore returns the correct answer: "IPA's taste and smell like dirty socks".
What is a scope? Scope refers to the places in a program where a variable is visible and can be referenced.
Under dynamic scoping:
a variable is bound to the most recent value assigned to that variable, i.e. the most recent assignment during the program’s execution.
in other words, the program returns the most recent assignment during the program's execution, i.e. "IPA's are the best"
Under lexical scoping:
the scope of a variable is determined by the lexical (i.e. textual) structure of a program.
the use of x on line 2 is "within the scope" created by the definition on line 1, so the program returns "IPA's taste and smell like dirty socks".
Most programming languages are lexically scoped
R uses lexical scoping: it looks up the values of names based on how a fuction is defined, not how it is called.
R's lexical scoping follows 4 rules:
Understanding these will help to use more advanced functional programming tools
Names defined inside a function mask names defined outside a function.
x <- 10y <- 20fun <- function() { x <- 1 y <- 2 c(x, y)}fun()
## [1] 1 2
🏠: functions help you prevent coding mistakes by having variables only be valid inside the body of a function and therefore unaffected by any other variables with the same name outside of the function
a <- 419fun <- function() { if (!exists("a")) { a <- 1 } else { a <- a + 1 } a}fun()
## [1] 420
fun() # every run is a fresh start!
## [1] 420
We get the same value because each function run is completely independent of the other - functions cannot tell what happened previously. We'll see how to modify this behavior in later chapters
R looks for values when the function is run, not when the function is created.
This behavior is, as Hadley calls it, "annoying" because if you make a spelling mistake in your code, you won’t get an error message when you create the function
You can use codetools::findGlobals()
which will list any unbounded symbols and then use emptyenv()
to manually empty out the environment the function is in.
Arguments to functions are evaluated lazily so they are evaluated only as needed:
f <- function(a, b) { a^2} f(2)
## [1] 4
This function never actually uses the argument b, so calling f(2) will not produce an error because the 2 gets positionally matched to a.
Another example:
f <- function(a, b) { print(a) print(b)}f(45)> 45> Error in print(b) : argument "b" is missing, with no default
“45” got printed first before the error was triggered. Why? because b did not have to be evaluated until after print(a).
Once the function tried to evaluate print(b) it had to throw an error.
Lazy evaluation is powered by a data structure called a promise
A promise has 3 components:
An expression which gives rise to the delayed computation
An environment where the expression should be evaluated, i.e. the environment where the function is called.
A value, which is computed and cached the first time a promise is accessed when the expression is evaluated in the specified environment. This ensures that the promise is evaluated at most once
🏠: Lazy evaluation via promises allows you to include intensive computations in function arguments which will only be evaluated when needed.
gallons <- function(x, y) {result <- x*yprint(paste(x,"barrels", "equal", result, "gallons of beer"))}gallons(8, 31)
## [1] "8 barrels equal 248 gallons of beer"
The same function, with a default argument:
gallons <- function(x, y = 31) {result <- x*yprint(paste(x,"barrels", "equal", result, "gallons of beer"))}gallons(8)
## [1] "8 barrels equal 248 gallons of beer"
Here, y
argument is optional and will take the default value unless you specify otherwise
Because of lazy evaluation, default values can be defined in terms of:
other arguments
or variables defined later in the function
Even though may base packages use default argument, Hadley does not recommend them because:
they are hard to read
need to know the order of evaluation to know what will be returned
You can use missing()
to check whether an argument's value comes from the user or the default
fun <- function(x = 10) { list(missing(x), x)}str(fun())
## List of 2## $ : logi TRUE## $ : num 10
Returns TRUE
because the argument's value comes from the default.
str(fun(10))
## List of 2## $ : logi FALSE## $ : num 10
Returns FALSE
because the argument's value comes from the user.
green.plot <- function(x, y, ...) { plot(x, y, col="green", ...)}green.plot(1:5, 1:5, xlab="Are Very Useful", ylab="dot-dot-dot")
We passed xlab
and ylab
thanks to the ellipses even though we didn't define them in the function.
Functions can have a special argument ...
and with it it can take any number of additional arguments
When is it used?
to extend another function when you don't want to copy the entire argument list of the original function:
when the number of arguments passed to the function cannot be known in advance.
❗: Any arguments after the ...
must be named explicitely and cannot be partially matched.
Success or failure are the two ways by which a function "exits"
There are many types of return values:
We're being explicit here by using return()
region_2 <- function(state) { northeast <- c("NY", "MA", "AL", "VT", "CT") if (state %in% northeast) { return("all good") # explicit because we call return() } else { return("not in the northeast") }}region_2("VT")
## [1] "all good"
The last evaluated expression is the return value:
region <- function(state) { northeast <- c("NY", "MA", "AL", "VT", "CT") if (state %in% northeast) { "all good" # implicit because we don't call return() } else { "not in the northeast" }}region("CA")
## [1] "not in the northeast"
Calling on a function returns the value automatically:
fun <- function() 1fun()
## [1] 1
But you can prevent that by wrapping the last value in invisible()
:
fun <- function() invisible(1)fun()
You can always call print()
or wrap the whole function call in parantheses to verify that it still exists.
Function to compute the confidence interval for the mean:
d <- rpois(25,8)dGetCI <- function(x, level = 0.95) { if (level <= 0 || level >= 1) { stop("The 'level' argument must be greater than 0 and less than 1") } if (level < 0.5) { warning("Confidence levels are often close to 1, e.g. 0.95") } m <- mean(x) n <- length(x) SE <- sd(x)/sqrt(n) upper <- 1 - (1-level)/2 ci <- m + c(-1,1)*qt(upper, n-1)*SE return(list(mean=m, se=SE, ci=ci))}GetCI(d, 99)
Sorry.
Not all function calls are the same. There are 4 types:
Prefix: the function name comes before its arguments (the most common)
infix: the function name comes in between its arguments (common in math operators and user-defined functions)
replacement: functions that replace values by assignment, like names(df) <- c("a", "b", "c").
special: functions like [[
, if
, and for
.
🏠: there are 4 forms but everything can be written in prefix form.
R comes with a number of built-in infix operators: :
, ::
, :::
, $
, @
, ^
, *
, /
, +
, -
, >
, >=
, <
, <=
, ==
, !=
, !
, &
, &&
, |
, ||
, ~
, <
-, and <<-
.
But you can create your own!
`%+%` <- function(a, b) paste0(a, b)"Sour beers " %+% "are elite"
## [1] "Sour beers are elite"
`second<-` <- function(x, value) { x[2] <- value x}
Replacement functions are used by placing the function call on the left side of <-
:
x <- 1:10second(x) <- 5Lx
## [1] 1 5 3 4 5 6 7 8 9 10
More on this in the book!
Good introductory overview on functions in R: https://www.stat.berkeley.edu/~statcur/Workshop2/Presentations/functions.pdf
On lexical scoping in R
Packages:
suppressPackageStartupMessages({library(tidyverse)library(skimr)})
Data:
# data from tidytuesday# https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-03-31/readme.mdbrewing_materials <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/brewing_materials.csv')beer_taxed <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/beer_taxed.csv')brewer_size <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/brewer_size.csv')beer_states <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/beer_states.csv')
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |