+ - 0:00:00
Notes for current slide
Notes for next slide

Advanced R by Hadley Wickham

Chapter 6: Functions

Asmae Toumi

@asmae_toumi

2020-07-26

1 / 35

Prerequisites

Packages:

suppressPackageStartupMessages({
library(tidyverse)
library(skimr)})

Data:

# data from tidytuesday
# https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-03-31/readme.md
brewing_materials <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/brewing_materials.csv')
beer_taxed <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/beer_taxed.csv')
brewer_size <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/brewer_size.csv')
beer_states <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/beer_states.csv')
2 / 35

Function fundamentals

  • Functions are objects, just as vectors are objects.
  • Functions can be broken down into three components: argument (plural: formals), body (code inside the function), and environment (determines how the function finds the values associated the names).
  • Arguments and body are always explicitely mentionned, but not the environment (which is implied)
3 / 35

Function fundamentals (2)

barrels_to_gallons <- function(total_barrels) {
# A barrel of beer is 31 gallons
gallons <- total_barrels * 31
return(gallons)
}
barrels_to_gallons(3.65)
## [1] 113.15
formals(barrels_to_gallons)
## $total_barrels
body(barrels_to_gallons)
## {
## gallons <- total_barrels * 31
## return(gallons)
## }
4 / 35

Function fundamentals (3)

  • Recall that functions are objects, therefore they have attributes
  • Notice that when we called body() in the previous slide, it did not contain the commented code chunk.
  • You can use attr() to print the function's other attributes. Here, srcref prints the source code and other formatting:
attr(barrels_to_gallons, "srcref")
## function(total_barrels) {
## # A barrel of beer is 31 gallons
## gallons <- total_barrels * 31
## return(gallons)
## }
5 / 35

Function fundamentals (4)

  • You don't have to name a function, especially if it takes too much effort to come up with a name. These are called anonymous functions
  • You can put functions in a list:
funs <- list(
gallons_est = function(barrels) barrels * 31,
gallons_real = function(barrels) barrels * 31.657
)
funs$gallons_real(10)
## [1] 316.57
6 / 35

Function fundamentals (5)

You can invoke functions if the arguments are contained in the data structure with do.call():

args <- brewer_size %>%
select(total_barrels) %>%
top_n(3) %>%
as.list()
do.call(barrels_to_gallons, args)
## [1] 6106047525 6051553498 6080563272
7 / 35

Function composition

To compose multiple function calls you can:

  • Nest functions (hard to read):
x <- runif(100)
sqrt(mean(square(deviation(x))))
  • Save the intermediate results as variables (annoying):
out <- deviation(x)
out <- square(out)
out <- mean(out)
out <- sqrt(out)
out
  • Pipe (the best):
x %>%
deviation() %>%
square() %>%
mean() %>%
sqrt()

"The focus is on what’s being done (the verbs), rather than on what’s being modified (the nouns)."

8 / 35

Is R lexically or dynamically scoped?

9 / 35

Lexical vs dynamic scoping

x <- 10
g <- function() {
x <- 20
x
}
g()
## [1] 20

Makes sense. What about?

x <- "IPA's taste and smell like dirty socks"
f <- function() x
g <- function() {
x <- "I like the taste of dirty socks and therefore IPA's"
f()
}
g() # what does this return? PS: it's the correct answer
10 / 35

R is lexically scoped, and therefore returns the correct answer: "IPA's taste and smell like dirty socks".

What is a scope? Scope refers to the places in a program where a variable is visible and can be referenced.

  • Under dynamic scoping:

    • a variable is bound to the most recent value assigned to that variable, i.e. the most recent assignment during the program’s execution.

    • in other words, the program returns the most recent assignment during the program's execution, i.e. "IPA's are the best"

  • Under lexical scoping:

    • the scope of a variable is determined by the lexical (i.e. textual) structure of a program.

    • the use of x on line 2 is "within the scope" created by the definition on line 1, so the program returns "IPA's taste and smell like dirty socks".

    • Most programming languages are lexically scoped

11 / 35

Lexical scoping

  • R uses lexical scoping: it looks up the values of names based on how a fuction is defined, not how it is called.

  • R's lexical scoping follows 4 rules:

    • Name masking
    • Functions versus variables
    • A fresh start
    • Dynamic lookup
  • Understanding these will help to use more advanced functional programming tools

12 / 35

Lexical scoping (1): Name Masking

Names defined inside a function mask names defined outside a function.

x <- 10
y <- 20
fun <- function() {
x <- 1
y <- 2
c(x, y)
}
fun()
## [1] 1 2
  • If a name isn’t defined inside a function, R looks one level up.
  • The same rules apply if a function is defined inside another function.
  • R will look a "level" up, all the way up to the global environment and finally, the loaded packages

🏠: functions help you prevent coding mistakes by having variables only be valid inside the body of a function and therefore unaffected by any other variables with the same name outside of the function

13 / 35

Lexical scoping (2): A fresh start

a <- 419
fun <- function() {
if (!exists("a")) {
a <- 1
} else {
a <- a + 1
}
a
}
fun()
## [1] 420
fun() # every run is a fresh start!
## [1] 420

We get the same value because each function run is completely independent of the other - functions cannot tell what happened previously. We'll see how to modify this behavior in later chapters

14 / 35

Lexical scoping (3): Dynamic lookup

  • R looks for values when the function is run, not when the function is created.

  • This behavior is, as Hadley calls it, "annoying" because if you make a spelling mistake in your code, you won’t get an error message when you create the function

  • You can use codetools::findGlobals() which will list any unbounded symbols and then use emptyenv() to manually empty out the environment the function is in.

15 / 35

Lazy evaluation (1)

Arguments to functions are evaluated lazily so they are evaluated only as needed:

f <- function(a, b) {
a^2
}
f(2)
## [1] 4

This function never actually uses the argument b, so calling f(2) will not produce an error because the 2 gets positionally matched to a.

16 / 35

Lazy evaluation (2)

Another example:

f <- function(a, b) {
print(a)
print(b)
}
f(45)
> 45
> Error in print(b) : argument "b" is missing, with no default

“45” got printed first before the error was triggered. Why? because b did not have to be evaluated until after print(a).

Once the function tried to evaluate print(b) it had to throw an error.

17 / 35

Lazy evaluation (2): Promises

Lazy evaluation is powered by a data structure called a promise

A promise has 3 components:

  • An expression which gives rise to the delayed computation

  • An environment where the expression should be evaluated, i.e. the environment where the function is called.

  • A value, which is computed and cached the first time a promise is accessed when the expression is evaluated in the specified environment. This ensures that the promise is evaluated at most once

🏠: Lazy evaluation via promises allows you to include intensive computations in function arguments which will only be evaluated when needed.

18 / 35

Lazy evaluation (4): Default arguments

gallons <- function(x, y) {
result <- x*y
print(paste(x,"barrels", "equal", result, "gallons of beer"))
}
gallons(8, 31)
## [1] "8 barrels equal 248 gallons of beer"

The same function, with a default argument:

gallons <- function(x, y = 31) {
result <- x*y
print(paste(x,"barrels", "equal", result, "gallons of beer"))
}
gallons(8)
## [1] "8 barrels equal 248 gallons of beer"

Here, y argument is optional and will take the default value unless you specify otherwise

19 / 35

Lazy evaluation (5): Default arguments

Because of lazy evaluation, default values can be defined in terms of:

  • other arguments

  • or variables defined later in the function

Even though may base packages use default argument, Hadley does not recommend them because:

  • they are hard to read

  • need to know the order of evaluation to know what will be returned

20 / 35

Lazy evaluation (6): Missing arguments

You can use missing() to check whether an argument's value comes from the user or the default

fun <- function(x = 10) {
list(missing(x), x)
}
str(fun())
## List of 2
## $ : logi TRUE
## $ : num 10

Returns TRUE because the argument's value comes from the default.

str(fun(10))
## List of 2
## $ : logi FALSE
## $ : num 10

Returns FALSE because the argument's value comes from the user.

21 / 35

dot-dot-dot (1)

green.plot <- function(x, y, ...) {
plot(x, y, col="green", ...)
}
green.plot(1:5, 1:5, xlab="Are Very Useful", ylab="dot-dot-dot")

We passed xlab and ylab thanks to the ellipses even though we didn't define them in the function.

22 / 35

dot-dot-dot (2)

  • Functions can have a special argument ... and with it it can take any number of additional arguments

  • When is it used?

    • to extend another function when you don't want to copy the entire argument list of the original function:

    • when the number of arguments passed to the function cannot be known in advance.

❗: Any arguments after the ... must be named explicitely and cannot be partially matched.

23 / 35

Exiting a function (1)

  • Success or failure are the two ways by which a function "exits"

    • Success is when it returns a value
    • Failure is when it throws an error
  • There are many types of return values:

    • Implicit vs explicit
    • visible vs invisible
24 / 35

Return values (1): explicit

We're being explicit here by using return()

region_2 <- function(state) {
northeast <- c("NY", "MA", "AL", "VT", "CT")
if (state %in% northeast) {
return("all good") # explicit because we call return()
} else {
return("not in the northeast")
}
}
region_2("VT")
## [1] "all good"
25 / 35

Return values (1): implicit

The last evaluated expression is the return value:

region <- function(state) {
northeast <- c("NY", "MA", "AL", "VT", "CT")
if (state %in% northeast) {
"all good" # implicit because we don't call return()
} else {
"not in the northeast"
}
}
region("CA")
## [1] "not in the northeast"
26 / 35

Visible vs invisible values

Calling on a function returns the value automatically:

fun <- function() 1
fun()
## [1] 1

But you can prevent that by wrapping the last value in invisible():

fun <- function() invisible(1)
fun()

You can always call print() or wrap the whole function call in parantheses to verify that it still exists.

27 / 35

Errors

Function to compute the confidence interval for the mean:

d <- rpois(25,8)
d
GetCI <- function(x, level = 0.95) {
if (level <= 0 || level >= 1) {
stop("The 'level' argument must be greater than 0 and less than 1")
}
if (level < 0.5) {
warning("Confidence levels are often close to 1, e.g. 0.95")
}
m <- mean(x)
n <- length(x)
SE <- sd(x)/sqrt(n)
upper <- 1 - (1-level)/2
ci <- m + c(-1,1)*qt(upper, n-1)*SE
return(list(mean=m, se=SE, ci=ci))
}
GetCI(d, 99)
28 / 35

Exit handlers

Sorry.

29 / 35

Function forms

Not all function calls are the same. There are 4 types:

  • Prefix: the function name comes before its arguments (the most common)

  • infix: the function name comes in between its arguments (common in math operators and user-defined functions)

  • replacement: functions that replace values by assignment, like names(df) <- c("a", "b", "c").

  • special: functions like [[, if, and for.

🏠: there are 4 forms but everything can be written in prefix form.

30 / 35

Infix functions

R comes with a number of built-in infix operators: :, ::, :::, $, @, ^, *, /, +, -, >, >=, <, <=, ==, !=, !, &, &&, |, ||, ~, <-, and <<-.

But you can create your own!

`%+%` <- function(a, b) paste0(a, b)
"Sour beers " %+% "are elite"
## [1] "Sour beers are elite"
31 / 35

Replacement functions

  • Act like they modify their arguments in place
  • have the special name xxx <- and must have arguments named x and value
  • must return the modified object. For example, the following function modifies the second element of a vector:
`second<-` <- function(x, value) {
x[2] <- value
x
}

Replacement functions are used by placing the function call on the left side of <-:

x <- 1:10
second(x) <- 5L
x
## [1] 1 5 3 4 5 6 7 8 9 10
32 / 35

Special forms

How to rewrite a function in prefix form

More on this in the book!

33 / 35

Useful references

34 / 35

Thank you!

35 / 35

Prerequisites

Packages:

suppressPackageStartupMessages({
library(tidyverse)
library(skimr)})

Data:

# data from tidytuesday
# https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-03-31/readme.md
brewing_materials <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/brewing_materials.csv')
beer_taxed <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/beer_taxed.csv')
brewer_size <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/brewer_size.csv')
beer_states <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/beer_states.csv')
2 / 35
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow