Advanced R by Hadley WickhamChapter 6: FunctionsAsmae Toumi@asmae_toumi2020-07-261 / 35

Prerequisites

Packages:

suppressPackageStartupMessages({
library(tidyverse)
library(skimr)})

Data:

# data from tidytuesday
# https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-03-31/readme.md
brewing_materials <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/brewing_materials.csv')
beer_taxed <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/beer_taxed.csv')
brewer_size <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/brewer_size.csv')
beer_states <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/beer_states.csv')

2 / 35

Function fundamentalsFunctions are objects, just as vectors are objects.
Functions can be broken down into three components: argument (plural: formals), body (code inside the function), and environment (determines how the function finds the values associated the names).
Arguments and body are always explicitely mentionned, but not the environment (which is implied)
3 / 35

Function fundamentals (2)

barrels_to_gallons <- function(total_barrels) {
  # A barrel of beer is 31 gallons
  gallons <- total_barrels * 31 
  return(gallons)
}
barrels_to_gallons(3.65)

## [1] 113.15

formals(barrels_to_gallons)

## $total_barrels

body(barrels_to_gallons)

## {
##     gallons <- total_barrels * 31
##     return(gallons)
## }

4 / 35

Function fundamentals (3)

Recall that functions are objects, therefore they have attributes
Notice that when we called body() in the previous slide, it did not contain the commented code chunk.
You can use attr() to print the function's other attributes. Here, srcref prints the source code and other formatting:

attr(barrels_to_gallons, "srcref")

## function(total_barrels) {
##   # A barrel of beer is 31 gallons
##   gallons <- total_barrels * 31 
##   return(gallons)
## }

5 / 35

Function fundamentals (4)

You don't have to name a function, especially if it takes too much effort to come up with a name. These are called anonymous functions
You can put functions in a list:

funs <- list(
  gallons_est = function(barrels) barrels * 31,
  gallons_real = function(barrels) barrels * 31.657
)
funs$gallons_real(10)

## [1] 316.57

6 / 35

Function fundamentals (5)

You can invoke functions if the arguments are contained in the data structure with do.call():

args <- brewer_size %>% 
  select(total_barrels) %>% 
  top_n(3) %>%
  as.list()
do.call(barrels_to_gallons, args)

## [1] 6106047525 6051553498 6080563272

7 / 35

Function composition

To compose multiple function calls you can:

Nest functions (hard to read):

x <- runif(100)
sqrt(mean(square(deviation(x))))

Save the intermediate results as variables (annoying):

out <- deviation(x)
out <- square(out)
out <- mean(out)
out <- sqrt(out)
out

Pipe (the best):

x %>%
  deviation() %>%
  square() %>%
  mean() %>%
  sqrt()

"The focus is on what’s being done (the verbs), rather than on what’s being modified (the nouns)."

8 / 35

Is R lexically or dynamically scoped?9 / 35

Lexical vs dynamic scoping

x <- 10
g <- function() {
  x <- 20
  x
}
g()

## [1] 20

Makes sense. What about?

x <- "IPA's taste and smell like dirty socks"
f <- function() x
g <- function() {
  x <- "I like the taste of dirty socks and therefore IPA's"
  f()
}
g() # what does this return? PS: it's the correct answer

10 / 35

R is lexically scoped, and therefore returns the correct answer: "IPA's taste and smell like dirty socks".

What is a scope? Scope refers to the places in a program where a variable is visible and can be referenced.

Under dynamic scoping:
- a variable is bound to the most recent value assigned to that variable, i.e. the most recent assignment during the program’s execution.
- in other words, the program returns the most recent assignment during the program's execution, i.e. "IPA's are the best"

Under lexical scoping:
- the scope of a variable is determined by the lexical (i.e. textual) structure of a program.
- the use of x on line 2 is "within the scope" created by the definition on line 1, so the program returns "IPA's taste and smell like dirty socks".
- Most programming languages are lexically scoped

11 / 35

Lexical scoping

R uses lexical scoping: it looks up the values of names based on how a fuction is defined, not how it is called.
R's lexical scoping follows 4 rules:
- Name masking
- Functions versus variables
- A fresh start
- Dynamic lookup
Understanding these will help to use more advanced functional programming tools

12 / 35

Lexical scoping (1): Name Masking

Names defined inside a function mask names defined outside a function.

x <- 10
y <- 20
fun <- function() {
  x <- 1
  y <- 2
  c(x, y)
}
fun()

## [1] 1 2

If a name isn’t defined inside a function, R looks one level up.
The same rules apply if a function is defined inside another function.
R will look a "level" up, all the way up to the global environment and finally, the loaded packages

🏠: functions help you prevent coding mistakes by having variables only be valid inside the body of a function and therefore unaffected by any other variables with the same name outside of the function

13 / 35

Lexical scoping (2): A fresh start

a <- 419
fun <- function() {
  if (!exists("a")) {
    a <- 1
  } else {
    a <- a + 1
  }
  a
}
fun()

## [1] 420

fun() # every run is a fresh start!

## [1] 420

We get the same value because each function run is completely independent of the other - functions cannot tell what happened previously. We'll see how to modify this behavior in later chapters

14 / 35

Lexical scoping (3): Dynamic lookup

R looks for values when the function is run, not when the function is created.
This behavior is, as Hadley calls it, "annoying" because if you make a spelling mistake in your code, you won’t get an error message when you create the function
You can use codetools::findGlobals() which will list any unbounded symbols and then use emptyenv() to manually empty out the environment the function is in.

15 / 35

Lazy evaluation (1)

Arguments to functions are evaluated lazily so they are evaluated only as needed:

f <- function(a, b) {
        a^2
} 
f(2)

## [1] 4

This function never actually uses the argument b, so calling f(2) will not produce an error because the 2 gets positionally matched to a.

16 / 35

Lazy evaluation (2)

Another example:

f <- function(a, b) {
  print(a)
  print(b)
}
f(45)
> 45
> Error in print(b) : argument "b" is missing, with no default

“45” got printed first before the error was triggered. Why? because b did not have to be evaluated until after print(a).

Once the function tried to evaluate print(b) it had to throw an error.

17 / 35

Lazy evaluation (2): Promises

Lazy evaluation is powered by a data structure called a promise

A promise has 3 components:

An expression which gives rise to the delayed computation
An environment where the expression should be evaluated, i.e. the environment where the function is called.
A value, which is computed and cached the first time a promise is accessed when the expression is evaluated in the specified environment. This ensures that the promise is evaluated at most once

🏠: Lazy evaluation via promises allows you to include intensive computations in function arguments which will only be evaluated when needed.

18 / 35

Lazy evaluation (4): Default arguments

gallons <- function(x, y) {
result <- x*y
print(paste(x,"barrels", "equal", result, "gallons of beer"))
}
gallons(8, 31)

## [1] "8 barrels equal 248 gallons of beer"

The same function, with a default argument:

gallons <- function(x, y = 31) {
result <- x*y
print(paste(x,"barrels", "equal", result, "gallons of beer"))
}
gallons(8)

## [1] "8 barrels equal 248 gallons of beer"

Here, y argument is optional and will take the default value unless you specify otherwise

19 / 35

Lazy evaluation (5): Default arguments

Because of lazy evaluation, default values can be defined in terms of:

other arguments
or variables defined later in the function

Even though may base packages use default argument, Hadley does not recommend them because:

they are hard to read
need to know the order of evaluation to know what will be returned

20 / 35

Lazy evaluation (6): Missing arguments

You can use missing() to check whether an argument's value comes from the user or the default

fun <- function(x = 10) {
  list(missing(x), x)
}
str(fun())

## List of 2
##  $ : logi TRUE
##  $ : num 10

Returns TRUE because the argument's value comes from the default.

str(fun(10))

## List of 2
##  $ : logi FALSE
##  $ : num 10

Returns FALSE because the argument's value comes from the user.

21 / 35

dot-dot-dot (1)

green.plot <- function(x, y, ...) {
  plot(x, y, col="green", ...)
}
green.plot(1:5, 1:5, xlab="Are Very Useful", ylab="dot-dot-dot")

We passed xlab and ylab thanks to the ellipses even though we didn't define them in the function.

22 / 35

dot-dot-dot (2)

Functions can have a special argument ... and with it it can take any number of additional arguments
When is it used?
- to extend another function when you don't want to copy the entire argument list of the original function:
- when the number of arguments passed to the function cannot be known in advance.

❗: Any arguments after the ... must be named explicitely and cannot be partially matched.

23 / 35

Exiting a function (1)

Success or failure are the two ways by which a function "exits"
- Success is when it returns a value
- Failure is when it throws an error
There are many types of return values:
- Implicit vs explicit
- visible vs invisible

24 / 35

Return values (1): explicit

We're being explicit here by using return()

region_2 <- function(state) {
  northeast <- c("NY", "MA", "AL", "VT", "CT")
  if (state %in% northeast) {
    return("all good") # explicit because we call return()
  } else {
    return("not in the northeast")
  }
}
region_2("VT")

## [1] "all good"

25 / 35

Return values (1): implicit

The last evaluated expression is the return value:

region <- function(state) {
  northeast <- c("NY", "MA", "AL", "VT", "CT")
  if (state %in% northeast) {
    "all good" # implicit because we don't call return()
  } else {
    "not in the northeast"
  }
}
region("CA")

## [1] "not in the northeast"

26 / 35

Visible vs invisible values

Calling on a function returns the value automatically:

fun <- function() 1
fun()

## [1] 1

But you can prevent that by wrapping the last value in invisible():

fun <- function() invisible(1)
fun()

You can always call print() or wrap the whole function call in parantheses to verify that it still exists.

27 / 35

Errors

Function to compute the confidence interval for the mean:

d <- rpois(25,8)
d
GetCI <- function(x, level = 0.95) {
  if (level <= 0 || level >= 1) {
    stop("The 'level' argument must be greater than 0 and less than 1")
  }
  if (level < 0.5) {
    warning("Confidence levels are often close to 1, e.g. 0.95") 
  }
  m <- mean(x)
  n <- length(x)
  SE <- sd(x)/sqrt(n)
  upper <- 1 - (1-level)/2
  ci <- m + c(-1,1)*qt(upper, n-1)*SE
  return(list(mean=m, se=SE, ci=ci))
}
GetCI(d, 99)

28 / 35

Exit handlers

Sorry.

29 / 35

Function forms

Not all function calls are the same. There are 4 types:

Prefix: the function name comes before its arguments (the most common)
infix: the function name comes in between its arguments (common in math operators and user-defined functions)
replacement: functions that replace values by assignment, like names(df) <- c("a", "b", "c").
special: functions like [[, if, and for.

🏠: there are 4 forms but everything can be written in prefix form.

30 / 35

Infix functions

R comes with a number of built-in infix operators: :, ::, :::, $, @, ^, *, /, +, -, >, >=, <, <=, ==, !=, !, &, &&, |, ||, ~, <-, and <<-.

But you can create your own!

`%+%` <- function(a, b) paste0(a, b)
"Sour beers " %+% "are elite"

## [1] "Sour beers are elite"

31 / 35

Replacement functions

Act like they modify their arguments in place
have the special name xxx <- and must have arguments named x and value
must return the modified object. For example, the following function modifies the second element of a vector:

`second<-` <- function(x, value) {
  x[2] <- value
  x
}

Replacement functions are used by placing the function call on the left side of <-:

x <- 1:10
second(x) <- 5L
x

##  [1]  1  5  3  4  5  6  7  8  9 10

32 / 35

Special forms

How to rewrite a function in prefix form

Useful references

Good introductory overview on functions in R: https://www.stat.berkeley.edu/~statcur/Workshop2/Presentations/functions.pdf
On lexical scoping in R
- On scoping in R: http://prl.ccs.neu.edu/blog/2019/09/10/scoping-in-r/
- Lexical and dynamic scope: http://prl.ccs.neu.edu/blog/2019/09/05/lexical-and-dynamic-scope/
- 4 kinds of scoping in R: http://prl.ccs.neu.edu/blog/2019/09/10/four-kinds-of-scoping-in-r/
- How R searches and finds stuff (many diagrams): http://blog.obeautifulcode.com/R/How-R-Searches-And-Finds-Stuff/

34 / 35

Thank you!35 / 35

Prerequisites

Packages:

suppressPackageStartupMessages({ library(tidyverse) library(skimr)})

Data:

# data from tidytuesday # https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-03-31/readme.md brewing_materials <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/brewing_materials.csv') beer_taxed <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/beer_taxed.csv') brewer_size <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/brewer_size.csv') beer_states <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/beer_states.csv')

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Advanced R by Hadley Wickham

Chapter 6: Functions

Asmae Toumi

@asmae_toumi

2020-07-26

Prerequisites

Function fundamentals

Function fundamentals (2)

Function fundamentals (3)

Function fundamentals (4)

Function fundamentals (5)

Function composition

Is R lexically or dynamically scoped?

Lexical vs dynamic scoping

Lexical scoping

Lexical scoping (1): Name Masking

Lexical scoping (2): A fresh start

Lexical scoping (3): Dynamic lookup

Lazy evaluation (1)

Lazy evaluation (2)

Lazy evaluation (2): Promises

Lazy evaluation (4): Default arguments

Lazy evaluation (5): Default arguments

Lazy evaluation (6): Missing arguments

dot-dot-dot (1)

dot-dot-dot (2)

Exiting a function (1)

Return values (1): explicit

Return values (1): implicit

Visible vs invisible values

Errors

Exit handlers

Function forms

Infix functions

Replacement functions

Special forms

How to rewrite a function in prefix form

Useful references

Thank you!

Prerequisites

Help