Functions

Learning objectives:

  • How to make functions in R
  • What are the parts of a function
  • Nested functions

How to make a simple function in R

Let’s imagine the structure of a function:

The black dot on the left is the environment. The two blocks to the right are the function arguments.

Function components

Functions have three parts, formals(), body(), and environment().

Example

coffee_ratings%>%slice(1:3)%>%select(1:5)
#> # A tibble: 3 × 5
#>   total_cup_points species owner                    country_of_origin farm_name 
#>              <dbl> <chr>   <chr>                    <chr>             <chr>     
#> 1             90.6 Arabica metad plc                Ethiopia          "metad pl…
#> 2             89.9 Arabica metad plc                Ethiopia          "metad pl…
#> 3             89.8 Arabica grounds for health admin Guatemala         "san marc…
avg_points <- function(species){
  # this function is for calculating the mean
  avg <-  coffee_ratings %>% 
  filter(species == species) %>% 
  summarise(mean = mean(total_cup_points))
  
  return(avg)
}
avg_points("Arabica")
#> # A tibble: 1 × 1
#>    mean
#>   <dbl>
#> 1  82.1
formals(avg_points)
#> $species
body(avg_points)
#> {
#>     avg <- coffee_ratings %>% filter(species == species) %>% 
#>         summarise(mean = mean(total_cup_points))
#>     return(avg)
#> }
environment(avg_points)
#> <environment: R_GlobalEnv>

Functions uses attributes, one attribute used by base R is srcref, short for source reference. It points to the source code used to create the function. It contains code comments and other formatting.

attr(avg_points, "srcref")
#> NULL

Primitive functions

Are the core function in base R, such as sum()

sum
#> function (..., na.rm = FALSE)  .Primitive("sum")

Type of primitives:

  • builtin
  • special
typeof(sum)
#> [1] "builtin"

These core functions have components to NULL.

Anonymous functions

If you don’t provide a name to a function

lapply(mtcars%>%select(mpg,cyl), function(x) length(unique(x)))
#> $mpg
#> [1] 25
#> 
#> $cyl
#> [1] 3
vector_len <- function(x) {
  length(unique(x))
}
lapply(mtcars%>%select(mpg,cyl), vector_len)
#> $mpg
#> [1] 25
#> 
#> $cyl
#> [1] 3

Invoking a function

args <- unique(coffee_ratings$species) %>% 
  `[[`(1) %>% 
  as.list()
  
  
  
do.call(avg_points, args)
#> # A tibble: 1 × 1
#>    mean
#>   <dbl>
#> 1  82.1

Function composition

square <- function(x) x^2
deviation <- function(x) x - mean(x)
x <- runif(100)
sqrt(mean(square(deviation(x))))
#> [1] 0.2841442
out <- deviation(x)
out <- square(out)
out <- mean(out)
out <- sqrt(out)
out
#> [1] 0.2841442
x %>%
  deviation() %>%
  square() %>%
  mean() %>%
  sqrt()
#> [1] 0.2841442

More about functions insights

Lexical scoping

Rules

  • Name masking
  • Functions versus variables
  • A fresh start
  • Dynamic lookup

Debugging

This function

g12 <- function() x + 1
x <- 15
g12()
#> [1] 16
codetools::findGlobals(g12)
#> [1] "+" "x"

You can change the function’s environment to an environment which contains nothing:

# environment(g12) <- emptyenv()
# g12()
# Error in x + 1 : could not find function "+"

… (dot-dot-dot)

Example

i01 <- function(y, z) {
  list(y = y, z = z)
}
i02 <- function(x, ...) {
  i01(...)
}
str(i02(x = 1, y = 2, z = 3))
#> List of 2
#>  $ y: num 2
#>  $ z: num 3
#> List of 2
#>  $ y: num 2
#>  $ z: num 3

Exiting a function

  1. Implicit or explicit returns

  2. Invisibility (<- most famous function that returns an invisible value)

  3. stop() to stop a function with an error.

  4. Exit handlers

Function forms

Everything that exists is an object. Everything that happens is a function call. — John Chambers

Case Study: SIR model function

This is an interesting example taken from a course on Coursera: Infectious disease modelling-ICL

The purpose of this example is to show how to make a model passing through making a function.

First we need to load some useful libraries:

library(deSolve)
library(reshape2)

Then set the model inputs:

  • population size (N)
  • number of susceptable (S)
  • infected (I)
  • recovered (R)

And add the model parameters:

  • infection rate (\(\beta\))
  • recovery rate (\(\gamma\))
N<- 100000                  # population

state_values<- c(S = N -1,   # susceptible
                 I = 1,      # infected
                 R = 0)      # recovered

parameters<- c(beta = 1/2,  # infection rate days^-1
               gamma = 1/4) # recovery rate days^-1

Then we set the time as an important factor, which defines the length of time we are looking at this model run. It is intended as the time range in which the infections spread out, let’s say that we are aiming to investigate an infection period of 100 days.

times<- seq(0, 100, by = 1)

Finally, we set up the SIR model, the susceptable, infected and recovered model. How do we do that is passing the paramenters through a function of the time, and state.

Within the model function we calculate one more paramenter, the force of infection: \(\lambda\)

sir_model<- function(time, state, parameters){
  with(as.list(c(state, parameters)),{
    N<- S + I + R
    lambda = beta * I/N    # force of infection
    dS<- - lambda * S 
    dI<- lambda * S - gamma * I
    dR<- gamma * I
    return(list(c(dS,dI,dR)))
  })
}

Once we have our SIR model function ready, we can calculate the output of the model, with the help of the function ode() from {deSolve} package.

output<- as.data.frame(ode(y = state_values,
                           times = times,
                           func = sir_model,
                           parms = parameters))
output %>% head
#>   time        S        I         R
#> 1    0 99999.00 1.000000 0.0000000
#> 2    1 99998.43 1.284018 0.2840252
#> 3    2 99997.70 1.648696 0.6487171
#> 4    3 99996.77 2.116939 1.1169863
#> 5    4 99995.56 2.718152 1.7182450
#> 6    5 99994.02 3.490086 2.4902600

In addition to our builtin SIR model function we can have a look at:

?deSolve::ode()

It solves Ordinary Differential Equations.

deSolve:::ode
#> function (y, times, func, parms, method = c("lsoda", "lsode", 
#>     "lsodes", "lsodar", "vode", "daspk", "euler", "rk4", "ode23", 
#>     "ode45", "radau", "bdf", "bdf_d", "adams", "impAdams", "impAdams_d", 
#>     "iteration"), ...) 
#> {
#>     if (is.null(method)) 
#>         method <- "lsoda"
#>     if (is.list(method)) {
#>         if (!inherits(method, "rkMethod")) 
#>             stop("'method' should be given as string or as a list of class 'rkMethod'")
#>         out <- rk(y, times, func, parms, method = method, ...)
#>     }
#>     else if (is.function(method)) 
#>         out <- method(y, times, func, parms, ...)
#>     else if (is.complex(y)) 
#>         out <- switch(match.arg(method), vode = zvode(y, times, 
#>             func, parms, ...), bdf = zvode(y, times, func, parms, 
#>             mf = 22, ...), bdf_d = zvode(y, times, func, parms, 
#>             mf = 23, ...), adams = zvode(y, times, func, parms, 
#>             mf = 10, ...), impAdams = zvode(y, times, func, parms, 
#>             mf = 12, ...), impAdams_d = zvode(y, times, func, 
#>             parms, mf = 13, ...))
#>     else out <- switch(match.arg(method), lsoda = lsoda(y, times, 
#>         func, parms, ...), vode = vode(y, times, func, parms, 
#>         ...), lsode = lsode(y, times, func, parms, ...), lsodes = lsodes(y, 
#>         times, func, parms, ...), lsodar = lsodar(y, times, func, 
#>         parms, ...), daspk = daspk(y, times, func, parms, ...), 
#>         euler = rk(y, times, func, parms, method = "euler", ...), 
#>         rk4 = rk(y, times, func, parms, method = "rk4", ...), 
#>         ode23 = rk(y, times, func, parms, method = "ode23", ...), 
#>         ode45 = rk(y, times, func, parms, method = "ode45", ...), 
#>         radau = radau(y, times, func, parms, ...), bdf = lsode(y, 
#>             times, func, parms, mf = 22, ...), bdf_d = lsode(y, 
#>             times, func, parms, mf = 23, ...), adams = lsode(y, 
#>             times, func, parms, mf = 10, ...), impAdams = lsode(y, 
#>             times, func, parms, mf = 12, ...), impAdams_d = lsode(y, 
#>             times, func, parms, mf = 13, ...), iteration = iteration(y, 
#>             times, func, parms, ...))
#>     return(out)
#> }
#> <bytecode: 0x0000022b02095d58>
#> <environment: namespace:deSolve>
methods("ode")
#> Warning in .S3methods(generic.function, class, envir, all.names = all.names, :
#> function 'ode' appears not to be S3 generic; found functions that look like S3
#> methods
#> [1] ode.1D   ode.2D   ode.3D   ode.band
#> see '?methods' for accessing help and source code

With the help of the {reshape2} package we use the function melt() to reshape the output:

melt(output,id="time") %>% head
#>   time variable    value
#> 1    0        S 99999.00
#> 2    1        S 99998.43
#> 3    2        S 99997.70
#> 4    3        S 99996.77
#> 5    4        S 99995.56
#> 6    5        S 99994.02

The same as usign pivot_longer() function.

output%>%
  pivot_longer(cols = c("S","I","R"),
               names_to="variable",
               values_to="values") %>%
  arrange(desc(variable)) %>%
  head
#> # A tibble: 6 × 3
#>    time variable values
#>   <dbl> <chr>     <dbl>
#> 1     0 S        99999 
#> 2     1 S        99998.
#> 3     2 S        99998.
#> 4     3 S        99997.
#> 5     4 S        99996.
#> 6     5 S        99994.

Before to proceed with the visualization of the SIR model output we do a bit of investigations.

What if we want to see how melt() function works?

What instruments we can use to see inside the function and understand how it works?

Using just the function name melt or structure() function with melt as an argument, we obtain the same output. To select just the argument of the function we can do args(melt)

reshape2:::melt
#> function (data, ..., na.rm = FALSE, value.name = "value") 
#> {
#>     UseMethod("melt", data)
#> }
#> <bytecode: 0x0000022b08d810e8>
#> <environment: namespace:reshape2>
body(melt)
#> {
#>     UseMethod("melt", data)
#> }
formals(melt)
#> $data
#> 
#> 
#> $...
#> 
#> 
#> $na.rm
#> [1] FALSE
#> 
#> $value.name
#> [1] "value"
environment(melt)
#> <environment: namespace:reshape2>
typeof(melt)
#> [1] "closure"

“R functions simulate a closure by keeping an explicit reference to the environment that was active when the function was defined.”

ref: closures

Try with methods(), or print(methods(melt)): Non-visible functions are asterisked!

The S3 method name is followed by an asterisk * if the method definition is not exported from the package namespace in which the method is defined.

methods("melt", data)
#> [1] melt.array*      melt.data.frame* melt.default*    melt.list*      
#> [5] melt.matrix*     melt.table*     
#> see '?methods' for accessing help and source code
methods(class="table")
#>  [1] [             aperm         as.data.frame as_tibble     Axis         
#>  [6] coerce        initialize    lines         melt          plot         
#> [11] points        print         show          slotsFromS3   summary      
#> [16] tail         
#> see '?methods' for accessing help and source code
help(UseMethod)

We can access to some of the above calls with getAnywhere(), for example here is done for “melt.data.frame”:

getAnywhere("melt.data.frame")
#> A single object matching 'melt.data.frame' was found
#> It was found in the following places
#>   registered S3 method for melt from namespace reshape2
#>   namespace:reshape2
#> with value
#> 
#> function (data, id.vars, measure.vars, variable.name = "variable", 
#>     ..., na.rm = FALSE, value.name = "value", factorsAsStrings = TRUE) 
#> {
#>     vars <- melt_check(data, id.vars, measure.vars, variable.name, 
#>         value.name)
#>     id.ind <- match(vars$id, names(data))
#>     measure.ind <- match(vars$measure, names(data))
#>     if (!length(measure.ind)) {
#>         return(data[id.vars])
#>     }
#>     args <- normalize_melt_arguments(data, measure.ind, factorsAsStrings)
#>     measure.attributes <- args$measure.attributes
#>     factorsAsStrings <- args$factorsAsStrings
#>     valueAsFactor <- "factor" %in% measure.attributes$class
#>     df <- melt_dataframe(data, as.integer(id.ind - 1), as.integer(measure.ind - 
#>         1), as.character(variable.name), as.character(value.name), 
#>         as.pairlist(measure.attributes), as.logical(factorsAsStrings), 
#>         as.logical(valueAsFactor))
#>     if (na.rm) {
#>         return(df[!is.na(df[[value.name]]), ])
#>     }
#>     else {
#>         return(df)
#>     }
#> }
#> <bytecode: 0x0000022afff9dd58>
#> <environment: namespace:reshape2>

References:

Going back to out model output visualization.

output_full<- melt(output,id="time")
output_full$proportion<- output_full$value/sum(state_values)
ggplot(data = output, aes(x = time, y = I)) + 
  geom_line() + 
  xlab("Time(days)") +
  ylab("Number of Infected") + 
  labs("SIR Model: prevalence of infection")

ggplot(output_full, aes(time, proportion, color = variable, group = variable)) + 
  geom_line() + 
  xlab("Time(days)") +
  ylab("Prevalence") + 
  labs(color = "Compartment", title = "SIR Model")