Chapter 6 Functions

6.2.2 Primitives

So if you are familiar with C can you just write a function in C in R? What does that process look like? I think this is a bigger question of digging into the relationship between C and R.

Primitives are part of R core, and can only be written by the R-core team. At its heart, R is the set of primitive C functions underneath it.

You can use Rcpp to include C++ code in your R code, but these aren’t Primitives. There are also other techniques which we’ll likely see covered in later chapters. Here’s an example using Rcpp.

[1] "IPAs suck"

Are there any non-base primitives? If so how is that possible!

XXX

6.2.5.1 Exercises

  1. Q: Given a name, like "mean", match.fun() lets you find a function. Given a function, can you find its name? Why doesn’t that make sense in R?

    A: A name can only point to a single object, but an object can be pointed to by 0, 1, or many names. What are names of the functions in the following block?

    ## function(x) sd(x) / mean(x)

There isn’t a 1 to 1 mapping between functions and names in R. Multiple names may point to the same function as we see for f1, f2, and f3. Also, each function has its own environment so it’s possible that two functions might have the same “code” but are not the same because they have different environments (or closures). Lastly, anonymous functions don’t have names so we’d have no way to look these up.

We could find the names of our functions if they are all in the global environment using body(x) == body(y)

[1] f1  f2  f3

But that’s just deparsing the body into a string and comparing the values. So if you want to think of two functions as being equal if their deparsed body strings as the same, then that’s technically possible but that is just like searching for every variable that has the value of 5 [possible but not efficient].

The main point is that name -> object is a one way (non-unique) look up in R. There’s no efficient way to go backwards. This is true for all values, not just functions.

6.3 Function composition

When comparing nested, intermediate, and piping functions, it looks like Hadley flips the order of f() and g() between bullet points

It does look like he does that!

6.0.1 Nested

## g is: 4

6.0.2 Intermediate

This is written in the book as y <- f(x); g(y) but should be flipped to y <- g(x); f(y) if we are to follow the nested example

## g is: 4

6.0.3 Piping

This also needs to be flipped from x %>% f() %>% g() to x %>% g() %>% f()

## g is: 4

6.4 Lexical scoping

“The scoping rules use a parse-time, rather than a run-time structure”? What is “parse-time” and “run-time”? How do they differ?

parse-time is when the function gets defined: when the formals and body get set. run-time is when it actually gets called. This function doesn’t get past parse-time because of the syntax error

get_state <- function(in_df, state_name){
  out_df % in_df[in_df$state == state_name, ]

Error: unexpected input in:

"get_state <- function(in_df, state_name){
  out_df % in_df[in_df$state == state_name, ]"
  return(out_df)

Error: object 'out_df' not found
}

Error: unexpected '}' in "}"

This function will get parsed successfully but could fail at run at run-time if the input data frame doesn’t have a column named state:

## [1] Sepal.Length Sepal.Width  Petal.Length Petal.Width  Species     
## <0 rows> (or 0-length row.names)

At R’s build-time, if you want to run a function from a package that isn’t loaded it will not throw an error but at run-time it will if the required package is not loaded:

Without dplyr this will fail

Error in select({: could not find function "select"

This will work:

## # A tibble: 5 x 1
##      x1
##   <dbl>
## 1 0.835
## 2 0.364
## 3 0.728
## 4 0.192
## 5 0.353

6.4.3 A fresh start

How would we change this code so that the second call of g11() is 2?

## [1] 1
## [1] 1

6.5 Lazy evaluation

“This allows you to do things like include potentially expensive computations in function arguments that will only be evaluated if needed”

Does anyone have an example of this? We discussed a function that will only perform expensive tasks given the context of the function perhaps?

Maybe a situation where we can give a function default arguments where sampleis a stand in for longer expensive functions like different fancy modeling techniques? We can workshop this…

##  [1] 207 213 218 293 211 254 240 261 239 278

6.5.1 Promises

Can we discuss the order that this happening in? Is it that Calculating... is printed, then x*2 then x*2 again? I am still reading this as: h03(double(20), double(20)) which is an incorrect mental model because the message is only printed once…

Explain what’s happeining here below in words, and restructure the promise image to make more sense

## Calculating...
## double before
## Registered S3 method overwritten by 'pryr':
##   method      from
##   print.bytes Rcpp
## $code
## h03(20)
## 
## $env
## <environment: R_GlobalEnv>
## 
## $evaled
## [1] FALSE
## 
## $value
## NULL
## 
## h03 before
## $code
## [1] 20
## 
## $env
## <environment: R_GlobalEnv>
## 
## $evaled
## [1] FALSE
## 
## $value
## NULL
## 
## h03 after
## $code
## [1] 20
## 
## $env
## NULL
## 
## $evaled
## [1] TRUE
## 
## $value
## [1] 20
## 
## double after
## $code
## h03(20)
## 
## $env
## NULL
## 
## $evaled
## [1] TRUE
## 
## $value
## [1] 20 20
## [1] 40 40

## [1] 20

var_doesnt_exist is a promise within g, we use the promises within g when we call f but f never uses its second argument so this runs without a problem. When would we want to leverage this behavior?

The unevaluated var_doesnt_exist doesn’t exist , but we can use substitute to to get the expression out of a promise! If we modify our function we can play with the expression contained in b:

## You entered var_doesnt_exist as `b`
## [1] 20

We can even evaluate b and use it to create a dplyr like pull function:

##  [1] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
## Levels: setosa versicolor virginica

I am going through chapter 6 of the book and can’t figure out how or why or what is different in terms of explanation of these two chunks of code:

g07 <- function(x) x + 1
g08 <- function() {
  g07 <- function(x) x + 100
  g07(10)
}
g08()
#> [1] 110
y <- 10
h02 <- function(x) {
  y <- 100
  x + 1
}
h02(y)
#> [1] 11

In (1), the g07 function defined in the g08 function takes precedence over the g07 function defined in the global scope (i.e. the first line). So when g08 is called (last line), g07 defined in g08 is executed with the g07(10) call. This function adds the input (i.e. 10) to 100, which returns 110.

In (2), when h02 is called, the y from the global scope (i.e. first line) is passed into the h02 function. In the scope of the function, y from the global scope becomes the x argument in the h02 function. The assignment of y <- 100 in the local scope is a “red herring” (i.e. it isn’t used by the function at all). Then, seeing how y <- 100 is not used, it should be more clear that the value of 10 passed into the function as x (i.e. y from the global scope) is used to get the output 11. (10 is added to 1.)

Note that with (2), the function would fail if y <- 10 hadn’t been defined in the global environment. On the other hand, with (1), the function would not fail if the g07 function had not been defined in the global environment (because only the g07 function defined in the body of g08 matters).

6.5.2 Default arguments

I don’t quite understand why x = ls() is different from ls() here; aren’t we still assigning x = ls() but without specifying x?

## [1] "a" "x"
##  [1] "a"          "double"     "f"          "f1"         "f2"        
##  [6] "f3"         "func_1"     "g"          "g11"        "get_state" 
## [11] "h03"        "h05"        "mega_model" "plop"       "test_tbl"  
## [16] "y"

The difference is where the promise is created. ls() is always evaluated inside h05 when x is evaluated. The difference is the environment. When ls() is provided as an explicit parameter, x is a promise whose environment is the global environment. When ls() is a default parameter, it is evaluated in the local environment where it is used.

Hypothesis: does nesting ls() in h05 first evaluate ls() then evaluate h05() ?

[1] "a" "x" "y"
[1] "h05"
[1] "h05" "y"  
[1] "h05" "y" 
[1] "h05" "x"   "y" 
[1] "h05" "x"   "y" 

Notice in all of the latter calls, a is not returned - so it’s not evaluating ls() inside of the function.

6.5.4.3 Exercise

I understand this problem is showing us an example of name masking (the function doesn’t need to use the y = 0 argument because it gets y from within the definition of x, but I’m fuzzy on what exactly the ; does. What does the syntax {y <- 1; 2} mean? Could it be read as "Set y <- 1 and x <- 2?

## [1] 2 1

The curly brackets are an expression, which can be read as

## [1] 2

This is returning 2 and setting 1 to y. The colon can be read as a new line in the expression. x is called inside the function and overwrites the argument value of y

## 02
## [1] 2 1

Compare to:

## [1] 0 2

What is happening here: - The default value of x is “assign 1 to y then return 2 implicitly.” - The default value of y is 0. - x and y aren’t their defaults until they’re referenced and there isn’t a value, so when you invoke the question x is the default, but y is never the default. - When you get to c(x, x is evaluated. Now the value of y is 1 instead of its default value! - When you get to c(x, y), y is now 1, so the return is 2 1

Using the original f1 function, if we write f1(x <- 5) we get 5 0, When you make that call, the function’s x argument gets set to x <- 5 (instead of the default). When you get to c(x, the x <- 5 call gets evaluated in the calling environment (global, most likely, unless you’re calling it from inside a function or something).

To see where x gets assigned, try this:

## [1] "Nothing to see here."
## Error: object 'x' not found

Since x is never used in this version, the x = {x <- 5} promise never gets evaluated, so x never gets set in the calling environment. But if you do the same thing with f1, x is now 5 in the calling environment.

Also note that calling the <- function returns the value (the second argument) invisibly, so y <- {x <- 5} assigns 5 to both y and x. I wouldn’t recommend ever doing this on purpose, but it’s useful to know for debugging weird cases.

A piece that ALMOST confused me was that the function’s default value only ever “exists” in the function’s environment, not in the calling environment, so the original case doesn’t change y to 1 globally. But f1({y <- 1; 2}) WILL change y globally… but does not change the value of y inside the function.

6.5.4.4 Exercise

I know this isn’t exactly needed to answer the question, but how do we access a function that has methods? For instance - here I want to dig into the hist function using hist

## function (x, ...) 
## UseMethod("hist")
## <bytecode: 0x7fabddb09ad0>
## <environment: namespace:graphics>

does not give me the actual contents of the actual function….

We need to access is using hist.<method>

6.6 dot dot dot

“(See also rlang::list2() to support splicing and to silently ignore trailing commas…” Can we come up with a simple use case for list2 here? The docs use list2(a = 1, a = 2, b = 3, b = 4, 5, 6) but how is this different from list?

## [1] TRUE

list2 is most helpful when we need to force environment variables with data variables. We can see this by creating a function that takes a variable number of arguments:

## [1] 1 2 3
## [1] 1 2 3

The main difference with list(…) is that list2(…) enables the !!! syntax to splice lists:

## [1] 1 2 3 4
Error in !x : invalid argument type

lapply() uses ... to pass na.rm on to mean()” Um, how?

## List of 2
##  $ : num 2
##  $ : num 5

An lapply takes on two main arguments: what you want to loop over and the function to apply to each element. By including ... lapply allows you to supply additional arguments which will be passed to the function inside the lapply. In this case, na.rm = TRUE is being applied to mean every time it’s being called in the loop.

6.6.1.2 Exercise

I tried running browser(plot(1:10, col = "red")) to peek under the hood but only got Called from: top level in the console. What am I missing?

We can use debugonce!

6.7.4 Exit handlers

“Always set add = TRUE when using on.exit() If you don’t, each call to on.exit() will overwrite the previous exit handler.” What does this mean?

add = TRUE is important when you have more than one on.exit function!

## a
## b

Can we go over this code? How does it not change your working directory after you run the function

## [1] "/Users/mayagans/Documents/bookclub-Advanced_R/QandA"

The behavior of setwd “changing the working directory” is actually a side effect of the function - it invisibly returns the previous working directory as the value of the function (potentially for the exact purpose demonstrated). We can use this within our on.exit function to change back to the prior working directory!

If on.exit fails will it continue onto the next on.exit so long as add == TRUE ?``on.exit fails it’ll continue onto the next one

Error in f() : Error
yay, still called.

6.7.5.4 Exercise

This question is flagged as “started” let’s try to complete it! Hadley comments in the repo: “I think I’m more interested in supplying a path vs. a logical value here”.

Q: How does the chdir parameter of source() compare to with_dir()? Why might you prefer one approach to the other? The with_dir() approach was given in the book as

A: with_dir() takes a path to a working directory as an argument. First the working directory is changed accordingly. on.exit() ensures that the modification to the working directory are reset to the initial value when the function exits.

with_dir gives you the flexibility to change the path to wherever you want (maybe a parent-level folder) whereas source(chdir=TRUE) changes the path to “where that file lives specifically”.

Given the following file structure:

Imagine I want to run import_data.R, but it needs to reference images/controlflow.png. we can do this by setting the wd to advRbookclub:

Or we can use:

but then, we’d need to include something like setwd(here::here()) in import_data.R so that it goes back to AdvancedR.Rproj and sets the working directory there.

In conclusion:

  • source is a base R function so reduces dependencies. Once set, you could use setwd("..")assuming you can have some confidence that it’s part of a repository or something?

  • with_dir is exported from the withr package but is a more fine-tuned control by passing the specific folder name as opposed to a boolean TRUE | FALSE.

6.7.5.5 Exercise

Can we go over the source code of capture.output and capture.output2?

There were several new terms here to me when going over this function:

  • stderr error output
  • stdout normal output
  • sink diverts R output to a connection.
  • textConnection allows R character vectors to be read as if they were being read from a text file.
    • They can capture R output to a character vector
    • They can be used to create a new character object or append to an existing one in the user’s workspace.
    • At all times the complete lines output to the connection are available in the R object. Closing the connection writes any remaining output to a final element of the character vector.
function (..., file = NULL, append = FALSE, type = c("output",
                                                    "message"), split = FALSE)
{
  # Capture dots
  # [-1L] removes the list()
  args <- substitute(list(cat("a", "b", "c", sep = "\n")))[-1L]
  
  # match
  type <- match.arg(type)
  # set default return value
  rval <- NULL
  # set default closer
  closeit <- TRUE
  # if file is null, then
  if (is.null(file))
    # If file is null, then create a write-only text connection object which will
    # save to the variable rval in the execution environment (local = TRUE).
    # see https://biostatmatt.com/R/R-conn-ints.pdf for gritty info on connections (and sinks)
    file <- textConnection("rval", "w", local = TRUE)
  else if (is.character(file))
    # if "file" is a character vector, then interpret it as a filename.  Open a
    # file connection in either append or write mode, depending on the value of
    # "append"
    file <- file(file, if (append)
      "a"
      else "w")
  else if (inherits(file, "connection")) {
    # if "file" is already a connection object, check if it is open.  If not, open it
    # in append mode, if specified, otherwise in write mode.
    # inherits refers to the S3 class system.
    #
    # Browse[2]> class(file)
    # [1] "textConnection" "connection"
    if (!isOpen(file))
      open(file, if (append)
        "a"
        else "w")
    # if the connection is already open, don't close it in this function.
    else closeit <- FALSE
  }
  # if you get here, then you misspecified "file"
  else stop("'file' must be NULL, a character string or a connection")
  # sink all output of type "type" into the connection "file".  If you would like
  # the output to continue to its original source, then "split" it.
  #
  # by default, messages (messages, warnings, errors) go to stderr and
  # everything else to stdout.
  sink(file, type = type, split = split)
  on.exit({
    # on exit, call sink with the same arguments and without "file" being specified.
    # this will cause the sink from the line before to terminate.
    sink(type = type, split = split)
    # Close the connection (always, unless "file" was provided as
    # an already open connection)
    if (closeit) close(file)
  })
  # store the calling environment in pf.  i.e. pf refers to the environment in
  # which capture.output was called.
  pf <- parent.frame()
  # define a local function which will evaluate its sole argument (expr) in the
  # parent frame.
  evalVis <- function(expr) withVisible(eval(expr, pf))
  # for each argument collected in the dot dot dot.
  #
  # use split = TRUE to help you debug at this point.  When you try to poke around
  # with the sink applied, you are rightfully stymied because all output is going
  # to the sink connection!
  for (i in seq_along(args)) {
    # store the argument in expr
    expr <- args[[i]]
    # based on the mode of the expr, evaluate it.
    tmp <- switch(mode(expr),
                  expression = lapply(expr, evalVis),
                  call = ,
                  name = list(evalVis(expr)),
                  stop("bad argument"))
    # print any visible values output during evaluation.
    # This print will be collected by the sink we set up earlier and saved to
    # the file connection.
    for (item in tmp) if (item$visible)
      print(item$value)
  }
  # calling on.exit with no arguments will clear the exit handler.
  # we are doing this because the on.exit was designed to safeguard the sink
  # and files being closed in the case of an exception when evaluating the
  # passed in arguments.
  on.exit()
  # undo the sink
  sink(type = type, split = split)
  # close the file if necessary
  if (closeit)
    close(file)
  # return the captured output or null invisibly otherwise.
  if (is.null(rval))
    invisible(NULL)
  else rval
}

The second function will always sink output to a temporary file, and then return the results by reading the file back in (and returning a character vector). It uses two exit handlers, one to clean up the temporary file, and one to remove the sink.

6.8.4 Replacement functions

Can we put into words the translation for

## [1] "a" "b" "c"
## [1] "a"   "two" "c"

Being equal to

We can dig into the source code, but the jist is that in order to implement these complex assignments:

  1. Copy x into a temporary variable *temp*
  2. [<-(names(*tmp*), 2, "two") modifies the second element of the names of *temp*,
  3. names<-(*tmp* assigns step 2 to *temp* names
  4. Clean up by removing the temp variable

6.8.6.3 Exercise

Q: Explain why the following code fails:

```r
modify(get("x"), 1) <- 10
#> Error: target of assignment expands to non-language object
```

A: First, let’s define x and recall the definition of modify() from the textbook:

R internally transforms the code and the transformed code reproduces the error above.

The error occurs during the assignment, because no corresponding replacement function, i.e. get<- exists for get(). To confirm this we can reproduce the error via the following simple example.

I don’t really see why this needs to be expanded upon….