class: center, middle, inverse, title-slide # Advanced R ## Chapter 17: Metaprograming - Big Picture ### Josh Pohlkamp-Hartt ###
@JPohlkampHartt
### 2020-11-18 --- # Outline - Metaprograming Introduction - Code Is Data and also a 🌲?! 🤯 - Evaluation of Code - Intro To Quosures --- ##Prerequisites - We will use **rlang** to introduce the user-friendly/tidy versions of the main methods in metaprogramming. - The **lobstr** package is used to understand the structure of R code. ```r library(rlang) library(lobstr) ``` <img src="img/lobster_edit_for_web_jpg.jpg" width="483" /> --- ## What is Metaprogramming? - The idea is that code is data that can be inspected and modified programmatically. - Ex: Why we don't need to quote packages in `library()` or can write formulas in `glm()` with `+, *, :` - We will focus on Tidy Eval rather than base R functions, this is to avoid some of R's ambiguous legacy code. --- ## Non-Standard Programing - What is Non-Standard Programing (NSE)? Very roughly, it is to programmatically modify an expression or its meaning after it is issued but before it is executed. - Example: how R knows to use "`hp > 250`" on our data frame at run time rather that the traditional `mtcars[mtcars$hp > 250,c("mpg","hp")]` method where we have to link `mtcars` to `hp`. ```r subset(mtcars%>%dplyr::select(mpg,hp), hp > 250) ``` ``` ## mpg hp ## Ford Pantera L 15.8 264 ## Maserati Bora 15.0 335 ``` - Hadley opines that NSE is a sloppy definition for this behavior, so we will generally avoid this terminology in these chapters. - R is one of the few languages that allows as much flexibility and accessibility for NSE. Python is jealous. --- ## Why use Metaprogramming? - The ability to modify the meaning of code given context or inputs is extremely powerful - This allows us to use many of the key features in the Tidyverse - R does it by design, so why not use its full power? <img src="img/more-power-scotty.png" width="356" /> --- ## Code is Data: Expressions - We can capture code and compute on it like other types of data - Captured code is called an *Expression*, we use `rlang::expr()` to do this. ```r expr(mean(x, na.rm = TRUE)) ``` ``` ## mean(x, na.rm = TRUE) ``` - An expression can be one of 4 types of objects: - calls (representing captured functions) - symbols (names of an object) - scalar constants (length 1 atomic vectors) - pairlists (legacy version of lists used to store the arguments of a function) --- ## Code is Data: Enriching! - Most often we want to deal with code passed through a function, this will not work with `expr()`. ```r capture_it <- function(x) { expr(x) } capture_it(a + b + c) ``` ``` ## x ``` - To capture code through a function call, where there is lazy evaluation, we need to use **enriched** expressions: `enexpr()`. ```r capture_it <- function(x) { enexpr(x) } capture_it(a + b + c) ``` ``` ## a + b + c ``` --- ## Code is Data: Inspect or Modify - Once we have captured code, we can interact with it like most data objects - Inspection: ```r f <- expr(f(x = 1, y = 2)) f[[2]] ``` ``` ## [1] 1 ``` ```r f$x ``` ``` ## [1] 1 ``` - Modification: ```r f$z <- 3 f ``` ``` ## f(x = 1, y = 2, z = 3) ``` --- ## Code is Data: And 🌲's! - Most programming language represents code as a tree, often called the **abstract syntax tree**. - To view the tree structure we use `lobstr::ast()`. This function displays the underlying tree structure. Function calls form the branches of the tree and the leaves are symbols and constants. ```r lobstr::ast(f(a, "b")) ``` ``` ## █─f ## ├─a ## └─"b" ``` - This works for all types of functions including the prefix form. ```r lobstr::ast(1 + 2 * 3) ``` ``` ## █─`+` ## ├─1 ## └─█─`*` ## ├─2 ## └─3 ``` --- ## Code is Data: Code Generator - We can use code to create new trees. There are two main tools: `rlang::call2()` and **unquoting**. - `call2()` constructs a function call from its components: first the function to call, and then the arguments to call it with. ```r call2("+", 1, 2) ``` ``` ## 1 + 2 ``` ```r call2("+", 1, call2("*", 2, 3)) ``` ``` ## 1 + 2 * 3 ``` --- ## Code is Data: !! - `call2` can be cumbersome for complex operations, an alternative is to build complex code trees by combining simpler code trees with a template. `expr()` and `enexpr()` have built-in support for this idea via `!!`, the *unquote operator*. - Unquoting allows you to selectively evaluate parts of the expression that would otherwise be quoted, which effectively allows you to merge ASTs using a template AST. - Basically !!x inserts the code tree stored in x into the expression. ```r xx <- expr(x + x) yy <- expr(y + y) expr(!!xx / !!yy) ``` ``` ## (x + x)/(y + y) ``` --- ## Code is Data: !! Cont. - Unquoting is even more useful when wrapped it up into a function. - First we using `enexpr()` to capture the user’s expression, then `expr()` and `!!` to create a new expression using a template. ```r cv <- function(var) { var <- enexpr(var) expr(sd(!!var) / mean(!!var)) } cv(x+y) ``` ``` ## sd(x + y)/mean(x + y) ``` - Capturing (also known as quoting) and unquoting together make up **Quasiquotation**. - Quasiquotation makes it easy to create functions that combine code written by the function’s author with code written by the function’s user. --- ## Evaluation: How to Evaluate - We can create, modify, and inspect code... what if we want to run it? - We can do that too with `base::eval()`. - Evaluating code relies on an expression and an environment to give the symbols definition. ```r eval(cv(x+y), env(x=rnorm(1000,0,1),y=rnorm(1000,0,1))) ``` ``` ## [1] 177.778 ``` - One advantage of evaluating code manually is that you can define the environment. There are two main reasons to do this: - To temporarily override functions to implement a domain specific language. - To add a data mask. - A data mask is an environment containing user-supplied objects. Objects in the mask have precedence over objects in the environment (i.e. they mask those objects). --- ## Evaluation: Custom with Functions - In our last example we bound the names `x` and `y` to random vectors of length 1000. As we saw in Chapter 6, we also bind names to functions. - This allows us to override the behaviour of existing functions. ```r string_math <- function(x) { e <- env( caller_env(), `+` = function(x, y) paste0(x, y), `-` = function(x, y) gsub(y, "", x), `*` = function(x, y) strrep(x, y) ) eval(enexpr(x), e) } What <- "Power!" string_math("More " + What) ``` ``` ## [1] "More Power!" ``` ```r string_math(("xyz" - "y") * 3) ``` ``` ## [1] "xzxzxz" ``` --- ## Evaluation: Custom with Data - Another application is modifying evaluation to look for variables in a data frame instead of an environment. This idea powers `ggplot2::aes()` and `dplyr::mutate()`. - It’s possible to use `eval()` for this, but this is less user friendly and can be restricting, so we’ll switch to `rlang::eval_tidy()` instead. - The main differences with `eval_tidy()` are that it takes an expression, environment and a **data mask** to reduce ambiguity. ```r df <- data.frame(x = 1:5, y = rep(1,5)) eval_tidy(expr(x + y), df, env(x=11:15)) ``` ``` ## [1] 2 3 4 5 6 ``` - To ensure we are using the data mask when we mean to, `eval_tidy` provides us with the pronouns: `.data` and `.env` to use in our expressions. ```r df <- data.frame(x = 1:5, y = rep(1,5)) eval_tidy(expr(.env$x + y), df, env(x=11:15)) ``` ``` ## [1] 12 13 14 15 16 ``` --- ## Quosures - We can see an issue with our environments in the below example. We would like the value of `a` to be the one that is visible to the user (10), not internal to the function (1000). ```r with2 <- function(df, expr) { a <- 1000 eval_tidy(enexpr(expr), df) } df <- data.frame(x = 1:3) a <- 10 with2(df, x + a) ``` ``` ## [1] 1001 1002 1003 ``` - We can solve this by using a **quosure**, which bundles an expression with an environment. - The name is a portmanteau of quoting and closure, because a quosure both quotes the expression and encloses the environment. - Quosures solidifies the internal promise object into something that you can program with. --- ## Quosures Cont. - `eval_tidy()` is designed to use quosures, just replace the expression and do not supply an environment in the call. ```r with2 <- function(df, expr) { a <- 1000 eval_tidy(enquo(expr), df) } with2(df, x + a) ``` ``` ## [1] 11 12 13 ``` - When using a data mask it is best practice to use a quosure. --- ## Summary - Metaprogramming lets us modify code after definition and before evaluation - Captured code = Expressions and they act like data (Chapter 18) - Quasiquotation allows us to interactively generate code (Chapter 19) - Evaluation allows for customization (Chapter 20)