18.3 Expression

  • Collectively, the data structures present in the AST are called expressions.
  • These include:
    1. Constants
    2. Symbols
    3. Calls
    4. Pairlists

18.3.1 Constants

  • Scalar constants are the simplest component of the AST.
  • A constant is either NULL or a length-1 atomic vector (or scalar)
    • e.g., TRUE, 1L, 2.5, "x", or "hello".
  • We can test for a constant with rlang::is_syntactic_literal().
  • Constants are self-quoting in the sense that the expression used to represent a constant is the same constant:
identical(expr(TRUE), TRUE)
#> [1] TRUE
identical(expr(1), 1)
#> [1] TRUE
identical(expr(2L), 2L)
#> [1] TRUE
identical(expr("x"), "x")
#> [1] TRUE
identical(expr("hello"), "hello")
#> [1] TRUE

18.3.2 Symbols

  • A symbol represents the name of an object.
    • x
    • mtcars
    • mean
  • In base R, the terms symbol and name are used interchangeably (i.e., is.name() is identical to is.symbol()), but this book used symbol consistently because “name” has many other meanings.
  • You can create a symbol in two ways:
    1. by capturing code that references an object with expr().
    2. turning a string into a symbol with rlang::sym().
expr(x)
#> x
sym("x")
#> x
  • A symbol can be turned back into a string with as.character() or rlang::as_string().
  • as_string() has the advantage of clearly signalling that you’ll get a character vector of length 1.
as_string(expr(x))
#> [1] "x"
  • We can recognize a symbol because it is printed without quotes
expr(x)
#> x
  • str() tells you that it is a symbol, and is.symbol() is TRUE:
str(expr(x))
#>  symbol x
is.symbol(expr(x))
#> [1] TRUE
  • The symbol type is not vectorised, i.e., a symbol is always length 1.
  • If you want multiple symbols, you’ll need to put them in a list, using rlang::syms().

Note that as_string() will not work on expressions which are not symbols.

as_string(expr(x+y))
#> Error in `as_string()`:
#> ! Can't convert a call to a string.

18.3.3 Calls

  • A call object represents a captured function call.
  • Call objects are a special type of list.
    • The first component specifies the function to call (usually a symbol, i.e., the name fo the function).
    • The remaining elements are the arguments for that call.
  • Call objects create branches in the AST, because calls can be nested inside other calls.
  • You can identify a call object when printed because it looks just like a function call.
  • Confusingly typeof() and str() print language for call objects (where we might expect it to return that it is a “call” object), but is.call() returns TRUE:
lobstr::ast(read.table("important.csv", row.names = FALSE))
#> █─read.table 
#> ├─"important.csv" 
#> └─row.names = FALSE
x <- expr(read.table("important.csv", row.names = FALSE))
typeof(x)
#> [1] "language"
is.call(x)
#> [1] TRUE

18.3.4 Subsetting

  • Calls generally behave like lists.
  • Since they are list-like, you can use standard subsetting tools.
  • The first element of the call object is the function to call, which is usually a symbol:
x[[1]]
#> read.table
is.symbol(x[[1]])
#> [1] TRUE
  • The remainder of the elements are the arguments:
is.symbol(x[-1])
#> [1] FALSE
as.list(x[-1])
#> [[1]]
#> [1] "important.csv"
#> 
#> $row.names
#> [1] FALSE
  • We can extract individual arguments with [[ or, if named, $:
x[[2]]
#> [1] "important.csv"
x$row.names
#> [1] FALSE
  • We can determine the number of arguments in a call object by subtracting 1 from its length:
length(x) - 1
#> [1] 2
  • Extracting specific arguments from calls is challenging because of R’s flexible rules for argument matching:
    • It could potentially be in any location, with the full name, with an abbreviated name, or with no name.
  • To work around this problem, you can use rlang::call_standardise() which standardizes all arguments to use the full name:
rlang::call_standardise(x)
#> Warning: `call_standardise()` is deprecated as of rlang 0.4.11
#> This warning is displayed once every 8 hours.
#> read.table(file = "important.csv", row.names = FALSE)
  • But If the function uses … it’s not possible to standardise all arguments.
  • Calls can be modified in the same way as lists:
x$header <- TRUE
x
#> read.table("important.csv", row.names = FALSE, header = TRUE)

18.3.5 Function position

  • The first element of the call object is the function position. This contains the function that will be called when the object is evaluated, and is usually a symbol.
lobstr::ast(foo())
#> █─foo
  • While R allows you to surround the name of the function with quotes, the parser converts it to a symbol:
lobstr::ast("foo"())
#> █─foo
  • However, sometimes the function doesn’t exist in the current environment and you need to do some computation to retrieve it:
    • For example, if the function is in another package, is a method of an R6 object, or is created by a function factory. In this case, the function position will be occupied by another call:
lobstr::ast(pkg::foo(1))
#> █─█─`::` 
#> │ ├─pkg 
#> │ └─foo 
#> └─1
lobstr::ast(obj$foo(1))
#> █─█─`$` 
#> │ ├─obj 
#> │ └─foo 
#> └─1
lobstr::ast(foo(1)(2))
#> █─█─foo 
#> │ └─1 
#> └─2

18.3.6 Constructing

  • You can construct a call object from its components using rlang::call2().
  • The first argument is the name of the function to call (either as a string, a symbol, or another call).
  • The remaining arguments will be passed along to the call:
call2("mean", x = expr(x), na.rm = TRUE)
#> mean(x = x, na.rm = TRUE)
call2(expr(base::mean), x = expr(x), na.rm = TRUE)
#> base::mean(x = x, na.rm = TRUE)
  • Infix calls created in this way still print as usual.
call2("<-", expr(x), 10)
#> x <- 10