18.11 Parsing and grammar

  • Parsing - The process by which a computer language takes a string and constructs an expression. Parsing is governed by a set of rules known as a grammar.
  • We are going to use lobstr::ast() to explore some of the details of R’s grammar, and then show how you can transform back and forth between expressions and strings.
  • Operator precedence - Conventions used by the programming language to resolve ambiguity.
  • Infix functions introduce two sources of ambiguity.
  • The first source of ambiguity arises from infix functions: what does 1 + 2 * 3 yield? Do you get 9 (i.e. (1 + 2) * 3), or 7 (i.e. 1 + (2 * 3))? In other words, which of the two possible parse trees below does R use?

  • Programming languages use conventions called operator precedence to resolve this ambiguity. We can use ast() to see what R does:
lobstr::ast(1 + 2 * 3)
#> █─`+` 
#> ├─1 
#> └─█─`*` 
#>   ├─2 
#>   └─3
  • PEMDAS is pretty clear on what to do. Other operator precedence isn’t as clear.
  • There’s one particularly surprising case in R:
    • ! has a much lower precedence (i.e. it binds less tightly) than you might expect.
    • This allows you to write useful operations like:
lobstr::ast(!x %in% y)
#> █─`!` 
#> └─█─`%in%` 
#>   ├─x 
#>   └─y
  • R has over 30 infix operators divided into 18 precedence groups.
  • While the details are described in ?Syntax, very few people have memorised the complete ordering.
  • If there’s any confusion, use parentheses!
# override PEMDAS
lobstr::ast((1 + 2) * 3)
#> █─`*` 
#> ├─█─`(` 
#> │ └─█─`+` 
#> │   ├─1 
#> │   └─2 
#> └─3