Parsing and grammar
- Parsing - The process by which a computer language takes a string and constructs an expression. Parsing is governed by a set of rules known as a grammar.
- We are going to use lobstr::ast() to explore some of the details of R’s grammar, and then show how you can transform back and forth between expressions and strings.
- Operator precedence - Conventions used by the programming language to resolve ambiguity.
- Infix functions introduce two sources of ambiguity.
- The first source of ambiguity arises from infix functions: what does 1 + 2 * 3 yield? Do you get 9 (i.e. (1 + 2) * 3), or 7 (i.e. 1 + (2 * 3))? In other words, which of the two possible parse trees below does R use?
- Programming languages use conventions called operator precedence to resolve this ambiguity. We can use ast() to see what R does:
lobstr::ast(1 + 2 * 3)
#> █─`+`
#> ├─1
#> └─█─`*`
#> ├─2
#> └─3
- PEMDAS is pretty clear on what to do. Other operator precedence isn’t as clear.
- There’s one particularly surprising case in R:
- ! has a much lower precedence (i.e. it binds less tightly) than you might expect.
- This allows you to write useful operations like:
lobstr::ast(!x %in% y)
#> █─`!`
#> └─█─`%in%`
#> ├─x
#> └─y
- R has over 30 infix operators divided into 18 precedence groups.
- While the details are described in ?Syntax, very few people have memorised the complete ordering.
- If there’s any confusion, use parentheses!
# override PEMDAS
lobstr::ast((1 + 2) * 3)
#> █─`*`
#> ├─█─`(`
#> │ └─█─`+`
#> │ ├─1
#> │ └─2
#> └─3