Learn the tools for controlling flow of execution.
Learn some technical pitfalls and (perhaps lesser known) useful features.
There are two main groups of flow control tools: choices and loops:
Choices (if
, switch
, ifelse
, dplyr::if_else
, dplyr::case_when
) allow you to run different code depending on the input.
Loops (for
, while
, repeat
) allow you to repeatedly run code
if()
and else
Use if
to specify a block of code to be executed, if a specified condition is true. Use else
to specify a block of code to be executed, if the same condition is false.
(Note braces are only needed for compound expressions)
Can be expanded to more alternatives:
Why does this work?
x <- 1:10
if (length(x)) "not empty" else "empty"
#> [1] "not empty"
x <- numeric()
if (length(x)) "not empty" else "empty"
#> [1] "empty"
if
returns a value which can be assigned
The book recommends assigning the results of an if statement only when the entire expression fits on one line; otherwise it tends to be hard to read.
When you use the single argument form without an else statement, if invisibly (Section 6.7.2) returns NULL if the condition is FALSE. Since functions like c() and paste() drop NULL inputs, this allows for a compact expression of certain idioms:
TRUE
or FALSE
A single number gets coerced to a logical type.
If the condition cannot evaluate to a single TRUE
or FALSE
, an error is (usually) produced.
if ("text") 1
#> Error in if ("text") 1: argument is not interpretable as logical
if ("true") 1
#> 1
if (numeric()) 1
#> Error in if (numeric()) 1: argument is of length zero
if (NULL) 1
#> Error in if (NULL) 1 : argument is of length zero
if (NA) 1
#> Error in if (NA) 1: missing value where TRUE/FALSE needed
Exception is a logical vector of length greater than 1, which only generates a warning, unless you have _R_CHECK_LENGTH_1_CONDITION_
set to TRUE
.
This seems to have been the default since R-4.2.0
ifelse()
is a vectorized version of if
:dplyr::if_else()
Book recommends only using ifelse()
“only when the yes and no vectors are the same type as it is otherwise hard to predict the output type.”
dplyr::if_else()
enforces this recommendation.
For example:
Rather then string together multiple if - else if chains, you can often use switch
.
Last component should always throw an error, as unmatched inputs would otherwise invisibly return NULL. Book recommends to only use character inputs for switch()
.
set.seed(123)
x <- rlnorm(100)
centers <- data.frame(type = c('mean', 'median', 'trimmed'))
centers$value = sapply(centers$type, \(t){centre(x,t)})
require(ggplot2)
ggplot(data = data.frame(x), aes(x))+
geom_density()+
geom_vline(data = centers,
mapping = aes(color = type, xintercept = value),
linewidth=0.5,linetype="dashed") +
xlim(-1,10)+
theme_bw()
Example from book of “falling through” to next value
dplyr::case_when
case_when
is a more general if_else
and can be used often in place of multiple chained if_else
or sapply’ing switch
.
It uses a special syntax to allow any number of condition-vector pairs:
set.seed(123)
x <- rlnorm(100)
centers <- data.frame(type = c('mean', 'median', 'trimmed'))
centers$value = dplyr::case_when(
centers$type == 'mean' ~ mean(x),
centers$type == 'median' ~ median(x),
centers$type == 'trimmed' ~ mean(x, trim = 0.1),
.default = 1000
)
centers
#> type value
#> 1 mean 1.652545
#> 2 median 1.063744
#> 3 trimmed 1.300568
for (item in vector) perform_action
First example
#> [1] 1
#> [1] 1 2
#> [1] 1 2 3
#> [1] 1 2 3 4
#> [1] 1 2 3 4 5
Second example: terminate a for loop earlier
next
skips rest of current iterationbreak
exits the loop entirelyWhen the following code is evaluated, what can you say about the vector being iterated?
xs <- c(1, 2, 3)
for (x in xs) {
xs <- c(xs, x * 2)
}
xs
#> [1] 1 2 3 2 4 6
Preallocate output containers to avoid slow code.
Beware that 1:length(v)
when v
has length 0 results in a iterating backwards over 1:0
, probably not what is intended. Use seq_along(v)
instead.
When iterating over S3 vectors, use [[]]
yourself to avoid stripping attributes.
xs <- as.Date(c("2020-01-01", "2010-01-01"))
for (x in xs) {
print(x)
}
#> [1] 18262
#> [1] 14610
vs.
for (i in seq_along(xs)) {
print(xs[[i]])
}
#> [1] "2020-01-01"
#> [1] "2010-01-01"
while(condition) action
: performs action while condition is TRUE.
repeat(action)
: repeats action forever (i.e. until it encounters break).
Note that for
can be rewritten as while
and while can be rewritten as repeat
(this goes in one direction only!); however:
Good practice is to use the least-flexible solution to a problem, so you should use
for
wherever possible. BUT you shouldn’t even use for loops for data analysis tasks asmap()
andapply()
already provide less flexible solutions to most problems. (More in Chapter 9.)