5.2 Bravo: a better script that works
- There’s a package that lurks within the original script.
- Suboptimal coding practices like repetitive code and mixing of code and data makes it hard to catch.
- A good first step is to refactor this code.
Next version of the script:
library(tidyverse)
infile <- "swim.csv"
dat <- read_csv(infile, col_types = cols(name = "c", where = "c", temp = "d"))
lookup_table <- tribble(
~where, ~english,
"beach", "US",
"coast", "US",
"seashore", "UK",
"seaside", "UK"
)
dat <- dat %>%
left_join(lookup_table)
#> Joining, by = "where"
f_to_c <- function(x) (x - 32) * 5/9
dat <- dat %>%
mutate(temp = if_else(english == "US", f_to_c(temp), temp))
dat
#> # A tibble: 5 × 4
#> name where temp english
#> <chr> <chr> <dbl> <chr>
#> 1 Adam beach 35 US
#> 2 Bess coast 32.8 US
#> 3 Cora seashore 28 UK
#> 4 Dale beach 29.4 US
#> 5 Evan seaside 31 UK
now <- Sys.time()
timestamp <- function(time) format(time, "%Y-%B-%d_%H-%M-%S")
outfile_path <- function(infile) {
paste0(timestamp(now), "_", sub("(.*)([.]csv$)", "\\1_clean\\2", infile))
}
write_csv(dat, outfile_path(infile))
Key features of this code:
- Functions from the
tidyverse
packages,readr
anddplyr
. - Different “beach” words are stored in a lookup table. This makes it easier to add words in the future.
f_to_c()
,timestamp()
, andoutfile_path()
functions now hold the logic for converting temperatures and forming the timestamped output file name.
Easier to recognize the reusable bits of this script, i.e. the bits that have nothing to do with a specific input file, like swim.csv