5.2 Bravo: a better script that works

  • There’s a package that lurks within the original script.
    • Suboptimal coding practices like repetitive code and mixing of code and data makes it hard to catch.
  • A good first step is to refactor this code.

Next version of the script:

library(tidyverse)

infile <- "swim.csv"
dat <- read_csv(infile, col_types = cols(name = "c", where = "c", temp = "d"))

lookup_table <- tribble(
      ~where, ~english,
     "beach",     "US",
     "coast",     "US",
  "seashore",     "UK",
   "seaside",     "UK"
)

dat <- dat %>% 
  left_join(lookup_table)
#> Joining, by = "where"

f_to_c <- function(x) (x - 32) * 5/9

dat <- dat %>% 
  mutate(temp = if_else(english == "US", f_to_c(temp), temp))
dat
#> # A tibble: 5 × 4
#>   name  where     temp english
#>   <chr> <chr>    <dbl> <chr>  
#> 1 Adam  beach     35   US     
#> 2 Bess  coast     32.8 US     
#> 3 Cora  seashore  28   UK     
#> 4 Dale  beach     29.4 US     
#> 5 Evan  seaside   31   UK

now <- Sys.time()
timestamp <- function(time) format(time, "%Y-%B-%d_%H-%M-%S")
outfile_path <- function(infile) {
  paste0(timestamp(now), "_", sub("(.*)([.]csv$)", "\\1_clean\\2", infile))
}
write_csv(dat, outfile_path(infile))

Key features of this code:

  • Functions from the tidyverse packages, readr and dplyr.
  • Different “beach” words are stored in a lookup table. This makes it easier to add words in the future.
  • f_to_c(), timestamp(), and outfile_path() functions now hold the logic for converting temperatures and forming the timestamped output file name.

Easier to recognize the reusable bits of this script, i.e. the bits that have nothing to do with a specific input file, like swim.csv