24.4 Do as little as possible

  • use a function tailored to a more specific type of input or output, or to a more specific problem
    • rowSums(), colSums(), rowMeans(), and colMeans() are faster than equivalent invocations that use apply() because they are vectorised
    • vapply() is faster than sapply() because it pre-specifies the output type
    • any(x == 10) is much faster than 10 %in% x because testing equality is simpler than testing set inclusion
  • Some functions coerce their inputs into a specific type. If your input is not the right type, the function has to do extra work
    • e.g. apply() will always turn a dataframe into a matrix
  • Other examples
    • read.csv(): specify known column types with colClasses. (Also consider switching to readr::read_csv() or data.table::fread() which are considerably faster than read.csv().)

    • factor(): specify known levels with levels.

    • cut(): don’t generate labels with labels = FALSE if you don’t need them, or, even better, use findInterval() as mentioned in the “see also” section of the documentation.

    • unlist(x, use.names = FALSE) is much faster than unlist(x).

    • interaction(): if you only need combinations that exist in the data, use drop = TRUE.