Controlling column types

  • readr will try to guess the type of each column (i.e. whether it’s a logical, number, string, etc.)

  • readr uses a heuristic to figure out the column types. For each column, it pulls the values of 1,0002 rows and works through the following questions:

    • Does it contain only F, T, FALSE, TRUE, false, true, f, or t etc.? If so, it’s a logical.
    • Does it contain only numbers (e.g., 1, -4.5, 5e6, Inf)? If so, it’s a number.
    • Does it match the ISO8601 standard? If so, it’s a date or date-time.
    • Otherwise, it must be a string.
  • This heuristic works well if you have a clean dataset, but not in real life.

read_csv("
  logical,numeric,date,string
  TRUE,1,2021-01-15,abc
  false,4.5,2021-02-15,def
  T,Inf,2021-02-16,ghi
")
## Rows: 3 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): string
## dbl  (1): numeric
## lgl  (1): logical
## date (1): date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 3 × 4
##   logical numeric date       string
##   <lgl>     <dbl> <date>     <chr> 
## 1 TRUE        1   2021-01-15 abc   
## 2 FALSE       4.5 2021-02-15 def   
## 3 TRUE      Inf   2021-02-16 ghi