19.5 Surrogate keys

  • A surrogate key is a custom made key where it is possible to identify unique information, such as the number of rows in a table, and it is made if a table lacks a primary key.

  • For flights, the combination of time_hour, carrier, and flight seems reasonable simple numeric surrogate key using the row number.

  • For example, flights2$flights_id is a surrogate key because it is custom made and uniquely identifies each observation in the flights table.

flights2 <- flights |> 
  mutate(id = row_number(), .before = 1)
flights2
## # A tibble: 336,776 × 20
##       id  year month   day dep_time sched_dep_time dep_delay arr_time
##    <int> <int> <int> <int>    <int>          <int>     <dbl>    <int>
##  1     1  2013     1     1      517            515         2      830
##  2     2  2013     1     1      533            529         4      850
##  3     3  2013     1     1      542            540         2      923
##  4     4  2013     1     1      544            545        -1     1004
##  5     5  2013     1     1      554            600        -6      812
##  6     6  2013     1     1      554            558        -4      740
##  7     7  2013     1     1      555            600        -5      913
##  8     8  2013     1     1      557            600        -3      709
##  9     9  2013     1     1      557            600        -3      838
## 10    10  2013     1     1      558            600        -2      753
## # ℹ 336,766 more rows
## # ℹ 12 more variables: sched_arr_time <int>, arr_delay <dbl>, carrier <chr>,
## #   flight <int>, tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,
## #   distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>