4.2 The data

We’re exploring data from the National Electronic Injury Surveillance System (NEISS), which covers accidents reported from a sample of hospitals in the US. The data set is from Hadley Wickham’s GitHub repository.

For every accident / injured person we have

  • date,
  • age,
  • sex,
  • race,
  • body part,
  • diagnosis and
  • location (Home, School, Street Or Highway etc.)

as well as

  • primary product associated with the injury and
  • a brief story how the accident occured.

Further we have a weight attribute for an estimation how may people the current case represents if the dataset was scaled to the entire US population.

Code to download the data:
dir.create("neiss")

download <- function(name) {
  url <- "https://github.com/hadley/mastering-shiny/raw/main/neiss/"
  download.file(paste0(url, name), paste0("neiss/", name), quiet = TRUE)
}

download("injuries.tsv.gz")
download("population.tsv")
download("products.tsv")

Main data tibbles:

injuries <- vroom("neiss/injuries.tsv.gz")
injuries
## # A tibble: 255,064 × 10
##    trmt_date    age sex    race       body_part  diag  location prod_code weight
##    <date>     <dbl> <chr>  <chr>      <chr>      <chr> <chr>        <dbl>  <dbl>
##  1 2017-01-01    71 male   white      Upper Tru… Cont… Other P…      1807   77.7
##  2 2017-01-01    16 male   white      Lower Arm  Burn… Home           676   77.7
##  3 2017-01-01    58 male   white      Upper Tru… Cont… Home           649   77.7
##  4 2017-01-01    21 male   white      Lower Tru… Stra… Home          4076   77.7
##  5 2017-01-01    54 male   white      Head       Inte… Other P…      1807   77.7
##  6 2017-01-01    21 male   white      Hand       Frac… Home          1884   77.7
##  7 2017-01-01    35 female not stated Lower Tru… Stra… Home          1807   87.1
##  8 2017-01-01    62 female not stated Lower Arm  Lace… Home          4074   87.1
##  9 2017-01-01    22 male   not stated Knee       Disl… Home          4076   87.1
## 10 2017-01-01    58 female not stated Lower Leg  Frac… Home          1842   87.1
## # ℹ 255,054 more rows
## # ℹ 1 more variable: narrative <chr>

Product names:

products <- vroom("neiss/products.tsv")
prod_codes <- setNames(products$prod_code, products$title)
products
## # A tibble: 38 × 2
##    prod_code title                                   
##        <dbl> <chr>                                   
##  1       464 knives, not elsewhere classified        
##  2       474 tableware and accessories               
##  3       604 desks, chests, bureaus or buffets       
##  4       611 bathtubs or showers                     
##  5       649 toilets                                 
##  6       676 rugs or carpets, not specified          
##  7       679 sofas, couches, davenports, divans or st
##  8      1141 containers, not specified               
##  9      1200 sports or recreational activity, n.e.c. 
## 10      1205 basketball (activity, apparel or equip.)
## # ℹ 28 more rows

Population data:

population <- vroom("neiss/population.tsv")
population
## # A tibble: 170 × 3
##      age sex    population
##    <dbl> <chr>       <dbl>
##  1     0 female    1924145
##  2     0 male      2015150
##  3     1 female    1943534
##  4     1 male      2031718
##  5     2 female    1965150
##  6     2 male      2056625
##  7     3 female    1956281
##  8     3 male      2050474
##  9     4 female    1953782
## 10     4 male      2042001
## # ℹ 160 more rows