4.2 The data
We’re exploring data from the National Electronic Injury Surveillance System (NEISS), which covers accidents reported from a sample of hospitals in the US. The data set is from Hadley Wickham’s GitHub repository.
For every accident / injured person we have
- date,
- age,
- sex,
- race,
- body part,
- diagnosis and
- location (Home, School, Street Or Highway etc.)
as well as
- primary product associated with the injury and
- a brief story how the accident occured.
Further we have a weight attribute for an estimation how may people the current case represents if the dataset was scaled to the entire US population.
Code to download the data:
Main data tibbles:
## # A tibble: 255,064 × 10
## trmt_date age sex race body_part diag location prod_code weight
## <date> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 2017-01-01 71 male white Upper Tru… Cont… Other P… 1807 77.7
## 2 2017-01-01 16 male white Lower Arm Burn… Home 676 77.7
## 3 2017-01-01 58 male white Upper Tru… Cont… Home 649 77.7
## 4 2017-01-01 21 male white Lower Tru… Stra… Home 4076 77.7
## 5 2017-01-01 54 male white Head Inte… Other P… 1807 77.7
## 6 2017-01-01 21 male white Hand Frac… Home 1884 77.7
## 7 2017-01-01 35 female not stated Lower Tru… Stra… Home 1807 87.1
## 8 2017-01-01 62 female not stated Lower Arm Lace… Home 4074 87.1
## 9 2017-01-01 22 male not stated Knee Disl… Home 4076 87.1
## 10 2017-01-01 58 female not stated Lower Leg Frac… Home 1842 87.1
## # ℹ 255,054 more rows
## # ℹ 1 more variable: narrative <chr>
Product names:
products <- vroom("neiss/products.tsv")
prod_codes <- setNames(products$prod_code, products$title)
products
## # A tibble: 38 × 2
## prod_code title
## <dbl> <chr>
## 1 464 knives, not elsewhere classified
## 2 474 tableware and accessories
## 3 604 desks, chests, bureaus or buffets
## 4 611 bathtubs or showers
## 5 649 toilets
## 6 676 rugs or carpets, not specified
## 7 679 sofas, couches, davenports, divans or st
## 8 1141 containers, not specified
## 9 1200 sports or recreational activity, n.e.c.
## 10 1205 basketball (activity, apparel or equip.)
## # ℹ 28 more rows
Population data:
## # A tibble: 170 × 3
## age sex population
## <dbl> <chr> <dbl>
## 1 0 female 1924145
## 2 0 male 2015150
## 3 1 female 1943534
## 4 1 male 2031718
## 5 2 female 1965150
## 6 2 male 2056625
## 7 3 female 1956281
## 8 3 male 2050474
## 9 4 female 1953782
## 10 4 male 2042001
## # ℹ 160 more rows