9.1 BONUS: Attrition (decision tree classifier)
Let’s apply a decision tree as classifier to the attrition
dataset
suppressMessages(library(tidymodels))
suppressMessages(library(tidyverse))
library(themis)
Load dataset
# load dataset
<- modeldata::attrition
attrition
# clean names with `janitor` package
# coerce ordered factor variables to numeric
<- attrition %>%
attrition ::clean_names() %>%
janitor# mutate_if(is.ordered, as.numeric) %>%
relocate(attrition, .before = everything())
First look at dataset
%>%
attrition glimpse()
## Rows: 1,470
## Columns: 31
## $ attrition <fct> Yes, No, Yes, No, No, No, No, No, No, No, N…
## $ age <int> 41, 49, 37, 33, 27, 32, 59, 30, 38, 36, 35,…
## $ business_travel <fct> Travel_Rarely, Travel_Frequently, Travel_Ra…
## $ daily_rate <int> 1102, 279, 1373, 1392, 591, 1005, 1324, 135…
## $ department <fct> Sales, Research_Development, Research_Devel…
## $ distance_from_home <int> 1, 8, 2, 3, 2, 2, 3, 24, 23, 27, 16, 15, 26…
## $ education <ord> College, Below_College, College, Master, Be…
## $ education_field <fct> Life_Sciences, Life_Sciences, Other, Life_S…
## $ environment_satisfaction <ord> Medium, High, Very_High, Very_High, Low, Ve…
## $ gender <fct> Female, Male, Male, Female, Male, Male, Fem…
## $ hourly_rate <int> 94, 61, 92, 56, 40, 79, 81, 67, 44, 94, 84,…
## $ job_involvement <ord> High, Medium, Medium, High, High, High, Ver…
## $ job_level <int> 2, 2, 1, 1, 1, 1, 1, 1, 3, 2, 1, 2, 1, 1, 1…
## $ job_role <fct> Sales_Executive, Research_Scientist, Labora…
## $ job_satisfaction <ord> Very_High, Medium, High, High, Medium, Very…
## $ marital_status <fct> Single, Married, Single, Married, Married, …
## $ monthly_income <int> 5993, 5130, 2090, 2909, 3468, 3068, 2670, 2…
## $ monthly_rate <int> 19479, 24907, 2396, 23159, 16632, 11864, 99…
## $ num_companies_worked <int> 8, 1, 6, 1, 9, 0, 4, 1, 0, 6, 0, 0, 1, 0, 5…
## $ over_time <fct> Yes, No, Yes, Yes, No, No, Yes, No, No, No,…
## $ percent_salary_hike <int> 11, 23, 15, 11, 12, 13, 20, 22, 21, 13, 13,…
## $ performance_rating <ord> Excellent, Outstanding, Excellent, Excellen…
## $ relationship_satisfaction <ord> Low, Very_High, Medium, High, Very_High, Hi…
## $ stock_option_level <int> 0, 1, 0, 0, 1, 0, 3, 1, 0, 2, 1, 0, 1, 1, 0…
## $ total_working_years <int> 8, 10, 7, 8, 6, 8, 12, 1, 10, 17, 6, 10, 5,…
## $ training_times_last_year <int> 0, 3, 3, 3, 3, 2, 3, 2, 2, 3, 5, 3, 1, 2, 4…
## $ work_life_balance <ord> Bad, Better, Better, Better, Better, Good, …
## $ years_at_company <int> 6, 10, 0, 8, 2, 7, 1, 1, 9, 7, 5, 9, 5, 2, …
## $ years_in_current_role <int> 4, 7, 0, 7, 2, 7, 0, 0, 7, 7, 4, 5, 2, 2, 2…
## $ years_since_last_promotion <int> 0, 1, 0, 3, 2, 3, 0, 0, 1, 7, 0, 0, 4, 1, 0…
## $ years_with_curr_manager <int> 5, 7, 0, 0, 2, 6, 0, 0, 8, 7, 3, 8, 3, 2, 3…
Take a deeper look at each variable with skimr
::skim(attrition) %>%
skimrkable()
skim_type | skim_variable | n_missing | complete_rate | factor.ordered | factor.n_unique | factor.top_counts | numeric.mean | numeric.sd | numeric.p0 | numeric.p25 | numeric.p50 | numeric.p75 | numeric.p100 | numeric.hist |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
factor | attrition | 0 | 1 | FALSE | 2 | No: 1233, Yes: 237 | NA | NA | NA | NA | NA | NA | NA | NA |
factor | business_travel | 0 | 1 | FALSE | 3 | Tra: 1043, Tra: 277, Non: 150 | NA | NA | NA | NA | NA | NA | NA | NA |
factor | department | 0 | 1 | FALSE | 3 | Res: 961, Sal: 446, Hum: 63 | NA | NA | NA | NA | NA | NA | NA | NA |
factor | education | 0 | 1 | TRUE | 5 | Bac: 572, Mas: 398, Col: 282, Bel: 170 | NA | NA | NA | NA | NA | NA | NA | NA |
factor | education_field | 0 | 1 | FALSE | 6 | Lif: 606, Med: 464, Mar: 159, Tec: 132 | NA | NA | NA | NA | NA | NA | NA | NA |
factor | environment_satisfaction | 0 | 1 | TRUE | 4 | Hig: 453, Ver: 446, Med: 287, Low: 284 | NA | NA | NA | NA | NA | NA | NA | NA |
factor | gender | 0 | 1 | FALSE | 2 | Mal: 882, Fem: 588 | NA | NA | NA | NA | NA | NA | NA | NA |
factor | job_involvement | 0 | 1 | TRUE | 4 | Hig: 868, Med: 375, Ver: 144, Low: 83 | NA | NA | NA | NA | NA | NA | NA | NA |
factor | job_role | 0 | 1 | FALSE | 9 | Sal: 326, Res: 292, Lab: 259, Man: 145 | NA | NA | NA | NA | NA | NA | NA | NA |
factor | job_satisfaction | 0 | 1 | TRUE | 4 | Ver: 459, Hig: 442, Low: 289, Med: 280 | NA | NA | NA | NA | NA | NA | NA | NA |
factor | marital_status | 0 | 1 | FALSE | 3 | Mar: 673, Sin: 470, Div: 327 | NA | NA | NA | NA | NA | NA | NA | NA |
factor | over_time | 0 | 1 | FALSE | 2 | No: 1054, Yes: 416 | NA | NA | NA | NA | NA | NA | NA | NA |
factor | performance_rating | 0 | 1 | TRUE | 2 | Exc: 1244, Out: 226, Low: 0, Goo: 0 | NA | NA | NA | NA | NA | NA | NA | NA |
factor | relationship_satisfaction | 0 | 1 | TRUE | 4 | Hig: 459, Ver: 432, Med: 303, Low: 276 | NA | NA | NA | NA | NA | NA | NA | NA |
factor | work_life_balance | 0 | 1 | TRUE | 4 | Bet: 893, Goo: 344, Bes: 153, Bad: 80 | NA | NA | NA | NA | NA | NA | NA | NA |
numeric | age | 0 | 1 | NA | NA | NA | 3.692381e+01 | 9.1353735 | 18 | 30 | 36.0 | 43.00 | 60 | ▂▇▇▃▂ |
numeric | daily_rate | 0 | 1 | NA | NA | NA | 8.024857e+02 | 403.5090999 | 102 | 465 | 802.0 | 1157.00 | 1499 | ▇▇▇▇▇ |
numeric | distance_from_home | 0 | 1 | NA | NA | NA | 9.192517e+00 | 8.1068644 | 1 | 2 | 7.0 | 14.00 | 29 | ▇▅▂▂▂ |
numeric | hourly_rate | 0 | 1 | NA | NA | NA | 6.589116e+01 | 20.3294276 | 30 | 48 | 66.0 | 83.75 | 100 | ▇▇▇▇▇ |
numeric | job_level | 0 | 1 | NA | NA | NA | 2.063946e+00 | 1.1069399 | 1 | 1 | 2.0 | 3.00 | 5 | ▇▇▃▂▁ |
numeric | monthly_income | 0 | 1 | NA | NA | NA | 6.502931e+03 | 4707.9567831 | 1009 | 2911 | 4919.0 | 8379.00 | 19999 | ▇▅▂▁▂ |
numeric | monthly_rate | 0 | 1 | NA | NA | NA | 1.431310e+04 | 7117.7860441 | 2094 | 8047 | 14235.5 | 20461.50 | 26999 | ▇▇▇▇▇ |
numeric | num_companies_worked | 0 | 1 | NA | NA | NA | 2.693197e+00 | 2.4980090 | 0 | 1 | 2.0 | 4.00 | 9 | ▇▃▂▂▁ |
numeric | percent_salary_hike | 0 | 1 | NA | NA | NA | 1.520952e+01 | 3.6599377 | 11 | 12 | 14.0 | 18.00 | 25 | ▇▅▃▂▁ |
numeric | stock_option_level | 0 | 1 | NA | NA | NA | 7.938776e-01 | 0.8520767 | 0 | 0 | 1.0 | 1.00 | 3 | ▇▇▁▂▁ |
numeric | total_working_years | 0 | 1 | NA | NA | NA | 1.127959e+01 | 7.7807817 | 0 | 6 | 10.0 | 15.00 | 40 | ▇▇▂▁▁ |
numeric | training_times_last_year | 0 | 1 | NA | NA | NA | 2.799320e+00 | 1.2892706 | 0 | 2 | 3.0 | 3.00 | 6 | ▂▇▇▂▃ |
numeric | years_at_company | 0 | 1 | NA | NA | NA | 7.008163e+00 | 6.1265252 | 0 | 3 | 5.0 | 9.00 | 40 | ▇▂▁▁▁ |
numeric | years_in_current_role | 0 | 1 | NA | NA | NA | 4.229252e+00 | 3.6231370 | 0 | 2 | 3.0 | 7.00 | 18 | ▇▃▂▁▁ |
numeric | years_since_last_promotion | 0 | 1 | NA | NA | NA | 2.187755e+00 | 3.2224303 | 0 | 0 | 1.0 | 3.00 | 15 | ▇▁▁▁▁ |
numeric | years_with_curr_manager | 0 | 1 | NA | NA | NA | 4.123129e+00 | 3.5681361 | 0 | 2 | 3.0 | 7.00 | 17 | ▇▂▅▁▁ |
Count attrition
(target)
%>%
attrition count(attrition)
## attrition n
## 1 No 1233
## 2 Yes 237
Our target (attrition
) is highly imbalanced.