Data preparation
The original dataset had 89 variables and 16,924 players.
Below is a preview of a slimmed-down version of this dataset used throughout chapter:
Rows: 5,000
Columns: 42
$ nationality <fct> Argentina, Portugal, Brazil, Slovenia, Belg…
$ overall <dbl> 94, 93, 92, 91, 91, 91, 90, 90, 90, 90, 89,…
$ potential <dbl> 94, 93, 92, 93, 91, 91, 93, 91, 90, 90, 95,…
$ wage_eur <dbl> 565000, 405000, 290000, 125000, 470000, 370…
$ value_eur <dbl> 95500000, 58500000, 105500000, 77500000, 90…
$ age <dbl> 32, 34, 27, 26, 28, 28, 27, 27, 33, 27, 20,…
$ height_cm <dbl> 170, 187, 175, 188, 175, 181, 187, 193, 172…
$ weight_kg <dbl> 72, 83, 68, 87, 74, 70, 85, 92, 66, 71, 73,…
$ attacking_crossing <dbl> 88, 84, 87, 13, 81, 93, 18, 53, 86, 79, 78,…
$ attacking_finishing <dbl> 95, 94, 87, 11, 84, 82, 14, 52, 72, 90, 89,…
$ attacking_heading_accuracy <dbl> 70, 89, 62, 15, 61, 55, 11, 86, 55, 59, 77,…
$ attacking_short_passing <dbl> 92, 83, 87, 43, 89, 92, 61, 78, 92, 84, 82,…
$ attacking_volleys <dbl> 88, 87, 87, 13, 83, 82, 14, 45, 76, 79, 79,…
$ skill_dribbling <dbl> 97, 89, 96, 12, 95, 86, 21, 70, 87, 89, 91,…
$ skill_curve <dbl> 93, 81, 88, 13, 83, 85, 18, 60, 85, 83, 79,…
$ skill_fk_accuracy <dbl> 94, 76, 87, 14, 79, 83, 12, 70, 78, 69, 63,…
$ skill_long_passing <dbl> 92, 77, 81, 40, 83, 91, 63, 81, 88, 75, 70,…
$ skill_ball_control <dbl> 96, 92, 95, 30, 94, 91, 30, 76, 92, 89, 90,…
$ movement_acceleration <dbl> 91, 89, 94, 43, 94, 77, 38, 74, 77, 94, 96,…
$ movement_sprint_speed <dbl> 84, 91, 89, 60, 88, 76, 50, 79, 71, 92, 96,…
$ movement_agility <dbl> 93, 87, 96, 67, 95, 78, 37, 61, 92, 91, 92,…
$ movement_reactions <dbl> 95, 96, 92, 88, 90, 91, 86, 88, 89, 92, 89,…
$ movement_balance <dbl> 95, 71, 84, 49, 94, 76, 43, 53, 93, 88, 83,…
$ power_shot_power <dbl> 86, 95, 80, 59, 82, 91, 66, 81, 79, 80, 83,…
$ power_jumping <dbl> 68, 95, 61, 78, 56, 63, 79, 90, 68, 69, 76,…
$ power_stamina <dbl> 75, 85, 81, 41, 84, 89, 35, 75, 85, 85, 84,…
$ power_strength <dbl> 68, 78, 49, 78, 63, 74, 78, 92, 58, 73, 76,…
$ power_long_shots <dbl> 94, 93, 84, 12, 80, 90, 10, 64, 82, 84, 79,…
$ mentality_aggression <dbl> 48, 63, 51, 34, 54, 76, 43, 82, 62, 63, 62,…
$ mentality_interceptions <dbl> 40, 29, 36, 19, 41, 61, 22, 89, 82, 55, 38,…
$ mentality_positioning <dbl> 94, 95, 87, 11, 87, 88, 11, 47, 79, 92, 89,…
$ mentality_vision <dbl> 94, 82, 90, 65, 89, 94, 70, 65, 91, 84, 80,…
$ mentality_penalties <dbl> 75, 85, 90, 11, 88, 79, 25, 62, 82, 77, 70,…
$ mentality_composure <dbl> 96, 95, 94, 68, 91, 91, 70, 89, 92, 91, 84,…
$ defending_marking <dbl> 33, 28, 27, 27, 34, 68, 25, 91, 68, 38, 34,…
$ defending_standing_tackle <dbl> 37, 32, 26, 12, 27, 58, 13, 92, 76, 43, 34,…
$ defending_sliding_tackle <dbl> 26, 24, 29, 18, 22, 51, 10, 85, 71, 41, 32,…
$ goalkeeping_diving <dbl> 6, 7, 9, 87, 11, 15, 88, 13, 13, 14, 13, 7,…
$ goalkeeping_handling <dbl> 11, 11, 9, 92, 12, 13, 85, 10, 9, 14, 5, 11…
$ goalkeeping_kicking <dbl> 15, 15, 15, 78, 6, 5, 88, 13, 7, 9, 7, 7, 1…
$ goalkeeping_positioning <dbl> 14, 14, 15, 90, 8, 10, 88, 11, 14, 11, 11, …
$ goalkeeping_reflexes <dbl> 8, 11, 11, 89, 8, 13, 90, 11, 9, 14, 6, 5, …
Note: The fifa
dataset referenced in the text appears to be different than the one currently available in the DALEX package. For instance, the field naming conventions are different, and the number of dimensions in the fifa
dataframe do not match the text. We use the fifa
dataset in the current DALEX package for presentation purposes.
Target Variable: Players’ Value
Player value, value_eur
, is a heavily skewed variable (skewness value : 4.03).
We’ll apply a log transformation for modeling purposes.
Key Feature Variables
Four key variables:
- Age
- range is 16-41, symmetric, median/mean is age 27
- movement_reactions
- roughly symmetric
- skill_ball_control
- bimodal due to lower score distribution for goalkeepers
- skill_dribbling
- bimodal due to lower score distribution for goalkeepers