Add new variables with mutate()
mutate()
adds new columns based on values from existing columns. Data frame includes existing and new columns.
Compute the gain, how much time a delayed flight made up in the air, and the speed in miles per hour:
## # A tibble: 336,776 × 21
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 1 1 517 515 2 830 819
## 2 2013 1 1 533 529 4 850 830
## 3 2013 1 1 542 540 2 923 850
## 4 2013 1 1 544 545 -1 1004 1022
## 5 2013 1 1 554 600 -6 812 837
## 6 2013 1 1 554 558 -4 740 728
## 7 2013 1 1 555 600 -5 913 854
## 8 2013 1 1 557 600 -3 709 723
## 9 2013 1 1 557 600 -3 838 846
## 10 2013 1 1 558 600 -2 753 745
## # ℹ 336,766 more rows
## # ℹ 13 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## # hour <dbl>, minute <dbl>, time_hour <dttm>, gain <dbl>, speed <dbl>
Use the .before
argument to instead add the variables to the left hand side
## # A tibble: 336,776 × 21
## gain speed year month day dep_time sched_dep_time dep_delay arr_time
## <dbl> <dbl> <int> <int> <int> <int> <int> <dbl> <int>
## 1 -9 370. 2013 1 1 517 515 2 830
## 2 -16 374. 2013 1 1 533 529 4 850
## 3 -31 408. 2013 1 1 542 540 2 923
## 4 17 517. 2013 1 1 544 545 -1 1004
## 5 19 394. 2013 1 1 554 600 -6 812
## 6 -16 288. 2013 1 1 554 558 -4 740
## 7 -24 404. 2013 1 1 555 600 -5 913
## 8 11 259. 2013 1 1 557 600 -3 709
## 9 5 405. 2013 1 1 557 600 -3 838
## 10 -10 319. 2013 1 1 558 600 -2 753
## # ℹ 336,766 more rows
## # ℹ 12 more variables: sched_arr_time <int>, arr_delay <dbl>, carrier <chr>,
## # flight <int>, tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,
## # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
Use .after
to add after a variable, and in both .before
and .after
you can use the variable name instead of a position.
Add the new variables after day:
## # A tibble: 336,776 × 21
## year month day gain speed dep_time sched_dep_time dep_delay arr_time
## <int> <int> <int> <dbl> <dbl> <int> <int> <dbl> <int>
## 1 2013 1 1 -9 370. 517 515 2 830
## 2 2013 1 1 -16 374. 533 529 4 850
## 3 2013 1 1 -31 408. 542 540 2 923
## 4 2013 1 1 17 517. 544 545 -1 1004
## 5 2013 1 1 19 394. 554 600 -6 812
## 6 2013 1 1 -16 288. 554 558 -4 740
## 7 2013 1 1 -24 404. 555 600 -5 913
## 8 2013 1 1 11 259. 557 600 -3 709
## 9 2013 1 1 5 405. 557 600 -3 838
## 10 2013 1 1 -10 319. 558 600 -2 753
## # ℹ 336,766 more rows
## # ℹ 12 more variables: sched_arr_time <int>, arr_delay <dbl>, carrier <chr>,
## # flight <int>, tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,
## # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
You can use the .keep
argument to control which variables are kept after a mutate()
operation. Setting .keep = "used"
ensures only columns involved or created in the mutate()
step are retained, like dep_delay
, arr_delay
, air_time
, gain
, hours
, and gain_per_hour
.
flights |>
mutate(
gain = dep_delay - arr_delay,
hours = air_time / 60,
gain_per_hour = gain / hours,
.keep = "used"
)
## # A tibble: 336,776 × 6
## dep_delay arr_delay air_time gain hours gain_per_hour
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2 11 227 -9 3.78 -2.38
## 2 4 20 227 -16 3.78 -4.23
## 3 2 33 160 -31 2.67 -11.6
## 4 -1 -18 183 17 3.05 5.57
## 5 -6 -25 116 19 1.93 9.83
## 6 -4 12 150 -16 2.5 -6.4
## 7 -5 19 158 -24 2.63 -9.11
## 8 -3 -14 53 11 0.883 12.5
## 9 -3 -8 140 5 2.33 2.14
## 10 -2 8 138 -10 2.3 -4.35
## # ℹ 336,766 more rows
If you don’t assign the result of the computation back to flights
, the new variables will only be displayed, not stored. Consider whether to overwrite flights
with more variables or create a new object for future use.