Slice functions
To extract specific rows within each group you can use the following functions:
df |> slice_head(n = 1)
takes the first row from each group.df |> slice_tail(n = 1)
takes the last row in each group.df |> slice_min(x, n = 1)
takes the row with the smallest value of columnx
.df |> slice_max(x, n = 1)
takes the row with the largest value of columnx
.df |> slice_sample(n = 1)
takes one random row.
You can use n to select multiple rows or use prop = 0.1 to select 10% of the rows in each group. For example, the code finds the most delayed flights upon arrival at each destination.
## # A tibble: 108 × 19
## # Groups: dest [105]
## dest year month day dep_time sched_dep_time dep_delay arr_time
## <chr> <int> <int> <int> <int> <int> <dbl> <int>
## 1 ABQ 2013 7 22 2145 2007 98 132
## 2 ACK 2013 7 23 1139 800 219 1250
## 3 ALB 2013 1 25 123 2000 323 229
## 4 ANC 2013 8 17 1740 1625 75 2042
## 5 ATL 2013 7 22 2257 759 898 121
## 6 AUS 2013 7 10 2056 1505 351 2347
## 7 AVL 2013 8 13 1156 832 204 1417
## 8 BDL 2013 2 21 1728 1316 252 1839
## 9 BGR 2013 12 1 1504 1056 248 1628
## 10 BHM 2013 4 10 25 1900 325 136
## # ℹ 98 more rows
## # ℹ 11 more variables: sched_arr_time <int>, arr_delay <dbl>, carrier <chr>,
## # flight <int>, tailnum <chr>, origin <chr>, air_time <dbl>, distance <dbl>,
## # hour <dbl>, minute <dbl>, time_hour <dttm>
In the above output we get 108 rows but there are only 105 destinations.
slice_min()
and slice_max()
keep tied values so n = 1
means give us all rows with the highest value, if you want only one row per group you can set with_ties = FALSE
.