4.3 STL features

A time series decomposition can be used to measure the strength of trend and seasonality in a time series. Recall that the decomposition is written as:

\(y_{t} = T_{t} + S_{t} + R_{t}\),

where \(T_{t}\) is the smoothed trend component, \(S_{t}\) is the seasonal component and \(R_{t}\) is a remainder component.

  • These measures can be useful, for example, when you have a large collection of time series, and you need to find the series with the most trend or the most seasonality.

  • These and other STL-based features are computed using the feat_stl() function.

tourism %>% 
     features(Trips, feat_stl)
## # A tibble: 304 × 12
##    Region State Purpose trend_strength seasonal_strength_year seasonal_peak_year
##    <chr>  <chr> <chr>            <dbl>                  <dbl>              <dbl>
##  1 Adela… Sout… Busine…          0.464                  0.407                  3
##  2 Adela… Sout… Holiday          0.554                  0.619                  1
##  3 Adela… Sout… Other            0.746                  0.202                  2
##  4 Adela… Sout… Visiti…          0.435                  0.452                  1
##  5 Adela… Sout… Busine…          0.464                  0.179                  3
##  6 Adela… Sout… Holiday          0.528                  0.296                  2
##  7 Adela… Sout… Other            0.593                  0.404                  2
##  8 Adela… Sout… Visiti…          0.488                  0.254                  0
##  9 Alice… Nort… Busine…          0.534                  0.251                  0
## 10 Alice… Nort… Holiday          0.381                  0.832                  3
## # ℹ 294 more rows
## # ℹ 6 more variables: seasonal_trough_year <dbl>, spikiness <dbl>,
## #   linearity <dbl>, curvature <dbl>, stl_e_acf1 <dbl>, stl_e_acf10 <dbl>

We can then use these features in plots to identify what type of series are heavily trended and what are most seasonal.

tourism %>%
     features(Trips, feat_stl) %>%
     ggplot(aes(x = trend_strength, y = seasonal_strength_year,
                col = Purpose)) +
     geom_point() +
     facet_wrap(vars(State))

  • Clearly, holiday series are most seasonal, which is not surprising.

  • The strongest trends tend to be in Western Australia and Victoria.

The most seasonal series can also be easily identified and plotted.

tourism %>%
     features(Trips, feat_stl) %>%
     filter(
          seasonal_strength_year == max(seasonal_strength_year)
     ) %>%
     left_join(tourism, by = c("State", "Region", "Purpose")) %>%
     ggplot(aes(x = Quarter, y = Trips)) +
     geom_line() +
     facet_grid(vars(State, Region, Purpose))

The above plot shows holiday trips to the most popular ski region of Australia.

The feat_stl() function returns several more features other than those discussed above.

- `seasonal_peak_year` indicates the timing of the peaks — which month or quarter contains the largest seasonal component. This tells us something about the nature of the seasonality. In the Australian tourism data, if Quarter 3 is the peak seasonal period, then people are travelling to the region in winter, whereas a peak in Quarter 1 suggests that the region is more popular in summer.

- `seasonal_trough_year` indicates the timing of the troughs — which month or quarter contains the smallest seasonal component.

- `spikiness` measures the prevalence of spikes in the remainder component Rt of the STL decomposition. It is the variance of the leave-one-out variances of Rt.

- `linearity` measures the linearity of the trend component of the STL decomposition. It is based on the coefficient of a linear regression applied to the trend component.

- `curvature` measures the curvature of the trend component of the STL decomposition. It is based on the coefficient from an orthogonal quadratic regression applied to the trend component.

- `stl_e_acf1` is the first autocorrelation coefficient of the remainder series.

- `stl_e_acf10` is the sum of squares of the first ten autocorrelation coefficients of the remainder series.