16.2 Complete pooled model
Notations:
\(j\) will indicate artist, \(j \in {1, 2 ..., 44}\)
\(i\) will indicate song for artist \(j\)
\(n_j\) Number of song we have for artist \(j\)
Example: Mia X, the first artist in our data set, has 4 songs -> \(n_1 = 4\)
ggplot(spotify, aes(x = popularity)) +
geom_density()
Even if the distribution is left skewed we will go with a Normal-Normal complete pooled model
\[Y_{ij}|\mu,\sigma \sim N (\mu, \sigma²)\]
\[\mu \sim N(50, 52^2)\]
\[\sigma \sim Exp(0.048)\]
\(\mu\) and \(\sigma\) are global parameter: they do not vary by artist:
\(\mu\): global mean popularity
\(\sigma\) : global standard deviation in popularity from song to song
<- stan_glm(
spotify_complete_pooled ~ 1, # trick is here \mu = beta_0 (intercept) with no X
popularity data = spotify, family = gaussian,
prior_intercept = normal(50, 2.5, # I do not understand 2.5
autoscale = TRUE),
prior_aux = exponential(1, autoscale = TRUE),
chains = 4, iter = 5000*2, seed = 84735)
<- tidy(spotify_complete_pooled,
complete_summary effects = c("fixed", "aux"),
conf.int = TRUE, conf.level = 0.80)
complete_summary
## # A tibble: 3 × 5
## term estimate std.error conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 58.4 1.10 57.0 59.8
## 2 sigma 20.7 0.776 19.7 21.7
## 3 mean_PPD 58.4 1.57 56.4 60.4
16.2.1 Quiz!!
3 artist:
- Mia X, artist with the lowest mean popularity in our data set
- Beyoncé, artist with nearly the highest mean popularity in our data set
- Mohsen Beats, an artist not in out data set
Using complete pooled model, what would be the approximate posterior predictive mean for a new song from this 3 artists?
<- spotify |>
artist_means group_by(artist) |>
summarize(count = n(), popularity = mean(popularity))
set.seed(84735)
<- posterior_predict(spotify_complete_pooled,
predictions_complete newdata = artist_means)
ppc_intervals(artist_means$popularity, yrep = predictions_complete,
prob_outer = 0.80) +
::scale_x_continuous(labels = artist_means$artist,
ggplot2breaks = 1:nrow(artist_means)) +
xaxis_text(angle = 90, hjust = 1)