16.5 Posterior prediction
What will be the popularity of new song of artist j (two cases: artist in the data / unknown artist)?
posterior_predict()
exist but first we do it by “hand”!
16.5.1 First case: Frank Ocean (j=39)
Yinew,j|μj,σy∼N(μij,(σ(i)y)2)
We have plenty of μij and σ(i)y with two sources of variability :
Not all song of Ocean are eqully popular (within-group sampling variability)
we do not know the exact mean and variability of Ocean song (posterior variability)
# Simulate Ocean's posterior predictive model
set.seed(84735)
<- spotify_hierarchical_df |>
ocean_chains rename(b = `b[(Intercept) artist:Frank_Ocean]`) |>
select(`(Intercept)`, b, sigma) |>
mutate(mu_ocean = `(Intercept)` + b,
y_ocean = rnorm(20000, mean = mu_ocean, sd = sigma)) # stuff that I always forget
# Check it out
head(ocean_chains, 3)
## (Intercept) b sigma mu_ocean y_ocean
## 1 49.93503 18.01483 14.51838 67.94986 77.63700
## 2 49.20877 19.15738 13.47368 68.36615 66.69757
## 3 51.66700 19.05753 14.27727 70.72453 81.87319
Then you summarize it:
|>
ocean_chains mean_qi(y_ocean, .width = 0.80)
## # A tibble: 1 × 6
## y_ocean .lower .upper .width .point .interval
## <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 69.4 51.4 87.6 0.8 mean qi
# to put into context
# the range of a new song is wider tham the average of ocean
|>
artist_summary_scaled filter(artist == "artist:Frank_Ocean")
## # A tibble: 1 × 7
## artist mu_j .lower .upper .width .point .interval
## <fct> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 artist:Frank_Ocean 69.4 66.6 72.2 0.8 mean qi
16.5.2 Posterior prediction for an observed group
We do not have μj but we know new artist is an artist! And we now the range of mean popularity level among artist N(μ,σu) and we have 44 artists.
step 1: simulate μnewartist bt drawing into layer 2 of MCMC
step 2: simulate song popularity with Layer 1 and μnewartist
We are adding a new source of variability (not all artist are equally popular : between group)
set.seed(84735)
<- spotify_hierarchical_df |>
mohsen_chains mutate(sigma_mu = sqrt(`Sigma[artist:(Intercept),(Intercept)]`),
mu_mohsen = rnorm(20000, `(Intercept)`, sigma_mu), # new stuff
y_mohsen = rnorm(20000, mu_mohsen, sigma))
# Posterior predictive summaries
|>
mohsen_chains mean_qi(y_mohsen, .width = 0.80)
## # A tibble: 1 × 6
## y_mohsen .lower .upper .width .point .interval
## <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 52.3 25.9 78.8 0.8 mean qi
16.5.3 posterior_predict()
set.seed(84735)
<- posterior_predict(
prediction_shortcut
spotify_hierarchical,newdata = data.frame(artist = c("Frank Ocean", "Mohsen Beats")))
# Posterior predictive model plots
mcmc_areas(prediction_shortcut, prob = 0.8) +
::scale_y_discrete(labels = c("Frank Ocean", "Mohsen Beats")) ggplot2
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.