16.5 Posterior prediction

What will be the popularity of new song of artist j (two cases: artist in the data / unknown artist)?

posterior_predict() exist but first we do it by “hand”!

16.5.1 First case: Frank Ocean (j=39)

Yinew,j|μj,σyN(μij,(σ(i)y)2)

We have plenty of μij and σ(i)y with two sources of variability :

  • Not all song of Ocean are eqully popular (within-group sampling variability)

  • we do not know the exact mean and variability of Ocean song (posterior variability)

# Simulate Ocean's posterior predictive model
set.seed(84735)
ocean_chains <- spotify_hierarchical_df |>
  rename(b = `b[(Intercept) artist:Frank_Ocean]`) |> 
  select(`(Intercept)`, b, sigma) |> 
  mutate(mu_ocean = `(Intercept)` + b,
         y_ocean = rnorm(20000, mean = mu_ocean, sd = sigma)) # stuff that I always forget

# Check it out
head(ocean_chains, 3)
##   (Intercept)        b    sigma mu_ocean  y_ocean
## 1    49.93503 18.01483 14.51838 67.94986 77.63700
## 2    49.20877 19.15738 13.47368 68.36615 66.69757
## 3    51.66700 19.05753 14.27727 70.72453 81.87319

Then you summarize it:

ocean_chains |> 
  mean_qi(y_ocean, .width = 0.80) 
## # A tibble: 1 × 6
##   y_ocean .lower .upper .width .point .interval
##     <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1    69.4   51.4   87.6    0.8 mean   qi
# to put into context 
# the range of a new song is wider tham the average of ocean
artist_summary_scaled |>
  filter(artist == "artist:Frank_Ocean")
## # A tibble: 1 × 7
##   artist              mu_j .lower .upper .width .point .interval
##   <fct>              <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1 artist:Frank_Ocean  69.4   66.6   72.2    0.8 mean   qi

16.5.2 Posterior prediction for an observed group

We do not have μj but we know new artist is an artist! And we now the range of mean popularity level among artist N(μ,σu) and we have 44 artists.

  • step 1: simulate μnewartist bt drawing into layer 2 of MCMC

  • step 2: simulate song popularity with Layer 1 and μnewartist

We are adding a new source of variability (not all artist are equally popular : between group)

set.seed(84735)
mohsen_chains <- spotify_hierarchical_df |>
  mutate(sigma_mu = sqrt(`Sigma[artist:(Intercept),(Intercept)]`),
         mu_mohsen = rnorm(20000, `(Intercept)`, sigma_mu), # new stuff
         y_mohsen = rnorm(20000, mu_mohsen, sigma))

# Posterior predictive summaries
mohsen_chains |> 
  mean_qi(y_mohsen, .width = 0.80)
## # A tibble: 1 × 6
##   y_mohsen .lower .upper .width .point .interval
##      <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1     52.3   25.9   78.8    0.8 mean   qi

16.5.3 posterior_predict()

set.seed(84735)
prediction_shortcut <- posterior_predict(
  spotify_hierarchical,
  newdata = data.frame(artist = c("Frank Ocean", "Mohsen Beats")))

# Posterior predictive model plots
mcmc_areas(prediction_shortcut, prob = 0.8) +
  ggplot2::scale_y_discrete(labels = c("Frank Ocean", "Mohsen Beats"))
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.