16.4 Building the hierarchical model

16.4.1 The hierarchy

Layer 1: \(Y_{ij} | \mu_j , \sigma_y\) how song popularity varies WITHIN artist \(j\)

Layer 2: \(\mu_{j}|\mu, \sigma_\mu\) how typical popularity \(\mu_{j}\) varies BETWEEN artists

Layer 3: \(\mu, \sigma_y, \sigma_{\mu}\) prior models for shared global parameters

(order do not necessarily matter)

Layer 1:

\[Y_{ij}|\mu_j, \sigma_j \sim N(\mu_j, \sigma_y^2)\]

\(\mu_j\) mean song popularity for artist j

\(\sigma_y\) within group variability sd in popularity from song to song within each artist

-> if we stop here we have a “no pooled”

Layer 2:

\[\mu_{j}|\mu_{j}. \sigma_\mu \overset{ind}{\sim} N(\mu, \sigma_\mu^2)\] \(\mu\) global average: the means popularity ratings for the most average artist

\(\sigma_{u}\) \(between-group variability\), the standard deviation in mean popularity \(μj\) from artist to artist.

#Normal is not too bad
ggplot(artist_means, aes(x = popularity)) + 
  geom_density()

Layer 3 : Priors

\[\mu \sim N(50, 52^2)\] (this 52 ???)

\[\sigma_y \sim Exp(0.048) \]

\[ \sigma \sim Exp(1)\]

@realHollanders

This is a one way analyis of variance (ANOVA)

An other way to think about it:

\[Y_{ij}|\mu_j, \sigma_j \sim N(\mu_j, \sigma_y^2) \quad with \quad \mu_j = \mu +b_j\]

\[b_j | \sigma_\mu \overset{ind}{\sim} N(0, \sigma_\mu^2) \]

Example: if \(\mu\) = 55 and \(\mu_j\) = 65 \(b_j\) = 10

16.4.2 within- vs -between-group variability

Before we analyse just one source of variability (the individual level), now we have two sources (\(\sigma_\mu, \sigma_y\)). The first one is the sqrt(variance) within the group (song of an artist) and the second is the sqrt(variance) between group.

The total variance is :

\[Var(Y_{ij} = \sigma²_y + \sigma^2_u) \] Other way of thinking about is:

\(\frac{\sigma^2_y}{\sigma^2_\mu + \sigma^2_y}\) proportion of total variance explained by difference within each group

\(\frac{\sigma^2_\mu}{\sigma^2_\mu + \sigma^2_y}\) proportion of total variance explained by difference between groups

You have correlation in song popularity of the same artist (within group). And assuming each groups are independant we get:

\[Cor(Y_{ij}, Y_{kj}) = \frac{\sigma^2_\mu}{\sigma^2_\mu + \sigma^2_y}\] ## Posterior analysis

16.4.3 Posterior simulation

47 parameters:

  • 44 artists specific parameters (\(\mu_{j}\))
  • 3 global parameters (\(\mu, \sigma_j, sigma_\mu\))
spotify_hierarchical <- stan_glmer(
  popularity ~ (1 | artist), # this  is the part that tell that artist is a group not a predictor
  data = spotify, family = gaussian,
  prior_intercept = normal(50, 2.5, autoscale = TRUE),
  prior_aux = exponential(1, autoscale = TRUE),
  prior_covariance = decov(reg = 1, conc = 1, shape = 1, scale = 1), # stuff we will learn chapter 17 suppose to be equivalent to Exp(1)
  chains = 4, iter = 5000*2, seed = 84735)
# Confirm the prior tunings
prior_summary(spotify_hierarchical)
## Priors for model 'spotify_hierarchical' 
## ------
## Intercept (after predictors centered)
##   Specified prior:
##     ~ normal(location = 50, scale = 2.5)
##   Adjusted prior:
##     ~ normal(location = 50, scale = 52)
## 
## Auxiliary (sigma)
##   Specified prior:
##     ~ exponential(rate = 1)
##   Adjusted prior:
##     ~ exponential(rate = 0.048)
## 
## Covariance
##  ~ decov(reg. = 1, conc. = 1, shape = 1, scale = 1)
## ------
## See help('prior_summary.stanreg') for more details

We need to check the success of MCMC (saving you time here)

bayesplot::pp_check(spotify_hierarchical) + xlab("popularity")

Let inspect our simulation:

spotify_hierarchical_df <- as.data.frame(spotify_hierarchical)
dim(spotify_hierarchical_df)
## [1] 20000    47

16.4.4 Posterior analysis of global parameters

  • \(\mu = (intercept)\)
  • \(\sigma_y = sigma\)
  • \(\sigma_\mu² = Sigma[artist:(intercep),(Intercept)]\) Attention! here this is the variance
tidy(spotify_hierarchical, effects = "fixed" # for getting global
     , conf.int = TRUE, conf.level = 0.80)
## # A tibble: 1 × 5
##   term        estimate std.error conf.low conf.high
##   <chr>          <dbl>     <dbl>    <dbl>     <dbl>
## 1 (Intercept)     52.5      2.44     49.3      55.6
tidy(spotify_hierarchical, 
     effects = "ran_pars") # PARameters and RANdomness or variability)
## # A tibble: 2 × 3
##   term                    group    estimate
##   <chr>                   <chr>       <dbl>
## 1 sd_(Intercept).artist   artist       15.2
## 2 sd_Observation.Residual Residual     14.0

An other way:

15.1^2 / (15.1^2 + 14.0^2) # sigma_mu^2 / sigma_mu^2 + sigma_y^2 
## [1] 0.5377468
14.0^2 / (15.1^2 + 14.0^2) # sigma_y^2 / sigma_mu^2 + sigma_y^2
## [1] 0.4622532

16.4.5 posterior analysis of group specific

If you recall that \(\mu_j = \mu + b_{j}\)

We have \(\mu\) and \(b_j\) (check spotify_hierarchical_df)

 # RANdom Values
artist_summary <- tidy(spotify_hierarchical, effects = "ran_vals"
                       , conf.int = TRUE, conf.level = 0.80)

# Check out the results for the first & last 2 artists
# 80% intervall
# this produce a summary
artist_summary %>% 
  select(level, conf.low, conf.high) %>% 
  slice(1:2, 43:44)
## # A tibble: 4 × 3
##   level         conf.low conf.high
##   <chr>            <dbl>     <dbl>
## 1 Mia_X            -40.8     -23.2
## 2 Chris_Goldarg    -39.4     -26.8
## 3 Lil_Skies         11.0      30.2
## 4 Camilo            19.4      32.4
dim(artist_summary)
## [1] 44  7

Other way: combining simulations to simulate posterior of \(\mu_j\)

\[\mu_j = \mu + b_{j} = (Intercept) + b[(Intercept) \quad artist:j]\]

artist_chains <- spotify_hierarchical |>
  spread_draws(`(Intercept)`, b[,artist]) |> 
  mutate(mu_j = `(Intercept)` + b) 

dim(artist_chains)
## [1] 880000      7
artist_chains |>
  select(artist, `(Intercept)`, b, mu_j) |> 
  head(4)
## # A tibble: 4 × 4
## # Groups:   artist [4]
##   artist              `(Intercept)`     b  mu_j
##   <chr>                       <dbl> <dbl> <dbl>
## 1 artist:Alok                  49.9  12.6  62.5
## 2 artist:Atlas_Genius          49.9 -12.8  37.1
## 3 artist:Au/Ra                 49.9  10.7  60.6
## 4 artist:Beyoncé               49.9  22.9  72.8
# Get posterior summaries for mu_j
artist_summary_scaled <- artist_chains |> 
  select(-`(Intercept)`, -b) |> 
  mean_qi(.width = 0.80) |> 
  mutate(artist = fct_reorder(artist, mu_j))

# Check out the results
artist_summary_scaled |> 
  select(artist, mu_j, .lower, .upper) |>
  head(4)
## # A tibble: 4 × 4
##   artist               mu_j .lower .upper
##   <fct>               <dbl>  <dbl>  <dbl>
## 1 artist:Alok          64.3   60.2   68.4
## 2 artist:Atlas_Genius  47.1   38.9   55.1
## 3 artist:Au/Ra         59.5   52.2   66.9
## 4 artist:BUNT.         44.8   35.5   54.0
ggplot(artist_summary_scaled, 
       aes(x = artist, y = mu_j, ymin = .lower, ymax = .upper)) +
  geom_pointrange() +
  xaxis_text(angle = 90, hjust = 1)

16.4.5.1 QUiz !

Similar posterior mean but different 80%CI ?