19.6 Sparse autoencoders
They are used to extract the most influential feature representations, which helps to:
- Understand what are the most unique features of a data set
- Highlight the unique signals across the features.
19.6.1 Mathematical description
In the context of tanh activation function, we consider a neuron active if the output value is closer to 1 and inactive if its output is closer to -1, but we can increase the number of inactive neurons by incorporating sparsity (average activation of the coding layer).
ˆρ=1mm∑i=1A(X)
Let’s get it from our example:
<- h2o.deepfeatures(best_model, features, layer = 1)
ae100_codings %>%
ae100_codings as.data.frame() %>%
::gather() %>%
tidyrsummarize(average_activation = mean(value))
## average_activation
## 1 -0.00677801
The most commonly used penalty is known as the Kullback-Leibler divergence (KL divergence) which measure the divergence between the target probability ρ that a neuron in the coding layer will activate, and the actual probability.
∑∑KL(ρ||ˆρ)=∑ρlogρˆρ+(1−ρ)log1−ρ1−ˆρ
Now we just need to add the penalty to our loss function with a parameter (β) to control the weight of the penalty.
minimize(L=f(X,X′)+β∑KL(ρ||ˆρ))
Adding sparsity can force the model to represent each input as a combination of a smaller number of activations.
19.6.2 Tuning sparsity β parameter
- Defining an evaluation grid for the β parameter.
<- list(sparsity_beta = c(0.01, 0.05, 0.1, 0.2)) hyper_grid
- Training a model for each option
<- h2o.grid(
ae_sparsity_grid algorithm = 'deeplearning',
x = seq_along(features),
training_frame = features,
grid_id = 'sparsity_grid',
autoencoder = TRUE,
hidden = 100,
activation = 'Tanh',
hyper_params = hyper_grid,
sparse = TRUE,
average_activation = -0.1,
ignore_const_cols = FALSE,
seed = 123
)
- Identifying the best option.
h2o.getGrid('sparsity_grid', sort_by = 'mse', decreasing = FALSE)
## H2O Grid Details
## ================
##
## Grid ID: sparsity_grid
## Used hyper parameters:
## - sparsity_beta
## Number of models: 4
## Number of failed models: 0
##
## Hyper-Parameter Search Summary: ordered by increasing mse
## sparsity_beta model_ids mse
## 1 0.01 sparsity_grid_model_1 0.012982916169006953
## 2 0.2 sparsity_grid_model_4 0.01321464889160263
## 3 0.05 sparsity_grid_model_2 0.01337749148043942
## 4 0.1 sparsity_grid_model_3 0.013516631653257992