10.2 Single Layer Neural Network
Let’s consider a dataset made of p predictors
X=(X1,X2,X3,...,Xp)
and build a non linear function f(X) to predict a response Y.
f(X)=β0+K∑k=1βkhk(X)
where hk(X) is the expression of the hidden layers, a transformation of the input, named as the Ak function of X with K activation, k=1,...,K, which are not directly observed.
Ak=hk(X) and identify the activation: a non linear transformation of a linear function g(z)
Ak=hk(X)=g(z)
Ak=hk(X)=g(wk0+p∑j=1wkjXj) to obtain an output layer which is a linear model that uses these activations Ak as inputs, resulting in a function f(X).
f(X)=β0+K∑k=1βkAk aech Ak is a different transformation of hk(X)
β0,...,BK and w1,0,...,wK,p need to be estimated from data.
What about the activation function g(z)? There are various options, but the most used ones are:
- sigmoid
g(z)=ez1+ez
- ReLU rectified linear unit
g(z)=(z)+={0if z<0;1otherwise.

Figure 10.1: Activation functions - Chap 10
This is the structure of a single layer neural network. Here we can see the layer inputs, the hidden layers and the output layer.

Figure 10.2: Single layer neural network - Chap 10
In this example, we see deep learning applied to dosage/efficacy study, the model parameters with the activation function in the middle.
The parameters can be retrieved with backpropagation which optimizes weights for coefficients wkj and biases for the intercepts wk0. We will see about that later on this notes. For now we suppose to know what is the value of the parameters, and we investigate the calculation of the deep learning model.
f(X)=β0+K∑k=1βkg(wk0+p∑wkjXj)

Figure 10.3: Neural network Pt.1 Inside the black box - Youtube video
Neural network Pt.1 Inside the black box
This is from the book pg.406, and you can see all the passages for calculating the estimated f(X) supposing that we know the value of the parameters.

Figure 10.4: Neural network model fit calculation - Chap 10
Fitting a quantitative neural network to estimate the unknown parameters wkj and βk requires the squared-error loss function to be minimum.
Mean squared-error:
min∑i=1n(yi−f(xi))2
Or to train a qualitative neural network is to minimizing the negative multinomial log-likelihood or the cross−entrophy. We see this explained in the multilayer neural network section.
Min of the negative multinomial log-likelyhood:
−n∑i=1K∑m=0yimlog(fm(xi))
As deep learnig models have the ability to fit a good squiggle lines to data, the estimated parameters can be applied to a special softmax function:
fm=Pr(Y=m|X)=eZm∑Kk=0eZk