The number of neurons in the input/output layers is related to their application.
1
neuron input with 1
neuron output: Ex. drug dosage and binary response. Like linear regression
N
neuron inputs, M
neuron outputs: N number of features. M number of categories in classification.
* Example: classification of images into digits. 28x28 input neuron from a 28x28px image and 10 output networks for digits.
It takes several binary inputs x1,x2,x3 and produces a single binary output
Each input has an associated weight w1,w2,w3 indicating the importance of its input to the output.
To calculate the output: output={0if ∑jwjxj≤threshold1if ∑jwjxj≥threshold
A network of perceptrons could weigh up evidence and make decisions, like computing logical functions with binary operations such as AND
, OR
or NAND
gates.
The output is defined by the sigmoid function:
σ(z)=11+e−z σ(w⋅x+b)=11+exp(−∑jwjxj−b)
Inputs xj and single output in the [0,1] range.
Weights, wj tell us how important each input is.
Bias b tell us how high the sum needs to be to activate the neuron.
ANN learning
From: But what is a neural network? | Deep learning chapter 1
From: But what is a neural network? | Deep learning chapter 1 Matrix operations
Goal: find weights and biases so that the output of the network approximates f(x) for all training inputs.
To evaluate how well we’re achieving this goal, we define a cost function, also referred to as loss or objective function.
Common one: mean squared error
C(w,b)=12n∑x||y(x)−a||2 We want C(w,b)≈0, so we want to minimize the function
In x,y, the slope of the derivative is the rate of change of a function at a specific point.
With partial derivatives using the chain rule (from Leibniz)
A very simple example:
We get the derivatives of the cost function with respect to each individual w and b and update them according to a learning rate.
We repeat until the change is really small or we reach some other condition.
Training Algorithm
What do we need?