Conditioning and Stability

The algorithm used by Julia’s \ does not use the normal equations because of instability.
We need the condition number of a rectangular matrix, which is defined to be:

$\kappa(\mathbf{A}) = \|\mathbf{A}\|_2 \cdot \|\mathbf{A}^{+}\|_2.$

When the residuals are small, the conditioning of the least squares problem is close to $\kappa(\mathbf{A})$ .
However, our algorithm uses $\mathbf{A}^T\mathbf{A}$ , so the condition number is amplified to $\kappa(\mathbf{A}^2)$ , which can destabilize the normal equations (increasing the sensitivity to small changes).
Demo:

t = range(0,3,length=400);
f = [ x->sin(x)^2, x->cos((1+1e-7)*x)^2, x->1. ];
A = [ f(t) for t in t, f in f ];
κ = cond(A)
# 1.825322542326142e7

Set up fake problem with known exact solution (zero residual)

x = [1.,2,1];
b = A*x;

Use backslash:

x_BS = A\b;
observed_error = norm(x_BS-x)/norm(x);
error_bound = κ*eps();
@show observed_error
@show error_bound
# observed_error = 2.279154063514702e-11
# error_bound = 4.053030227715619e-9

Now try it using normal equations:

N = A'*A;
x_NE = N\(A'*b);
@show observed_err = norm(x_NE-x)/norm(x)
@show digits = -log10(observed_err)
    
#observed_err = norm(x_NE - x) / norm(x) = 2.719479911021037e-16
#digits = -(log10(observed_err)) = 15.565514144999181

THAT IS ODD . In the book this was much less accurate. I suspect some changes to how the backslach operator works, but did not investigate further.

x_LSN = lsnormal(A,b);

@show observed_err = norm(x_LSN-x)/norm(x)
@show digits = -log10(observed_err)

# observed_err = norm(x_LSN - x) / norm(x) = 0.015371886019245154
# digits = -(log10(observed_err)) = 1.8132728444390442

If we use our own implementation of solving the normal equation using backsub/forward sub we do see a small number of digits of accuracy, so the julia backslash operator is doing something different here (now).