8.27 Now, the BART algorithm

In the first iteration of the BART algorithm, all $K$ trees are initialized to have 1 root node, with $\hat{f}^1_k(x) = \frac{1}{nK}\sum_{i=1}^{n}y_i$
- i.e., the mean of the response values divided by the total number of trees
Thus, for the first iteration ( $b = 1$ ), the prediction for all $K$ trees is just the mean of the response

$\hat{f}^1(x) = \sum_{k=1}^K\hat{f}^1_k(x) = \sum_{k=1}^K\frac{1}{nK}\sum_{i=1}^{n}y_i = \frac{1}{n}\sum_{i=1}^{n}y_i$