11.6 Kaplan-Meier survival curve

We don’t have $T$ so cannot just count up how many are alive at any given point in the study to estimate $S(t)$ .
Define:
- $d_j$ : the times of death.
- $r_j$ : the number of non-censored ‘alive’ cases at time $d_j$ . (at risk)
- $q_j$ : the number that die at time $d_j$ (typically just 1!)
The ratio $(r_j - q_j)/r_j$ is the fraction of those at risk that survive past time $d_k$
This fraction is an estimate of the probabilty $Pr(T> d_j | T> d_{j-1})$

Note that this uses only uncensored data at time $d_i$ but includes data that could become censored later! It takes care of censoring ‘automatically’.

The text shows how one can decompose $S(d_k)$ into these more elemental probabilities:

$S(d_k) = Pr(T> d_k | T> d_{k-1}) \times ... \times Pr(T > d_2 | T > d_1)Pr(T> d_1)$

This leads to the Kaplan-Meier estimator:

$\hat{S}(d_k) = \Pi_{j=1}^{k} (\frac{r_j- q_j}{r_j})$

Note also that:

$\ln\hat{S}(d_k) = \sum_{j=1}^{k} \ln (\frac{r_j- q_j}{r_j})$