11.6 Kaplan-Meier survival curve
We don’t have \(T\) so cannot just count up how many are alive at any given point in the study to estimate \(S(t)\).
Define:
\(d_j\) : the times of death.
\(r_j\): the number of non-censored ‘alive’ cases at time \(d_j\). (at risk)
\(q_j\): the number that die at time \(d_j\) (typically just 1!)
The ratio \((r_j - q_j)/r_j\) is the fraction of those at risk that survive past time \(d_k\)
This fraction is an estimate of the probabilty \(Pr(T> d_j | T> d_{j-1})\)
Note that this uses only uncensored data at time \(d_i\) but includes data that could become censored later! It takes care of censoring ‘automatically’.
- The text shows how one can decompose \(S(d_k)\) into these more elemental probabilities:
\[ S(d_k) = Pr(T> d_k | T> d_{k-1}) \times ... \times Pr(T > d_2 | T > d_1)Pr(T> d_1) \]
- This leads to the Kaplan-Meier estimator:
\[ \hat{S}(d_k) = \Pi_{j=1}^{k} (\frac{r_j- q_j}{r_j}) \]
- Note also that:
\[ \ln\hat{S}(d_k) = \sum_{j=1}^{k} \ln (\frac{r_j- q_j}{r_j}) \]