11.6 Kaplan-Meier survival curve

  • We don’t have \(T\) so cannot just count up how many are alive at any given point in the study to estimate \(S(t)\).

  • Define:

    • \(d_j\) : the times of death.

    • \(r_j\): the number of non-censored ‘alive’ cases at time \(d_j\). (at risk)

    • \(q_j\): the number that die at time \(d_j\) (typically just 1!)

  • The ratio \((r_j - q_j)/r_j\) is the fraction of those at risk that survive past time \(d_k\)

  • This fraction is an estimate of the probabilty \(Pr(T> d_j | T> d_{j-1})\)

Note that this uses only uncensored data at time \(d_i\) but includes data that could become censored later! It takes care of censoring ‘automatically’.

  • The text shows how one can decompose \(S(d_k)\) into these more elemental probabilities:

\[ S(d_k) = Pr(T> d_k | T> d_{k-1}) \times ... \times Pr(T > d_2 | T > d_1)Pr(T> d_1) \]

  • This leads to the Kaplan-Meier estimator:

\[ \hat{S}(d_k) = \Pi_{j=1}^{k} (\frac{r_j- q_j}{r_j}) \]

  • Note also that:

\[ \ln\hat{S}(d_k) = \sum_{j=1}^{k} \ln (\frac{r_j- q_j}{r_j}) \]