11.6 Kaplan-Meier survival curve
We don’t have T so cannot just count up how many are alive at any given point in the study to estimate S(t).
Define:
dj : the times of death.
rj: the number of non-censored ‘alive’ cases at time dj. (at risk)
qj: the number that die at time dj (typically just 1!)
The ratio (rj−qj)/rj is the fraction of those at risk that survive past time dk
This fraction is an estimate of the probabilty Pr(T>dj|T>dj−1)
Note that this uses only uncensored data at time di but includes data that could become censored later! It takes care of censoring ‘automatically’.
- The text shows how one can decompose S(dk) into these more elemental probabilities:
S(dk)=Pr(T>dk|T>dk−1)×...×Pr(T>d2|T>d1)Pr(T>d1)
- This leads to the Kaplan-Meier estimator:
ˆS(dk)=Πkj=1(rj−qjrj)
- Note also that:
lnˆS(dk)=k∑j=1ln(rj−qjrj)