温莎日记 25

S-Plus Commands for Survival Estimation：

R studio

Estimating the Survival Function：One-sample non-parametric methods

We consider three methods for estimating a survivorship function $S(t) =Pr(T\geq t)$ , without resorting to parametric methods — Kaplan-Meier method；Life-table (Actuarial Estimator)；via the Cumulative hazard estimator.

(1) The Kaplan-Meier Estimator

It can be justified from 3 perspectives: product limit estimator, likelihood justification, redistribute to the right estimator. We will start with an intuitive motivation based on conditional probabilities, then review some of the other justifications.

Motivation: First, consider an example where there is no censoring. The following are times of remission (weeks) for 21 leukemia patients receiving control treatment (Table 1.1 of Cox & Oakes):

1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8, 11, 11, 12, 12, 15, 17, 22, 23

How would we estimate S(10), the probability that an individual survives to time 10 or later? What about $\tilde{S} (8)$ ? Let’s construct a table of time t.

Empirical Survival Function: no censoring, $\tilde{S} (t)=$ {# individuals with T ≥ t}/{total sample size}.

Example for leukemia data (control arm).

What if there is censoring?

Consider the treated group from Table 1.1 of Cox and Oakes: 6+, 6, 6, 6, 7, 9+, 10+, 10, 11+, 13, 16, 17+, 19+, 20+, 22, 23, 25+, 32+, 32+, 34+, 35+. [Note: times with + are right censored]

We know S(6)= 21/21, because everyone survived at least until time 6 or greater. But, we can’t say S(7) = 17/21, because we don’t know the status of the person who was censored at time 6.

In a 1958 paper in the Journal of the American Statistical Association, Kaplan and Meier proposed a way to non-parametrically estimate S(t), even in the presence of censoring. The method is based on the ideas of conditional probability.

$\rightarrow$ Now, let’s apply these ideas of conditional probability to estimate S(t): $a_k < t \leq a_{k+1}$ . So, $S(t)=P(T\geq a_{k+1})=P(T\geq a_1,T\geq a_2,...,T\geq a_{k+1})=P(T\geq a_1)\times \prod_{j=1}^k P(T\geq a_{j+1}|T\geq a_j)= \prod_{j=1}^k [1-P(T=a_j|T\geq a_j)] =\prod_{j=1}^k [1-\lambda _j ]$ so, $\hat{S} (t) \cong \prod_{j=1}^k (1-\frac{d_j}{r_j} )=\prod_{{j:a_j<t}}(1-\frac{d_j}{r_j} )$ . And $d_j$ is the number of deaths at $a_j$ , $r_j$ is the number at risk at $a_j$ .

$\rightarrow$ Intuition behind the Kaplan-Meier Estimator: Think of dividing the observed time span of the study into a series of fine intervals so that there is a separate interval for each time of death or censoring. Using the law of conditional probability,

$Pr(T ≥ t) =\prod_{j}Pr(survive j-th interval I_j | survived to start of I_j)$ ,

where the product is taken over all the intervals including or preceding time t.

$\rightarrow$ 4 possibilities for each interval:

(1) No events (death or censoring) - conditional probability of surviving the interval is 1.

(2) Censoring - assume they survive to the end of the interval, so that the conditional probability of surviving the interval is 1.

(3) Death, but no censoring - conditional probability of not surviving the interval is # deaths (d) divided by #‘at risk’ (r) at the beginning of the interval. So the conditional probability of surviving the interval is 1−(d/r).

(4) Tied deaths and censoring - assume censoring last to the end of the interval, so that conditional probability of surviving the interval is still 1 − (d/r).

$\rightarrow$ General Formula for j-th interval:

It turns out we can write a general formula for the conditional probability of surviving the j-th interval that holds for all 4 cases: $1-d_j/r_j$ .

We could use the same approach by grouping the event times into intervals (say, one interval for each month), and then counting up the number of deaths in each to estimate the probability of surviving the interval (this is called the life-table estimate). However, the assumption that those censored last until the end of the interval wouldn’t be quite accurate, so we would end up with a cruder approximation. As the intervals get finer and finer, the approximations made in estimating the probabilities of getting through each interval become smaller and smaller, so that the estimator converges to the true S(t). This intuition clarifies why an alternative name for the KM is the product limit estimator.

KM - Cox and Oakes example.

$\rightarrow$ Make a table with a row for every death or censoring time:

KM plot for treated leukemia patients.

Two Other Justifications for KM Estimator：Cox and Oakes

The likelihood is that of g independent binomials: $L(\lambda)=\prod_{j=1}^g \lambda _{j}^{d_j}(1-\lambda _j)^{r_j-d_j}$ . Therefore, the maximum likelihood estimator of $\lambda _j$ is: $\hat{\lambda _j} =d_j/r_j$ .

Now we plug in the MLE’s of $\lambda$ to estimate S(t): $\hat{S} (t)=\prod_{j:a_j<t}(1-\hat{\lambda }_j )=\prod_{j:a_j<t}(1-\frac{d_j}{r_j} )$ .

$\rightarrow$ Redistribute to the right justification (Efron, 1967)

Algorithm: The idea behind Efron approach is to spread the contributions of censored observations out over all the possible times to their right.

Step (1): arrange the n observed times (deaths or censoring) in increasing order. If there are ties, put censored after deaths.

Step (2): Assign weight (1/n) to each time.

Step (3): Moving from left to right, each time you encounter a censored observation, distribute its mass to all times to its right.

Step (4): Calculate $\hat{S} _j$ by subtracting the final weight for time j from $\hat{S} _{j-1}$ .

$\rightarrow$ Example of “redistribute to the right” algorithm

Consider the following event times: 2, 2.5+, 3, 3, 4, 4.5+, 5, 6, 7. The algorithm goes as follows:

This comes out the same as the product limit approach.

$\rightarrow$ Properties of the KM estimator: This is just like an estimated probability from a binomial distribution, so we have the no censoring case: $\hat{S} (t) ≃ N (S(t),S(t)[1-S(t)]/n)$ .

How does censoring affect this?

$\hat{S} (t)$ is still approximately normal ; The mean of $\hat{S} (t)$ converges to the true S(t); The variance is a bit more complicated (since the denominator n includes some censored observations). In calculus, we can construct (1 − α)% confidence bands about $\hat{S} (t)$ ： $\hat{S} (t)\pm z_{1-\alpha /2} se[\hat{S} (t)]$ .

$\implies$ Greenwood’s formula:

1. The KM estimator

$\hat{S} (t)=\prod_{j:r_j<t}(1-\hat{\lambda }_j )=\prod_{j:r_j<t}(1-\frac{d_j}{r_j} )$ .

2. Binomial proportions to the variance standards

$var(\hat{\lambda }_j )\approx \frac{\hat{\lambda}_j(1-\hat{\lambda}_j) }{r_j}$ .

3. Estimate its variance using the delta method

If Y is normal with mean $\mu$ and variance $\sigma ^2$ , then g(Y) is approximately normally distributed with mean g(µ) and variance $[h(\mu )]^2\sigma ^2$ , where h(µ)=g'(µ).

4. Two specific examples of the delta method

Z=log(Y), then $Z∼N[log(\mu ),{(\frac{1}{\mu } )}^2 \sigma ^2 ]$ ; Z=exp(Y), then $Z∼N[e^\mu,{[e^\mu]}^2 \sigma ^2 ]$ .

The examples above use the following results from calculus:

$\frac{d}{dx} log(u)=\frac{1}{u} (\frac{du}{dx} )$ ; $\frac{d}{dx} e^u = e^u (\frac{du}{dx} )$ .

Instead of computing $\hat{S} (t)$ directly with too large waste in the resources, we will look at its log:

$log[\hat{S} (t)]=\sum_{j:r_j<t}log(1-\hat{\lambda }_j )$ .

Thus, by approximate independence of the $\hat{\lambda } _j$ by the Z=log(Y) transformation:

Thus by Z=exp(Y), and $\hat{S} (t)=exp\left\{ log[\hat{S}(t) ] \right\}$ : $var[\hat{S} (t)]={[\hat{S} (t)]}^2var[log(\hat{S} (t))]$ .

5. Back to confidence intervals

Problem: This approach can yield values > 1 or < 0.

Better approach: Get a 95% confidence interval for L(t) = log(− log(S(t))).

Since this quantity is unrestricted, the confidence interval will be in the proper range when we transform back. To see why this works, note the following:

① Since $\hat{S}(t)$ is an estimated probability $0 \leq \hat{S}(t)\leq 1$ ;

② Taking the log of $\hat{S}(t)$ has bounds $- ∞ \leq log[\hat{S} (t)]\leq 0$ ;

③ Taking the opposite $0 \leq -log[\hat{S}(t) ] \leq ∞$ ;

④ Taking the log again $-∞ \leq log[-log[\hat{S} (t)]]\leq ∞$ ;

⑤ To transform back, reverse steps with $S(t)=exp[-exp(L(t))]$ .

Log-log Approach for Confidence Intervals:

① Define L(t) = log(− log(S(t))) ；

② Form a 95% confidence interval for L(t) based on $\hat{L}(t)$ , yielding $[\hat{L}(t)-A ,\hat{L} (t)+A]$ , where $A=1.96se(\hat{L}(t) )$ ;

③ Since S(t) = exp(− exp(L(t)), the confidence bounds for the 95% CI on S(t) are: $[exp(-e^{(\hat{L}(t)+A )}),exp(-e^{(\hat{L}(t)-A )})]$ ;

④ Substituting $\hat{L} (t)=log[-log(\hat{S}(t) )]$ back into the above bounds, we get confidence bounds of $([\hat{S}(t) ]^{e^A},[\hat{S} (t)]^{e^{-A}})$ ;

⑤ More calculations of the variance functions:

$var(\hat{L}(t) )=var[log(-log(\hat{S} (t)))];var[log(\hat{S} (t))]=\sum_{j:r_j<t}\frac{d_j}{(r_j-d_j)r_j}$ ;

⑥ Applied the delta method to simplex inference:

$var(\hat{L}(t) )=var[log(-log[\hat{S} (t)])]={log\hat{S} (t)}^{-2} \sum_{j:r_j<t} \frac{d_j}{(r_j-d_j)r_j}$ .

⑦ The version of the new confidence interval:

$\hat{S}(t) ^{e^{\pm 1.96se(\hat{L} (t))}}$ .

(2) The Lifetable or Actuarial Estimator

Our goal is still to estimate survival function, hazard, and density function, but this is complicated by the fact that we don’t know exactly when during each time interval an event occurs.

+ one of the oldest techniques around

+ used by actuaries, demographers, etc.

+ applies when the data are grouped

Population Life Tables:

+ cohort life table - describes the mortality experience from birth to death for a particular cohort of people born at about the same time. People at risk at the start of interval are those who survived the previous interval.

+ current life table - constructed from (1) census information on the number of individuals alive at each age, for a given year and (2) vital statistics on the number of deaths or failures in a given year, by age. This type of lifetable is often reported in terms of a hypothetical cohort of 100,000 people.

Generally, censoring is not an issue for Population Life Tables.

Clinical Life tables - applies to grouped survival data from patients with specific diseases. Because patients can enter the study at different times, or be lost to follow-up, censoring must be allowed.

We could apply the K-M formula directly to the numbers in the table on the previous page, estimating S(t) as $\hat{S} (t)=\Sigma _{j:r_j<t}(1-{d_j}/{r_j} )$ . However, this approach is unsatisfactory for grouped data......it treats the problem as though it were in discrete time, with events happening only at 1 yr, 2 yr, etc. In fact, what we are trying to calculate here is the conditional probability of dying within the interval, given survival to the beginning of it.

Quantities estimated: Conditional probability of dying, $\hat{q} _j=d_j/r_j$ ; Conditional probability of surviving, $\hat{p}_j=1-\hat{q} _j$ ; Cumulative probability at $t_j$ , $\hat{S} (t_j)= \Pi _{l\leq j}(1-d_l/r_l)$ .

Other quantities estimated at the midpoint of the j-th interval:

- Hazard in the j-th interval: $\hat{\lambda } (t_{mj} )=\hat{q}_j/{[b_j(1-\hat{q}_j/2 )]}$ .

- Density at the midpoint of the j-th interval:

$\hat{f} (t_{mj})=\hat{\lambda } (t_{mj})\hat{S} (t_{mj})=\hat{\lambda } (t_{mj})[\hat{S} (t_j)+\hat{S} (t_{j-1})]/2$ .

(3) Estimating the cumulative hazard — Nelson-Aalen estimator

Just as we did for the KM, think of dividing the observed time span of the study into a series of fine intervals so that there is only one event per interval:

Suppose we want to estimate $\Lambda (t)=\int_{0}^{t} \lambda (u)du$ , the cumulative hazard at time t. Λ(t) can then be approximated by a sum: $\hat{\Lambda } (t)=\Sigma _j \lambda _j \Delta$ , where the sum is over intervals, $\lambda _j$ is the value of the hazard in the j-th interval and ∆ is the width of each interval. $\hat{\lambda } \Delta$ is approximately the probability of dying in the interval, we can further approximate by $\hat{\lambda } (t)=\Sigma _j d_j/r_j$ . It follows that Λ(t) will change only at death times, and hence we write the Nelson-Aalen estimator as:

$\hat{\Lambda } _{NA}(t)=\sum_{j:r_j<t} d_j/r_j$ .

Fleming-Harrington: $\hat{S} _{FH}(t)=exp\left\{ -\hat{\Lambda } _{NA}(t) \right\}$ . In general, this estimator of the survival function will be close to the Kaplan-Meier estimator, $\hat{S} _{KM}(t)$ . We can also go the other way ...... we can take the Kaplan-Meier estimate of S(t), and use it to calculate an alternative estimate of the cumulative hazard function: $\hat{\Lambda } _{KM}(t)=-log\hat{S } _{KM}(t)$ .

Splus Commands for Fleming-Harrington Estimator.

温莎日记 25

Estimating the Survival Function：One-sample non-parametric methods

(1) The Kaplan-Meier Estimator

(2) The Lifetable or Actuarial Estimator

(3) Estimating the cumulative hazard — Nelson-Aalen estimator

推荐阅读更多精彩内容