温莎日记 29

Censoring and truncation

Right censoring occurs when a subject leaves the study before an event occurs, or the study ends before the event has occurred. For example, we consider patients in a clinical trial to study the effect of treatments on stroke occurrence. The study ends after 5 years. Those patients who have had no strokes by the end of the year are censored. If the patient leaves the study at time $t_e$ ; then the event occurs in $(t_e, ∞)$ . Left censoring is when the event of interest has already occurred before enrolment. This is very rarely encountered. Truncation is deliberate and due to study design. Right truncation occurs when the entire study population has already experienced the event of interest. Left truncation occurs when the subjects have been at risk before entering the study. Generally we deal with right censoring & sometimes left truncation.

Two types of independent right censoring:

Type I : completely random dropout and/or fixed time of end of study no event having occurred.

Type II: study ends when a fixed number of events amongst the subjects has occurred.

Likelihood and Censoring

If the censoring mechanism is independent of the event process, then we have an easy way of dealing with it. Suppose that T is the time to event and that C is the time to the censoring event. Assume that all subjects may have an event or be censored, say for subject i one of a pair of observations $(\tilde{t}_i ,\tilde{c}_i )$ may be observed: Then since we observe the minimum time we would have the following expression for the likelihood (using independence)

$L=\prod_{\tilde{t}_i <\tilde{c}_i }f(\tilde{t}_i )S_C(\tilde{t}_i )\prod_{\tilde{c}_i <\tilde{t}_i }S(\tilde{c}_i )f_C(\tilde{c} _i)$ .

Now define the following random variable: if T<C , δ=1； if T>C， δ=0.

For each subject we observe $t_i=min(\tilde{t}_i ,\tilde{c}_i )$ and $\delta _i$ , observations from a continuous random variable and a binary random variable. In terms of these L becomes

$L=\prod_{i}h(t_i)^{\delta _i}S(t_i)\prod_{i}h_C(t_i)^{1-\delta _i}S_C(t_i)$

where we have used $density = hazard × survival$ . NB If the censoring mechanism is independent (sometimes called non-informative) then we can ignore the second product on the right as it gives us no information about the event time. In the remainder of the course we will assume that the censoring mechanism is independent.

Demographic v. trial data

The time to event can literally be the age. In a clinical trial it will more typically be the time from admission to the trial. Slides show five patients A, B, C, D, E from a Sydney hospital pilot study, concerning treatment of bladder cancer. Each patient has their own zero time, the time at which the patient entered the study (accrual time). For each patient we record time to event of interest or censoring time, whichever is the smaller, and the status, δ = 1 if the event occurs and δ = 0 if the patient is censored.

Non-parametric estimators

If there are observations $x_{1:n}$ from a random sample then we define the empirical distribution

$\hat{F} (x)=\frac{1}{n} \odot \left\{ x_i:x_i \leq x \right\}$ .

This is appropriate if no censoring occurs. However if censoring occurs this has to be taken into account. We measure the pair (X, δ) where X = min(T; C) and δ is as before “if T<C , δ=1； if T>C， δ=0”. Suppose that the observations are $(x_i,\delta _i)$ for i = 1,2,...,n.

$L=\prod_{i}f(x_i)^{\delta _i}S(x_i)^{1-\delta _i}=\prod_{i}f(x_i)^{\delta _i}(1-f(x_i))^{1-\delta _i}$ .

What follows is a heuristic argument allowing us to find an estimator for S, the survival function, which in the likelihood sense is the best that we can do. Suppose that there are failure times (0 <) $<t_1<t_2<...<t_i<...$ . Let $s_{i1},s_{i2},...,s_{ic_i}$ be the censoring times within the interval $[t_i,t_{i+1})$ and suppose that there are $d_i$ failures at time $t_i$ . Then the likelihood function gets

$L=\prod_{fail}f(t_i)^{d_i}\prod_{i}[\prod_{k=1}^{c_i} (1-F(s_{ik}))]=\prod_{fail}(F(t_i)-F(t_{i-}))^{d_i} \prod_{i}[\prod_{k=1}^{c_i} (1-F(s_{ik}))]$

where we write $f(t_i)=F(t_i)-F(t_{i-})$ the difference in the cdf at time $t_i$ and the cdf immediately before it. This maximises L by considering the cdf F(t) to be a step function and therefore to come from a discrete distribution, with failure times as the actual failure times which occur. Then

$L=\prod_{fail}[F(t_i)-F(t_{i-1})]^{d_i}\prod_{i}[1-F(t_i)]^{c_i}$ .

Let us consider the discrete case and let Pr (fail at $t_i$ | survived to $t_{i^-}$ ) = $h_i$ . Then

$S(t_i)=1-F(t_i)=\prod_{1}^i (1-h_j)$ ; $f(t_i)=h_i\prod_{1}^{i-1}(1-h_j)$ .

Finally we have

$L=\prod_{t_i}h_{i}^{d_i} (1-h_i)^{n_i-d_I}$

where $n_i$ is the number at risk at time $t_i$ . This is usually referred to as the number in the risk set. Note $n_{i+1}+c_i+d_i=n_i$ .

Invented data set

Suppose that we have 10 observations in the data set with failure times as follows:

2; 5; 5; 6+; 7; 12; 14+; 14+; 14+; 14+

Here + indicates a censored observation. Then we can calculate both estimators for S(t) at all time points. It is considered unsafe to extrapolate much beyond the last time point, 14, even with a large data set.

Confidence Intervals

We need to find confidence intervals for the estimators of S(t) at each time point. We differentiate the log-likelihood and use likelihood theory,

$l=\sum_{i}d_ilogh_i+\sum_{i}(n_i-d_i)log(1-h_i)$ ,

differentiated twice to find the Hessian matrix $\left\{ \frac{\partial^2 l}{\partial h_i \partial h_j} \right\}$ .

Note that since $l$ is a sum of functions of each individual hazard the Hessian must be diagonal. The estimators $\left\{ \hat{h}_{1:n} \right\}$ are asymptotically unbiased and are asymptotically jointly normally distributed with approximate variance $I^{-1}$ , where the information matrix is given by

$I=E[-\left\{ \frac{\partial^2 l}{\partial h_i \partial h_j} \right\} ]$ .

Since the Hessian is diagonal, the covariances are all asymptotically zero, and coupled with asymptotic normality, this ensures that all pairs $\hat{h} _i, \hat{h} _j$ are asymptotically independent.

$-\frac{\partial^2 l}{\partial h_i^2} =\frac{d_i}{h_i^2} +\frac{n_i-d_i}{(1-h_i)^2}$ .

We use the observed information J and replace hi in the above by its estimator $\hat{h}_i=\frac{d_i}{n_i}$ . Hence we have $V(\hat{h}_i ) \approx \frac{d_i(n_i-d_i)}{n_i^3}$ .

Actuarial estimator

The actuarial estimator is a further estimator for S(t). It is given as

$S^*(t)=\prod_{t_i\leq t}[1-\frac{d_i}{n_i-\frac{1}{2} c_i} ]$ .

The intervals between consecutive failure times are usually of constant length, and it is generally used by actuaries and demographers following a cohort from birth to death. Age will normally be the time variable and hence the unit of time is 1 year.

Models: accelerated life & proportional hazards

We generally will have heterogeneous data where parameter estimates will be dependent on covariates measured for participants in a study. For example age or sex may have an effect on time to event. A simple example would be where participants fall into two groups such as treatment v. control, smoker v. non-smoker. There are two popular general classes of model as in the heading above — AL and PH.

Accelerated Life models

Suppose there are (several) groups, labelled by index i: The accelerated life model has a survival curve for each group defined by $S_i(t)=S_0(\rho _i t)$ where $S_0(t)$ is some baseline survival curve and $\rho _i$ is a constant specific to group i. If we plot $S_i$ against $logt$ , $i=1,2,...,k$ , then we expect to see a horizontal shift as

$S_i(t)=S_0 [e^{log\rho _i+logt}]$ .

Note too that each group has a different median lifetime, since, if $S_0(m) = 0.5$ ;

$S_i(\frac{m}{\rho _i} )=S_0(\rho _i \frac{m}{\rho _i} )=0.5$

giving a median for group i of $\frac{m}{\rho _i}$ . Similarly if the 100% quantile of the baseline survival function is $t_\alpha$ , then the 100% quantile of group i is $\frac{t_\alpha }{\rho _i}$ .

Proportional Hazards models

In this model we assume that the hazards in the various groups are proportional so that

$h_i(t)=\rho _ih_0(t)$

where $h_0(t)$ is the baseline hazard. Hence we see that

$S_i(t)=S_0(t)^{\rho _i}$ .

Taking logs twice we get

$log[-logS_i(t)]=log\rho _i+log[-logS_0(t)]$ .

So if we plot the RHS of the above equation against either t or log t we expect to see a vertical shift between groups.

Taking both models together it is clear that we should plot

$log[-log\hat{S}_i (t) ]$ against log t

as then we can check for AL and PH in one plot. Generally $\hat{S} _i$ will be calculated as the Kaplan-Meier estimator for group i, and the survival function estimator for each group will be plotted on the same graph.

(i) If the accelerated life model is plausible we expect to see a horizontal shift between groups.

(ii) If the proportional hazards model is plausible we expect to see a vertical shift between groups.

Regression in parametric AL models

In general studies each observation will have measured explanatory factors such as age, smoking status, blood pressure and so on. We need to incorporate these into a model using some sort of generalised regression. It is usual to do so by making $\rho$ a function of the explanatory variables.

For each observation (say individual in a clinical trial) we set the scale parameter $\rho =\rho (\beta .x)$ , where $\beta.x$ is a linear predictor composed of a vector $x$ of known explanatory variables and an unknown vector $\beta$ of parameters which will be estimated. The most common link function is

$log\rho = \beta .x$ , equivalently $\rho =e^{\beta .x}$ .

The idea is to mirror ordinary linear regression and find a baseline distribution which does not depend on $\rho$ ; similar to looking at the error term in least squares regression. To give a derivation we will restrict to the Weibull distribution, but similar arguments work for all AL parametric models. We have

Now let Y = α (log T + log ρ) and y = α (log t + log ρ) :

$Pr(Y>y)=S_Y(y)=S(t)=e^{-(\rho t)^\alpha }=exp(-e^y)$ .

The distribution of Y is independent of the parameters ρ and α: And in the case of the Weibull distribution its distribution is called the extreme value distribution and is as above. In general we will write

$logT=-log\rho + \frac{1}{\alpha } Y$

for all AL parametric models, and Y has a distribution in each case which is independent of the model parameters.

With real data (assuming right censoring only).

Censoring is assumed to be independent mechanism and is sometimes referred to as non-informative. The shape parameter α is assumed to be the same for each observation in the study. There are often very many covariates measured for each subject in a study. A row of data will have perhaps: response - event time $t_i$ , status $\delta _i$ (=1 if failure, =0 if censored) ; covariates - age, sex, systolic blood pressure, treatment, and a mixture of categorical variables and continuous variables amongst the covariates. Suppose that Weibull is a good fit. Then

$S(t)=e^{-(\rho t)^\alpha }$ and $\rho =e^{\beta .x}$ ;

$\beta .x=b_0+b_1x_{age}+b_2x_{sex}+b_3x_{sbp}+b_4x_{trt}$

where $b_0$ is the intercept and all regression coefficients bi are to be estimated, as well as estimating α. Note this model assumes that α is the same for each subject. We have not shown, but could have, interaction terms such as $x_{age}*x_{trt}$ . This interaction would allow a different effect of age according to treatment group.

Suppose subject $j$ has covariate vector $x_j$ and so scale parameter $\rho _j=e^{\beta .x_j}$ . This gives a likelihood

$L(\alpha ,\beta )=\prod_{j}(\alpha \rho _{j}^\alpha t _{j}^{\alpha -1} )^{\delta _j}e^{-(\rho _j t_j)^\alpha } =\prod_{j} (\alpha e^{\alpha \beta .x_j} t _{j}^{\alpha -1} )^{\delta _j}e^{-(e^{\beta .x_j}t_j)^\alpha }$ .

We can now look for mle’s for α and all components of the vector β; giving estimators $\hat{\alpha } ,\hat{\beta }$ together with their standard errors $V(\cdot )$ calculated from the observed information matrix. As already noted we can test for α = 1 using

$2 log\hat{L} _{weib}-2log\hat{L}_{exp} \sim \chi ^2(1)$ , asymptotically.

Packages allow for Weibull, log-logistic and log-normal models, sometimes others. In recent years, a semi-parametric model has been developed in which the baseline survival function $S_0$ is modelled non-parametrically, and each subject has time t scaled to $\rho _jt$ .

温莎日记 29

温莎日记 29

Censoring and truncation

Likelihood and Censoring

Demographic v. trial data

Non-parametric estimators

Confidence Intervals

Models: accelerated life & proportional hazards

相关阅读更多精彩内容

友情链接更多精彩内容