Censoring and truncation

Right censoring occurs when a subject leaves the study before an event occurs, or the study ends before the event has occurred. For example, we consider patients in a clinical trial to study the effect of treatments on stroke occurrence. The study ends after 5 years. Those patients who have had no strokes by the end of the year are censored. If the patient leaves the study at time ; then the event occurs in
. Left censoring is when the event of interest has already occurred before enrolment. This is very rarely encountered. Truncation is deliberate and due to study design. Right truncation occurs when the entire study population has already experienced the event of interest. Left truncation occurs when the subjects have been at risk before entering the study. Generally we deal with right censoring & sometimes left truncation.
Two types of independent right censoring:
Type I : completely random dropout and/or fixed time of end of study no event having occurred.
Type II: study ends when a fixed number of events amongst the subjects has occurred.
Likelihood and Censoring
If the censoring mechanism is independent of the event process, then we have an easy way of dealing with it. Suppose that T is the time to event and that C is the time to the censoring event. Assume that all subjects may have an event or be censored, say for subject i one of a pair of observations may be observed: Then since we observe the minimum time we would have the following expression for the likelihood (using independence)
.
Now define the following random variable: if T<C , δ=1; if T>C, δ=0.
For each subject we observe and
, observations from a continuous random variable and a binary random variable. In terms of these L becomes
where we have used . NB If the censoring mechanism is independent (sometimes called non-informative) then we can ignore the second product on the right as it gives us no information about the event time. In the remainder of the course we will assume that the censoring mechanism is independent.
Demographic v. trial data
The time to event can literally be the age. In a clinical trial it will more typically be the time from admission to the trial. Slides show five patients A, B, C, D, E from a Sydney hospital pilot study, concerning treatment of bladder cancer. Each patient has their own zero time, the time at which the patient entered the study (accrual time). For each patient we record time to event of interest or censoring time, whichever is the smaller, and the status, δ = 1 if the event occurs and δ = 0 if the patient is censored.
Non-parametric estimators
If there are observations from a random sample then we define the empirical distribution
.
This is appropriate if no censoring occurs. However if censoring occurs this has to be taken into account. We measure the pair (X, δ) where X = min(T; C) and δ is as before “if T<C , δ=1; if T>C, δ=0”. Suppose that the observations are for i = 1,2,...,n.
.
What follows is a heuristic argument allowing us to find an estimator for S, the survival function, which in the likelihood sense is the best that we can do. Suppose that there are failure times (0 <) . Let
be the censoring times within the interval
and suppose that there are
failures at time
. Then the likelihood function gets
where we write the difference in the cdf at time
and the cdf immediately before it. This maximises L by considering the cdf F(t) to be a step function and therefore to come from a discrete distribution, with failure times as the actual failure times which occur. Then
.
Let us consider the discrete case and let Pr (fail at | survived to
) =
. Then
;
.
Finally we have
where is the number at risk at time
. This is usually referred to as the number in the risk set. Note
.
Invented data set
Suppose that we have 10 observations in the data set with failure times as follows:
2; 5; 5; 6+; 7; 12; 14+; 14+; 14+; 14+
Here + indicates a censored observation. Then we can calculate both estimators for S(t) at all time points. It is considered unsafe to extrapolate much beyond the last time point, 14, even with a large data set.
Confidence Intervals
We need to find confidence intervals for the estimators of S(t) at each time point. We differentiate the log-likelihood and use likelihood theory,
,
differentiated twice to find the Hessian matrix .
Note that since is a sum of functions of each individual hazard the Hessian must be diagonal. The estimators
are asymptotically unbiased and are asymptotically jointly normally distributed with approximate variance
, where the information matrix is given by
.
Since the Hessian is diagonal, the covariances are all asymptotically zero, and coupled with asymptotic normality, this ensures that all pairs are asymptotically independent.
.
We use the observed information J and replace hi in the above by its estimator . Hence we have
.
Actuarial estimator
The actuarial estimator is a further estimator for S(t). It is given as
.
The intervals between consecutive failure times are usually of constant length, and it is generally used by actuaries and demographers following a cohort from birth to death. Age will normally be the time variable and hence the unit of time is 1 year.
Models: accelerated life & proportional hazards
We generally will have heterogeneous data where parameter estimates will be dependent on covariates measured for participants in a study. For example age or sex may have an effect on time to event. A simple example would be where participants fall into two groups such as treatment v. control, smoker v. non-smoker. There are two popular general classes of model as in the heading above — AL and PH.
Accelerated Life models
Suppose there are (several) groups, labelled by index i: The accelerated life model has a survival curve for each group defined by where
is some baseline survival curve and
is a constant specific to group i. If we plot
against
,
, then we expect to see a horizontal shift as
.
Note too that each group has a different median lifetime, since, if ;
giving a median for group i of . Similarly if the 100% quantile of the baseline survival function is
, then the 100% quantile of group i is
.
Proportional Hazards models
In this model we assume that the hazards in the various groups are proportional so that
where is the baseline hazard. Hence we see that
.
Taking logs twice we get
.
So if we plot the RHS of the above equation against either t or log t we expect to see a vertical shift between groups.
Taking both models together it is clear that we should plot
against log t
as then we can check for AL and PH in one plot. Generally will be calculated as the Kaplan-Meier estimator for group i, and the survival function estimator for each group will be plotted on the same graph.
(i) If the accelerated life model is plausible we expect to see a horizontal shift between groups.
(ii) If the proportional hazards model is plausible we expect to see a vertical shift between groups.
Regression in parametric AL models
In general studies each observation will have measured explanatory factors such as age, smoking status, blood pressure and so on. We need to incorporate these into a model using some sort of generalised regression. It is usual to do so by making a function of the explanatory variables.
For each observation (say individual in a clinical trial) we set the scale parameter , where
is a linear predictor composed of a vector
of known explanatory variables and an unknown vector
of parameters which will be estimated. The most common link function is
, equivalently
.
The idea is to mirror ordinary linear regression and find a baseline distribution which does not depend on ; similar to looking at the error term in least squares regression. To give a derivation we will restrict to the Weibull distribution, but similar arguments work for all AL parametric models. We have
Now let Y = α (log T + log ρ) and y = α (log t + log ρ) :
.
The distribution of Y is independent of the parameters ρ and α: And in the case of the Weibull distribution its distribution is called the extreme value distribution and is as above. In general we will write
for all AL parametric models, and Y has a distribution in each case which is independent of the model parameters.

With real data (assuming right censoring only).
Censoring is assumed to be independent mechanism and is sometimes referred to as non-informative. The shape parameter α is assumed to be the same for each observation in the study. There are often very many covariates measured for each subject in a study. A row of data will have perhaps: response - event time , status
(=1 if failure, =0 if censored) ; covariates - age, sex, systolic blood pressure, treatment, and a mixture of categorical variables and continuous variables amongst the covariates. Suppose that Weibull is a good fit. Then
and
;
where is the intercept and all regression coefficients bi are to be estimated, as well as estimating α. Note this model assumes that α is the same for each subject. We have not shown, but could have, interaction terms such as
. This interaction would allow a different effect of age according to treatment group.
Suppose subject has covariate vector
and so scale parameter
. This gives a likelihood
.
We can now look for mle’s for α and all components of the vector β; giving estimators together with their standard errors
calculated from the observed information matrix. As already noted we can test for α = 1 using
, asymptotically.
Packages allow for Weibull, log-logistic and log-normal models, sometimes others. In recent years, a semi-parametric model has been developed in which the baseline survival function is modelled non-parametrically, and each subject has time t scaled to
.