温莎日记 17

Cox’s semiparametric proportional hazard model： larynx

Cox (1972) proposed the following model: $h(t|x)=e^{\beta x} h_0(t)$ , where $h_0(t)=h(t|0)$ is an arbitrary baseline hazard function.

Since $e^{\beta x}=\frac{h(t|x)}{h(t|0)}$ for all values of x, the hazard functions for different values of x are proportional to one another, and $e^\beta = \frac{h(t|x+1)}{h(t|x)}$ is the hazard ratio corresponding to an unit increase in the value of x.

Cox model is semi-parametric because $h_0(t)$ is arbitrary, but the effect of x depends on the parameter through $e^{\beta x}$ . How to estimate β for Cox model?

$\rightarrow$ Case 1: No censoring, no ties.

Use rank likelihood, i.e, under exponential PH model, the log rank likelihood is still given by

$l_{rank} (\beta ) = \sum\nolimits_{j=1}^{n-1} \left\{ \beta x_{(j)} -ln(\sum\nolimits_{i=j}^n e^{\beta x_{(i)}} )\right\}$ .

$\rightarrow$ Case 2: Right-censored data, only 1 death per timepoint.

The rank likelihood method can be generalized by summing over all possible rank vectors that are compatible with the observed censored data.

The generalized rank likelihood:

$lik_{partial}(\beta )=\prod_{j=1}^k \frac{e^{\beta x_{(j)}}}{\Sigma _{i:t_i \geq t_{(j)}}e^{\beta x_{(i)}}} =\prod_{j=1}^k \frac{e^{\beta x_{(j)}}}{\Sigma _{i\in R(t_{(j)})}e^{\beta x_{(i)}}}$ .

There is another way (the partial likelihood approach) to derive the same expression with the advantage that it shows explicitly how the unknown baseline hazard function $h_0(t)$ is being eliminated from the likelihood. Again we will illustrate using the hypothetical example:

The general form of the partial likelihood, if all $d_j = 1$ , is

$lik_{partial}(\beta )=\prod_{j=1}^k \frac{e^{\beta x_{(j)}}}{\Sigma _{i\in R(t_{(j)})}e^{\beta x_{(i)}}} =lik_{rank}$ .

$\rightarrow$ Case 3: Data right-censored, $d_j > 1$ for some j.

How to handle ties in partial likelihood? There are 3 methods.

$\odot$ Method 1 (Breslow’s method): Let $D_j$ be the death set at time $t_{(j)}$ , i.e. , the set of $d_j$ persons who die at time $t_{(j)}$ , and $S_j=\sum\nolimits_{l\in D_j} x_l$ the sum of $x_l$ over the death set $D_j$ .

$lik_{partial}(\beta )=\prod_{j=1}^k \prod_{l \in D_j} \frac{e^{\beta x_l} }{\sum\nolimits_{i\in R(t_{(j)})e^{\beta x_i} }} =\prod_{j=1}^k \frac{e^{\beta S_j} }{(\sum\nolimits_{i\in R(t_{(j)})e^{\beta x_i} })^{d_j}}$

$\otimes$ Method 2 (Efron’s method): By thinking of $d_j$ tied failure times as $d_j$ distinct but infinitesimally close failure times, Efron’s approximation differs from Breslow’s method in how they treat 'coxph' denominator. Efrons method is more computationally intensive.

$lik_{partial} (\beta )=\prod\nolimits_{j=1}^k e^{\beta S_j} [A \times B\times \cdot \cdot \cdot \times C]^{-1}$ , where we have the clarifications

$A= \sum\nolimits_{i \in R(t_{(j)})} e^{\beta x_i}$ , $B=\sum\nolimits_{i \in R(t_{(j)})} e^{\beta x_i}-\frac{1}{d_j} \Sigma _{i\in D_j}e^{\beta x_i}$ , $C=\sum\nolimits_{i \in R(t_{(j)})} e^{\beta x_i}-\frac{d_j-1}{d_j} \Sigma _{i\in D_j}e^{\beta x_i}$ .

$\oplus$ Method 3 (Exact method): Method 3 is the most computing intensive. If the amount of ties is not excessive, the 3 methods should give similar results.

$lik_{partial}(\beta )=\prod_{k=1}^k \left\{ \prod\nolimits_{i \in D_j} e^{\beta x_i} \right\} \cdot \left\{ \Sigma _{Q\subset R(t_{(j)});\vert Q \vert =d_j } \Pi _{i\in Q}e^{\beta x_i}\right\} ^{-1}=\prod_{j=1}^k \left\{ e^{\beta S_i} \right\} \cdot \left\{ \Sigma _{Q\subset R(t_{(j)});\vert Q \vert =d_j } \Pi _{i\in Q}e^{\beta x_i}\right\} ^{-1}$

Department of Statistics and Applied Probability: stayz@nus.edu.sg

Laryngeal cancer example

Hypotheses testing:

$（1）$ overall test H: $\beta _1=\beta _2=\beta _3=\beta _4=0$ ,

LRT=18.3 on 4 df (p=0.001)

$（2）$ testing effect of age only, i.e. H: $\beta _4=0$ ; Wald test: $Z=\frac{\hat\beta_4 -0 }{SE(\hat\beta_4 )} =1.33$ , LRT=2{-187.7074-(-188.6208)}=1.8268 on 1 df (p=0.1765)

$（3）$ testing effect of stage of cancer only,i.e. H: $\beta _1=\beta _2=\beta _3=0$ ,

LRT=2{-187.7074-(-195.5478)}=15.68 on 3 df (p=0.0013)

How to estimate baseline survival function under Cox model?

Interaction between a factor and a continuous variable:

Recall on p.23: we entered stage of cancer (II, III, IV vs I) and age at diagnosis to Cox model and saw that the effect of age on survival is not statistically significant (p=0.18). K&M added the interaction terms between stage of cancer and age to the model (read Tables 8.3 and 8.4 of K&M). The only significant interaction term is between Z1 (stage II vs I) and age. This means that the hazard ratio or relative risk of dying for a stage II patient relative to a stage I patient with disease diagnosed at the same age depends on age.

Interactions between continuous variables can be studied by addingproduct terms to Cox model.

Using the estimates reported in Table 8.4 of K&M,

$\frac{h(t|stage II,age)}{h(t|stageI,age)} =\frac{h(t|z_1=1,z_2=0,z_3=0,z_4=age,z_5=z_1age)}{h(t|z_1=0,z_2=0,z_3=0,z_4=age,z_5=z_1age)} =e^{\beta _1+\beta _5 age}=e^{-7.382+0.1117age}$

$\frac{h(t|stage II,age=60)}{h(t|stageI,age=60)}=e^{-7.382+0.1117(60)}=e^{1.1072}=3.103$ .

温莎日记 17

温莎日记 17

Cox’s semiparametric proportional hazard model： larynx

Laryngeal cancer example

相关阅读更多精彩内容

友情链接更多精彩内容