温莎日记 17

Cox’s semiparametric proportional hazard model: larynx

Cox (1972) proposed the following model: h(t|x)=e^{\beta x} h_0(t), where h_0(t)=h(t|0) is an arbitrary baseline hazard function. 

Since e^{\beta x}=\frac{h(t|x)}{h(t|0)}  for all values of x, the hazard functions for different values of x are proportional to one another, and e^\beta = \frac{h(t|x+1)}{h(t|x)}  is the hazard ratio corresponding to an unit increase in the value of x. 

Cox model is semi-parametric because h_0(t) is arbitrary, but the effect of x depends on the parameter through e^{\beta  x}How to estimate β for Cox model?

\rightarrow  Case 1: No censoring, no ties.

Use rank likelihood, i.e, under exponential PH model,  the log rank likelihood is still given by

l_{rank} (\beta ) = \sum\nolimits_{j=1}^{n-1} \left\{ \beta x_{(j)} -ln(\sum\nolimits_{i=j}^n e^{\beta x_{(i)}} )\right\} .

\rightarrow  Case 2: Right-censored data, only 1 death per timepoint.

The rank likelihood method can be generalized by summing over all possible rank vectors that are compatible with the observed censored data.

The generalized rank likelihood:

lik_{partial}(\beta )=\prod_{j=1}^k \frac{e^{\beta x_{(j)}}}{\Sigma _{i:t_i \geq t_{(j)}}e^{\beta x_{(i)}}} =\prod_{j=1}^k \frac{e^{\beta x_{(j)}}}{\Sigma _{i\in R(t_{(j)})}e^{\beta x_{(i)}}} .

There is another way (the partial likelihood approach) to derive the same expression with the advantage that it shows explicitly how the unknown baseline hazard function h_0(t) is being eliminated from the likelihood. Again we will illustrate using the hypothetical example:

The general form of the partial likelihood, if all d_j = 1, is

lik_{partial}(\beta )=\prod_{j=1}^k \frac{e^{\beta x_{(j)}}}{\Sigma _{i\in R(t_{(j)})}e^{\beta x_{(i)}}} =lik_{rank}.

\rightarrow  Case 3: Data right-censored, d_j > 1 for some j.

How to handle ties in partial likelihood? There are 3 methods.

\odot Method 1 (Breslow’s method): Let D_j be the death set at time t_{(j)} , i.e. , the set of d_j persons who die at time t_{(j)}, and S_j=\sum\nolimits_{l\in  D_j} x_l the sum of x_l over the death set D_j

lik_{partial}(\beta )=\prod_{j=1}^k \prod_{l \in D_j} \frac{e^{\beta x_l} }{\sum\nolimits_{i\in  R(t_{(j)})e^{\beta x_i} }} =\prod_{j=1}^k   \frac{e^{\beta S_j} }{(\sum\nolimits_{i\in  R(t_{(j)})e^{\beta x_i} })^{d_j}}

\otimes  Method 2 (Efron’s method): By thinking of d_j tied failure times as d_jdistinct but infinitesimally close failure times, Efron’s approximation differs from Breslow’s method in how they treat 'coxph' denominator. Efrons method is more computationally intensive.

lik_{partial} (\beta )=\prod\nolimits_{j=1}^k e^{\beta S_j} [A \times   B\times  \cdot \cdot \cdot   \times C]^{-1}, where we have the clarifications

A= \sum\nolimits_{i \in  R(t_{(j)})} e^{\beta  x_i},  B=\sum\nolimits_{i \in  R(t_{(j)})} e^{\beta  x_i}-\frac{1}{d_j} \Sigma _{i\in  D_j}e^{\beta x_i}C=\sum\nolimits_{i \in  R(t_{(j)})} e^{\beta  x_i}-\frac{d_j-1}{d_j} \Sigma _{i\in  D_j}e^{\beta x_i}.

\oplus  Method 3 (Exact method): Method 3 is the most computing intensive. If the amount of ties is not excessive, the 3 methods should give similar results.

lik_{partial}(\beta )=\prod_{k=1}^k \left\{ \prod\nolimits_{i \in  D_j} e^{\beta  x_i}  \right\}  \cdot \left\{ \Sigma _{Q\subset  R(t_{(j)});\vert Q \vert =d_j } \Pi _{i\in Q}e^{\beta x_i}\right\} ^{-1}=\prod_{j=1}^k \left\{ e^{\beta  S_i}  \right\}  \cdot \left\{ \Sigma _{Q\subset  R(t_{(j)});\vert Q \vert =d_j } \Pi _{i\in Q}e^{\beta x_i}\right\} ^{-1}

Department of Statistics and Applied Probability: stayz@nus.edu.sg

Laryngeal cancer example

Hypotheses testing:

(1)overall test H: \beta _1=\beta _2=\beta _3=\beta _4=0

LRT=18.3 on 4 df (p=0.001)

(2)testing effect of age only, i.e. H: \beta _4=0; Wald test: Z=\frac{\hat\beta_4 -0 }{SE(\hat\beta_4 )} =1.33, LRT=2{-187.7074-(-188.6208)}=1.8268 on 1 df (p=0.1765)

(3)testing effect of stage of cancer only,i.e. H: \beta _1=\beta _2=\beta _3=0

LRT=2{-187.7074-(-195.5478)}=15.68 on 3 df (p=0.0013)

How to estimate baseline survival function under Cox model?

Interaction between a factor and a continuous variable:

Recall on p.23: we entered stage of cancer (II, III, IV vs I) and age at diagnosis to Cox model and saw that the effect of age on survival is not statistically significant (p=0.18). K&M added the interaction terms between stage of cancer and age to the model (read Tables 8.3 and 8.4 of K&M). The only significant interaction term is between Z1 (stage II vs I) and age. This means that the hazard ratio or relative risk of dying for a stage II patient relative to a stage I patient with disease diagnosed at the same age depends on age.

Interactions between continuous variables can be studied by addingproduct terms to Cox model.

Using the estimates reported in Table 8.4 of K&M,

\frac{h(t|stage II,age)}{h(t|stageI,age)} =\frac{h(t|z_1=1,z_2=0,z_3=0,z_4=age,z_5=z_1age)}{h(t|z_1=0,z_2=0,z_3=0,z_4=age,z_5=z_1age)} =e^{\beta _1+\beta _5 age}=e^{-7.382+0.1117age}

\frac{h(t|stage II,age=60)}{h(t|stageI,age=60)}=e^{-7.382+0.1117(60)}=e^{1.1072}=3.103.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容