广义线性模型Proc GENMOD。SAS中通过不同命令dist=和link=来分别定义分布和连接函数。
SAS中给出的常用的dist和对应的link:
Traditional Linear Model
- response variable: a continuous variable
- distribution: normal
- link function: identity, g(μ)=μ
(等同于一般线性模型,此时MLE估计=OLS估计)
Logistic Regression
- response variable: a propotion
- distribution: binomial
- link function: logit, g(μ)=log(μ/(1-μ))
(等同于一般线性模型,此时MLE估计=OLS估计)
Poisson Regression in Log-Linear Model
- response variable: a count
- distribution: Poisson
- link function: log, g(μ)=log(μ)
除了上述的固定搭配外,也可以定义不同的分布对应不同的链接函数。
不同Dist、Link对应的不同模型及率的估计:
在计算校正OR、RR、RD时,有以下几个选择。
---Genmod模型
- (a) Identity Link model:
估计率即为π,因为是恒等模型,可能会出现π不在(0,1)范围内,使目标率出现不合理的估计值。 - (b) Log Link model:
可能会出现>1的估计值。 - (c) Logit Link model:
,率差不能简单相减(即β参数)获得,因为此时率的估计中包含了截距和协变量,无法直接减掉。
-OR
proc genmod descending;
model death=receptor stage2 stage3/dist=bin link=logit;
estimate ‘OR receptor low vs. high’ receptor 1/exp;
estimate ‘OR stage2 vs stage1’ stage2 1/exp;
estimate ‘OR stage 3 vs stage1’ stage3 1/exp;
/* 此时模型为log(P/(1-p))=L'β */
/* 估计出差值后通过exp反变换 */
/* EXP:request exp(L'β), its standard error, and its confidence limits be computed - sas help */
run;
RR
proc genmod descending;
model death=receptor stage2 stage3/dist=bin link=log;
estimate ‘RR receptor low vs. high’ receptor 1/exp;
estimate ‘RR stage2 vs stage1’ stage2 1/exp;
estimate ‘RR stage 3 vs stage1’ stage3 1/exp;
/* 此时模型为log(P)=L'β*/
/* 线性参数估计值β为log(RR) */
/* 估计出差值后通过exp反变换 */
/* EXP:request exp(L'β), its standard error, and its confidence limits be computed - sas help */
run;
- RD:
proc genmod descending;
model death=receptor stage2 stage3/dist=bin link=identity;
estimate ‘RD receptor low vs. high’ receptor 1;
estimate ‘RD stage2 vs stage1’ stage2 1;
estimate ‘RD stage 3 vs stage1’ stage3 1;
/* 此时模型为P=L'β */
/* 此时线性参数的估计值β即为RD的估计值 */
run;
BINOMIAL REGRESSION IN GLIM: ESTIMATING RISK RATIOS AND RISK DIFFERENCES
模型不收敛时的方法可参考Model choices to obtain adjusted risk difference estimates from a binomial regression model with convergence
problems: An assessment of methods of adjusted risk difference estimation
模型不能收敛时dist=bin替换为dist=possion.
---Logistic模型
数据--SAS help-logistic examples02
OR通过logistic模型直接估计。
proc logistic data=temp;
class treatment (ref='0')/param=ref;
model pain (event='Yes')=gender age treatment;
run;
与
proc genmod data=temp;
class treatment (ref='0')/param=ref;
model pain1=gender age treatment/dist=bin link=logit;
run;
结果基本相同。-
RD的估计值则是所有个体取x1=1与x1=0时差值的平均值,假设x1为组别变量(1=试验组,2=对照组),则RD为所有受试者均作为试验组的率-所有受试者均作为对照组的率(不管初始分配的组别)。
可以通过logistic得到各个后计算均值即为RD
e.g.
proc logistic data=temp;
class treatment (ref='0')/param=ref;
model pain (event='Yes')=gender age treatment;
output out=pred_p p=p;
run
在计算OR的logistic回归的程序上输出predicted probability.
;
subject=1 -> gender=0,Age=68,treatment=0 -> p=0.578
subject=2 -> gender=0,Age=67,treatment=0 -> p=0.5355
... ...
所有受试者的预测值计算RD。同样的数据用Genmod来计算就会出现不收敛的问题。
RD的标准误:对于identity link,可以直接通过模型得到。对于Log 和 Logit Link,则需要通过delta method来计算,需要用到SAS/IML。
参考文献:1. An Illustration of Rate Difference Estimation with SAS in Logistic Regression (Delta Method with IML)。2. Performance of models for estimating absolute risk difference in multicenter trials with binary outcome。3. Covariate-Adjusted Difference in Proportions from Clinical Trials Using Logistic Regression and Weighted Risk Differences