倾向性评分匹配完整实例(R实现)

倾向性评分匹配(propensity score matching, PSM)主要是在随机对照试验(Randomized controlled trials,RCT)中用于衡量treat组和control组样本的其他各项特征(如年龄、体重、身高、人种等)的整体均衡性的度量。比如说研究一种药物对疾病的影响,在临床实验中,treat组和control组除了使用药物(安慰剂)不同外,其他的临床特征(如年龄、体重等)都应该基本是相似的,这样treat和control组才有可比性,进而才能验证药物的有效性。

如下图所示,该治疗方法实际上是无效的,但是由于分组中年龄的不平衡导致得出错误的结论。


    对于衡量同一特征的组间差异或者距离,我们通常使用标准均值误差(Standardized mean difference,SMD,PMC3144483)。
对于连续性特征变量公式如下:


对于离散型特征变量,公式如下:

下面我们使用tableone包来统计每个变量的标准均值误差SMD,数据来源是右心导管插入(right heart catheterization, RHC)数据,treat组是"RHC",control是"No RHC"(变量 swang1)

library(tableone)
## PS matching
library(Matching)
## Weighted analysis
library(survey)
library(reshape2)
library(ggplot2)
## Right heart cath dataset
rhc <- read.csv("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/rhc.csv")
# 待统计的协变量:
vars <- c("age","sex","race","edu","income","ninsclas","cat1","das2d3pc","dnr1",
          "ca","surv2md1","aps1","scoma1","wtkilo1","temp1","meanbp1","resp1",
          "hrt1","pafi1","paco21","ph1","wblc1","hema1","sod1","pot1","crea1",
          "bili1","alb1","resp","card","neuro","gastr","renal","meta","hema",
          "seps","trauma","ortho","cardiohx","chfhx","dementhx","psychhx",
          "chrpulhx","renalhx","liverhx","gibledhx","malighx","immunhx",
          "transhx","amihx")
## Construct a table
tabUnmatched <- CreateTableOne(vars = vars, strata = "swang1", data = rhc, test = FALSE)
## Show table with SMD
print(tabUnmatched, smd = TRUE)

下图是统计结果,第一列是协变量,第二列是按照有无RHC(treat、contol组)各变量的统计值(mean和SD),最后一列是SMD,可以看出treat和control组的age和sex差异都较小(<10%),income较高(>10%)。


接下来计算通过logit回归计算每个样本的倾向性分数(Propensity Score, PS),也就是被分配为RHC的概率.

## Fit model
psModel <- glm(formula = swang1 ~ age + sex + race + edu + income + ninsclas +
                 cat1 + das2d3pc + dnr1 + ca + surv2md1 + aps1 + scoma1 +
                 wtkilo1 + temp1 + meanbp1 + resp1 + hrt1 + pafi1 +
                 paco21 + ph1 + wblc1 + hema1 + sod1 + pot1 + crea1 +
                 bili1 + alb1 + resp + card + neuro + gastr + renal +
                 meta + hema + seps + trauma + ortho + cardiohx + chfhx +
                 dementhx + psychhx + chrpulhx + renalhx + liverhx + gibledhx +
                 malighx + immunhx + transhx + amihx,
                 family  = binomial(link = "logit"),
                 data    = rhc)

## Predicted probability of being assigned to RHC
rhc$pRhc <- predict(psModel, type = "response")
## Predicted probability of being assigned to no RHC
rhc$pNoRhc <- 1 - rhc$pRhc
## Predicted probability of being assigned to the treatment actually assigned (either RHC or no RHC)
rhc$pAssign <- NA
rhc$pAssign[rhc$swang1 == 1]    <- rhc$pRhc[rhc$swang1   == 1]
rhc$pAssign[rhc$swang1 == 0] <- rhc$pNoRhc[rhc$swang1 == 0]
## Smaller of pRhc vs pNoRhc for matching weight
rhc$pMin <- pmin(rhc$pRhc, rhc$pNoRhc)

然后使用Matching包进行匹配一致性样本(1:1匹配)。

listMatch <- Match(Tr       = (rhc$swang1 == 1),      # Need to be in 0,1
                   ## logit of PS,i.e., log(PS/(1-PS)) as matching scale
                   X        = log(rhc$pRhc / rhc$pNoRhc),
                   ## 1:1 matching
                   M        = 1,
                   ## caliper = 0.2 * SD(logit(PS))
                   caliper  = 0.2,
                   replace  = FALSE,
                   ties     = TRUE,
                   version  = "fast")
# determining if balance exists in any unmatched dataset and in matched datasets
mb <- MatchBalance(psModel$formula, data=rhc, match.out=listMatch, nboots=50)

将匹配的样本提取出来:

rhcMatched <- rhc[unlist(listMatch[c("index.treated","index.control")]), ]

再看下现在匹配后的SMD,现在所有变量的SMD都小于10%了。

## Construct a table
tabMatched <- CreateTableOne(vars = vars, strata = "swang1", data = rhcMatched, test = FALSE)
## Show table with SMD
print(tabMatched, smd = TRUE)


然后给样本进行加权,使得各组中的倾向性评分基本一致,进而消除混杂因素,作为标准平衡数据参考。一般有两种加权方法:逆概率处理加权法(the inverse probability of treatment weighting,IPTW)和标准化死亡比加权法(the standardized mortality ratio weighting,SMRW),本次我们是有IPTW的进阶版(PMID:26238958):

## Matching weight
rhc$mw <- rhc$pMin / rhc$pAssign
# IPTW:
rhc$mw1=ifelse(rhc$swang1==1,1/(rhc$pRhc),1/(1-rhc$pRhc))

## Weighted data
rhcSvy <- svydesign(ids = ~ 1, data = rhc, weights = ~ mw)
## Construct a table (This is a bit slow.)
tabWeighted <- svyCreateTableOne(vars = vars, strata = "swang1", data = rhcSvy, test = FALSE)
## Show table with SMD
print(tabWeighted, smd = TRUE)

加权后变量组间差异(很小):


进行作图比较 Unmatched、Mathced和Weighted结果:

library(data.table)
## Construct a data frame containing variable name and SMD from all methods
dataPlot <- data.table(variable  = rownames(ExtractSmd(tabUnmatched)),
                       Unmatched = ExtractSmd(tabUnmatched),
                       Matched   = ExtractSmd(tabMatched),
                       Weighted  = ExtractSmd(tabWeighted))
colnames(dataPlot) <- c("variable","Unmatched","Matched","Weighted")
## Create long-format data for ggplot2
dataPlotMelt <- melt(data          = dataPlot,
                     id.vars       = c("variable"),
                     variable.name = "Method",
                     value.name    = "SMD")

## Order variable names by magnitude of SMD
varNames <- as.character(dataPlot$variable)[order(dataPlot$Unmatched)]

## Order factor levels in the same order
dataPlotMelt$variable <- factor(dataPlotMelt$variable,
                                levels = varNames)

## Plot using ggplot2
ggplot(data = dataPlotMelt, mapping = aes(x = variable, y = SMD,
                                          group = Method, color = Method)) +
  geom_line() +
  geom_point() +
  geom_hline(yintercept = 0.1, color = "black", size = 0.1) +
  coord_flip() +
  theme_bw() + theme(legend.key = element_blank())


可以看出加权后的"标准数据"和我们PSM后的结果基本是一致的。最后还看看右心导管插不插对于生存是否有影响,使用ShowRegTable函数计算风险比(hazard ratio,HR)[95% CI)]和pvalue。

## Unmatched model (unadjsuted)
glmUnmatched <- glm(formula = (death == "Yes") ~ swang1,
                    family  = binomial(link = "logit"),
                    data    = rhc)
## Matched model
glmMatched <- glm(formula = (death == "Yes") ~ swang1,
                  family  = binomial(link = "logit"),
                  data    = rhcMatched)
## Weighted model
glmWeighted <- svyglm(formula = (death == "Yes") ~ swang1,
                      family  = binomial(link = "logit"),
                      design    = rhcSvy)

## Show results together
resTogether <- list(Unmatched = ShowRegTable(glmUnmatched, printToggle = FALSE),
                    Matched   = ShowRegTable(glmMatched, printToggle = FALSE),
                    Weighted  = ShowRegTable(glmWeighted, printToggle = FALSE))
print(resTogether, quote = FALSE)

参考:
https://cran.r-project.org/web/packages/tableone/vignettes/smd.html
https://www.mediecogroup.com/method_topic_article_detail/131/


更多原创精彩视频敬请关注生信杂谈:

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 215,463评论 6 497
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,868评论 3 391
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 161,213评论 0 351
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,666评论 1 290
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,759评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,725评论 1 294
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,716评论 3 415
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,484评论 0 270
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,928评论 1 307
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,233评论 2 331
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,393评论 1 345
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,073评论 5 340
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,718评论 3 324
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,308评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,538评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,338评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,260评论 2 352