断点回归

基本设定

x_{i}是驱动变量running variable,可以和结果变量有关也可以无关。

c是断点(cutoff )

D_{i}是处理变量,处理变量完全依赖于驱动变量:

D_{i}=\left\{\begin{array}{ll}{1} & {\text { if } x_{i}<c} \\ {0} & {\text { if } x_{i} \geq c}\end{array}\right.

断点回归在断点邻域处,样本是否被处理,仿佛被“上帝之手”给控制,成为一个准实验。

局部平均处理效应(local average treatment effect,LATE)

\begin{aligned} \mathrm{LATE} & \equiv \mathrm{E}\left(y_{1 i}-y_{0 i} | x=c\right) \\ &=\mathrm{E}\left(y_{1 i} | x=c\right)-\mathrm{E}\left(y_{0 i} | x=c\right) \\ &=\lim _{x \rightarrow c+} \mathrm{E}\left(y_{1 i} | x\right)-\lim _{x \rightarrow c-} \mathrm{E}\left(y_{0 i} | x\right) \end{aligned}

设定treat和not treat不同的截距(D斜率[\gamma\left(x_{i}-c\right) D_{i}];其中截距\delta就是LATE

y_{i}=\alpha+\beta\left(x_{i}-c\right)+\delta D_{i}+\gamma\left(x_{i}-c\right) D_{i}+\varepsilon_{i}

基本假设
  1. 断点假设:断点处个体被分配的概率存在跳跃
  2. 连续性假设:结果变量与驱动变量之间的关系在所有点都连续的
  3. 局部随机化假设:
    \left(Y_{1 i}, Y_{0 i}\right) \perp D_{i} | X_{i} \in \delta\left(x_{0}\right)
  4. 独立性假设:潜在结果和干预在断点处独立于驱动变量X
    \left(Y_{1 i}, Y_{0 i}\right), D_{t i}(x), D_{0 i}(x) \perp X_{i}, X_{i} \in \delta\left(x_{0}\right)

精确断点回归

定义

在断点处,个体得到处理的概率从0跳跃到1

存在两个问题
  • 如果回归函数包含高次项,会导致遗漏变量偏差
  • 断点回归是局部的随机实验
两个解决方法
  1. 加入高次项。

2.限定x的取值范围 (c-h,c+h);这里的h就是带宽

得到下式:
\begin{aligned} y_{i}=& \alpha+\beta_{1}\left(x_{i}-c\right)+\delta D_{i}+\gamma_{1}\left(x_{i}-c\right) D_{i} \\ &+\beta_{2}\left(x_{i}-c\right)^{2}+\gamma_{2}\left(x_{i}-c\right)^{2} D_{i}+\varepsilon_{i} \quad(c-h<x<c+h) \end{aligned}

其中\delta为对LATE的估计量,可以使用稳健标准误克服异方差。

令人头疼的最优带宽——如何求解

h小,也许会精确,但由于点过少可能方差会变大
h大,方差也许会变小,但包含了过多离x=c较远的点导致偏差变大

现在一般流行使用非参数的方法求最优带宽

\min _{h} \mathrm{E}\left\{\left[\hat{m}_{1}(c)-m_{1}(c)\right]^{2}+\left[\hat{m}_{0}(c)-m_{0}(c)\right]^{2}\right\}

其中,m_{1}(x) \equiv \mathrm{E}\left(y_{1} | x\right), m_{0}(x) \equiv \mathrm{E}\left(y_{0} | x\right)\\\delta=m_{1}(c)-m_{0}(c),\hat{\delta}=\hat{m}_{1}(c)-\hat{m}_{0}(c)

核函数求解

\min _{ | \alpha, \beta, \delta, y \}} \sum_{i=1}^{n} K\left[\left(x_{i}-c\right) / h\right]\left[y_{i}-\alpha-\beta\left(x_{i}-c\right)-\delta D_{i}-\gamma\left(x_{i}-c\right) D_{i}\right]^{2}

其中K(\cdot)是核函数(如三角核)。后面一部分就是一个残差;而前面的中括号就是权重。局部线性回归,在小临域内进行加权最小二乘,权重由核函数决定,离c约近权重越大。

协变量的选择

影响y的其他协变量,好处在于减少扰动项方差。坏处

  • 如果内生会干扰估计
  • 如果协变量也出现跳跃,则直接出现偏误,因此通常需要验证协变量的条件密度函数是否在断点跳跃。
汇报包括

(1)三角核与矩形核的局部线性回归结果
(2)汇报不同带宽的结果,最优带宽,二分之一带宽,两倍
(3)汇报协变量和不包含协变量的情形
(4)检验模型设定检验,检验分组变量与协变量的条件密度是否在断点处连续

模糊断点回归

\begin{array}{l}{E(y | x)=E\left(y_{0} | x\right)+E\left[D\left(y_{1}-y_{0}\right) | x\right]} \\ {\quad=E\left(y_{0} | x\right)+E(D | x) \cdot E\left[\left(y_{1}-y_{0}\right) | x\right]}\end{array}

平均处理效应
L A T E \equiv E\left[\left(y_{1}-y_{0}\right) | x=c\right]=\frac{\lim _{x \downarrow c} E(y | x)-\lim _{x \uparrow c} E(y | x)}{\lim _{x \downarrow c} E(D | x)-\lim _{x \uparrow c} E(D | x)}

RDD的一般步骤

(1)图形分析,

  • Y和X关系图,Y和X有没有断点(rdplot),一般是每个带宽之间的结果变量取一个平均值。
  • 协变量和X关系图,协变量和X有没有断点。
  • 驱动变量X的分布图,看其在驱动变量左右有没有明显的跳跃。

(2)因果效应估计

  • 边界非参数回归(比较少用)
  • 局部线性回归
    *局部多项式回归

(3)稳健性检验:协变量连续性检验(对每个协变量做一下断点回归)、参考变量分布连续性检验(McCrary)、伪断点回归(左右带宽的中间)、带宽敏感性检验(换不同的带宽)

软件实现

*断点图
rdplot depvar runvar [if] [in] [, c(cutoff) p(pvalue) kernel(kernelfn)]
*最优带宽选择
rdbwselect depvar runvar [if] [in] [, c(cutoff) p(pvalue) q(qvalue) deriv(dvalue) fuzzy(fuzzyvar [sharpbw]) covs(covars) kernel(kernelfn) weights(weightsvar)
                   bwselect(bwmethod) scaleregul(scaleregulvalue) vce(vcemethod) all]
/*
 c(cutoff) specifies the RD cutoff.  The default is c(0).

    p(pvalue) specifies the order of the local polynomial used to construct the point estimator.  The default is p(1) (local linear regression).

    q(qvalue) specifies the order of the local polynomial used to construct the bias correction.  The default is q(2) (local quadratic regression).

    deriv(dvalue) specifies the order of the derivative of the regression functions to be estimated.  The default is deriv(0) (sharp RD, or fuzzy RD if fuzzy() is also
        specified).  Setting deriv(1) results in estimation of a kink RD design (up to scale) or a fuzzy kink RD if fuzzy() is also specified.

    fuzzy(fuzzyvar [sharpbw]) specifies the treatment status variable used to implement fuzzy RD estimation (or fuzzy kink RD if deriv(1) is also specified).  The
        default is sharp RD design.  If the sharpbw option is set, the fuzzy RD estimation is performed using a bandwidth selection procedure for the sharp RD model.
        This option is automatically selected if there is perfect compliance at either side of the threshold.

    covs(covars) specifies additional covariates to be used for estimation and inference.

    kernel(kernelfn) specifies the kernel function used to construct the local polynomial estimators.  kernelfn may be triangular, epanechnikov, or uniform.  The
        default is kernel(triangular).

*/
*断点回归估计
rdrobust depvar runvar [if] [in] [, c(cutoff) p(pvalue) q(qvalue) deriv(dvalue) fuzzy(fuzzyvar [sharpbw]) covs(covars) kernel(kernelfn) weights(weightsvar)
                h(hvalueL hvalueR) b(bvalueL bvalueR) rho(rhovalue) scalepar(scaleparvalue) bwselect(bwmethod) scaleregul(scaleregulvalue) vce(vcemethod) level(level)
                all]

/*
c(cutoff) specifies the RD cutoff.  The default is c(0).

    p(pvalue) specifies the order of the local polynomial used to construct the point estimator.  The default is p(1) (local linear regression).

    q(qvalue) specifies the order of the local polynomial used to construct the bias correction.  The default is q(2) (local quadratic regression).

    deriv(dvalue) specifies the order of the derivative of the regression functions to be estimated.  The default is deriv(0) (sharp RD, or fuzzy RD if fuzzy() is also
        specified).  Setting deriv(1) results in estimation of a kink RD design (up to scale), or fuzzy kink RD if fuzzy() is also specified.

    fuzzy(fuzzyvar [sharpbw]) specifies the treatment status variable used to implement fuzzy RD estimation (or fuzzy kink RD if deriv(1) is also specified).  The
        default is sharp RD design.  If the sharpbw option is set, the fuzzy RD estimation is performed using a bandwidth selection procedure for the sharp RD model.
        This option is automatically selected if there is perfect compliance at either side of the threshold.

    covs(covars) specifies additional covariates to be used for estimation and inference.

    kernel(kernelfn) specifies the kernel function used to construct the local polynomial estimators.  kernelfn may be triangular, epanechnikov, or uniform.  The
        default is kernel(triangular).

    weights(weightsvar) specifies the variable used for optional weighting of the estimation procedure.  The unit-specific weights multiply the kernel function.

    h(hvalueL hvalueR) specifies the main bandwidth, h, to be used on the left and on the right of the cutoff, respectively.  If only one value is specified, then this
        value is used on both sides.  If not specified, the bandwidth(s) h is computed by the companion command rdbwselect.

    b(bvalueL bvalueR) specifies the bias bandwidth, b, to be used on the left and on the right of the cutoff, respectively.  If only one value is specified, then this
        value is used on both sides.  If not specified, bandwidth(s) b is computed by the companion command rdbwselect.

    rho(rhovalue) specifies the value of rho so that the bias bandwidth, b, equals b=h/rho.  The default is rho(1) if h is specified but b is not.

    scalepar(scaleparvalue) specifies the scaling factor for the RD parameter of interest.  This option is useful when the population parameter of interest involves a
        known multiplicative factor (for example, sharp kink RD).  The default is scalepar(1) (no scaling).

    bwselect(bwmethod) specifies the bandwidth selection procedure to be used.  By default, it computes both h and b, unless rho is specified, in which case it
        computes only the h and sets b=h/rho.  For details on implementation, see Calonico, Cattaneo, and Titiunik (2014b); Calonico, Cattaneo, and Farrell
        (forthcoming); and Calonico et al. (2016), and the companion software articles.  bwmethod may be one of the following:

*/
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 220,699评论 6 513
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 94,124评论 3 395
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 167,127评论 0 358
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 59,342评论 1 294
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 68,356评论 6 397
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 52,057评论 1 308
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,654评论 3 420
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,572评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 46,095评论 1 318
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 38,205评论 3 339
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 40,343评论 1 352
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 36,015评论 5 347
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,704评论 3 332
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 32,196评论 0 23
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,320评论 1 271
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,690评论 3 375
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 45,348评论 2 358

推荐阅读更多精彩内容