断点回归

基本设定

$x_{i}$ 是驱动变量running variable，可以和结果变量有关也可以无关。

$c$ 是断点（cutoff ）

$D_{i}$ 是处理变量，处理变量完全依赖于驱动变量：

$D_{i}=\left\{\begin{array}{ll}{1} & {\text { if } x_{i}<c} \\ {0} & {\text { if } x_{i} \geq c}\end{array}\right.$

断点回归在断点邻域处，样本是否被处理，仿佛被“上帝之手”给控制，成为一个准实验。

局部平均处理效应（local average treatment effect,LATE）

$\begin{aligned} \mathrm{LATE} & \equiv \mathrm{E}\left(y_{1 i}-y_{0 i} | x=c\right) \\ &=\mathrm{E}\left(y_{1 i} | x=c\right)-\mathrm{E}\left(y_{0 i} | x=c\right) \\ &=\lim _{x \rightarrow c+} \mathrm{E}\left(y_{1 i} | x\right)-\lim _{x \rightarrow c-} \mathrm{E}\left(y_{0 i} | x\right) \end{aligned}$

设定treat和not treat不同的截距（ $D$ ）和斜率 $[\gamma\left(x_{i}-c\right) D_{i}]$ ；其中截距 $\delta$ 就是 $LATE$

$y_{i}=\alpha+\beta\left(x_{i}-c\right)+\delta D_{i}+\gamma\left(x_{i}-c\right) D_{i}+\varepsilon_{i}$

基本假设

断点假设：断点处个体被分配的概率存在跳跃
连续性假设：结果变量与驱动变量之间的关系在所有点都连续的
局部随机化假设：
$\left(Y_{1 i}, Y_{0 i}\right) \perp D_{i} | X_{i} \in \delta\left(x_{0}\right)$
独立性假设：潜在结果和干预在断点处独立于驱动变量X
$\left(Y_{1 i}, Y_{0 i}\right), D_{t i}(x), D_{0 i}(x) \perp X_{i}, X_{i} \in \delta\left(x_{0}\right)$

精确断点回归

定义

在断点处，个体得到处理的概率从0跳跃到1

存在两个问题

如果回归函数包含高次项，会导致遗漏变量偏差
断点回归是局部的随机实验

两个解决方法

加入高次项。

2.限定 $x$ 的取值范围（c-h,c+h）；这里的h就是带宽。

得到下式：
$\begin{aligned} y_{i}=& \alpha+\beta_{1}\left(x_{i}-c\right)+\delta D_{i}+\gamma_{1}\left(x_{i}-c\right) D_{i} \\ &+\beta_{2}\left(x_{i}-c\right)^{2}+\gamma_{2}\left(x_{i}-c\right)^{2} D_{i}+\varepsilon_{i} \quad(c-h<x<c+h) \end{aligned}$

其中 $\delta$ 为对 $LATE$ 的估计量，可以使用稳健标准误克服异方差。

令人头疼的最优带宽——如何求解

h小，也许会精确，但由于点过少可能方差会变大
h大，方差也许会变小，但包含了过多离 $x=c$ 较远的点导致偏差变大

现在一般流行使用非参数的方法求最优带宽

$\min _{h} \mathrm{E}\left\{\left[\hat{m}_{1}(c)-m_{1}(c)\right]^{2}+\left[\hat{m}_{0}(c)-m_{0}(c)\right]^{2}\right\}$

其中， $m_{1}(x) \equiv \mathrm{E}\left(y_{1} | x\right), m_{0}(x) \equiv \mathrm{E}\left(y_{0} | x\right)\\\delta=m_{1}(c)-m_{0}(c),\hat{\delta}=\hat{m}_{1}(c)-\hat{m}_{0}(c)$

核函数求解

$\min _{ | \alpha, \beta, \delta, y \}} \sum_{i=1}^{n} K\left[\left(x_{i}-c\right) / h\right]\left[y_{i}-\alpha-\beta\left(x_{i}-c\right)-\delta D_{i}-\gamma\left(x_{i}-c\right) D_{i}\right]^{2}$

其中 $K(\cdot)$ 是核函数（如三角核）。后面一部分就是一个残差；而前面的中括号就是权重。局部线性回归，在小临域内进行加权最小二乘，权重由核函数决定，离c约近权重越大。

协变量的选择

影响y的其他协变量，好处在于减少扰动项方差。坏处

如果内生会干扰估计
如果协变量也出现跳跃，则直接出现偏误，因此通常需要验证协变量的条件密度函数是否在断点跳跃。

汇报包括

（1）三角核与矩形核的局部线性回归结果
（2）汇报不同带宽的结果，最优带宽，二分之一带宽，两倍
（3）汇报协变量和不包含协变量的情形
（4）检验模型设定检验，检验分组变量与协变量的条件密度是否在断点处连续

模糊断点回归

$\begin{array}{l}{E(y | x)=E\left(y_{0} | x\right)+E\left[D\left(y_{1}-y_{0}\right) | x\right]} \\ {\quad=E\left(y_{0} | x\right)+E(D | x) \cdot E\left[\left(y_{1}-y_{0}\right) | x\right]}\end{array}$

平均处理效应
$L A T E \equiv E\left[\left(y_{1}-y_{0}\right) | x=c\right]=\frac{\lim _{x \downarrow c} E(y | x)-\lim _{x \uparrow c} E(y | x)}{\lim _{x \downarrow c} E(D | x)-\lim _{x \uparrow c} E(D | x)}$

RDD的一般步骤

（1）图形分析，

Y和X关系图，Y和X有没有断点（rdplot），一般是每个带宽之间的结果变量取一个平均值。
协变量和X关系图，协变量和X有没有断点。
驱动变量X的分布图，看其在驱动变量左右有没有明显的跳跃。

（2）因果效应估计

边界非参数回归（比较少用）
局部线性回归
*局部多项式回归

（3）稳健性检验：协变量连续性检验（对每个协变量做一下断点回归）、参考变量分布连续性检验（McCrary）、伪断点回归（左右带宽的中间）、带宽敏感性检验（换不同的带宽）

软件实现

*断点图
rdplot depvar runvar [if] [in] [, c(cutoff) p(pvalue) kernel(kernelfn)]

*最优带宽选择
rdbwselect depvar runvar [if] [in] [, c(cutoff) p(pvalue) q(qvalue) deriv(dvalue) fuzzy(fuzzyvar [sharpbw]) covs(covars) kernel(kernelfn) weights(weightsvar)
                   bwselect(bwmethod) scaleregul(scaleregulvalue) vce(vcemethod) all]
/*
 c(cutoff) specifies the RD cutoff.  The default is c(0).

    p(pvalue) specifies the order of the local polynomial used to construct the point estimator.  The default is p(1) (local linear regression).

    q(qvalue) specifies the order of the local polynomial used to construct the bias correction.  The default is q(2) (local quadratic regression).

    deriv(dvalue) specifies the order of the derivative of the regression functions to be estimated.  The default is deriv(0) (sharp RD, or fuzzy RD if fuzzy() is also
        specified).  Setting deriv(1) results in estimation of a kink RD design (up to scale) or a fuzzy kink RD if fuzzy() is also specified.

    fuzzy(fuzzyvar [sharpbw]) specifies the treatment status variable used to implement fuzzy RD estimation (or fuzzy kink RD if deriv(1) is also specified).  The
        default is sharp RD design.  If the sharpbw option is set, the fuzzy RD estimation is performed using a bandwidth selection procedure for the sharp RD model.
        This option is automatically selected if there is perfect compliance at either side of the threshold.

    covs(covars) specifies additional covariates to be used for estimation and inference.

    kernel(kernelfn) specifies the kernel function used to construct the local polynomial estimators.  kernelfn may be triangular, epanechnikov, or uniform.  The
        default is kernel(triangular).

*/

*断点回归估计
rdrobust depvar runvar [if] [in] [, c(cutoff) p(pvalue) q(qvalue) deriv(dvalue) fuzzy(fuzzyvar [sharpbw]) covs(covars) kernel(kernelfn) weights(weightsvar)
                h(hvalueL hvalueR) b(bvalueL bvalueR) rho(rhovalue) scalepar(scaleparvalue) bwselect(bwmethod) scaleregul(scaleregulvalue) vce(vcemethod) level(level)
                all]

/*
c(cutoff) specifies the RD cutoff.  The default is c(0).

    p(pvalue) specifies the order of the local polynomial used to construct the point estimator.  The default is p(1) (local linear regression).

    q(qvalue) specifies the order of the local polynomial used to construct the bias correction.  The default is q(2) (local quadratic regression).

    deriv(dvalue) specifies the order of the derivative of the regression functions to be estimated.  The default is deriv(0) (sharp RD, or fuzzy RD if fuzzy() is also
        specified).  Setting deriv(1) results in estimation of a kink RD design (up to scale), or fuzzy kink RD if fuzzy() is also specified.

    fuzzy(fuzzyvar [sharpbw]) specifies the treatment status variable used to implement fuzzy RD estimation (or fuzzy kink RD if deriv(1) is also specified).  The
        default is sharp RD design.  If the sharpbw option is set, the fuzzy RD estimation is performed using a bandwidth selection procedure for the sharp RD model.
        This option is automatically selected if there is perfect compliance at either side of the threshold.

    covs(covars) specifies additional covariates to be used for estimation and inference.

    kernel(kernelfn) specifies the kernel function used to construct the local polynomial estimators.  kernelfn may be triangular, epanechnikov, or uniform.  The
        default is kernel(triangular).

    weights(weightsvar) specifies the variable used for optional weighting of the estimation procedure.  The unit-specific weights multiply the kernel function.

    h(hvalueL hvalueR) specifies the main bandwidth, h, to be used on the left and on the right of the cutoff, respectively.  If only one value is specified, then this
        value is used on both sides.  If not specified, the bandwidth(s) h is computed by the companion command rdbwselect.

    b(bvalueL bvalueR) specifies the bias bandwidth, b, to be used on the left and on the right of the cutoff, respectively.  If only one value is specified, then this
        value is used on both sides.  If not specified, bandwidth(s) b is computed by the companion command rdbwselect.

    rho(rhovalue) specifies the value of rho so that the bias bandwidth, b, equals b=h/rho.  The default is rho(1) if h is specified but b is not.

    scalepar(scaleparvalue) specifies the scaling factor for the RD parameter of interest.  This option is useful when the population parameter of interest involves a
        known multiplicative factor (for example, sharp kink RD).  The default is scalepar(1) (no scaling).

    bwselect(bwmethod) specifies the bandwidth selection procedure to be used.  By default, it computes both h and b, unless rho is specified, in which case it
        computes only the h and sets b=h/rho.  For details on implementation, see Calonico, Cattaneo, and Titiunik (2014b); Calonico, Cattaneo, and Farrell
        (forthcoming); and Calonico et al. (2016), and the companion software articles.  bwmethod may be one of the following:

*/