◾PanelData
◾Fixed Effects Regression
◾Regression with Time Fixed Effects
◾The Fixed Effects Regression Assumptions
面板数据(Panel Data),是截面数据与时间序列数据综合起来的一种数据类型。它有时间序列和截面两个维度,当这类数据按两个维度排列时,是排在一个平面上,与只有一个维度的数据排在一条线上有着明显的不同,整个表格像是一个面板。
pacman::p_load(AER,plm,stargazer)
data(Fatalities)
# 获取维度检查结构
#该数据集由对34个变量的336个观测结果组成
is.data.frame(Fatalities)
dim(Fatalities)
str(Fatalities)
head(Fatalities)
# summarize 变量 'state' and 'year'
summary(Fatalities[, c(1, 2)])
# 确认数据是平衡面板
years <- length(levels(Fatalities$year))
states <- length(levels(Fatalities$state))
years*states == nrow(Fatalities)
> is.data.frame(Fatalities)
[1] TRUE
> dim(Fatalities)
[1] 336 37
> summary(Fatalities[, c(1, 2)])
state year
al : 7 1982:48
az : 7 1983:48
ar : 7 1984:48
ca : 7 1985:48
co : 7 1986:48
ct : 7 1987:48
(Other):294 1988:48
实例:交通死亡和酒精税
# 定义死亡率
Fatalities$fatal_rate <- Fatalities$fatal / Fatalities$pop * 10000
# 数据子集
Fatalities1982 <- subset(Fatalities, year == "1982")
Fatalities1988 <- subset(Fatalities, year == "1988")
# 使用1982年和1988年的数据来估计简单的回归模型
fatal1982_mod <- lm(fatal_rate ~ beertax, data = Fatalities1982)
fatal1988_mod <- lm(fatal_rate ~ beertax, data = Fatalities1988)
coeftest(fatal1982_mod, vcov. = vcovHC, type = "HC1")
coeftest(fatal1988_mod, vcov. = vcovHC, type = "HC1")
# 绘制观测结果,添加1982年数据的估计回归线
plot(x = Fatalities1982$beertax,
y = Fatalities1982$fatal_rate,
xlab = "Beer tax (in 1988 dollars)",
ylab = "Fatality rate (fatalities per 10000)",
main = "Traffic Fatality Rates and Beer Taxes in 1982",
ylim = c(0, 4.5),
pch = 20,
col = "steelblue")
abline(fatal1982_mod, lwd = 1.5)
# 绘制观测结果,添加1988年数据的估计回归
plot(x = Fatalities1988$beertax,
y = Fatalities1988$fatal_rate,
xlab = "Beer tax (in 1988 dollars)",
ylab = "Fatality rate (fatalities per 10000)",
main = "Traffic Fatality Rates and Beer Taxes in 1988",
ylim = c(0, 4.5),
pch = 20,
col = "steelblue")
abline(fatal1988_mod, lwd = 1.5)
> coeftest(fatal1982_mod, vcov. = vcovHC, type = "HC1")
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.01038 0.14957 13.4408 <2e-16 ***
beertax 0.14846 0.13261 1.1196 0.2687
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> coeftest(fatal1988_mod, vcov. = vcovHC, type = "HC1")
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.85907 0.11461 16.2205 < 2.2e-16 ***
beertax 0.43875 0.12786 3.4314 0.001279 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


结论:
1.两年啤酒税与死亡率呈正关系
2.1988年数据的啤酒税估计系数几乎是1982年数据的三倍
3.估计出现偏差
具有两个时间段的面板数据:“前后”比较
# 计算差异,通过相减消除偏差
diff_fatal_rate <- Fatalities1988$fatal_rate - Fatalities1982$fatal_rate
diff_beertax <- Fatalities1988$beertax - Fatalities1982$beertax
# 使用差异来估计回归
fatal_diff_mod <- lm(diff_fatal_rate ~ diff_beertax)
coeftest(fatal_diff_mod, vcov = vcovHC, type = "HC1")
# 绘制差异数据图
plot(x = diff_beertax,
y = diff_fatal_rate,
xlab = "Change in beer tax (in 1988 dollars)",
ylab = "Change in fatality rate (fatalities per 10000)",
main = "Changes in Traffic Fatality Rates and Beer Taxes in 1982-1988",
xlim = c(-0.6, 0.6),
ylim = c(-1.5, 1),
pch = 20,
col = "steelblue")
# 添加回归线
abline(fatal_diff_mod, lwd = 1.5)
> coeftest(fatal_diff_mod, vcov = vcovHC, type = "HC1")
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.072037 0.065355 -1.1022 0.276091
diff_beertax -1.040973 0.355006 -2.9323 0.005229 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

啤酒税的估计系数现在是负的;
它的解释是,通过$1提高啤酒税会导致交通死亡人数每1万人减少1.04人。
固定效应回归(Fixed Effects Regression)
固定效应模型(Fixed Effects Regression Model),如果对于不同的截面或不同的时间序列,模型的截距不同,则可以采用在模型中添加虚拟变量的方法估计回归参数。该模型刻画了不同个体的特殊影响,而且这个影响不随样本变化。
pacman::p_load(plm)
# 用plm()估计固定效应回归model = "within"
fatal_fe_mod <- plm(fatal_rate ~ beertax,
data = Fatalities,
index = c("state", "year"),
model = "within")
# print summary using robust standard errors
coeftest(fatal_fe_mod, vcov. = vcovHC, type = "HC1")
#混合估计模型进行估计model = "pooling"
mod <- plm(lfatal_rate~ beertax,
data = Fatalities,
index = c("state", "year"),
model = "pooling")
#如果要判断固定效应模型是否比混合估计模型更好,可采用F检验
pFtest(fatal_fe_mod, mod)
> coeftest(fatal_fe_mod, vcov. = vcovHC, type = "HC1")
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
beertax -0.65587 0.28880 -2.271 0.02388 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
解释:啤酒税上的系数为负且显著。
实际啤酒税每增加$1,交通死亡人数估计为每1万人减少0.66人。
随时间固定效应的回归(Regression with Time Fixed Effects)
在plm()调用中,设置另一个参数effect = "twoways"来包含实体和时间。
# 建立一个时间和实体组合的固定效应回归模型
# 通过lm()
fatal_tefe_lm_mod <- lm(fatal_rate ~ beertax + state + year - 1, data = Fatalities)
fatal_tefe_lm_mod
# 通过plm()
fatal_tefe_mod <- plm(fatal_rate ~ beertax,
data = Fatalities,
index = c("state", "year"),
model = "within",
effect = "twoways")
coeftest(fatal_tefe_mod, vcov = vcovHC, type = "HC1")
# 检查 'state' 和 'year'的类型
class(Fatalities$state)
class(Fatalities$year)
> coeftest(fatal_tefe_mod, vcov = vcovHC, type = "HC1")
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
beertax -0.63998 0.35015 -1.8277 0.06865 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
实例:酒后驾驶法和交通死亡法
# 离散最低法定饮酒年龄cut()
Fatalities$drinkagec <- cut(Fatalities$drinkage,
breaks = 18:22,
include.lowest = TRUE,
right = FALSE)
# 将最低饮酒年龄[21,22]设定为基准水平
Fatalities$drinkagec <- relevel(Fatalities$drinkagec, "[21,22]")
# 曼达多里监狱还是社区服务机构?
Fatalities$punish <- with(Fatalities, factor(jail == "yes" | service == "yes",
labels = c("no", "yes")))
# 1982年和1988年所有变量的观察值
Fatalities_1982_1988 <- Fatalities[with(Fatalities, year == 1982 | year == 1988), ]
#使用plm()来估计所有7个模型
#第4个模型最好,但是不显著
fatalities_mod1 <- lm(fatal_rate ~ beertax, data = Fatalities)
fatalities_mod2 <- plm(fatal_rate ~ beertax + state, data = Fatalities) #只有state固定效应
fatalities_mod3 <- plm(fatal_rate ~ beertax + state + year,
index = c("state","year"), #固定效应state、year
model = "within",
effect = "twoways",
data = Fatalities)
fatalities_mod4 <- plm(fatal_rate ~ beertax + state + year + drinkagec
+ punish + miles + unemp + log(income), #增加控制变量
index = c("state", "year"),
model = "within",
effect = "twoways",
data = Fatalities)
fatalities_mod5 <- plm(fatal_rate ~ beertax + state + year + drinkagec
+ punish + miles, #减掉两个控制变量unemp、log(income)
index = c("state", "year"),
model = "within",
effect = "twoways",
data = Fatalities)
fatalities_mod6 <- plm(fatal_rate ~ beertax + year + drinkage #减掉控制变量state
+ punish + miles + unemp + log(income),
index = c("state", "year"),
model = "within",
effect = "twoways",
data = Fatalities)
fatalities_mod7 <- plm(fatal_rate ~ beertax + state + year + drinkagec
+ punish + miles + unemp + log(income),
index = c("state", "year"),
model = "within",
effect = "twoways",
data = Fatalities_1982_1988) #只用了2年数据
#使用stargazer() (Hlavac, 2018)来生成全面的表格展示结果
library(stargazer)
# 建立标准差集
rob_se <- list(sqrt(diag(vcovHC(fatalities_mod1, type = "HC1"))),
sqrt(diag(vcovHC(fatalities_mod2, type = "HC1"))),
sqrt(diag(vcovHC(fatalities_mod3, type = "HC1"))),
sqrt(diag(vcovHC(fatalities_mod4, type = "HC1"))),
sqrt(diag(vcovHC(fatalities_mod5, type = "HC1"))),
sqrt(diag(vcovHC(fatalities_mod6, type = "HC1"))),
sqrt(diag(vcovHC(fatalities_mod7, type = "HC1"))))
# 生成表格
stargazer(fatalities_mod1, fatalities_mod2, fatalities_mod3,
fatalities_mod4, fatalities_mod5, fatalities_mod6, fatalities_mod7,
digits = 3,
header = FALSE,
type = "html",
out = "day3_2.doc",
se = rob_se,
title = "Linear Panel Regression Models of Traffic Fatalities due to Drunk Driving",
model.numbers = FALSE,
column.labels = c("(1)", "(2)", "(3)", "(4)", "(5)", "(6)", "(7)"))
# 检验drinkage是否没有解释能力
linearHypothesis(fatalities_mod4,
test = "F",
c("drinkagec[18,19)=0", "drinkagec[19,20)=0", "drinkagec[20,21)"),
vcov. = vcovHC, type = "HC1")
# 检验经济指标是否没有解释能力
linearHypothesis(fatalities_mod4,
test = "F",
c("log(income)", "unemp"),
vcov. = vcovHC, type = "HC1")
cut():切割将x的范围划分为时间间隔,并根据其所处的时间间隔对x中的值进行编码。参数:
breaks:两个或更多个唯一切割点或单个数字(大于或等于2)的数字向量,给出x被切割的间隔的个数。breaks:采用fivenum():返回五个数据:最小值、下四分位数、中位数、上四分位数、最大值。
labels:为区间数,打标签
ordered_result:逻辑:结果应该是一个有序的因素吗?
>linearHypothesis(fatalities_mod4,
test = "F",
c("drinkagec[18,19)=0", "drinkagec[19,20)=0", "drinkagec[20,21)"),
vcov. = vcovHC, type = "HC1")
Linear hypothesis test
Hypothesis:
drinkagec[18,19) = 0
drinkagec[19,20) = 0
drinkagec[20,21) = 0
Model 1: restricted model
Model 2: fatal_rate ~ beertax + state + year + drinkagec + punish + miles +
unemp + log(income)
Note: Coefficient covariance matrix supplied.
Res.Df Df F Pr(>F)
1 276
2 273 3 0.3782 0.7688
>linearHypothesis(fatalities_mod4,
test = "F",
c("log(income)", "unemp"),
vcov. = vcovHC, type = "HC1")
Linear hypothesis test
Hypothesis:
log(income) = 0
unemp = 0
Model 1: restricted model
Model 2: fatal_rate ~ beertax + state + year + drinkagec + punish + miles +
unemp + log(income)
Note: Coefficient covariance matrix supplied.
Res.Df Df F Pr(>F)
1 275
2 273 2 31.577 4.609e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1