这篇文章主要是:R语言中工具变量的使用、涉及到的数据处理以及模型含义。(本例题涉及数据处理很常用)
两个例题:例题一涉及6个问题,使用的数据集为R语言自带的fertil2。例题二涉及3个问题,使用的数据集为stata格式的eitc.dta。本文介绍例题二。例题一见上篇。
例题一
1.1 问题
使用数据集eitc.dta,因变量为children,educ为自变量(是否为内生在问题1-6中会讨论),还有其他自变量如age等。
1.2 我的解答
1.3 R语言代码
> library(tidyverse) # ggplot(), %>%, mutate(), and friends
> library(scales) # Format numbers with functions like comma(), percent(), and dollar()
> library(broom) # Convert models to data frames
> library(wooldridge) # Econometrics-related datasets like injury
> library(stargazer)
> library(foreign)
> library(readstata13)
> eitc = read.dta13("C:\\Users\\LENOVO\\Desktop\\eitc.dta")
>
> head(eitc,2)
state year urate children nonwhite finc earn age ed work unearn
1 11 1991 7.6 0 1 18714.394 18714.3943 26 10 1 0.000000
2 12 1991 7.2 1 0 4838.568 471.3656 22 9 1 4.367203
#问题7:(这些不同级别儿童的平均工作、收入、收入、非白人、教育程度和年龄是多少? 这些群体有何不同?)
#7.1 将儿童数分为3类,即0,1,2。
> eitc <- eitc %>% mutate(children_cat = case_when(
+ children == 0 ~ "0",
+ children == 1 ~ "1",
+ children >= 2 ~ "2+"
+ ))
#7.2 选取儿童数为0的,求work、finc等的均值
> eitc %>%
+ filter(children =="0")%>%
+ summarize(mean_0_work = mean(work),
+ mean_0_finc = mean(finc),
+ mean_0_earn = mean(earn),
+ mean_0_nonwhite = mean(nonwhite),
+ mean_0_ed = mean(ed),
+ mean_0_age =mean(age)
+ )
mean_0_work mean_0_finc mean_0_earn mean_0_nonwhite mean_0_ed mean_0_age
1 0.5744896 18559.86 13760.26 0.515944 8.548676 38.49823
#7.3 选取儿童数为1的,求work、finc等的均值
> eitc %>%
+ filter(children =="1")%>%
+ summarize(mean_1_work = mean(work),
+ mean_1_finc = mean(finc),
+ mean_1_earn = mean(earn),
+ mean_1_nonwhite = mean(nonwhite),
+ mean_1_ed = mean(ed),
+ mean_1_age =mean(age)
+ )
mean_1_work mean_1_finc mean_1_earn mean_1_nonwhite mean_1_ed mean_1_age
1 0.5376063 13941.57 9928.279 0.5964683 8.992479 33.75899
#7.4 选取儿童数为2的,求work、finc等的均值
> eitc %>%
+ filter(children =="2")%>%
+ summarize(mean_2_work = mean(work),
+ mean_2_finc = mean(finc),
+ mean_2_earn = mean(earn),
+ mean_2_nonwhite = mean(nonwhite),
+ mean_2_ed = mean(ed),
+ mean_2_age =mean(age)
+ )
mean_2_work mean_2_finc mean_2_earn mean_2_nonwhite mean_2_ed mean_2_age
1 0.4782972 12357.29 7487.978 0.6527546 9.082638 32.26002
>
> eitc <- eitc %>% mutate(children_cat = case_when(
+ children == 0 ~ "0",
+ children == 1 ~ "1",
+ children >= 2 ~ "2+"
+ ))
>
>
> #问题8:(创建一个名为 any_kids 的新变量(如果 children > 0,则应为 TRUE 或 1)和一个名为 after_1993 的时间变量(如果 year > 1993,则应为 TRUE 或 1))
> any_kids = (eitc$children > 0)*1
> eitc = cbind(eitc,any_kids)
>
> after_1993 = (eitc$year > 0)*1
> eitc = cbind(eitc,after_1993)
>
>
> #问题9:(创建一个新数据集,显示治疗组和对照组(即有孩子和没有孩子)中每年就业女性(工作)的平均比例。)
> eitc %>%
+ filter(any_kids =="1")%>%
+ summarize(mean_any_kid_1_work = mean(work)
+ )
mean_any_kid_1_work
1 0.4664279
>
> eitc %>%
+ filter(any_kids =="0")%>%
+ summarize(mean_any_kid_0_work = mean(work)
+ )
mean_any_kid_0_work
1 0.5744896
以上是我自己做的答案,也不知道正确答案如何,如果有会的同学来点评帮助一下,小编将感激不尽。共勉。