检错技术积累(by stata)_14Nov2019

逻辑检错每个变量不一样,差别大,大家遇到了随机应变就行了。
缺失值和异常值都在下面的例子了。这些例子可以很轻易地在您的机器上实现,唯一需要一个外部命令,这样来安装:

  1. 联网的电脑
  2. 打开stata
  3. 命令窗口键入: ssc install fmiss

注意:安装只需要一次即可。

检缺失值

/*make a dataset that contains missing values*/
sysuse auto, clear
set more off
replace make = "" in 1
replace price = . in 2
replace price = . in 7
replace mpg = . in 7
replace rep78 = . in 10
replace headroom = . in 11
replace trunk = . in 9
replace weight = . in 5

fmiss /*a simple glimpse works sometimes*/

misstable summarize, generate(miss_)
keep miss_*

egen obs_with_na = rowtotal(*)
drop if obs_with_na == 0
list /*I personally prefer this version of result, perhaps you should save this into .xlsx*/

/*optional*/
drop obs_with_na
tostring *, replace
for var *: replace X = "" if X == "0"
for var *: replace X = "missing" if X == "1"
list

export excel using "chk_missing.xlsx", replace

批量绘图导出箱图_检错或可使用(histogram may also help)

sysuse auto, clear
set more off
cap log close
des

!rmdir /s chk_miss_figure /*be careful that anything in this directory will be removed*/
!mkdir chk_miss_figure

foreach var of varlist price mpg rep head tru wei len turn dis gear fore {
    graph box `var'
    graph export "chk_miss_figure/`var'.png", replace
}

批量检异常值

sysuse auto, clear
set more off
cap log close

gen ID = _n
gen test = _n + 1
des

/*you are required to revise the folling four lines of code to setup*/
log using chk_outlier.txt, text replace /*the file name of results*/
global threshold 2 /*bigger than which times of sd is consisdered as a outlier, 2 or 3 recommended*/
global must_shown ID test /*variables showed in results, usually used to identify obs*/
global chk_vars price mpg rep head tru wei len turn dis gear fore /*variables to be checked, very important*/

/*do not change anything of below codes unless you understand exactly what you doing*/
foreach var of global chk_vars {
    qui: su `var'
    
    cap drop temp
    qui: gen temp = 1 if abs(`var' - r(mean)) > $threshold * r(sd) & `var' != .
    qui: su temp
    
    if r(sum) != 0 {
        di "===========`var'==============="
        list $must_shown `var' if temp == 1
        di "==============================="
    }
}

log close

/*another example, helpful in your analysis*/
gen outlier = 0

foreach var of global chk_vars{
    qui: su `var'
    qui: replace outlier = 1 if abs(`var' - r(mean)) > $threshold * r(sd) & `var' != .
}

tab outlier
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

  • 这部分是对Stata编程的简单介绍。主要讨论宏和循环,并展示如何编写简单程序。编程是一个很大的主题,我在这里仅进行...
    谢作翰阅读 14,410评论 0 13
  • 中国自古便推崇“文以载道”的思想。宋代周敦颐《通书·文辞》说: “文所以载道也,轮辕饰而人弗庸,徒饰也,况...
    垂杨紫陌yileen阅读 659评论 0 0
  • 所有的新品牌进入市场的时候,以下几个目标: 一:知名度 1. 品类间,我们的知晓度如何? 2.品牌间,我们的知晓度...
    ermus阅读 402评论 0 0
  • 民间有句谚语:“低头的稻穗,昂头的稗子。”越成熟,越饱满的稻穗,头垂的越低。只有那些果实空空如也的稗子,才会显得招...
    晨初v听雨阅读 496评论 0 2
  • 晴空九泣 百秋,十九岁。他的生活与常人并没有什么不同,该有的都有了,不该有的也都没有。大学,一个充满青春活力的地方...
    林夕在苦茶树下阅读 351评论 1 3

友情链接更多精彩内容