准备的数据
affairs:numeric. How often engaged in extramarital sexual intercourse during the past year? 0 = none, 1 = once, 2 = twice, 3 = 3 times, 7 = 4–10 times, 12 = monthly, 12 = weekly, 12 = daily.
gender:factor indicating gender.
age:numeric variable coding age in years: 17.5 = under 20, 22 = 20–24, 27 = 25–29, 32 = 30–34, 37 = 35–39, 42 = 40–44, 47 = 45–49, 52 = 50–54, 57 = 55 or over
yearsmarried:numeric variable coding number of years married: 0.125 = 3 months or less, 0.417 = 4–6 months, 0.75 = 6 months–1 year, 1.5 = 1–2 years, 4 = 3–5 years, 7 = 6–8 years, 10 = 9–11 years, 15 = 12 or more years.
children :factor. Are there children in the marriage?
religiousness:numeric variable coding religiousness: 1 = anti, 2 = not at all, 3 = slightly, 4 = somewhat, 5 = very.
education:numeric variable coding level of education: 9 = grade school, 12 = high school graduate, 14 = some college, 16 = college graduate, 17 = some graduate work, 18 = master's degree, 20 = Ph.D., M.D., or other advanced degree.
occupation:numeric variable coding occupation according to Hollingshead classification (reverse numbering).
ratingnumeric :variable coding self rating of marriage: 1 = very unhappy, 2 = somewhat unhappy, 3 = average, 4 = happier than average, 5 = very happy。
统计因变量和自变量
统计p值
验证
数据可是是骗人的,但是它不会说谎验证如下:
table(data$affairs)/nrow(data)#全集上因变量的各个的比例
0 1
0.750416 0.249584
> #0 1
> #0.750416 0.249584
> table(dataTrain$affairs)/nrow(dataTrain)#接近全集比例测试集上的
0 1
0.7546778 0.2453222
> table(dataTest$affairs)/nrow(dataTest)#训练集上的
0 1
0.7333333 0.2666667
具体如下:preProcValues <- preProcess(dataTrain,method = c('center','scale'))
trainTransformed <- predict(preProcValues,dataTrain)
testTransformed <- predict(preProcValues,dataTest)
#四、选择变量
subsets <- c(2,5,8,15,20)
ctrl <- rfeControl(functions = rfFuncs,#随机森林
method = 'cv')#交叉验证
x <- trainTransformed[,-which(colnames(trainTransformed)%in%"affairs")]#不要affairs这一列
y <- trainTransformed[,"affairs"]
profile <- rfe(x,y,sizes = subsets,rfeControl = ctrl)
profile$optVariables
#五、模拟训练及调参
data.train <- trainTransformed[,c(profile$optVariables,'affairs')]
data.test <- testTransformed[,c(profile$optVariables,'affairs')]
##随机森林##
set.seed(45645)
gbmFit1=train(affairs~.,data=data.train,method='rf')
#用于训练集
importance <- varImp(gbmFit1,scale = F)
plot(importance,xlab='重要性哈哈哈哈哈哈')
如图所示:拿去用吧DS,ZN,事实证明,流氓不可怕,可怕的是流氓有文化,而且还不是一般的文化。哈哈哈哈~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
成功了给我留言: