数据挖掘
这是一个数据挖掘的常规流程:
- 业务理解 :背景是什么,问题的目的是什么
- 数据理解 :有哪些数据,那些数据相关,数据是否充分,数据对不对
- 数据预处理:数据的清洗,数据的转换,包括特征的选择
- 建立模型:建立分类模型,回归模型
- 评估模型:模型效果如何,ks ,auc
-
模型部署,使用建立好的模型
image.png
数据处理
输出数据的行列
# simple show rows x columns function
nelems=function(d) paste(nrow(d),"x",ncol(d))
缺失值处理
# 1.直接删除
bank4=na.omit(bank3)
# 2.用平均值填充
bank5=imputation("value",bank3,"age",Value=meanage)
# 3.substitute NA values by the values found in most similar case (1-nearestneighbor):
bank6=imputation("hotdeck",bank3,"age")
建模
fit函数:训练模型,调参数
predict: 函数,进行预测
mining :根据验证方法和运行次数执行几次拟合并预测执行。
library(rminer)
# ctree
B2=fit(schoolsup~.,math[,c(inputs,bout)],model="ctree")
# rpart
B1=fit(schoolsup~.,math[,c(inputs,bout)],model="rpart")
B3=fit(schoolsup~.,math[,c(inputs,bout)],model="mlpe")
B4=fit(schoolsup~.,math[,c(inputs,bout)],model="ksvm")
C3=fit(Mjob~.,cmath,model="randomForest")
你修改model就好了
评估
B1=fit(schoolsup~.,math[,c(inputs,bout)],model="rpart")
test <- math[,c(inputs,bout)]
y <- test$schoolsup.1
P1=predict(B1,test)
m=mmetric(y,P1,metric=c("ALL"))
这样就会得出所有的指标
如何查看model有哪些模型:
naivemost common class (classification) or mean output value (regression)ctree– conditional inference tree (classification and regression, uses[ctree](http://127.0.0.1:10074/help/library/rminer/help/ctree)frompartypackage)cv.glmnet– generalized linear model with lasso or elasticnet regularization (classification and regression, uses[cv.glmnet](http://127.0.0.1:10074/help/library/rminer/help/cv.glmnet)fromglmnetpackage; note: cross-validation is used to automatically set the lambda parameter that is needed to compute the predictions)rpartordt– decision tree (classification and regression, uses[rpart](http://127.0.0.1:10074/help/library/rminer/help/rpart)fromrpartpackage)kknnorknn– k-nearest neighbor (classification and regression, uses[kknn](http://127.0.0.1:10074/help/library/rminer/help/kknn)fromkknnpackage)ksvmorsvm– support vector machine (classification and regression, uses[ksvm](http://127.0.0.1:10074/help/library/rminer/help/ksvm)fromkernlabpackage)mlp– multilayer perceptron with one hidden layer (classification and regression, uses[nnet](http://127.0.0.1:10074/help/library/rminer/help/nnet)fromnnetpackage)mlpe– multilayer perceptron ensemble (classification and regression, uses[nnet](http://127.0.0.1:10074/help/library/rminer/help/nnet)fromnnetpackage)randomForestorrandomforest– random forest algorithm (classification and regression, uses[randomForest](http://127.0.0.1:10074/help/library/rminer/help/randomForest)fromrandomForestpackage)xgboost– eXtreme Gradient Boosting (Tree) (classification and regression, uses[xgboost](http://127.0.0.1:10074/help/library/rminer/help/xgboost)fromxgboostpackage; note:nroundsparameter is set by default to 2)bagging– bagging (classification, uses[bagging](http://127.0.0.1:10074/help/library/rminer/help/bagging)fromadabagpackage)boosting– boosting (classification, uses[boosting](http://127.0.0.1:10074/help/library/rminer/help/boosting)fromadabagpackage)lda– linear discriminant analysis (classification, uses[lda](http://127.0.0.1:10074/help/library/rminer/help/lda)fromMASSpackage)multinomorlr– logistic regression (classification, uses[multinom](http://127.0.0.1:10074/help/library/rminer/help/multinom)fromnnetpackage)naiveBayesornaivebayes– naive bayes (classification, uses[naiveBayes](http://127.0.0.1:10074/help/library/rminer/help/naiveBayes)frome1071package)qda– quadratic discriminant analysis (classification, uses[qda](http://127.0.0.1:10074/help/library/rminer/help/qda)fromMASSpackage)cubist– M5 rule-based model (regression, uses[cubist](http://127.0.0.1:10074/help/library/rminer/help/cubist)fromCubistpackage)lm– standard multiple/linear regression (uses[lm](http://127.0.0.1:10074/help/library/rminer/help/lm))mr– multiple regression (regression, equivalent to[lm](http://127.0.0.1:10074/help/library/rminer/help/lm)but uses[nnet](http://127.0.0.1:10074/help/library/rminer/help/nnet)fromnnetpackage with zero hidden nodes and linear output function)mars– multivariate adaptive regression splines (regression, uses[mars](http://127.0.0.1:10074/help/library/rminer/help/mars)frommdapackage)pcr– principal component regression (regression, uses[pcr](http://127.0.0.1:10074/help/library/rminer/help/pcr)fromplspackage)plsr– partial least squares regression (regression, uses[plsr](http://127.0.0.1:10074/help/library/rminer/help/plsr)fromplspackage)cppls– canonical powered partial least squares (regression, uses[cppls](http://127.0.0.1:10074/help/library/rminer/help/cppls)fromplspackage)rvm– relevance vector machine (regression, uses[rvm](http://127.0.0.1:10074/help/library/rminer/help/rvm)fromkernlabpackage)
分享资料:
https://repositorium.sdum.uminho.pt/bitstream/1822/36210/1/rminer-tutorial.pdf
