矿工会看见蓝精灵吗?

Version:1.0StartHTML:000000193EndHTML:000341168StartFragment:000323903EndFragment:000341070StartSelection:000323903EndSelection:000341070SourceURL:https://www.jianshu.com/writer

data() #列出已载入的包中的所有数据集。

data(package = .packages(all.available = TRUE)) #列出已安装的包中的所有数据集。

y = rep(c(1, 2, 3), c(20, 20, 20))

生成20个1 20个2 20个3

去除空值

na.omit(A)   A[complete.cases(A),]

rnorm()函数产生一系列的随机数,随机数个数,均值和标准差都可以设定

cor() 函数计算两两变量之间的相关系数的矩阵

#数据中心化:  scale(data,center=T,scale=F)

####数据标准化:  scale(data,center=T,scale=T) 或默认参数scale(data)

进行pca之前一般先变量标准化。

y=c(rep(-1,10),rep(1,10))

rep  重复函数    -1 重复出现十次

无监督学习:仅有x值 来

两种主要类型无监督学习:聚类分析,主成分分析

定性的响应变量,定性变量也称为分类变量。

线性回归的因变量(Y)是连续变量,自变量(X)可以是连续变量,也可以是分类变量

logistic 回归与线性回归恰好相反,因变量一定要是分类变量,不可能是连续变量。分类变量既可以是二分类,也可以是多分类,多分类中既可以是有序,也可以是无序。

最小二乘法(https://www.zhihu.com/question/37031188

竖直投影下来 计算(y-ybar)^2最小

决策树(https://blog.csdn.net/u010089444/article/details/53241218)

ID3算法

选择信息增益最大的方向进行分支标准

https://blog.csdn.net/xiaohukun/article/details/78055132

信息增益:  信息熵-条件熵

在决策树算法的学习过程中,信息增益是特征选择的一个重要指标,它定义为一个特征能够为分类系统带来多少信息,带来的信息越多,说明该特征越重要,相应的信息增益也就越大。

https://www.zhihu.com/question/22104055

信息熵越大说明事件的无序程度越高

信息熵越小说明事件的有序程度越高

https://blog.csdn.net/wxn704414736/article/details/80512705

CART

gini越小 越纯

最小的切分点最为最优切分点    使用该切分点将数据切分为两个子集

分类 回归   监督学习

聚类        非监督学习

https://blog.csdn.net/chenKFKevin/article/details/70547549

————————————

pca  降维工具

协方差矩阵——PCA实现的关键

https://www.zhihu.com/question/41120789

pinkyjie.com/2011/02/24/covariance-pca/

### prcomp(data,scale=TRUE)    scale对数据进行标准化处理

prcomp     pca主成分分析函数

————————————

混淆矩阵

https://www.zhihu.com/question/36883196

支持向量机    (文本分类问题)

https://www.zhihu.com/question/21094489

knn 

kmeans                    https://zhuanlan.zhihu.com/p/31580379

import numpy as np

import pandas as pd

import sys

import matplotlib.pyplot as plt

wine=pd.read_excel(r'D:\未名学院\第4节课\作业材料\winequality-white.xlsx')

wine.info()

Xvar=wine[['fixed acidity','volatile acidity','citric acid','residual sugar','chlorides','free sulfur dioxide','total sulfur dioxide','density','pH','sulphates','alcohol']]

Yvar=wine['quality']

t=Xvar.corr()

画图

plt.figure(figsize=(10,8))

sns.heatmap(np.abs(t),annot=True)

https://www.kaggle.com/xvivancos/tutorial-clustering-wines-with-k-means

https://www.kaggle.com/maitree/wine-quality-selection

cov_sdc=cov(wine)

eigen(cov_sdc)

res.pca <- PCA(wine[,-12], graph = TRUE)

eig.val <- get_eigenvalue(res.pca)

eig.val

#数据导入

wine=read.csv()

wine= read.csv('winequality-white.csv',header=TRUE)

wine=winequality_white

#data cleaning

wine = wine[complete.cases(wine),]

#PCA

library(stringr)

library(FactoMineR)

#绘图

res.pca <- PCA(wine[,-12], graph = TRUE)#delete Y=quality, plot the PCA graph

sdc=scale(wine)

pca.d=prcomp(sdc)

summary(pca.d)

#PCA降维

wine=wine[,-9:-11]

#查看定性变量分布,确定定性变量

hist(wine$quality)

#分类

wine0 = wine[wine$quality==3,]

wine1 = wine[wine$quality==4,]

wine2 = wine[wine$quality==5,]

wine3 = wine[wine$quality==6,]

wine4 = wine[wine$quality==7,]

wine5 = wine[wine$quality==8,]

#抽样

label0= sample(c(1:10),dim(wine0[1]),replace= TRUE)

label1= sample(c(1:10),dim(wine1[1]),replace= TRUE)

label2= sample(c(1:10),dim(wine2[1]),replace= TRUE)

label3= sample(c(1:10),dim(wine3[1]),replace= TRUE)

label4= sample(c(1:10),dim(wine4[1]),replace= TRUE)

label5= sample(c(1:10),dim(wine5[1]),replace= TRUE)

wine0_train = wine0[label0<=5,]

wine0_test = wine0[label0>5,]

wine1_train = wine1[label1<=5,]

wine1_test = wine1[label1>5,]

wine2_train = wine2[label2<=5,]

wine2_test = wine2[label2>5,]

wine3_train = wine3[label3<=5,]

wine3_test = wine3[label3>5,]

wine4_train = wine4[label4<=5,]

wine4_test = wine4[label4>5,]

wine5_train = wine5[label5<=5,]

wine5_test = wine4[label5>5,]

wine_train = rbind(wine0_train,wine1_train,wine2_train,wine3_train,wine4_train,wine5_train)

wine_test = rbind(wine0_test,wine1_test,wine2_test,wine3_test,wine4_test,wine5_test)

re_log = multinomial(quality~.,data= wine_train) 

wine_train$quality = as.factor(wine_train$quality)

re_rf = randomForest(quality~,data = wine_train,ntree=5)

######################################

library(rpart)

library(rattle)

library(rpart.plot)

library(RColorBrewer)

#########################################

ID3  方法生成树枝

re_id3 <-rpart(quality~.,data=wine_train,method="class", parms=list(split="information"))

fancyRpartPlot(re_id3)

########################################

CART 方法生成树枝

re_CART = rpart(quality~.,data= wine_train,method = "class",parms = list(split="gini"),control=rpart.control(cp=0.000001))

fancyRpartPlot(re_CART,main = "CART")

min = which.min(re_CART$cptable[,4])

re_CART_f = prune(re_CART,cp=re_CART$cptable[min,1])

pred_id3 = predict(re_id3,newdata = wine_test)

pred_CART = predict(re_CART,newdata = wine_test,type="class")

table(wine_test$quality,pred_CART)

wine_train$quality= as.factor(wine_train$quality)

re_rf = randomForest(quality~.,data=wine_train,ntree=50)

pred_rf=predict(re_rf,newdata=wine_test,type="prob")

wine$quality

liear regression

library(ggplot2) # Data visualization

library(readr) # CSV file I/O, e.g. the read_csv function

library(corrgram)

library(lattice) #required for nearest neighbors

library(FNN) # nearest neighbors techniques

library(pROC) # to make ROC curve

install.packages('corrgram')

library(corrgram)

linear_quality = lm(quality ~ fixed acidity+volatile acidity+citric acid+residual sugar+chlorides+free sulfur dioxide+total sulfur dioxide+density, data=wine)

corrgram(wine, lower.panel=panel.shade, upper.panel=panel.ellipse)

wine$poor <- wine$quality <= 4

wine$okay <- wine$quality == 5 | wine$quality == 6

wine$good <- wine$quality >= 7

head(wine)

summary(wine)

#############   KNN

class_knn10 = knn(train=wine[,1:8], test=wine[,1:8], cl=wine$good, k =10)

class_knn20 = knn(train=wine[,1:8],test=wine[,1:8], cl = wine$good, k=20)

table(wine$good,class_knn10)

table(wine$good,class_knn20)

########################################

wine123=winequality_white

wine123$poor <- wine$quality <= 4

wine123$okay <- wine$quality == 5 | wine$quality == 6

wine123$good <- wine$quality >= 7

library(rpart) #for trees

tree1 = rpart(good~  alcohol + sulphates+ pH , data = wine123, method="class")

rpart.plot(tree1)

summary(tree1)

pred1 = predict(tree1,newdata=wine123,type="class")

summary(pred1)

summary(wine123$good)

比较模型的准确度

tree2 = rpart(good~  alcohol + volatile acidity +citric acid+ pH , data = wine123, method="class")

tree2 = rpart(good ~ alcohol + volatile acidity + citric acid + sulphates, data = wine123, method="class")

rpart.plot(tree2)

tree2= rpart(good ~ alcohol + volatile acidity + citric acid + sulphates, data = wine123 ,method='class')

pred2 = predict(tree2,newdata=wine123,type="class")

summary(pred2)

summary(wine123$good)

cor() 函数计算两两变量之间的相关系数的矩阵

#数据中心化:  scale(data,center=T,scale=F)

####数据标准化:  scale(data,center=T,scale=T) 或默认参数scale(data)

无监督学习:仅有x值,

两种主要类型无监督学习:聚类分析,主成分分析

定性的响应变量,定性变量也称为分类变量。

线性回归的因变量(Y)是连续变量,自变量(X)可以是连续变量,也可以是分类变量

logistic 回归与线性回归恰好相反,因变量一定要是分类变量,不可能是连续变量。分类变量既可以是二分类,也可以是多分类,多分类中既可以是有序,也可以是无序。

最小二乘法(https://www.zhihu.com/question/37031188

竖直投影下来 计算(y-ybar)^2最小

决策树(https://blog.csdn.net/u010089444/article/details/53241218)

ID3算法

选择信息增益最大的方向进行分支标准

https://blog.csdn.net/xiaohukun/article/details/78055132

信息增益:  信息熵-条件熵

在决策树算法的学习过程中,信息增益是特征选择的一个重要指标,它定义为一个特征能够为分类系统带来多少信息,带来的信息越多,说明该特征越重要,相应的信息增益也就越大。

https://www.zhihu.com/question/22104055

信息熵越大说明事件的无序程度越高

信息熵越小说明事件的有序程度越高

https://blog.csdn.net/wxn704414736/article/details/80512705

CART

gini越小 越纯

最小的切分点最为最优切分点    使用该切分点将数据切分为两个子集

分类 回归   监督学习

聚类        非监督学习

https://blog.csdn.net/chenKFKevin/article/details/70547549

pca  降维工具

协方差矩阵——PCA实现的关键

https://www.zhihu.com/question/41120789

pinkyjie.com/2011/02/24/covariance-pca/

协方差计算

特征向量

向量的转置

混淆矩阵

https://www.zhihu.com/question/36883196

支持向量机    (文本分类问题)

https://www.zhihu.com/question/21094489

knn 

kmeans                    https://zhuanlan.zhihu.com/p/31580379

import numpy as np

import pandas as pd

import sys

import matplotlib.pyplot as plt

wine=pd.read_excel(r'D:\未名学院\第4节课\作业材料\winequality-white.xlsx')

wine.info()

Xvar=wine[['fixed acidity','volatile acidity','citric acid','residual sugar','chlorides','free sulfur dioxide','total sulfur dioxide','density','pH','sulphates','alcohol']]

Yvar=wine['quality']

t=Xvar.corr()

画图

plt.figure(figsize=(10,8))

sns.heatmap(np.abs(t),annot=True)

https://www.kaggle.com/xvivancos/tutorial-clustering-wines-with-k-means

https://www.kaggle.com/maitree/wine-quality-selection

cov_sdc=cov(wine)

eigen(cov_sdc)

res.pca <- PCA(wine[,-12], graph = TRUE)

eig.val <- get_eigenvalue(res.pca)

eig.val

#数据导入

wine=read.csv()

wine= read.csv('winequality-white.csv',header=TRUE)

wine=winequality_white

#data cleaning

wine = wine[complete.cases(wine),]

#PCA

library(stringr)

library(FactoMineR)

#绘图

res.pca <- PCA(wine[,-12], graph = TRUE)#delete Y=quality, plot the PCA graph

sdc=scale(wine)

pca.d=prcomp(sdc)

summary(pca.d)

#PCA降维

wine=wine[,-9:-11]

#查看定性变量分布,确定定性变量

hist(wine$quality)

#分类

wine0 = wine[wine$quality==3,]

wine1 = wine[wine$quality==4,]

wine2 = wine[wine$quality==5,]

wine3 = wine[wine$quality==6,]

wine4 = wine[wine$quality==7,]

wine5 = wine[wine$quality==8,]

#抽样

label0= sample(c(1:10),dim(wine0[1]),replace= TRUE)

label1= sample(c(1:10),dim(wine1[1]),replace= TRUE)

label2= sample(c(1:10),dim(wine2[1]),replace= TRUE)

label3= sample(c(1:10),dim(wine3[1]),replace= TRUE)

label4= sample(c(1:10),dim(wine4[1]),replace= TRUE)

label5= sample(c(1:10),dim(wine5[1]),replace= TRUE)

wine0_train = wine0[label0<=5,]

wine0_test = wine0[label0>5,]

wine1_train = wine1[label1<=5,]

wine1_test = wine1[label1>5,]

wine2_train = wine2[label2<=5,]

wine2_test = wine2[label2>5,]

wine3_train = wine3[label3<=5,]

wine3_test = wine3[label3>5,]

wine4_train = wine4[label4<=5,]

wine4_test = wine4[label4>5,]

wine5_train = wine5[label5<=5,]

wine5_test = wine4[label5>5,]

wine_train = rbind(wine0_train,wine1_train,wine2_train,wine3_train,wine4_train,wine5_train)

wine_test = rbind(wine0_test,wine1_test,wine2_test,wine3_test,wine4_test,wine5_test)

re_log = multinomial(quality~.,data= wine_train) 

wine_train$quality = as.factor(wine_train$quality)

re_rf = randomForest(quality~,data = wine_train,ntree=5)

######################################

library(rpart)

library(rattle)

library(rpart.plot)

library(RColorBrewer)

#########################################

ID3  方法生成树枝

re_id3 <-rpart(quality~.,data=wine_train,method="class", parms=list(split="information"))

fancyRpartPlot(re_id3)

########################################

CART 方法生成树枝

re_CART = rpart(quality~.,data= wine_train,method = "class",parms = list(split="gini"),control=rpart.control(cp=0.000001))

fancyRpartPlot(re_CART,main = "CART")

min = which.min(re_CART$cptable[,4])

re_CART_f = prune(re_CART,cp=re_CART$cptable[min,1])

pred_id3 = predict(re_id3,newdata = wine_test)

pred_CART = predict(re_CART,newdata = wine_test,type="class")

table(wine_test$quality,pred_CART)

wine_train$quality= as.factor(wine_train$quality)

re_rf = randomForest(quality~.,data=wine_train,ntree=50)

pred_rf=predict(re_rf,newdata=wine_test,type="prob")

wine$quality

liear regression

library(ggplot2) # Data visualization

library(readr) # CSV file I/O, e.g. the read_csv function

library(corrgram)

library(lattice) #required for nearest neighbors

library(FNN) # nearest neighbors techniques

library(pROC) # to make ROC curve

install.packages('corrgram')

library(corrgram)

linear_quality = lm(quality ~ fixed acidity+volatile acidity+citric acid+residual sugar+chlorides+free sulfur dioxide+total sulfur dioxide+density, data=wine)

corrgram(wine, lower.panel=panel.shade, upper.panel=panel.ellipse)

wine$poor <- wine$quality <= 4

wine$okay <- wine$quality == 5 | wine$quality == 6

wine$good <- wine$quality >= 7

head(wine)

summary(wine)

#############   KNN

class_knn10 = knn(train=wine[,1:8], test=wine[,1:8], cl=wine$good, k =10)

class_knn20 = knn(train=wine[,1:8],test=wine[,1:8], cl = wine$good, k=20)

table(wine$good,class_knn10)

table(wine$good,class_knn20)

########################################

wine123=winequality_white

wine123$poor <- wine$quality <= 4

wine123$okay <- wine$quality == 5 | wine$quality == 6

wine123$good <- wine$quality >= 7

library(rpart) #for trees

tree1 = rpart(good~  alcohol + sulphates+ pH , data = wine123, method="class")

rpart.plot(tree1)

summary(tree1)

pred1 = predict(tree1,newdata=wine123,type="class")

summary(pred1)

summary(wine123$good)

比较模型的准确度

tree2 = rpart(good~  alcohol + volatile acidity +citric acid+ pH , data = wine123, method="class")

tree2 = rpart(good ~ alcohol + volatile acidity + citric acid + sulphates, data = wine123, method="class")

rpart.plot(tree2)

tree2= rpart(good ~ alcohol + volatile acidity + citric acid + sulphates, data = wine123 ,method='class')

pred2 = predict(tree2,newdata=wine123,type="class")

summary(pred2)

summary(wine123$good)

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 219,427评论 6 508
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 93,551评论 3 395
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 165,747评论 0 356
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,939评论 1 295
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,955评论 6 392
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,737评论 1 305
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,448评论 3 420
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,352评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,834评论 1 317
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,992评论 3 338
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 40,133评论 1 351
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,815评论 5 346
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,477评论 3 331
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 32,022评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,147评论 1 272
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,398评论 3 373
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 45,077评论 2 355

推荐阅读更多精彩内容