Machine learning by Iris2019-12-21

1. Introduction

I will go on learning to analyze Iris dataset, by KNeighborsClassifier, and to predict the data (machine learning).

2. KNeighborsClassifier

2.1 train and test

train

#KNeighborsClassifier
#import iris datasets first
#then creat random sequence, by function of random.permutation() in package of numpy
#datasets was divided into two parts: train datasets and test datasets
#140 train datasets and 10 test datasets

import numpy as np
from sklearn import datasets

np.random.seed(0)
iris = datasets.load_iris()
x = iris.data
y = iris.target
i = np.random.permutation(len(iris.data))

x_train = x[i[:-10]]
y_train = y[i[:-10]]
x_test = x[i[-10:]]
y_test = y[i[-10:]]

#import KNeighborsClassifier
#train by fit() function

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier()
knn.fit(x_train, y_train)
Figure 1.Training by KNeighborsClassifier.png

test(predict)

knn.predict(x_test)
Figure 2. Predict data after training.png

testdata

y_test
Figure 3. Test data of iris datasets.png

As we can see, 90% of the 10 data were correct predicted.

2.2 Show the data in 2D scatter, plot decision boundary

plot the decision boundary by KNeighborsClassifier

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier

iris = datasets.load_iris()
x = iris.data[:,:2]
y = iris.target

x_min,x_max = x[:,0].min()-0.5, x[:,0].max()+0.5
y_min,y_max = x[:,1].min()-0.5, x[:,1].max()+0.5

cmap_light = ListedColormap(['#AAAAFF','#AAFFAA','#FFAAAA'])
h = 0.2

xx,yy = np.meshgrid(np.arange(x_min,x_max,h), np.arange(y_min,y_max,h))
knn = KNeighborsClassifier()
knn.fit(x,y)
Z = knn.predict(np.c_[xx.ravel(),yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx,yy,Z,cmap = cmap_light)
Figure 4. Decision boundary.png

Plot the training points

#plot the training points
plt.scatter(x[:,0],x[:,1],c = y)
plt.xlim(xx.min(),xx.max())
plt.ylim(yy.min(),yy.max())
plt.show()
Figure 5. Plot the training points.png

Plot the decision boundary and training points

#show the data in 2D scatter, plot the decision boundary
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier

iris = datasets.load_iris()
x = iris.data[:,:2]
y = iris.target

x_min,x_max = x[:,0].min()-0.5, x[:,0].max()+0.5
y_min,y_max = x[:,1].min()-0.5, x[:,1].max()+0.5

cmap_light = ListedColormap(['#AAAAFF','#AAFFAA','#FFAAAA'])
h = 0.2

xx,yy = np.meshgrid(np.arange(x_min,x_max,h), np.arange(y_min,y_max,h))
knn = KNeighborsClassifier()
knn.fit(x,y)
Z = knn.predict(np.c_[xx.ravel(),yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx,yy,Z,cmap = cmap_light)

#plt the training points
plt.scatter(x[:,0],x[:,1],c = y)
plt.xlim(xx.min(),xx.max())
plt.ylim(yy.min(),yy.max())
plt.show()
Figure 6. Plot the decision boundary and training points.png

3. Conclusion

As we can see in Figure 6, most of the training points was located in the right boundary. The function of KNeighborsClassifier was very strong. Great!

(本文学习自http://zhuanlan.zhihu.com/p/31785188,感谢网友‘数据之禅’的内容和分享,非常有用的教程,再次感谢)

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容