Machine learning by Iris2019-12-21

1. Introduction

I will go on learning to analyze Iris dataset, by KNeighborsClassifier, and to predict the data (machine learning).

2. KNeighborsClassifier

2.1 train and test

train

#KNeighborsClassifier
#import iris datasets first
#then creat random sequence, by function of random.permutation() in package of numpy
#datasets was divided into two parts: train datasets and test datasets
#140 train datasets and 10 test datasets

import numpy as np
from sklearn import datasets

np.random.seed(0)
iris = datasets.load_iris()
x = iris.data
y = iris.target
i = np.random.permutation(len(iris.data))

x_train = x[i[:-10]]
y_train = y[i[:-10]]
x_test = x[i[-10:]]
y_test = y[i[-10:]]

#import KNeighborsClassifier
#train by fit() function

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier()
knn.fit(x_train, y_train)

Figure 1.Training by KNeighborsClassifier.png

test(predict)

knn.predict(x_test)

Figure 2. Predict data after training.png

testdata

y_test

Figure 3. Test data of iris datasets.png

As we can see, 90% of the 10 data were correct predicted.

2.2 Show the data in 2D scatter, plot decision boundary

plot the decision boundary by KNeighborsClassifier

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier

iris = datasets.load_iris()
x = iris.data[:,:2]
y = iris.target

x_min,x_max = x[:,0].min()-0.5, x[:,0].max()+0.5
y_min,y_max = x[:,1].min()-0.5, x[:,1].max()+0.5

cmap_light = ListedColormap(['#AAAAFF','#AAFFAA','#FFAAAA'])
h = 0.2

xx,yy = np.meshgrid(np.arange(x_min,x_max,h), np.arange(y_min,y_max,h))
knn = KNeighborsClassifier()
knn.fit(x,y)
Z = knn.predict(np.c_[xx.ravel(),yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx,yy,Z,cmap = cmap_light)

Figure 4. Decision boundary.png

Plot the training points

#plot the training points
plt.scatter(x[:,0],x[:,1],c = y)
plt.xlim(xx.min(),xx.max())
plt.ylim(yy.min(),yy.max())
plt.show()

Figure 5. Plot the training points.png

Plot the decision boundary and training points

#show the data in 2D scatter, plot the decision boundary
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier

iris = datasets.load_iris()
x = iris.data[:,:2]
y = iris.target

x_min,x_max = x[:,0].min()-0.5, x[:,0].max()+0.5
y_min,y_max = x[:,1].min()-0.5, x[:,1].max()+0.5

cmap_light = ListedColormap(['#AAAAFF','#AAFFAA','#FFAAAA'])
h = 0.2

xx,yy = np.meshgrid(np.arange(x_min,x_max,h), np.arange(y_min,y_max,h))
knn = KNeighborsClassifier()
knn.fit(x,y)
Z = knn.predict(np.c_[xx.ravel(),yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx,yy,Z,cmap = cmap_light)

#plt the training points
plt.scatter(x[:,0],x[:,1],c = y)
plt.xlim(xx.min(),xx.max())
plt.ylim(yy.min(),yy.max())
plt.show()

Figure 6. Plot the decision boundary and training points.png

3. Conclusion

As we can see in Figure 6, most of the training points was located in the right boundary. The function of KNeighborsClassifier was very strong. Great!

(本文学习自http://zhuanlan.zhihu.com/p/31785188,感谢网友‘数据之禅’的内容和分享，非常有用的教程，再次感谢)

最后编辑于：2019.12.21 22:53:09