1. Introduction
I will go on learning to analyze Iris dataset, by KNeighborsClassifier, and to predict the data (machine learning).
2. KNeighborsClassifier
2.1 train and test
train
#KNeighborsClassifier
#import iris datasets first
#then creat random sequence, by function of random.permutation() in package of numpy
#datasets was divided into two parts: train datasets and test datasets
#140 train datasets and 10 test datasets
import numpy as np
from sklearn import datasets
np.random.seed(0)
iris = datasets.load_iris()
x = iris.data
y = iris.target
i = np.random.permutation(len(iris.data))
x_train = x[i[:-10]]
y_train = y[i[:-10]]
x_test = x[i[-10:]]
y_test = y[i[-10:]]
#import KNeighborsClassifier
#train by fit() function
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier()
knn.fit(x_train, y_train)
Figure 1.Training by KNeighborsClassifier.png
test(predict)
knn.predict(x_test)
Figure 2. Predict data after training.png
testdata
y_test
Figure 3. Test data of iris datasets.png
As we can see, 90% of the 10 data were correct predicted.
2.2 Show the data in 2D scatter, plot decision boundary
plot the decision boundary by KNeighborsClassifier
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier
iris = datasets.load_iris()
x = iris.data[:,:2]
y = iris.target
x_min,x_max = x[:,0].min()-0.5, x[:,0].max()+0.5
y_min,y_max = x[:,1].min()-0.5, x[:,1].max()+0.5
cmap_light = ListedColormap(['#AAAAFF','#AAFFAA','#FFAAAA'])
h = 0.2
xx,yy = np.meshgrid(np.arange(x_min,x_max,h), np.arange(y_min,y_max,h))
knn = KNeighborsClassifier()
knn.fit(x,y)
Z = knn.predict(np.c_[xx.ravel(),yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx,yy,Z,cmap = cmap_light)
Figure 4. Decision boundary.png
Plot the training points
#plot the training points
plt.scatter(x[:,0],x[:,1],c = y)
plt.xlim(xx.min(),xx.max())
plt.ylim(yy.min(),yy.max())
plt.show()
Figure 5. Plot the training points.png
Plot the decision boundary and training points
#show the data in 2D scatter, plot the decision boundary
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier
iris = datasets.load_iris()
x = iris.data[:,:2]
y = iris.target
x_min,x_max = x[:,0].min()-0.5, x[:,0].max()+0.5
y_min,y_max = x[:,1].min()-0.5, x[:,1].max()+0.5
cmap_light = ListedColormap(['#AAAAFF','#AAFFAA','#FFAAAA'])
h = 0.2
xx,yy = np.meshgrid(np.arange(x_min,x_max,h), np.arange(y_min,y_max,h))
knn = KNeighborsClassifier()
knn.fit(x,y)
Z = knn.predict(np.c_[xx.ravel(),yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx,yy,Z,cmap = cmap_light)
#plt the training points
plt.scatter(x[:,0],x[:,1],c = y)
plt.xlim(xx.min(),xx.max())
plt.ylim(yy.min(),yy.max())
plt.show()
Figure 6. Plot the decision boundary and training points.png
3. Conclusion
As we can see in Figure 6, most of the training points was located in the right boundary. The function of KNeighborsClassifier was very strong. Great!
(本文学习自http://zhuanlan.zhihu.com/p/31785188,感谢网友‘数据之禅’的内容和分享,非常有用的教程,再次感谢)