前言
此程序基于手写体数码图像识别实验
支持向量机(svm)模型实现分类任务。
本程序可以流畅运行于Python3.6环境,但是Python2.x版本需要修正的地方也已经在注释中说明。
requirements:pandas,numpy,scikit-learn
想查看其他经典算法实现可以关注查看本人其他文集。
实验结果分析
由于精妙的模型假设,使得我们可以在海量甚至高维度的数据中,筛选对预测任务最为有效的少量训练样本,这样做不仅节省了模型选择学习所需要的数据内存,同事也提高了模型的预测性能。然而,要获得如此的有事就必然要付出更多的计算代价(CPU资源和计算时间)
程序源码
#import handwirtten digits loader from sklearn.datasets
from sklearn.datasets import load_digits
#load digits data
digits=load_digits()
#check the scale and features dimensions of data
#print(digits.data.shape)
#data preprocessing
#notes:you should use cross_valiation instead of model_valiation in python 2.7
#from sklearn.cross_validation import train_test_split #DeprecationWarning
from sklearn.model_selection import train_test_split #use train_test_split module of sklearn.model_valiation to split data
#take 25 percent of data randomly for testing,and others for training
X_train,X_test,y_train,y_test = train_test_split(digits.data,digits.target,test_size=0.25,random_state=33)
#check the scale of training set and test set respectively
#print(y_train.shape)
#print(y_test.shape)
#import data standardizition module from sklearn.preprocession
from sklearn.preprocessing import StandardScaler
#import svm classifier LinearSVC which based on linear hypothesis
from sklearn.svm import LinearSVC
#standardizing data
ss = StandardScaler()
X_train = ss.fit_transform(X_train)
X_test = ss.transform(X_test)
#initializing LinearSVC
lsvc = LinearSVC()
#traing svm classifier
lsvc.fit(X_train,y_train)
#predicting digits and saving results in variable y_predict
y_predict=lsvc.predict(X_test)
#get accuracy by the score function in lsvc model
print('The accuracy of Linear SVC is',lsvc.score(X_test,y_test))
from sklearn.metrics import classification_report
#get precision ,recall and f1-score from classification_report module
print(classification_report(y_test,y_predict,target_names=digits.target_names.astype(str)))
Ubuntu16.04 Python3.6 程序输出结果:
The accuracy of Linear SVC is 0.9533333333333334
precision recall f1-score support
0 0.92 1.00 0.96 35
1 0.96 0.98 0.97 54
2 0.98 1.00 0.99 44
3 0.93 0.93 0.93 46
4 0.97 1.00 0.99 35
5 0.94 0.94 0.94 48
6 0.96 0.98 0.97 51
7 0.92 1.00 0.96 35
8 0.98 0.84 0.91 58
9 0.95 0.91 0.93 44
avg / total 0.95 0.95 0.95 450
[Finished in 0.6s]
欢迎指正错误,包括英语和程序错误。有问题也欢迎提问,一起加油一起进步。
本程序完全是本人逐字符输入的劳动结果,转载请注明出处。