准备数据

图片.png
引入包
import pandas as pd
import numpy as np
加载数据
iris_data = pd.read_csv('iris.csv')
获取特征
X = iris_data[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']].values
print(X.shape)
(150, 4)
获取标签
y = iris_data['label'].values
print(y)
输出
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3]
划分数据集
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/4, random_state=10)
print('原数据集的样本个数:', X.shape[0])
print('训练集的样本个数:', X_train.shape[0])
print('测试集的样本个数:', X_test.shape[0])
原数据集的样本个数: 150
训练集的样本个数: 112
测试集的样本个数: 38
选择模型,选择K近邻距离算法
from sklearn.neighbors import KNeighborsClassifier
knn_model = KNeighborsClassifier()
训练模型
knn_model.fit(X_train, y_train)
测试模型
在测试集上测试模型
y_pred = knn_model.predict(X_test)
print('真实值:', y_test)
print('预测值:', y_pred)
输出结果
真实值: [2 3 1 2 1 2 2 2 1 2 2 3 2 1 1 3 2 1 1 1 3 3 3 1 2 1 2 2 2 3 2 2 3 3 3 1 3
3]
预测值: [2 3 1 2 1 2 3 2 1 2 2 3 2 1 1 3 2 1 1 1 3 3 3 1 2 1 2 2 2 3 2 2 3 3 3 1 3
3]
模型准确率
from sklearn.metrics import accuracy_score
acc = accuracy_score(y_test, y_pred)
print('准确率:', acc)
输出结果
准确率: 0.9736842105263158