OverView:
今天突然想起以前写过一个用BP算法的iris分类器, 加上最近面试把线性规划的思想和实现又看了一遍.
(1) 数据集介绍
(2) tensorflow 实现分类器
(3) tensorflow实现模型评估
(1) 数据集介绍
Iris数据集在模式识别研究领域比较知名的数据集。这个数据集里一共包括150个样本,其中前四列为花萼长度,花萼宽度,花瓣长度,花瓣宽度等4个用于识别鸢尾花的属性,第5列为鸢尾花的类别(包括Setosa,Versicolour,Virginica三类)。也即通过判定花萼长度,花萼宽度,花瓣长度,花瓣宽度的尺寸大小来识别鸢尾花的类别。
数据集可以从UCI数据集上直接下载,默认格式为逗号分隔的文本文件。也可以直接从sklearn包里datasets里导入,语法为:from sklearn.datasets import load_iris。为了方便起见,使用sklearning导入.
(2) tensorflow 实现分类器
step1 导入所需要的工具库
#-*-coding:UTF-8-*-
# 导入所需要的工具库
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
import tensorflow as tf
from tensorflow.python.framework import ops
ops.reset_default_graph()
step2 加载数据
# Load the iris data
# iris.target = {0, 1, 2}, where '0' is setosa
# iris.data ~ [sepal.width, sepal.length, pedal.width, pedal.length]
iris = datasets.load_iris()
binary_target = np.array([1. if x==0 else 0. for x in iris.target])
iris_2d = np.array([[x[2], x[3]] for x in iris.data])
step3 预处理
# Declare batch size
batch_size = 20
# Create graph
sess = tf.Session()
# Declare placeholders
x1_data = tf.placeholder(shape=[None, 1], dtype=tf.float32)
x2_data = tf.placeholder(shape=[None, 1], dtype=tf.float32)
y_target = tf.placeholder(shape=[None, 1], dtype=tf.float32)
step4创建线性模型
# Create variables A and b (0 = x1 - A*x2 + b)
A = tf.Variable(tf.random_normal(shape=[1, 1]))
b = tf.Variable(tf.random_normal(shape=[1, 1]))
# Add model to graph:
# x1 - A*x2 + b 线性回归
my_mult = tf.matmul(x2_data, A)
my_add = tf.add(my_mult, b)
my_output = tf.subtract(x1_data, my_add)
# sigmoid_cross_entropy_with_logits
xentropy = tf.nn.sigmoid_cross_entropy_with_logits(logits=my_output, labels=y_target)
# Create Optimizer
my_opt = tf.train.GradientDescentOptimizer(0.05)
train_step = my_opt.minimize(xentropy)
step5 训练数据
# Run Loop
for i in range(1000):
rand_index = np.random.choice(len(iris_2d), size=batch_size)
#rand_x = np.transpose([iris_2d[rand_index]])
rand_x = iris_2d[rand_index]
rand_x1 = np.array([[x[0]] for x in rand_x])
rand_x2 = np.array([[x[1]] for x in rand_x])
#rand_y = np.transpose([binary_target[rand_index]])
rand_y = np.array([[y] for y in binary_target[rand_index]])
sess.run(train_step, feed_dict={x1_data: rand_x1, x2_data: rand_x2, y_target: rand_y})
if (i+1)%200==0:
print('Step #' + str(i+1) + ' A = ' + str(sess.run(A)) + ', b = ' + str(sess.run(b)))