TensorFlow编程框架

TensorFlow深度学习框架大致分为4层，结构如下

The TensorFlow Programming Environment

在使用TensorFlow做训练模型的时候，官方推荐使用API

Estimators：训练模型API，它提供了训练模型，评估模型和使用模型进行预测的方法

Datasets：数据集API，它提供了获取数据以及对训练模型进行数据输入的方法，它与Estimators能够很好的协调工作

鸢尾花分类：概述

如下图有三种鸢尾花，分别是清风藤、杂色鸢尾和维尔吉妮卡（这里翻译不一定准，但不影响理解），通过萼片和花瓣的长度和宽度我们可以分辨出它们属于哪个品种

三种不同的鸢尾花

数据集

鸢尾花数据集包含4个特征集和1个标签集，如下：

特征集，与鸢尾花的生物特征相关

⊙ sepal length 萼片长度

⊙ sepal width 萼片宽度

⊙ petal length 花瓣长度

⊙ petal width 花瓣宽度

标签集，是鸢尾花的分类标识

⊙ Iris setosa (0) 清风藤

⊙ Iris versicolor (1) 杂色鸢尾

⊙ Iris virginica (2) 维尔吉妮卡

鸢尾花数据集

算法

深层神经网络分类模型的算法图如下：

⊙ 2层隐层

⊙ 每层有10个节点

鸢尾花分类的深层神经网络拓扑

推断

通过训练好的鸢尾花分类模型，我们输入一个未经过人工分类的鸢尾花特征数据，能得出类似如下的推断结果：

⊙ 0.03 for Iris Setosa

⊙ 0.95 for Iris Versicolor

⊙ 0.02 for Iris Virginica

这就是通过模型识别为某种鸢尾花的概率，它们的和为1

Estimators的使用

Estimator是TensorFlow的高层训练模型API，它屏蔽了数据初始化、日志、模型保存和恢复等细节，令你可以专心训练你的模型，使用默认的Estimator步骤如下：

⊙ 创建一个或多个输入函数（input function）

⊙ 定义模型特征集

⊙ 实例化Estimator，并传入特征集和超参数

⊙ 使用特定的输入函数作为参数调用Estimator的方法

创建输入函数

输入函数为模型训练、模型评估和数据预测等操作提供数据输入，它的返回值一般是个二元组：

features元组：一个map

⊙ key是特征的名字

⊙ values是包含所有特征值的数组

labels元组：一个包含所有标签值的数组

一个简单的输入函数实现如下：

def input_evaluation_set():

features = {'SepalLength': np.array([6.4, 5.0]),

'SepalWidth': np.array([2.8, 2.3]),

'PetalLength': np.array([5.6, 3.3]),

'PetalWidth': np.array([2.2, 1.0])}

labels = np.array([2, 1]) return features, labels

TensorFlow建议使用Dataset API，它对解析一些数据输入源非常有帮助，其API层次如下：

Dataset API层次结构

Dataset：数据集API的基类，包含创建和传输数据集的接口

TextLineDataset：从文本文件读取数据集

TFRecordDataset：从TFRecord文件读取数据集

FixedLengthRecordDataset：从二进制文件读取数据集

Iterator：数据集迭代器，通过它可以遍历整个数据集

一个使用Dataset API的输入函数例子：

def train_input_fn(features, labels, batch_size):

    """An input function for training"""

    # Convert the inputs to a Dataset.

    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

    # Shuffle, repeat, and batch the examples.

    dataset = dataset.shuffle(1000).repeat().batch(batch_size)

    # Build the Iterator, and return the read end of the pipeline.

    return dataset.make_one_shot_iterator().get_next()

定义特征列

特征列告诉Estimator都输入哪些特征。

例如鸢尾花的分类，有4种特征，生成特征列代码如下：

my_feature_columns = []

for key in train_x.keys():

my_feature_columns.append(tf.feature_column.numeric_column(key=key))

实例化Estimator

鸢尾花分类是个典型的分类问题，TensorFlow内置了几种分类器Estimator模型：

⊙ tf.estimator.DNNClassifier：面向多类分类的深度学习模型

⊙ tf.estimator.DNNLinearCombinedClassifier：面向wide-n-deep模型

⊙ tf.estimator.LinearClassifier：面向线性分类模型

就鸢尾花分类问题，最合适的是tf.estimator.DNNClassifier

# Build 2 hidden layer DNN with 10, 10 units respectively.

classifier = tf.estimator.DNNClassifier(

    feature_columns = my_feature_columns,

    # Two hidden layers of 10 nodes each.

    hidden_units = [10, 10],

    # The model must choose between 3 classes.

    n_classes = 3)

训练、评估和预测

我们已经拥有了一个Estimator的实例，于是我们执行以下步骤：

⊙ 训练该模型

⊙ 评估训练好的模型

⊙ 用训练好的模型做预测

训练模型

调用Estimator的train方法开始训练模型：

# Train the Model.

classifier.train(

input_fn=lambda:iris_data.train_input_fn(train_x, train_y, args.batch_size),

steps=args.train_steps)

这里我们用lambda表达式对我们的输入函数做一个封装，以捕获输入函数的参数，steps参数告诉该方法训练多少步后停止。

评估训练好的模型

模型训练好后，我们需要评估它的准确性，以下是评估训练模型的代码片段：

# Evaluate the model.

eval_result = classifier.evaluate(

input_fn=lambda:iris_data.eval_input_fn(test_x, test_y, args.batch_size))

print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))

运行代码输出如下：

Test set accuracy: 0.967

使用训练好的模型做预测

模型训练好后，我们就可以用它来做预测了，我们输入未经过分类的鸢尾花特征数据，然后调用predict做预测：

# Generate predictions from the model

expected = ['Setosa', 'Versicolor', 'Virginica']

predict_x = {

    'SepalLength': [5.1, 5.9, 6.9],

    'SepalWidth': [3.3, 3.0, 3.1],

    'PetalLength': [1.7, 4.2, 5.4],

    'PetalWidth': [0.5, 1.5, 2.1],

}

predictions = classifier.predict(input_fn = lambda:iris_data.eval_input_fn(predict_x, batch_size = args.batch_size))

使用迭代器获取predictions集合的数据：

for pred_dict, expec in zip(predictions, expected):

    template = ('\nPrediction is "{}" ({:.1f}%), expected "{}"')

    class_id = pred_dict['class_ids'][0]

    probability = pred_dict['probabilities'][class_id]

    print(template.format(iris_data.SPECIES[class_id], 100 * probability, expec))