tensorflow集成tensorRT及原理分析

版本号：cuda-10.0 && cudnn7.3 && tensorflow 1.13 && tensorRT5.0.2.6 && T4 GPU

使用：在tensorflow里graph换成tensorRT返回的graph。

tf.import_graph_def(self.convFP16Graph(output_graph_def), name="")

def convFP16Graph(self, inGraph):

return self.convRTGraph("FP16", inGraph)

Create_inference_graph 函数将冻结住的 TensorFlow 图作为输入，返回一个经过 TensorRT 节点优化过的图。我们看看这个函数的参数：

Input_graph_def:冻结住的 TensorFlow 图

Outputs:输出节点名字的字符串列表，比如：[“resnet_v1_50/predictions/Resape_1”]

Max_batch_size:整数，输入的 batch size，比如，16

Max_workspace_size_bytes:整数，能分配给 TensorRT 的最大 GPU 显存大小

Precision_mode:字符串，可选的值为「FP32」, 「FP16」, 「INT8」

报错W tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3710] Validation failed for TensorRTInputPH_0 and input slot 0: Input tensor with shape [?,?,?,2048] has an unknown non-batch dimension at dim 1

在Create_inference_graph里面加上参数，is_dynamic_op=True 。ok

原理分析：

在使用的过程中，log里能明显看出将nodes和edgs变少了很多。附上链接：https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html

构建阶段在图层图上执行以下优化：

1.Elimination of layers whose outputs are not used：消除未使用输出的层

2.Fusion of convolution, bias and ReLU operations：融合conv || bias || Relu 操作

3.Aggregation of operations with sufficiently similar parameters and the same source tensor：

聚合相似或相同参数的向量。(for example, the 1x1 convolutions in GoogleNet v5’s inception module)、

4.Merging of concatenation layers by directing layer outputs to the correct eventual destination.

通过输出合并链接层

tensorflow集成tensorRT及原理分析

推荐阅读更多精彩内容