Tensorflow 2 和 Pytorch 基础操作对比

定义tensor
tensorflow: tf.constant()和tf.convert_to_tensor(), 参数可以是数值、list或者numpy。需要注意的是虽然可以通过dtype参数指定类型，但是无法进行类型的转换，比如x=tf.constant([1, 2.2], dtype=tf.int16)就会出错，为了安全起见，类型转换建议使用tf.cast()操作。
pytorch: 'th.tensor()'也支持数值、list和numpy，支持dtype指定类型。
string类型
tensorflow 提供了内建的string类型而pytorch没有
bool 类型
tensorflow 只能使用True, Flase来构建布尔变量，而pytorch可以使用数值然后通过dtype类型转换成布尔变量。 tensorflow和pytorch的布尔类型都可以直接放到条件语句中使用。

一些常用函数的区别

tensorflow 中zeros需要以tuple指定shape，否则第二个参数表示的是类型， pytorch的zeros中tuple使用与否结果一样
ones 方法和 zeros 方法使用情况相同
ones_like 和 zeros_like的用法相同
对于填充函数tensorflow使用fill方法，而pytorch使用full方法，另外pytorch还可以使用tensor的内置函数fill_()进行数值填充
产生随机数时，tensorflow调用自身的random包产生正态和均匀分布，且可以指定相关参数，而pytorch直接从th中调用对应函数，且仅产生标准分布，比如表征正态分布和[0,1)均匀分布。
产生序列是tensorflow和numpy一样使用range方法， pytorch使用arange方法，都是左闭右开区间，(左侧值，右侧值，步长)
索引与切片方法相同, tensorflow在给切片赋值的时候很不方便，参考方法scatter_nd_update和py_function

维度变换

增加维度, tensorflow采用expand_dims方法，指定维度的关键字是axis和numpy相同，而pytorch采用unsqueeze方法和关键字dim
删除维度，只能删除长度为1的维度，tensorflow和pytorch的方法名都为 squeeze
改变视图，都采用reshape方法, 对于tensorflow是共享存储的；对于pytorch而言，view和reshape都可以提供新的视图，但是view共享内容且需要输入连续，而reshape是否copy数据无法提前知道，所以尽可能地采用view
交换维度，tensorflow采用transpose方法, 输入参数是排序后的index，其功能与pytorch的permute相同，而pytorch中的transpose只提供两个维度交换
复制数据，tensorflow采用的和numpy相同的函数签名tile, 而pytorch采用expand, expand_as和repeat实现类似功能
广播运算， tensorflow和pytorch的机制是相同的
基础运算，基于元素的操作，包括加减乘除，次幂、指数和对数tensorflow和pytorch用法都是相同的

下面给出两个分别使用tensorflow和pytorch基础操作的toy example：使用3层的MLP实现四分类问题的拟合

# --*-- coding:utf-8 --*--
import tensorflow as tf    # tensorflow 
import torch as th           # pytorch
import numpy as np       # numpy
# 创建数据集
x_list = np.random.randn(30, 784)
w1_gt = np.random.randn(784, 256)
w2_gt = np.random.randn(256, 128)
w3_gt = np.random.randn(128, 4)
y_output = np.dot(np.abs(np.dot(np.abs(np.dot(x_list, w1_gt)), w2_gt)), w3_gt)
y_list = np.argmax(y_output, axis=1)      #分成4类

lr = 1e-4     # 学习率

# 定义tensorflow的学习参数
w1 = tf.Variable(tf.random.truncated_normal([784, 256], stddev=0.1))
b1 = tf.Variable(tf.zeros([256], dtype=tf.float32))
w2 = tf.Variable(tf.random.truncated_normal([256, 128], stddev=0.1))
b2 = tf.Variable(tf.zeros([128]))
w3 = tf.Variable(tf.random.truncated_normal([128, 4], stddev=0.1))
b3 = tf.Variable(tf.zeros([4]))

# 定义pytorch的可学习参数， 为了和tensorflow一致，我们采用相同的初始化值
w1_th = th.from_numpy(w1.numpy())
w1_th.requires_grad = True
w2_th = th.from_numpy(w2.numpy())
w2_th.requires_grad=True
w3_th = th.from_numpy(w3.numpy())
w3_th.requires_grad=True
b1_th = th.zeros(256, requires_grad=True)
b2_th = th.zeros(128, requires_grad=True)
b3_th = th.zeros(4, requires_grad=True)

# tensorflow 的执行和训练过程
def forward(x, y):
    h1 = tf.nn.relu(tf.matmul(x, w1) + b1)
    h2 = tf.nn.relu(tf.matmul(h1, w2) + b2)
    h3 = tf.nn.softmax(tf.matmul(h2, w3) + b3)
    loss = tf.pow(h3 - y, 2)
    return tf.reduce_mean(loss)

x_tf_list = tf.constant(x_list, dtype=tf.float32)
y_tf_list = tf.constant(y_list)

for epoch in range(30):
    for i, (x, y) in enumerate(zip(x_tf_list, y_tf_list)):
        # 使用梯度
        with tf.GradientTape() as tape:
            loss = forward(tf.expand_dims(x, axis=0), tf.one_hot(y, depth=4))
            grad = tape.gradient(loss, [w1, b1, w2, b2, w3, b3])
            # 更新权重
            r=1
            w1.assign_sub(r*lr*grad[0])
            b1.assign_sub(r*lr*grad[1])
            w2.assign_sub(r*lr*grad[2])
            b2.assign_sub(r*lr*grad[3])
            w3.assign_sub(r*lr*grad[4])
            b3.assign_sub(r*lr*grad[5])
    print('loss in {}-th batch: {}'.format(epoch, loss.numpy()))

# pytorch的执行和训练过程
def forward_th(x, y):
    h1 = th.clamp(th.matmul(x, w1_th)+b1_th, 0)
    h2 = th.clamp(th.matmul(h1, w2_th)+b2_th, 0)
    h3 = th.nn.functional.softmax(th.matmul(h2, w3_th) + b3_th)
    return th.pow(h3-y,2).mean()

x_th_list = th.from_numpy(x_list).float()
y_th_list = th.nn.functional.one_hot(th.from_numpy(y_list).long(), 4)
for epoch in range(30):
    for i, (x, y) in enumerate(zip(x_th_list, y_th_list)):
        loss = forward_th(x.unsqueeze(0), y)
        loss.backward()
        w1_th.data = w1_th - lr*w1_th.grad
        w2_th.data = w2_th - lr*w2_th.grad
        w3_th.data = w3_th - lr*w3_th.grad
        b1_th.data = b1_th - lr*b1_th.grad
        b2_th.data = b2_th - lr*b2_th.grad
        b3_th.data = b3_th - lr*b3_th.grad
        w1_th.grad.data.zero_()
        w2_th.grad.data.zero_()
        w3_th.grad.data.zero_()
        b1_th.grad.data.zero_()
        b2_th.grad.data.zero_()
        b3_th.grad.data.zero_()
        
    print('loss in {}-th batch: {}'.format(epoch, loss.item()))

输出的结果几乎是一致的，只有1e-7量级的误差。

Tensorflow 2 和 Pytorch 基础操作对比

推荐阅读更多精彩内容