Overview
从label propagation得到启发,作者提出了一种建模思路,先训练好一个MLP,再对MLP的结果进行平滑,node classification的结果可以超越GNN。思路大概是这样:
1.用节点特征x和节点标签y训练一个MLP
2.取得MLP的预测结果z=MLP(x),对残差e=y-z进行平滑,z=z+e
3.对z进行平滑
Detail
- MLP阶段:首先是第一步的MLP,这是一个四层神经网络,结构如下所示:
class MLPLayer(nn.Module):
def __init__(self, input_dim, output_dim, dropout):
super(MLPLayer, self).__init__()
self.linear = nn.Linear(input_dim, output_dim)
self.bn = nn.BatchNorm1d(output_dim)
self.dropout = dropout
def forward(self, x):
x = self.linear(x)
x = self.bn(x)
x = F.relu(x, inplace=True)
x = F.dropout(x, p=self.dropout, training=self.training)
return x
class MLP(nn.Module):
def __init__(self, n_features, hidden_dim, n_labels, dropout):
super(MLP, self).__init__()
self.layer1 = MLPLayer(n_features, hidden_dim)
self.layer2 = MLPLayer(hidden_dim, hidden_dim)
self.layer3 = nn.Linear(hidden_dim, n_labels)
def forward(self, x):
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
return F.softmax(x, dim=1)
x是node feature,y是node label。z=MLP(x)作为y的预测。对这个MLP完整地训练一遍。
- Correct阶段:训练好MLP之后,第二步对残差进行平滑。是残差矩阵,第行第列代表了节点的标签的残差。代表training set,代表validation和test set,那么的定义是:
令初始的,,作者给出的原始的correct方法是: 一般迭代十来次就收敛了:
由于的variance发生了变化,所以需要进行scale。原文讲了两种scale方法,第一种方法叫做Auto-scale():
第二种方法叫Fdiff-scale: 其中s是自行设定的参数。用numpy实现的代码如下:
def _correct_autoscale(y, z, train_mask, propagation_matrix, alpha, n_iters=50):
e = np.where(train_mask, y - z, np.zeros(shape=z.shape))
e_l1 = np.linalg.norm(e, ord=1, axis=1)
num_train = train_mask[:, 0].sum()
sigma = e_l1.sum() / num_train
e_init = e.copy()
for _ in range(n_iters):
e = (1.0 - alpha) * e_init + alpha * (propagation_matrix @ e)
scale = sigma / np.abs(e).sum(axis=1, keepdims=True)
scale[np.isinf(scale) | (scale > 1000)] = 1.0
return z + scale * e
def _correct_fdiff_scale(y, z, train_mask, propagation_matrix, scale, n_iters=50):
e = np.where(train_mask, y - z, np.zeros(shape=z.shape))
e_init = e.copy()
for _ in range(n_iters):
e = propagation_matrix @ e
e = np.where(train_mask, e_init, e)
return z + scale * e
- Smooth阶段:设,,,初始值,对进行平滑: 用numpy实现的代码如下:
def _smooth(y, z, train_mask, propagation_matrix, alpha, n_iters=50):
h = np.where(train_mask, y, z)
h_init = h.copy()
for _ in range(n_iters):
h = (1.0 - alpha) * h_init + alpha * (propagation_matrix @ h)
return h
最终hard prediction为:
Comment
我自己写代码复现了一下,基本没怎么参考官方的代码,在Cora、Citeseer和pubmed都得到了和paper里差不多的分数。有趣的是,我去掉了correct步骤之后,分数反而提升了!一个MLP+smooth,就基本和APPNP没啥区别了。区别在于,APPNP的梯度回传阶段(训练阶段)是MLP+smooth,此文的梯度回传阶段(训练阶段)仅仅是MLP。因此我又用更少的label rate(20 labeled nodes per class),进行了半监督实验。经过调参发现,MLP+smooth基本能实现APPNP差不多的分数。但是如果去掉了correct可能文章就发表不出来了,因为模型太简单。
Reference
论文Arxiv地址
Re1: 读论文 C&S (Correct and Smooth) - CSDN
论文官方代码
PyG Correct and Smooth类