本篇介绍 Recurrent Neural Networks (GRU) 在推荐系统上的应用。主要关注数据的构建和损失函数的设计。
1. 样本构建
1.1 Session-Parallel Mini-Batches
任何一条 session 切完后下一条 session 补上,比如 session4 补到 session2 的后面。
这里与自然语言的滑动窗口处理形式不同,主要考虑到以下2点:
- session 的数据长度变化非常大
- 这种方式相对于滑动窗口能更好的捕捉 session 随时间的变化情况。
1.2 Sampling On The Output
- 根据 item 的流行度来进行采样。因为流行的物品用户接触到的可能性更高,如果对该物品都没有正反馈,则表示用户不喜欢该物品。
- 没有采用对每个正样本采样负样本的形式,而是直接用其他的 batch 做为负样本。
2. 损失函数的设计
排序主要分3种:pointwise, pairwise , listwise。
- pointwise: Pointwise ranking estimates the score or the rank of items independently of each other and the loss is defined in a way so that the rank of relevant items should be low.
- pairwise: Pairwise ranking compares the score or the rank of pairs of a positive and a negative item and the loss enforces that the rank of the positive item should be lower than that of the negative one.
- listwise: Listwise ranking uses the scores and ranks of all items and compares them to the perfect ordering. As it includes sorting, it is usually computationally more expensive and thus not used often
论文中提出pointwise (i.e. cross-entropy)表现不稳定,效果较好的方式是 pairwise,对 piarwise 主要用到下面2种损失函数:
BPR
TOP1(作者设计的,对负样本增加了正则项,期望负样本的得分为0)
其中, 为负样本量大小, 为 item k
的分数,i
代表正样本,j
代表负样本。
3. 模型架构
注:
- 1-of-N encoding: one-hot形式,输入向量的长度为 itmes 的数量。
- embedding 向量短接到每一个 RGU 层能提升模型效果
4. 实现
参考:https://github.com/Songweiping/GRU4Rec_TensorFlow
4.1 数据处理部分
def _load_data(f, max_len):
"""
Data format in file f:
SessionId\tItemId\tTimestamp\n
"""
if os.path.exists('item2id.map'):
item2idmap = {}
for line in open('item2id.map'):
k, v = line.strip().split('\t')
item2idmap[k] = int(v)
else:
items = get_item()
item2idmap = dict(zip(items, range(1, 1+items.size)))
with open('item2id.map', 'w') as fout:
for k, v in item2idmap.iteritems():
fout.write(str(k) + '\t' + str(v) + '\n')
n_items = len(item2idmap)
data = pd.read_csv(f, sep='\t', dtype={0:str, 1:str, 2:np.float32})
data['ItemId'] = data['ItemId'].map(item2idmap)
data = data.sort_values(by=['Timestamp']).groupby('SessionId')['ItemId'].apply(list).to_dict()
new_x = []
new_y = []
for k, v in data.items():
x = v[:-1]
y = v[1:]
if len(x) < 2:
continue
padded_len = max_len - len(x)
if padded_len > 0:
x.extend([0] * padded_len)
y.extend([0] * padded_len)
new_x.append(x[:max_len])
new_y.append(y[:max_len])
return (new_x, new_y, n_items)
4.2 模型部分:
############################LOSS FUNCTIONS######################
def cross_entropy(self, yhat):
# tf.diag_part 返回张量的对角线元素
return tf.reduce_mean(-tf.log(tf.diag_part(yhat)+1e-24))
def bpr(self, yhat):
yhatT = tf.transpose(yhat)
return tf.reduce_mean(-tf.log(tf.nn.sigmoid(tf.diag_part(yhat)-yhatT)))
def top1(self, yhat):
yhatT = tf.transpose(yhat)
term1 = tf.reduce_mean(tf.nn.sigmoid(-tf.diag_part(yhat)+yhatT)+tf.nn.sigmoid(yhatT**2), axis=0)
term2 = tf.nn.sigmoid(tf.diag_part(yhat)**2) / self.batch_size
return tf.reduce_mean(term1 - term2)
def build_model(self):
self.X = tf.placeholder(tf.int32, [self.batch_size], name='input')
self.Y = tf.placeholder(tf.int32, [self.batch_size], name='output')
self.state = [tf.placeholder(tf.float32, [self.batch_size, self.rnn_size], name='rnn_state') for _ in xrange(self.layers)]
self.global_step = tf.Variable(0, name='global_step', trainable=False)
with tf.variable_scope('gru_layer'):
sigma = self.sigma if self.sigma != 0 else np.sqrt(6.0 / (self.n_items + self.rnn_size))
if self.init_as_normal:
initializer = tf.random_normal_initializer(mean=0, stddev=sigma)
else:
initializer = tf.random_uniform_initializer(minval=-sigma, maxval=sigma)
embedding = tf.get_variable('embedding', [self.n_items, self.rnn_size], initializer=initializer)
softmax_W = tf.get_variable('softmax_w', [self.n_items, self.rnn_size], initializer=initializer)
softmax_b = tf.get_variable('softmax_b', [self.n_items], initializer=tf.constant_initializer(0.0))
cell = rnn_cell.GRUCell(self.rnn_size, activation=self.hidden_act)
drop_cell = rnn_cell.DropoutWrapper(cell, output_keep_prob=self.dropout_p_hidden)
stacked_cell = rnn_cell.MultiRNNCell([drop_cell] * self.layers)
inputs = tf.nn.embedding_lookup(embedding, self.X) # [batch_size, rnn_size]
output, state = stacked_cell(inputs, tuple(self.state)) # [batch_size, rnn_size]
self.final_state = state
if self.is_training:
'''
Use other examples of the minibatch as negative samples.
'''
sampled_W = tf.nn.embedding_lookup(softmax_W, self.Y)
sampled_b = tf.nn.embedding_lookup(softmax_b, self.Y)
logits = tf.matmul(output, sampled_W, transpose_b=True) + sampled_b
self.yhat = self.final_activation(logits)
self.cost = self.loss_function(self.yhat)
else:
logits = tf.matmul(output, softmax_W, transpose_b=True) + softmax_b
self.yhat = self.final_activation(logits)