Related Pins at Pinterest: The Evolution of a Real-World Recommender System

小结

主要讲述pinterest在Related Pins这个场景的演进过程。pinterest的总保存RP占比从10%涨到了40%。Related Pins Save Propensity,是他们主要的优化目标。Related Pins leverages this human-curated content to provide personalized recommendations of pins based on a given query pin

系统总览

一共分为三块、候选产生、memboost和rank，注意看三者在系统中的先后顺序

候选
规模从10亿到1000
memboost
memorizes past engagement on specific query and result pairs.
rank
maximize our target engagement metric of Save Propensity

图片.png

候选演进过程

最早候选主要是基于pins在boards中的共现，后来引入memboost和ltr之后，候选的问题从precision慢慢往recall的方向靠拢，加入了更多新的候选来源

board共现
1）用mapreduce来计算两个pin在boards中的共现，同时基于分类和文本的匹配程度来进行相关性加权，有两个问题a)长尾问题;b)更多的依赖基础信息的相关性得分
2）随机游走Pixie。a）使用一些规则去掉了一些高相关和低相关的节点;b)参考了twitter的游走方法，超过10w次的随机游走的结果聚合，每次都重设概率
3）优缺点
优点是recall还不错，缺点1）board有时宽泛了，而且board的内容容易随着用户的兴趣转义而变化；2）board有时太狭义了，比如whiskey和cocktail有时候在不同的board中
session共现
用户在时序上的行为串联可以解决board太宽泛和狭窄的问题
Pin2Vec is a learned embedding of the N most popular (head) pins in a d-dimensional space, with the goal of mini- mizing the distance between pins that are saved in the same session.
we consider pins that are saved by the same user within a certain time window to be related (保存行为而不是点击行为！！)，it captures a large amount of user behavior in a compact vector representation.
补充候选
为了解决两大问题：1）the cold start problem: rare pins do not have a lot of candidates because they do not appear on many boards.2）after we added ranking, we wanted to expand our candidate sets in the cases where diversity of results would lead to more engagement.
1）基于搜索的候选
We generate candidates by leveraging Pinterest’s text-based search, using the query pin’s annotations (words from the web link or description) as query tokens. Each popular search query is backed by a precomputed set of pins from Pinterest Search
2）视觉相似候选
a) If the query image is a near-duplicate, then we add the Related Pins recommendations for the duplicate image to the results.
b) use the Visual Search backend to return visually similar images, based on a nearest-neighbor lookup

区域候选
the content activation problem: rare pins do not show up as candidates because they do not appear on many boards.
we generate additional candidate sets segmented by locale for many of the above generation techniques
解决内容激活问题的方法还有gender-specific content or fresh content.

memboost演进过程

we built Memboost to memorize the best result pins for each query，Memboost as a whole introduces significant system com- plexity by adding feedback loops in the system。
1）使用clicks over expected clicks (COEC) 来解决位置和平台的偏差

图片.png

2）具体行为考虑到点击、长点击、关闭和保存，具体的计算方法为：

图片.png

3）如果加入了new ranker或者时间推移或者系统改变，其实会对memboost这种历史累计的得分产生影响，解决方法是把memboost作为一个feature喂给ranker
4）memboost insertion主要是为了解决一些召回和排序不存在的优质内容进行回流

ranking演进过程

概览
假设ranking是未来最有可能提升效果的部分，第一个版本效果提升了30%。第一个版本只用了pin的原始数据，后面加上了Memboost and user data，包括用户的行为数据（最近搜索）。特征包括原始特征（主题、分类）、归一化特征（memboost）、one-hot编码特征、相关性特征（query和candidate的主题相关性）等等。
在实际线上使用中主要是有三个大的问题需要处理：

训练数据如何选择？用memboost分数还是独立session作为训练数据
学习目标用 pointwise 还是 pairwise？
模型类型用线性模型还是树模型

进化过程

图片.png

1）. Memboost training data, relevance pair labels, pairwise loss, and linear RankSVM model

pins (r1,rn),(rn,rrand) for each query, where r1,rn are the results with highest and lowest Memboost scores,
respectively, for a given query. rrand is a randomly generated popular pin from Pinterest, added to stabilize the rankings, as suggested in [14].
We reasoned that pins with low Memboost scores would still be more relevant than purely random pins
优点：
the training data was fairly clean
we could use a much smaller corpus and train a model within minutes on a single machine.
缺点：
Memboost data inherently precludes personalization because it is aggregated over many users, losing the association with individual users and session context（很难模拟不同用户对结果不同的反应）
only popular content had enough interaction for reliable Memboost data.
2）. individual Related Pins sessions
logged session consists of the query pin, viewing user, and recent action context, and a list of result pins.
save > long click > click > closeup > impression only.
we trim the logged set of pins, taking each engaged pin as well as two pins immediately preceding it in rank order, under the assumption that the user probably saw the pins immediately preceding the pin they engaged with（只选连续出现的pair）
3）. Moved to a RankNet GBDT Model
线性模型的缺点
they force the score to depend linearly on each feature.
cannot make use of features that only depend on the query and not the candidate pin， It is time consuming to engineer these feature crosses
树模型的优点
allowing non-linear response to individual features,
decision trees also inherently consider interactions between features, corresponding to the depth of the tree.
4）. Moved to pointwise classification loss, binary labels, and logistic GBDT model
closeups and clicks seemed counterproductive since these actions may not reflect save propensity.
We found that giving examples simple binary labels (“saved” or “not saved”) and reweighting positive examples to combat class imbalance proved effective at increasing save propensity.

模型偏差
the model that is currently deployed dramatically impacts the training examples produced for future models.

the first ranking model, the logs reflected user’s engagement with results ranked only by the candidate generator
Over the following months, the training data only reflected engagement with pins that were highly ranked by the existing model
training pins no longer matched the distribution of pins ranked at serving time.
解决方法
a small percentage of traffic for “unbiased data collection，show a random sample，randomly ordered without ranking，each user is served unranked pins on only a small random subset of queries.

成功的评测指标
迭代越快越好，线上的ab测试对于ranking的评价方法就是看保存率，但是online需要数天，所以离线测试就很重要了。

sample individual Related Pins sessions, but choose a distinct range of dates that follows the training date range, and a slightly different sampling strategy
For each session we rescore the pins that the user actually saw, using the models under test, then measure the agreement between the scores and the logged user behavior.
We examined the directionality as well as the magnitude of the difference predicted by offline evaluation, and compared it to the actual experiment results.
We found that PR AUC metrics are extremely predictive of closeups and clickthroughs in A/B experiments, but we had difficulty predicting the save behav- ior using offline evaluation

serving框架
离线与在线相结合

图片.png

ranking与memboost的关系

ranking和memboost理论上来说都是进行排序的，在pinterest的应用中，这俩模块也一直在。我们可以看到

第三阶段其实是ranking先pre-compute做一个粗排，然后用memboost进行微调（其实还有一种做法就是用memboost进行粗排再用ranking精排，不知道为什么pinterest为什么这么选择？）
第四阶段直接变成joint-learning了，memboost的分数变成了ranking的一个特征，进行联合训练。

挑战

Changing Anything Changes Everything
inputs are never really independent，Improving another compo- nent may actually result in worse overall performance.
Our general solution is to jointly train/automate as much of the system as possible for each experiment.

To avoid other changes resulting in hyperparameters becoming subop- timal, we implemented a parallelized system for automated hyperparameter tuning
“Improvements” to the raw data can harm our results since our downstream model is trained on the old definition of the feature. Even if a bug is fixed
changing or introducing a candidate generator can cause the ranker to become worse, since the training data distribution will no longer match the distribution of data ranked at serving time。Our current solution is to insert the new candidates into the training data collection for some time before running an experiment with a newly trained model.

内容冷启动

We dive deeper into how we solved for content activation for the particular case of localization.
SWAP:we check if there is a local alternative with the same image.
BOOST: artificially promote them to a higher position in the result set.
BLEND: producing a segmented corpus of pins for each language，blend local candidates into the results at various ratios.

TODO

pinterest的最新方向好像是GCN

参考资料和深入阅读

Pinterest推荐系统四年进化之路
 Wtf: The who to follow service at twitter
Visual search at pinterest
Visual discovery at pinterest

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 221,198评论 6赞 514
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 94,334评论 3赞 398
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 167,643评论 0赞 360
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 59,495评论 1赞 296
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 68,502评论 6赞 397
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 52,156评论 1赞 308
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 40,743评论 3赞 421
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 39,659评论 0赞 276
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 46,200评论 1赞 319
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 38,282评论 3赞 340
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 40,424评论 1赞 352
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 36,107评论 5赞 349
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 41,789评论 3赞 333
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 32,264评论 0赞 23
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 33,390评论 1赞 271
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 48,798评论 3赞 376
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 45,435评论 2赞 359