word2vec 理解

1:word2vec 可以用于时序数据的挖掘,比如商品的浏览分析,app 下载分析,通过word2vec 可以得到商品或 app 的 向量表示,从而用于推荐等,个性化展示
http://ginobefunny.com/post/learning_word2vec/

2:一些使用经验
There's no universal rules-of-thumb, as even what makes a set of word-vectors good for one purpose might not be best for other purposes. (For example, word-vecs that do best on the analogies-test may not be also do the best at a topical-classification task that works on some mean-of-word-vectors.)

That said:

  • be sure to use the latest gensim; earlier versions could be significantly slower on very-short text examples (like tweets)

  • larger window sizes seem to position words closer according to topical-domain/field-of-use/semantic similarity; shorter window sizes position words closer based on functional/syntactic similarity (serve same role in sentence)

  • as your dataset gets larger, sometimes very-small values of window and negative are just as good (or better) and faster than larger values

  • as your dataset gets larger, more-aggressive frequent-word downsampling (the 'sample' parameter becoming smaller but not zero) can offer both speed and quality benefits (by spending fewer training cycles on redundant well-represented words)

  • it's typical to use more than one iteration, but as your data gets larger (and if you're confident word/word-senses are randomly distributed from front to back) the benefits of extra iterations will lessen

  • Gordon
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • **2014真题Directions:Read the following text. Choose the be...
    又是夜半惊坐起阅读 9,997评论 0 23
  • 马上离开四年的大学了,突然想写写最近几天班里面发生的事情,作为一个班干部,还有最后一件毕业相册的事情需要我通知并且...
    jqq青阅读 426评论 0 0
  • 大家好,我是老王,也许就住在你隔壁,今天做一个写文章的教学。首先说明我不是标题党,按照我今天教你的小套路,就能让你...
    奔跑的老王阅读 488评论 0 0
  • 今晚十二点就要成年了。好紧张,好惊恐。不想成年,我还想做个小天使。但是这我也很无奈啊。我又不能手动改生日。 ...
    卿梓渊阅读 225评论 0 2
  • 如果你不够努力,不够优秀,便也不够幸运与自信。现实总是欺软怕硬,不会有半点宽容理解。你只有勇敢面对一切磨难,用最决...
    爱墨生阅读 285评论 0 0