2017.10.13
Short term goal: (1)Finish the experiment : build topic Evolution roads. (2) Think how to do use the topic evolution method in Danmaku datasets.
- Thinking and discussing how to crawl the Danmaku datasets with Songyang and Prof.hu
2.Have a rest due to headache and diarrhea.
2017.10.12
Short term goal: (1)Finish the experiment : build topic Evolution roads. (2) Think how to use the topic evolution method in Danmaku datasets.
1.Read the paper of <Temporal Regularized Matrix Factorization for High-dimensional Time Series Prediction>, learn how to add the temple into MF approaches. Still have some confusions, and talk with Guowang (the student of Prof.lv ) about this paper. We will discuss it next week.
2.Learn the basic method of Hierarchical agglomerative clustering.
3.Met some difficults of the experiment and seek advice from ZhouJie, he gives me some valuable suggestions.
2017.10.11
- Download some Danmaku datasets, compare the context difference between Catoon videos and
- Attend the paper sharing class.
2017.10.10
1.Prepare the slides of paper sharing class.
2.Discuss with Songyang about how to get Danmaku datasets and what we can analysis. Sumarrizes the feasible method. thod
3.Attend the party member meeting.
2017.10.09
- Read the papers and prepare the presentation of paper sharing class.
- Summarize the papers about Danmaku.
2017.09.29
- Read the paper of <Semantic Connection Based Topic Evolution> (AAAI 2017) and <Probabilistic Non-Negative Matrix Factorization and Its Robust Extensions for Topic Modeling>(AAAI 2017). The frist paper is simple but workful, In order to model the topic evolution, they use LDA method to extract the topic-document distributions and use DeepWalk method to build a semantic connection graph.
- Read the paper of <Video-based Evanescent, Anonymous, Asynchronous Social Interaction: Motivation and Adaption to Medium> (CSCW 2017).
- And talk with Meihong Wang about the text
2017.9.29周报
1.实验。做话题的演化,计算相似度矩阵,现在的做法感觉还是太简单。
2.阅读关于视频“弹幕”的论文,和博后姐姐研究结合,看看能否用我们的模型分析随着时间演化,弹幕中内容、主题的变化
3.阅读AAAI2017最新的论文
a.AAAI2017关于Topic Evolution的论文
<Probabilistic Non-Negative Matrix Factorization and Its Robust Extensions for Topic Modeling>(AAAI 2017)
用NMF做主题抽取,理论上的创新,加入了概率推断,但是看他的实验结果并不好,在20news上的准确率K=300时候只有25%,Reuters只有55%。我的方法尝试过的是20news 是67%(论文中有ACC 80%的)
<Semantic Connection Based Topic Evolution> (AAAI 2017)
这篇论文思路也很简单,1)文档按照时间分割以后,用LDA学习到主题-词的分布;2)构造语义连接图,计算相邻文档的KL距离,低于阈值范围的增加一条边,词作为特征标签 3)用DeepWalk计算路径并进行可视化。
b.关于深度学习的论文(准备下周吴老师的论文分享课presentation)
4.看了semEval的任务,其中TASK5 是给定一些持续时间很长的文档,从中抽取事件(killing、injuring),统计数量和参与者。目前数据集还没给出
- 阅读与word2vector相关的17年刚出的A类文章,看看别人都在如何改进wordembedding。然后发现如今关注word representations这一块的人还挺多,新出的方法也很多,有直接修改了最初的语言模型的,也有再向量空间加入了流形学习的等等。
- 继续学习异常检测,以及NLP基础知识。
3.看了semEval的那些任务,感觉task1 &2 & 3会相对比较容易做一点,但是好像没有发文章的空间了,所以还不确定做哪个