本文主要汇聚一些互联网公开的语料,供机器学习研究使用。 词典 THUOCL:清华大学开放中文词库 中英文词典 MDBG English <-> Chinese dictionary. 中文文本分类数据集THUCNews http://thuctc.thunlp.org Youtube Bounding Boxes Google QuickDraw Data DeepMind Open Source Datasets Google Speech Commands Dataset Atomic Visual Actions Several updates to the Open Images data set Nsynth dataset of annotated musical notes Quora Question Pairs 语料 by sennchi