2020年春节,冠状病毒在中国的蔓延不容忽视,大大小小的新闻媒体相继进行报道。太多的消息让人越看越烦躁,不如通过词云进行关键字的整理,帮着梳理下疫情的脉络。
1、在各大新闻平台中,copy下20余篇相关新闻到本地作为数据源;
2、通过kumo的词云功能生成词云(基于JAVA开发);
3、对词云结果进行分析;
import com.kennycason.kumo.CollisionMode;
import com.kennycason.kumo.WordCloud;
import com.kennycason.kumo.WordFrequency;
import com.kennycason.kumo.bg.RectangleBackground;
import com.kennycason.kumo.font.KumoFont;
import com.kennycason.kumo.font.scale.LinearFontScalar;
import com.kennycason.kumo.nlp.FrequencyAnalyzer;
import com.kennycason.kumo.nlp.filter.Filter;
import com.kennycason.kumo.nlp.tokenizers.ChineseWordTokenizer;
import com.kennycason.kumo.palette.LinearGradientColorPalette;
import java.awt.*;
import java.io.IOException;
import java.util.List;
public class WordCloudTest {
public static void main(String[] args) throws IOException {
//建立词频分析器,设置词频,以及词语最短长度,此处的参数配置视情况而定即可
FrequencyAnalyzer frequencyAnalyzer = new FrequencyAnalyzer();
//最多展示关键字数量
frequencyAnalyzer.setWordFrequenciesToReturn(500);
Filter filter = new Filter() {
@Override
public boolean test(String s) {
if (s.matches("^\\d+$")) {
return false;
}
return true;
}
};
frequencyAnalyzer.setFilter(filter);
//关键字重复出现阈值
frequencyAnalyzer.setMinWordLength(3);
//引入中文解析器
frequencyAnalyzer.setWordTokenizer(new ChineseWordTokenizer());
final List<WordFrequency> wordFrequencies = frequencyAnalyzer.load("D:\\news.txt");
//初始化画板
Dimension dimension = new Dimension(800, 600);
// WordCloud wordCloud = new WordCloud(dimension, CollisionMode.RECTANGLE);
WordCloud wordCloud = new WordCloud(dimension, CollisionMode.RECTANGLE);
wordCloud.setPadding(0);
java.awt.Font font = new java.awt.Font("STSong-Light", 5, 100);
wordCloud.setBackgroundColor(new Color(255, 255, 255));
wordCloud.setKumoFont(new KumoFont(font));
wordCloud.setBackground(new RectangleBackground(dimension));
// wordCloud.setBackground(new CircleBackground(255));
wordCloud.setColorPalette(new LinearGradientColorPalette(Color.gray, Color.GREEN, 300));
wordCloud.setFontScalar(new LinearFontScalar(20, 100));
wordCloud.build(wordFrequencies);
wordCloud.writeToFile("D:\\news.png");
}
}
词云结果
通过词云结果分析,字体越大代表关键字命中率越高。在这些信息里,摘取自己关注的关键字,再通过谷歌、度娘查询具体的内容,也不至于像一只没头苍蝇一样,看着林林总总的信息越看越恐慌。
PS:数据源一定是越多,命中的结果越准确,感兴趣的同学可以自行copy或者通过爬虫来爬取相关信息,得到自己想要的结果。
感兴趣的同学记得点赞哦~