Python jieba 去除停用词生成词云图

读写文件

把待读取的文本存在info.txt中，content类型为str

with open('info.txt', 'r', encoding="UTF-8") as file1:  # with as操作读取文件很ok
    content = "".join(file1.readlines())

待写入文件为 output.txt，content_after为待写入字符串

with open('output.txt', 'w', encoding='utf-8') as file2:
    file2.write(content_after+"\n")

分词

# 调用jieba.cut
sentence_seged = jieba.cut(content)

去除停用词

建立停用词表
将停用词表放在stop.txt中，一行一个词

# stopwords为停用词list
stopwords = [line.strip() for line in open('stop.txt', 'r', encoding='utf-8').readlines()]

遍历去除停用词

outstr = '' # 待返回字符串

 for word in sentence_seged:
    if word not in stopwords:
        outstr += word + " "

生成词云图

images = Image.open("something.png") # 打开保存的图片
maskImages = np.array(images) # 并用numpy转换
wc = WordCloud(font_path="msyh.ttc", background_color="white", max_words=100, max_font_size=100).generate(content_after) # 生成词云图
wc.to_file('wordCloudPic.png')    # 保存到本地图片文件

最后编辑于：2019.01.21 18:32:20

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成，浏览时请结合常识与多方信息审慎甄别。
平台声明：文章内容（如有图片或视频亦包括在内）由作者上传并发布，文章内容仅代表作者本人观点，简书系信息发布平台，仅提供信息存储服务。

Python jieba 去除停用词生成词云图

Python jieba 去除停用词生成词云图

读写文件

分词

去除停用词

生成词云图

相关阅读更多精彩内容

友情链接更多精彩内容

Python jieba 去除停用词 生成词云图

读写文件

分词

去除停用词

生成词云图

相关阅读更多精彩内容

友情链接更多精彩内容

Python jieba 去除停用词生成词云图