又是没有出分数线的一天,焦虑max,只能看看爱豆平复一下心情。
永远不要小看粉丝的力量,我在贴吧找到了包含杰伦所有歌的最强word文档!从2000年的《Jay》到2016年《周杰伦的床边故事》,整整收录了161首歌的歌词!
那我们来看看,周杰伦的歌里都在唱些什么吧!
# 加载相关包
import numpy as np
import pandas as pd
import xlrd #读取excel数据用
import openpyxl #输出excel数据用
import re
import jieba
import jieba.analyse
#jieba.enable_parallel() #并行分词开启
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
from wordcloud import WordCloud #词云包
import warnings
warnings.filterwarnings("ignore")
# 加载数据
songs = pd.read_excel("data/周杰伦歌词.xlsx", encoding='utf-8')
# 歌词只保留中文
songs['歌词'] = songs['歌词'].apply(lambda x: re.sub(r'[^\u4e00-\u9fa5]+', ' ',str(x), flags=re.U))
# 歌词分词
songs['分词'] = songs['歌词'].apply(lambda x: ' '.join(jieba.cut(x.strip())))
all_train = list(songs.分词.values)
# 基于textrank抽取关键词
jieba.analyse.set_stop_words('./data/stopwords.txt')
key_words = jieba.analyse.textrank(str(all_train), topK=200, withWeight=True, allowPOS=('n','an','v'))
# 关键词可视化
from scipy.misc import imread
matplotlib.rcParams['figure.figsize'] = (12.0, 12.0)
from wordcloud import WordCloud,ImageColorGenerator
bimg=imread('data/jay3.jpg')
wordcloud=WordCloud(background_color="white",mask=bimg,font_path='data/simhei.ttf',max_font_size=130)
word_frequence = {x[0]:x[1] for x in key_words}
wordcloud=wordcloud.fit_words(word_frequence)
bimgColors=ImageColorGenerator(bimg)
plt.axis("off")
plt.imshow(wordcloud.recolor(color_func=bimgColors))
谈过歌词,方文山为周杰伦写了86首歌!!!我们来看看文山大哥喜欢写什么好了!!
fang = songs[songs['填词'] == '方文山'].index
fang_lyrics = [all_train[i] for i in fang]
fang_words = jieba.analyse.textrank(str(fang_lyrics), topK=100, withWeight=True,
allowPOS=('n','an','v'))
# 画图
bimg=imread('data/fang2.jpg')
wordcloud=WordCloud(background_color="white",mask=bimg,font_path='data/simhei.ttf',max_font_size=130)
word_frequence = {x[0]:x[1] for x in fang_words}
wordcloud=wordcloud.fit_words(word_frequence)
bimgColors=ImageColorGenerator(bimg)
plt.axis("off")
plt.imshow(wordcloud.recolor(color_func=bimgColors))
凌凌乱乱恍恍惚惚,我有故事,你有分数线吗?