这天山哥正在看IT动向。看到网上有人用Python和R来分析微信朋友,于是来了兴趣,也玩了一把。不过不会R,于是用Python画图。(参考网址 http://www.sohu.com/a/154250476_467794)
一开始是在Windows下面玩的,后来装不了 Jieba和Wordcloud,就转向Mac了。
第一步,安装 itchat
要方便,你得用PIP: pip install itchat
第二步,获取微信朋友资料,保存为JSON
import itchat
import json
if __name__ == '__main__':
# 把获取到的资料存为Json,那样在之后的调试过程,不用次次连接微信
f = open("C:\\Users\\Samuel\\Desktop\\friends.json", encoding="UTF-8", mode="w")
itchat.login() # 这个会弹出二维码让你扫码登陆微信
friends = itchat.get_friends(update=True)[0:
![gender.png](http://upload-images.jianshu.io/upload_images/6409065-b6c686e33427cdd5.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)
] #取得朋友资料数组
json.dump(friends, fp=f) # 保存为Json
f.close()
开始玩,分析性别比例
这个是在Mac下的代码,Windows处理中文乱码和Mac有点不同,其它一样
# coding:utf-8
import json
import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties
# Define this to solve the Mac Chinese problem.. if you use english, no need
def getChineseFont():
return FontProperties(fname='/System/Library/Fonts/PingFang.ttc')
f = open("/Users/sam/Desktop/friends.json", encoding="UTF-8", mode="r")
friends = json.load(fp=f)
f.close()
male = female = other = 0
for friend in friends[1:]:
sex = friend["Sex"]
if sex == 1:
male += 1
elif sex == 2:
female += 1
else:
other += 1
# 计算朋友总数
total = len(friends[1:])
# 打印出自己的好友性别比例
print("男性好友: %.2f%%" % (float(male) / total * 100))
print("女性好友: %.2f%%" % (float(female) / total * 100))
print("不明性别好友: %.2f%%" % (float(other) / total * 100))
# For windows to solve the Chinese problem. No need to add `fontproperties` to the methods.
#matplotlib.rcParams['font.sans-serif'] = ['SimHei']
plt.xticks((0, 1, 2),('其它', '男', '女'), fontproperties=getChineseFont())
plt.title('微信朋友圈性别比例分析', fontproperties=getChineseFont())
plt.bar(left=(0, 1, 2), height=(other/total * 100, male/total * 100, female/total * 100), color=('yellow', 'blue', 'red'))
plt.ylabel("百分比 %",fontproperties=getChineseFont())
plt.show()
输出结果:
男性好友: 49.50%
女性好友: 38.25%
不明性别好友: 12.25%
再玩,微信好友个性签名的自定义词云图
这个是好玩的东东,原参考文章里那个地址分析的画图太复杂,没有源码,而且是R的,咱就不玩了。咱来分析一下大伙儿个性签名时使用的高频词语是什么,做个词云图。
个性签名(Signature)有很多本来是表情的,例如 emoji、span、class等等这些无关紧要的词,需要先替换掉,另外,还有类似<>/= 之类的符号,也需要写个简单的正则替换掉,再把所有拼起来,得到text字串。不多说了,上代码。
先安装 JieBa 和 WordCloud:
pip install jieba
pip install wordcloud
# -*- coding:utf-8 -*-
# coding:utf-8
import json
import matplotlib.pyplot as plt
import jieba
from wordcloud import WordCloud, ImageColorGenerator
import numpy as np
import PIL.Image as Image
import re
# Load the JSON file
f = open("/Users/sam/Desktop/friends.json", encoding="UTF-8", mode="r")
friends = json.load(fp=f)
f.close()
# Use the jieba to analyze the signature.
siglist = []
for i in friends:
signature = i["Signature"].strip().replace("span","").replace("class","").replace("emoji","")
rep = re.compile("1fd+w*|[<>/=]")
signature = rep.sub("", signature)
siglist.append(signature)
text = "".join(siglist)
wordlist = jieba.cut(text, cut_all=True)
word_space_split = " ".join(wordlist)
# 这里用一张图作底版,WordCloud会根据颜色来分布不同频率出现的词汇。
coloring = np.array(Image.open("/Users/sam/Desktop/wechat.jpg"))
my_wordcloud = WordCloud(background_color="white", max_words=2000,
mask=coloring, max_font_size=60, random_state=42, scale=2,
font_path="/Library/Fonts/Songti.ttc").generate(word_space_split)
image_colors = ImageColorGenerator(coloring)
plt.imshow(my_wordcloud.recolor(color_func=image_colors))
plt.imshow(my_wordcloud)
plt.axis("off")
plt.show()
好了!大功告成!亲个嘴儿!