学习Python的第三天

首先要解决昨天安装词云失败的问题


安装失败会报详情,原来是我选择错误,应该选择32位,而自己错装成64位
卸载重装后,词云成功安装啦!yeah!!!
接着昨天未完成的词云绘制

# 绘制词云
from wordcloud import WordCloud
text = 'He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without taking a fish. In the first forty days a boy had been with him. But after forty days without a fish the boy’s parents had told him that the old man was now definitely and finally salao, which is the worst form of unlucky, and the boy had gone at their orders in another boat which caught three good fish the first week. It made the boy sad to see the old man come in each day with his skiff empty and he always went down to help him carry either the coiled lines or the gaff and harpoon and the sail that was furled around the mast. The sail was patched with flour sacks and, furled, it looked like the flag of permanent defeat.'
wc = WordCloud.generate(text)
wc.to_file('老人与海.png')

执行程序之后生成图片


三国演义小说词云绘制

这里要设置 font_path='msyh.ttc'
不然会显示乱码

mask = imageio.imread('./china.jpg')
with open('./novel/threekingdom.txt', 'r', encoding='utf-8') as f:
    words = f.read()
    # print(len(words)) # 字数  55万
    words_list = jieba.lcut(words)
    # print(len(words_list)) # 分词后的词语数  35万
    print(words_list)
    # 将words_list转化成字符串
    novel_words = " ".join(words_list)
    # print(novel_words)
    # WordCloud()里面设置参数
    wc = WordCloud(
        font_path='msyh.ttc',
        background_color='white',
        width=800,
        height=600,
        mask=mask
    ).generate(novel_words)
    wc.to_file('三国词云.png')

执行程序生成图片



下面开始今天的学习内容

三国top10人物分析

  1. 读取小说内容
    2.分词
    3.词语过滤,删除无关词、重复分词
    4.排序
    5.得出结论
import jieba
# 1. 读取小说内容
with open('./novel/threekingdom.txt', 'r', encoding='utf-8') as f:
    words = f.read()
    counts = {} #{'曹操': 234, '回寨': 56}
# 2.分词
    words_list = jieba.lcut(words)
    for word in words_list:
        if len(word) <= 1:
            continue
        else:
            #向字典中更新字典中的值
            #counts[word] = 取出字典中原来键对应的值 + 1
            # counts[word] = counts[word] + 1  counts[word]没有就会报错
            #字典.get(k) 如果字典中没有这个键 返回 none
            counts[word] = counts.get(word, 0) + 1
    print(counts)
# 3.词语过滤,删除无关词、重复分词
    # 4.排序 [(), ()]
    items = list(counts.items())
    print('排序前的列表', items)
    def sort_by_count(x):
        return x[1]
    items.sort(key=sort_by_count, reverse=True)
    for i in range(20):
        #序列解包
        role, count = items[i]
        print(role, count)

排除不是人名的分词,合并人名,然后排出top10

exclude = {"将军", "却说", "丞相", "二人", "不可", "荆州", "不能", "如此", "商议",
               "如何", "主公", "军士", "军马", "左右", "次日", "引兵", "大喜", "天下",
               "东吴", "于是", "今日", "不敢", "魏兵", "陛下", "都督", "人马", "不知", 
               "孔明曰", "玄德曰", "刘备", "云长"}

   counts['孔明'] = counts['孔明'] + counts['孔明曰']
   counts['玄德'] = counts['玄德'] + counts['玄德曰'] + counts['刘备']
   counts['关公'] = counts['关公'] + counts['云长']
   for word in exclude:
       del counts[word]

最后再绘制词云,最终代码如下:

import jieba
from wordcloud import WordCloud
import imageio
# 1. 读取小说内容
with open('./novel/threekingdom.txt', 'r', encoding='utf-8') as f:
    words = f.read()
    counts = {} #{'曹操': 234, '回寨': 56}
    exclude = {"将军", "却说", "丞相", "二人", "不可", "荆州", "不能", "如此", "商议",
               "如何", "主公", "军士", "军马", "左右", "次日", "引兵", "大喜", "天下",
               "东吴", "于是", "今日", "不敢", "魏兵", "陛下", "都督", "人马", "不知",
               "孔明曰", "玄德曰", "刘备", "云长"}

    # 2.分词
    words_list = jieba.lcut(words)
    for word in words_list:
        if len(word) <= 1:
            continue
        else:
            #向字典中更新字典中的值
            #counts[word] = 取出字典中原来键对应的值 + 1
            # counts[word] = counts[word] + 1  counts[word]没有就会报错
            #字典.get(k) 如果字典中没有这个键 返回 none
            counts[word] = counts.get(word, 0) + 1
    print(counts)
    # 3.词语过滤,删除无关词、重复分词
    counts['孔明'] = counts['孔明'] + counts['孔明曰']
    counts['玄德'] = counts['玄德'] + counts['玄德曰'] + counts['刘备']
    counts['关公'] = counts['关公'] + counts['云长']
    for word in exclude:
        del counts[word]
    # 4.排序 [(), ()]
    items = list(counts.items())
    print('排序前的列表', items)
    def sort_by_count(x):
        return x[1]
    items.sort(key=sort_by_count, reverse=True)

    li = []  # ['孔明',孔明,孔明,'曹操'。。。。。]
    for i in range(10):
        #序列解包
        role, count = items[i]
        print(role, count)
        # _是告诉看代码的人,循环里面不需要使用临时变量
        for _ in range(count):
            li.append(role)
    # 5.得出结论
    mask = imageio.imread('./china.jpg')
    text = ' '.join(li)
    WordCloud(
        font_path='msyh.ttc',
        background_color='white',
        width=800,
        height=600,
        mask=mask,
        # 相邻两个重复词之间的匹配
        collocations=False
    ).generate(text).to_file('top10.png')

collocations=False 这一条语句不能少,否则程序执行结果就会变成下面这种情况:



设置为False的作用是取消相邻两个重复词之间的匹配

匿名函数

结构:lambda x1, x2....xn: 表达式,参数可以是无限多个,但是表达式只有一个
eg1:求两个数相加

sum_number = lambda x1, x2: x1 + x2
print(sum_number(2, 3))

eg2:

name_info_list = [
    ('张三', 4500),
    ('李四', 9900),
    ('王五', 2000),
    ('赵六', 5500),
]
name_info_list.sort(key=lambda  x: x[1], reverse=True)
print(name_info_list)

eg3:

stu_info = [
    {"name": 'zhangsan', "age": 18},
    {"name": 'lisi', "age": 30},
    {"name": 'wangwu', "age": 99},
    {"name": 'tiaqi', "age": 3},

]
stu_info.sort(key=lambda  x: x['age'], reverse=True)
print(stu_info)

所以三国top10任务分析中的排序也可以做优化

 def sort_by_count(x):
         return x[1]
    items.sort(key=sort_by_count, reverse=True)

优化以后:

    items.sort(key=lambda x: x[1], reverse=True)

列表推导式

之前我们使用普通for循环创建列表

li = []
for i in range(10):
    li.append(i)
print(li)

使用列表推导式,只需一条语句也可以达到同样的效果

# [表达式 for 临时变量 in 可迭代对象 可以追加条件]
print([i for i  in range(10)])

列表解析

比如说筛选出列表中的所有偶数,如果按照平常的方法

li = []
for i in range(10):
    if i%2 == 0:
        li.append(i)
print(li)

而我们使用列表解析之后

print([i for i in range(10) if i%2 == 0])

eg:筛选出列表中大于0的数

from random import randint
num_list = [randint(-10, 10) for _ in range(10)]
print(num_list)
print([i for i in num_list if i > 0])

执行程序,得到结果


字典解析

eg1:生成10个学生的成绩

from random import randint
stu_grades = {'student{}'.format(i): randint(50, 100) for i in range(1, 101)}
print(stu_grades)

运行结果如下



eg2:筛选大于60分的所有学生

from random import randint
stu_grades = {'student{}'.format(i): randint(50, 100) for i in range(1, 101)}
print({k: v for k, v in stu_grades.items() if v > 60})

运行结果如下:


Matplotlib

Matplotlib 是一个Python的2D绘图库,它以各种硬拷贝格式和跨平台的交互式环境生成出版质量级别的图形 。
通过 Matplotlib,开发者可以仅需要几行代码,便可以生成绘图,直方图,功率谱,条形图,错误图,散点图等。

使用100个点,绘制[0, 2π]正弦曲线图

注意:Python文件名不能起库名,否则会错

from matplotlib import pyplot as plt
plt.rcParams["font.sans-serif"] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
import numpy as np

# 使用100个点 绘制[0, 2π]正弦曲线图
# .linsapce 左闭右闭区间等差数列
x = np.linspace(0, 2*np.pi, num=100)
print(x)
y = np.sin(x)
# plt.plot(x, y)
# plt.show()

# 正弦和余弦在同一坐标系下
cosy = np.cos(x)
plt.plot(x, y, color='g', linestyle='--', label='sin(x)')
plt.plot(x, cosy, color='r', label='cos(x)')
plt.xlabel('时间(s)')
plt.ylabel('电压(v)')
plt.title('欢迎来到python世界')
# 图例
plt.legend()
plt.show()

运行结果如下:


柱状图

from matplotlib import pyplot as plt
plt.rcParams["font.sans-serif"] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
import numpy as np
import string
from random import randint
# print(string.ascii_uppercase[0:6])
# ['A', 'B', 'C'...]
x = ['口红{}'.format(x) for x in string.ascii_uppercase[0:5]]

y = [randint(200, 500) for _ in range(5)]
print(x)
print(y)
plt.xlabel('口红品牌')
plt.ylabel('价格(元)')
plt.bar(x, y)
plt.show()

饼图

from matplotlib import pyplot as plt
plt.rcParams["font.sans-serif"] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
import numpy as np
import string
from random import randint
counts = [randint(3500, 9000) for _ in range(9)]
labels = ['员工{}'.format(x) for x in string.ascii_lowercase[:9]]
# 距离圆心点的距离
explode = [0.1, 0, 0, 0, 0, 0, 0, 0, 0]
colors = ['red', 'purple', 'blue', 'yellow', 'gray', 'green']
plt.pie(counts, explode=explode, colors=colors, shadow=True, labels=labels, autopct='%1.1f%%')
plt.legend(loc=1)  # 图例位置
plt.axis('equal')
plt.show()

散点图

from matplotlib import pyplot as plt
plt.rcParams["font.sans-serif"] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
import numpy as np
x = np.random.normal(0, 1, 100)
y = np.random.normal(0, 1, 100)
# alpha是指透明度
plt.scatter(x, y, alpha=0.5)
plt.show()

绘制三国演义top10饼图

import jieba
from matplotlib import pyplot as plt
plt.rcParams["font.sans-serif"] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False

# 1. 读取小说内容
with open('./novel/threekingdom.txt', 'r', encoding='utf-8') as f:
    words = f.read()
    counts = {} #{'曹操': 234, '回寨': 56}
    exclude = {"将军", "却说", "丞相", "二人", "不可", "荆州", "不能", "如此", "商议",
               "如何", "主公", "军士", "军马", "左右", "次日", "引兵", "大喜", "天下",
               "东吴", "于是", "今日", "不敢", "魏兵", "陛下", "都督", "人马", "不知",
               "孔明曰", "玄德曰", "刘备", "云长"}

    # 2.分词
    words_list = jieba.lcut(words)
    for word in words_list:
        if len(word) <= 1:
            continue
        else:
            #向字典中更新字典中的值
            #counts[word] = 取出字典中原来键对应的值 + 1
            # counts[word] = counts[word] + 1  counts[word]没有就会报错
            #字典.get(k) 如果字典中没有这个键 返回 none
            counts[word] = counts.get(word, 0) + 1
    # print(counts)
    # 3.词语过滤,删除无关词、重复分词
    counts['孔明'] = counts['孔明'] + counts['孔明曰']
    counts['玄德'] = counts['玄德'] + counts['玄德曰'] + counts['刘备']
    counts['关公'] = counts['关公'] + counts['云长']
    for word in exclude:
        del counts[word]
    # 4.排序 [(), ()]
    items = list(counts.items())
    items.sort(key=lambda x: x[1], reverse=True)
    roles = []
    cishu = []
    for i in range(10):
        #序列解包
        role, count = items[i]
        roles.append(role)
        cishu.append(count)

    print(roles)
    print(cishu)
    # 距离圆心点的距离
    explode = [0.1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    colors = ['red', 'purple', 'blue', 'yellow', 'gray', 'green']
    plt.pie(cishu, explode=explode, colors=colors, shadow=True, labels=roles, autopct='%1.1f%%')
    plt.legend(loc=2)  # 图例位置
    plt.axis('equal')
    plt.show()

运行结果:

红楼梦top10人物分析及top10饼图绘制

import jieba
from wordcloud import WordCloud
import imageio
from matplotlib import pyplot as plt
plt.rcParams["font.sans-serif"] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False

# 1. 读取小说内容
with open('./novel/hongloumeng.txt', 'r', encoding='utf-8') as f:
    words = f.read()
    counts = {} #{'曹操': 234, '回寨': 56}
    exclude = {"什么", "一个", "我们", "你们", "如今", "说道", "知道", "起来", "这里",
               "出来", "众人", "那里", "自己", "一面", "只见", "太太", "两个", "没有",
               "怎么", "不是", "不知", "这个", "听见", "这样", "进来", "咱们", "就是",
               "老太太", "东西", "告诉", "回来", "只是", "大家", "姑娘", "奶奶", "凤姐儿"}

    # 2.分词
    words_list = jieba.lcut(words)
    for word in words_list:
        if len(word) <= 1:
            continue
        else:
            #向字典中更新字典中的值
            #counts[word] = 取出字典中原来键对应的值 + 1
            # counts[word] = counts[word] + 1  counts[word]没有就会报错
            #字典.get(k) 如果字典中没有这个键 返回 none
            counts[word] = counts.get(word, 0) + 1
    # print(counts)
    # 3.词语过滤,删除无关词、重复分词
    counts['贾母'] = counts['贾母'] + counts['老太太']
    counts['黛玉'] = counts['黛玉'] + counts['林黛玉']
    counts['宝玉'] = counts['宝玉'] + counts['贾宝玉']
    counts['宝钗'] = counts['宝钗'] + counts['薛宝钗']
    counts['老爷'] = counts['老爷'] + counts['贾政']
    counts['王夫人'] = counts['王夫人'] + counts['太太']
    counts['凤姐'] = counts['凤姐儿'] + counts['凤姐'] + counts['王熙凤']
    for word in exclude:
        del counts[word]
    # 4.排序 [(), ()]
    items = list(counts.items())
    # print('排序前的列表', items)
    # def sort_by_count(x):
    #     return x[1]
    # items.sort(key=sort_by_count, reverse=True)
    items.sort(key=lambda x: x[1], reverse=True)
    li = []  # ['宝玉',宝玉,宝玉,'贾母'。。。。。]
    roles = []
    cishu = []
    for i in range(10):
        #序列解包
        role, count = items[i]
        roles.append(role)
        cishu.append(count)
        print(role, count)
        # _是告诉看代码的人,循环里面不需要使用临时变量
        for _ in range(count):
            li.append(role)
    # 5.得出结论
    mask = imageio.imread('./china.jpg')
    text = ' '.join(li)

    WordCloud(
        font_path='msyh.ttc',
        background_color='white',
        width=800,
        height=600,
        mask=mask,
        # 相邻两个重复词之间的匹配
        collocations=False
    ).generate(text).to_file('hlm_top10.png')

    print(roles)
    print(cishu)
    # 距离圆心点的距离
    explode = [0.1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    colors = ['red', 'purple', 'blue', 'yellow', 'gray', 'green']
    plt.pie(cishu, explode=explode, colors=colors, shadow=True, labels=roles, autopct='%1.1f%%')
    plt.legend(loc=2)  # 图例位置
    plt.axis('equal')
    plt.show()

运行结果:

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 215,794评论 6 498
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,050评论 3 391
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 161,587评论 0 351
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,861评论 1 290
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,901评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,898评论 1 295
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,832评论 3 416
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,617评论 0 271
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,077评论 1 308
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,349评论 2 331
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,483评论 1 345
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,199评论 5 341
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,824评论 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,442评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,632评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,474评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,393评论 2 352

推荐阅读更多精彩内容