python学习之文章数据分析

通常我们在进行NLP学习的时候,会经常的处理一些语料,同时也会对这些语料进行一些分析,今天的这篇文章我们通过分析quora上的Andrew NG的一个回答来实际操作一下:

原文复制如下:

Deep Learning is an amazing tool that is helping numerous groups create exciting AI applications. It is helping us build self-driving cars, accurate speech recognition, computers that can understand images, and much more.Despite all the recent progress, I still see huge untapped opportunities ahead. There're many projects in precision agriculture, consumer finance, medicine, ... where I see a clear opportunity for deep learning to have a big impact, but that none of us have had time to focus on yet. So I'm confident deep learning isn't going to "plateau" anytime soon and that it'll continue to grow rapidly.Deep Learning has also been overhyped. Because neural networks are very technical and hard to explain, many of us used to explain it by drawing an analogy to the human brain. But we have pretty much no idea how the biological brain works. UC Berkeley's Michael Jordan calls deep learning a "cartoon" of the biological brain--a vastly oversimplified version of something we don't even understand--and I agree. Despite the media hype, we're nowhere near being able to build human-level intelligence. Because we fundamentally don't know how the brain works, attempts to blindly replicate what little we know in a computer also has not resulted in particularly useful AI systems. Instead, the most effective deep learning work today has made its progress by drawing from CS and engineering principles and at most a touch of biological inspiration, rather than try to blindly copy biology.Concretely, if you hear someone say "The brain does X. My system also does X. Thus we're on a path to building the brain," my advice is to run away!Many of the ideas used in deep learning have been around for decades. Why is it taking off only now? Two of the key drivers of its progress are: (i) scale of data and (ii) scale of computation. With our society spending more time on websites and mobile devices, for the past two decades we've been rapidly accumulating data. It was only recently that we figured out how to scale computation so as to build deep learning algorithms that can take advantage of this voluminous amount of data.This has now put us in two positive feedback loops, which is accelerating the progress of deep learning:First, now that we have huge machines to absorb huge amounts of data, the value of big data is clearer. This creates a greater incentive to acquire more data, which in turn creates a greater incentive to build bigger/faster neural networks.Second, that we have fast deep learning implementations also speeds up innovation, and accelerates deep learning's research progress. Many people underestimate the impact of computer systems investments in deep learning. When carrying out deep learning research, we start out not knowing what algorithms will and won't work, and our job is to run a lot of experiments and figure it out. If we have an efficient compute infrastructure that lets you run an experiment in a day rather than a week, then your research progress could be almost 7x as fast!This is why around 2008 my group at Stanford started advocating shifting deep learning to GPUs (this was really controversial at that time; but now everyone does it); and I'm now advocating shifting to HPC (High Performance Computing/Supercomputing) tactics for scaling up deep learning. Machine learning should embrace HPC. These methods will make researchers more efficient and help accelerate the progress of our whole field.To summarize: Deep learning has already helped AI made tremendous progress. But the best is still to come!

我们分析这篇文章有两个需求,一个是分析一篇文章当中的词频,另外一个是每一个词出现的次数,而我们也将奔着这两个目标去处理:

这里我们要用到matplotlib这个模块来进行图像的绘制:

1:分词处理

英文文章一个好处是他们每个词之间会有空格来进行区分,但是词和词之间往往会有句号,逗号这样的标点来去干扰,因此我们是通过string这个模块来去除标点和空格,其中string.punctuation是去除标点,string.whitespace是去除空格.至于hist[word]=hist.get(word,0)+1,这句话等同于上边的if-else,这里记录的是每一个单词和这个单词出现的次数.

结果如下:

2:排序处理

这一个函数是在上文获取了每一个单词和这个单词出现的次数之后,他不是有顺序的,,在这里我们要用数组的排序来处理一下,数组有一个sort()函数,可以从大到小进行排序.

结果如下:

3:绘图处理:

这里用的matplotlib绘图大家都很熟悉了,绘制出来

其实本来应该下边包含有标签,比如下边:

这样应该是最好的,但是我换了一台电脑后发现最下边的标签实在是太丑了,拥挤不堪,于是就去掉了,如果有兴趣的小伙伴可以自己再加上.

完整代码如下:


©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,919评论 6 502
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,567评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 163,316评论 0 353
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,294评论 1 292
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,318评论 6 390
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,245评论 1 299
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,120评论 3 418
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,964评论 0 275
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,376评论 1 313
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,592评论 2 333
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,764评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,460评论 5 344
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,070评论 3 327
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,697评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,846评论 1 269
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,819评论 2 370
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,665评论 2 354

推荐阅读更多精彩内容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 7,324评论 0 10
  • 我准备了三十三首情诗 和打算托付余生的深情 要是这雨不会停 我愿意在这里等你 等到白雪公主爱上骑士 要是明天不会来...
    三日月见森阅读 372评论 0 0
  • 本文是活的优雅系列中的第2篇,这个系列是在阅读《活的优雅,做时间的主人》一书的反思。以书为镜,剖析自己的问题,给出...
    IT生涯刘老师阅读 517评论 4 3
  • “又见到你了,真巧。” 银子小姐说这话的时候是在春天了,我正闭着眼坐在公园的长椅上,享受久违的假期,被她一句问候惊...
    卿月未溟阅读 460评论 0 1