Python 自然语言处理 1.8 练习

Natural Language Processing with Python
Python 自然语言处理
1.8练习


5. Compare the lexical diversity scores for humor and romance fiction in Table 1-1. Which genre is more lexically diverse?

  • [√] romance fiction: 8.3
  • [x] humor:4.3

6. Produce a dispersion plot of the four main protagonists in Sense and Sensibility:Elinor, Marianne, Edward, and Willoughby. What can you observe about the different roles played by the males and females in this novel? Can you identify the couples?

text2.dispersion_plot(["Elinor","Marianne","Edward","Willoughby"])

7. Find the collocations(搭配) in text5 .

text5.collocations()
wanna chat; PART JOIN; MODE #14-19teens; JOIN PART; PART PART;
cute.-ass MP3; MP3 player; JOIN JOIN; times .. .; ACTION watches; guys
wanna; song lasts; last night; ACTION sits; -...)...- S.M.R.; Lime
Player; Player 12%; dont know; lez gurls; long time

8. Consider the following Python expression: len(set(text4)) . State the purpose of this expression. Describe the two steps involved in performing this computation.

text4中"词类型"的数目.
第一步,set(text4) 获得在text4中"词类型"的词汇表
第二部, len() 计算这个词汇表的大小("词类型"数目)


9.


25. ◑Define sent to be the list of words ['she', 'sells', 'sea', 'shells', 'by','the', 'sea', 'shore'] . Now write code to perform the following tasks:

a. Print all words beginning with sh.
[w for w in sent if w.startswith('sh')]
b. Print all words longer than four characters
  • 1st Solution
 [w for w in sent if len(w) >= 4]
  • 2nd Solution
 for ab in sent:
    if len(ab) >= 4:
        print ab,
Q25

26.◑ What does the following Python code do? sum([len(w) for w in text1]) Can you use it to work out the average word length of a text?

Text1 中有999044个字符(标点符号 + sum(每个单词长度))

>>> sum([len(w) for w in text1])/len(text1)
3

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。