Python 自然语言处理 1.8 练习

Natural Language Processing with Python
Python 自然语言处理
1.8练习

5. Compare the lexical diversity scores for humor and romance fiction in Table 1-1. Which genre is more lexically diverse?

[√] romance fiction: 8.3
[x] humor:4.3

6. Produce a dispersion plot of the four main protagonists in Sense and Sensibility:Elinor, Marianne, Edward, and Willoughby. What can you observe about the different roles played by the males and females in this novel? Can you identify the couples?

text2.dispersion_plot(["Elinor","Marianne","Edward","Willoughby"])

7. Find the collocations(搭配) in text5 .

text5.collocations()

wanna chat; PART JOIN; MODE #14-19teens; JOIN PART; PART PART;
cute.-ass MP3; MP3 player; JOIN JOIN; times .. .; ACTION watches; guys
wanna; song lasts; last night; ACTION sits; -...)...- S.M.R.; Lime
Player; Player 12%; dont know; lez gurls; long time

8. Consider the following Python expression: len(set(text4)) . State the purpose of this expression. Describe the two steps involved in performing this computation.

text4中＂词类型＂的数目．
第一步，set(text4) 获得在text4中"词类型"的词汇表
第二部， len() 计算这个词汇表的大小（"词类型"数目）

9. 25. ◑Define sent to be the list of words ['she', 'sells', 'sea', 'shells', 'by','the', 'sea', 'shore'] . Now write code to perform the following tasks:

a. Print all words beginning with sh.

[w for w in sent if w.startswith('sh')]

b. Print all words longer than four characters

1st Solution

 [w for w in sent if len(w) >= 4]

2nd Solution

 for ab in sent:
    if len(ab) >= 4:
        print ab,

Q25

26.◑ What does the following Python code do? sum([len(w) for w in text1]) Can you use it to work out the average word length of a text?

Text1 中有999044个字符（标点符号＋ sum(每个单词长度))

>>> sum([len(w) for w in text1])/len(text1)
3