Python nltk 英文词性分析

在NLP任务中,常需要分析单词的词性,借助nltk库的pos_tag方法可以较好地实现。

以下是一个例子:

import nltk
line = 'i love this world which was beloved by all the people here'
tokens = nltk.word_tokenize(line)
# ['i', 'love', 'this', 'world', 'which', 'was', 'beloved', 'by', 
# 'all', 'the', 'people', 'here']
pos_tags = nltk.pos_tag(tokens)
# [('i', 'RB'), ('love', 'VBP'), ('this', 'DT'), ('world', 'NN'), ('which', 'WDT'), 
# ('was', 'VBD'), ('beloved', 'VBN'), ('by', 'IN'), ('all', 'PDT'), ('the', 'DT'),
# ('people', 'NNS'), ('here', 'RB')]
for word,pos in pos_tags:
     if (pos == 'NN' or pos == 'NNP' or pos == 'NNS' or pos == 'NNPS'):
             print word,pos
# world NN
# people NNS

作为nltk的替代,TextBlob库能够更进一步进行词组划分,例如“computer science”会被当做一个单词,而非"computer"和"science"

from textblob import TextBlob
txt = """Natural language processing (NLP) is a field of computer science, artificial intelligence, and computational linguistics concerned with the inter
actions between computers and human (natural) languages."""
blob = TextBlob(txt)
print(blob.noun_phrases)
# [u'natural language processing', 'nlp', u'computer science', u'artificial intelligence', u'computational linguistics']

更多例子请参考nltk官方教科书第五章
其中pos_tag分析出来的词性含义按照宾夕法尼亚大学tag词性对照表

tag 含义
CC Coordinating conjunction
CD Cardinal number
DT Determiner
EX Existential there
FW Foreign word
IN Preposition or subordinating conjunction
JJ Adjective
JJR Adjective, comparative
JJS Adjective, superlative
LS List item marker
MD Modal
NN Noun, singular or mass
NNS Noun, plural
NNP Proper noun, singular
NNPS Proper noun, plural
PDT Predeterminer
POS Possessive ending
PRP Personal pronoun
PRP$ Possessive pronoun
RB Adverb
RBR Adverb, comparative
RBS Adverb, superlative
RP Particle
SYM Symbol
TO to
UH Interjection
VB Verb, base form
VBD Verb, past tense
VBG Verb, gerund or present participle
VBN Verb, past participle
VBP Verb, non-3rd person singular present
VBZ Verb, 3rd person singular present
WDT Wh-determiner
WP Wh-pronoun
WP$ Possessive wh-pronoun
WRB Wh-adverb
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容