更复杂的用户输入
这一章是分析用户输入,有点人工智能的意思了,哈哈。
用户在输入命令时,open door
与open the door
应当是一个意思,现在交给程序去判断。
首先得从英语组成上分析
句子由单词组成
单词与单词之间通过空格间隔
单词有动词、名词、修饰词、数字等构成
句子的意思由语法控制
所以分析一个句子,首先得将它拆分成单词,然后分析每个单词的类型,最后将其重组为指令。
获取用户输入,拆分成单词
stuff = raw_input('> ')
words = stuff.split() #返回一个列表
分析单词类型
使用(type,word)元组来保存单词类型对
first_word = ('direction','north')
second_word = ('verb','go')
sentence = [first_word,second_word]
单元测试
书中提供了测试用例,
from nose.tools import *
from EX48 import lexicon
def test_directions():
assert_equal(lexicon.scan("north"),[('direction','north')])
result = lexicon.scan("north south east")
assert_equal(result,[('direction','north'),
('direction','south'),
('direction','east')])
def test_verbs():
assert_equal(lexicon.scan("go"),[('verb','go')])
result = lexicon.scan("go kill eat")
assert_equal(result,[('verb','go'),
('verb','kill'),
('verb','eat')])
def test_stops():
assert_equal(lexicon.scan("the"),[('stop','the')])
result = lexicon.scan("the in of")
assert_equal(result, [('stop','the'),
('stop','in'),
('stop','of')])
def test_nouns():
assert_equal(lexicon.scan("bear"),[('noun','bear')])
result = lexicon.scan("bear princess")
assert_equal(result, [('noun','bear'),
('noun','princess')])
def test_numbers():
assert_equal(lexicon.scan('1234'),[('number',1234)])
result = lexicon.scan("3 91234")
assert_equal(result,[('number',3),
('number',91234)])
def test_errors():
assert_equal(lexicon.scan('ASDFADFASDF'),[('error','ASDFADFASDF')])
result = lexicon.scan("bear IAS princess")
assert_equal(result,[('noun','bear'),
('error','IAS'),
('noun','princess')])
根据测试用例写出词汇扫描器。
通过assert_equal函数可以发现
lexicon中有个带字符串参数的scan函数
词汇类型有‘direction’、'number'、'noun'、'stop'、'verb'、'error'
再增加一个名为'unkown'的类型以便收集预定词汇表中没有的单词
scan函数的返回值是一个列表,列表的元素是(type,word)元组对
词汇扫描器
应该有个预定列表来保存常用的单词和它所代表的类型
当获取用户输入后,拆分成词,与预定的词汇类型表对比获取单词类型,返回多个(type,word)元组
def scan(stuff):
sentence = []
directions = ['north','south','east']
verbs = ['go','kill','eat']
stops = ['in','of','the']
nouns = ['bear','princess']
numbers = [3,91234,1234]
errors = ['IAS','ASDFADFASDF']
words = stuff.split()
for word in words:
if word in directions:
sentence.append(('direction',word))
elif word in verbs:
sentence.append(('verb',word))
elif word in stops:
sentence.append(('stop',word))
elif word in nouns:
sentence.append(('noun',word))
elif word in errors:
sentence.append(('error',word))
elif int(word) in numbers:
sentence.append(('number',int(word)))
else:
sentence.append(('unkown',word))
return sentence
执行nosetests
damao@damao:~/Documents/ex48$ nosetests
.........
~----------------------------------------------------------------------
Ran 9 tests in 0.005sOK
这个扫描器可以再改进。
def scan(stuff):
sentence = []
directions = ['north','south','east']
verbs = ['go','kill','eat']
stops = ['in','of','the']
nouns = ['bear','princess']
numbers = [3,91234,1234]
errors = ['IAS','ASDFADFASDF']
words = stuff.split()
for word in words:
try:
intword = int(word)
sentence.append(('number',int(word)))
except ValueError:
if word in directions:
sentence.append(('direction',word))
elif word in verbs:
sentence.append(('verb',word))
elif word in stops:
sentence.append(('stop',word))
elif word in nouns:
sentence.append(('noun',word))
elif word in errors:
sentence.append(('error',word))
else:
sentence.append(('unkown',word))
return sentence
print scan("go north")
print scan("kill the princess")
print scan("eat the bear")
print scan("open the door and smack the bear in the nose")
print scan("open 1234 door")
单独运行输出效果
damao@damao:~/Documents/ex48/EX48$ python lexicon.py
[('verb', 'go'), ('direction', 'north')]
[('verb', 'kill'), ('stop', 'the'), ('noun', 'princess')]
[('verb', 'eat'), ('stop', 'the'), ('noun', 'bear')]
[('unkown', 'open'), ('stop', 'the'), ('unkown', 'door'), ('unkown', 'and'), ('unkown', 'smack'), ('stop', 'the'), ('noun', 'bear'), ('stop', 'in'), ('stop', 'the'), ('unkown', 'nose')]
[('unkown', 'open'), ('number', 1234), ('unkown', 'door')]
可以正常输入元组列表。
使用骨架目录,以一个新项目形式生成,项目名字叫EX48