- 安装nltk
pip install nltk
- 下载nltk_data
clone https://github.com/nltk/nltk_data.git
- 解压缩
tar -xvf nltk_data-gh-pages.zip
- 将
packages
重命名为nltk_data
cd ./nltk_data-gh-pages
mv ./packages ./nltk_data
- 查看
nltk_data
的安装路径,运行以下python命令
import nltk
print(nltk.data.path)
- 将下载好的
nltk_data
复制到任意安装路径下,注意,上一步中输出的路径包含.../nltk_data/
,但如果之前未安装过nltk_data,该目录是不存在的,所以我们可以直接复制到该目录的上级目录,举例来说:
% 如果第六步中的输出结果如下,选择其中的某路径:
[..., '/usr/local/share/nltk_data', ...]
% 那么 /usr/local/share/nltk_data 该路径可能会不存在,可以通过如下命令验证,在命令行中输入:
ls /usr/local/share/nltk_data
% 如果得到:
ls: /usr/local/share/nltk_data: No such file or directory
% 则说明该路径不存在
% 那么我们将 nltk_data 复制到 /usr/local/share
cp -r ./nltk_dta /usr/local/share
- 当需要使用某模块时,需要将该模块下的压缩文件解压
unzip ./nltk_data/tokenizers/punkt.zip
- 测试
from nltk.book import *
#输出以下结果则安装成功
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908