1.安装依赖
[root@iZ2zegaforshlunfo6xw8qZ~]# yum -y groupinstall "Development tools"
[root@hadron ~]# yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel --skip-broken
2.安装Python(略,可自行搜索教程)
3.安装Scrapy爬虫框架
[root@iZ2zegaforshlunfo6xw8qZ~]# pip3 install scrapy
Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/Collecting scrapy Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/9a/d3/5af102af577f57f706fcb302ea47d40e09355778488de904b3594d4e48d2/Scrapy-2.1.0-py2.py3-none-any.whl (239 kB) |████████████████████████████████| 239 kB 3.8 MB/s Collecting service-identity>=16.0.0 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/e9/7c/2195b890023e098f9618d43ebc337d83c8b38d414326685339eb024db2f6/service_identity-18.1.0-py2.py3-none-any.whl (11 kB)Collecting parsel>=1.5.0 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/23/1e/9b39d64cbab79d4362cdd7be7f5e9623d45c4a53b3f7522cd8210df52d8e/parsel-1.6.0-py2.py3-none-any.whl (13 kB)Collecting w3lib>=1.17.0 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/a3/59/b6b14521090e7f42669cafdb84b0ab89301a42f1f1a82fcf5856661ea3a7/w3lib-1.22.0-py2.py3-none-any.whl (20 kB)Requirement already satisfied: lxml>=3.5.0 in /usr/local/python3/lib/python3.8/site-packages (from scrapy) (4.5.0)Collecting PyDispatcher>=2.0.5 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/cd/37/39aca520918ce1935bea9c356bcbb7ed7e52ad4e31bff9b943dfc8e7115b/PyDispatcher-2.0.5.tar.gz (34 kB)Collecting cssselect>=0.9.1 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/3b/d4/3b5c17f00cce85b9a1e6f91096e1cc8e8ede2e1be8e96b87ce1ed09e92c5/cssselect-1.1.0-py2.py3-none-any.whl (16 kB)Collecting zope.interface>=4.1.3 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/45/87/0d0c79724621056e39ac0385d0171fba3e92645b7947b143347aecf3069f/zope.interface-5.1.0-cp38-cp38-manylinux2010_x86_64.whl (243 kB) |████████████████████████████████| 243 kB 90.2 MB/s Requirement already satisfied: cryptography>=2.0 in /usr/local/python3/lib/python3.8/site-packages (from scrapy) (2.9.2)Collecting protego>=0.1.15 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/db/6e/bf6d5e4d7cf233b785719aaec2c38f027b9c2ed980a0015ec1a1cced4893/Protego-0.1.16.tar.gz (3.2 MB) |████████████████████████████████| 3.2 MB 34.6 MB/s Collecting Twisted>=17.9.0 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/4a/b4/4973c7ccb5be2ec0abc779b7d5f9d5f24b17b0349e23240cfc9dc3bd83cc/Twisted-20.3.0.tar.bz2 (3.1 MB) |████████████████████████████████| 3.1 MB 3.8 MB/s ERROR: Command errored out with exit status 1: command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-pzjcmemj/Twisted/setup.py'"'"'; __file__='"'"'/tmp/pip-install-pzjcmemj/Twisted/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-1wa67ju3 cwd: /tmp/pip-install-pzjcmemj/Twisted/ Complete output (33 lines): WARNING: The repository located at mirrors.cloud.aliyuncs.com is not a trusted or secure host and is being ignored. If this repository is available via HTTPS we recommend you use HTTPS instead, otherwise you may silence this warning and allow it anyway with '--trusted-host mirrors.cloud.aliyuncs.com'. ERROR: Could not find a version that satisfies the requirement incremental>=16.10.1 (from versions: none) ERROR: No matching distribution found for incremental>=16.10.1 Traceback (most recent call last): File "/usr/local/python3/lib/python3.8/site-packages/setuptools/installer.py", line 128, in fetch_build_egg subprocess.check_call(cmd) File "/usr/local/python3/lib/python3.8/subprocess.py", line 364, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'pip', '--disable-pip-version-check', 'wheel', '--no-deps', '-w', '/tmp/tmpn_8n87uq', '--quiet', '--index-url', 'http://mirrors.cloud.aliyuncs.com/pypi/simple/', 'incremental>=16.10.1']' returned non-zero exit status 1. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<string>", line 1, in <module> File "/tmp/pip-install-pzjcmemj/Twisted/setup.py", line 20, in <module> setuptools.setup(**_setup["getSetupArgs"]()) File "/usr/local/python3/lib/python3.8/site-packages/setuptools/__init__.py", line 143, in setup _install_setup_requires(attrs) File "/usr/local/python3/lib/python3.8/site-packages/setuptools/__init__.py", line 138, in _install_setup_requires dist.fetch_build_eggs(dist.setup_requires) File "/usr/local/python3/lib/python3.8/site-packages/setuptools/dist.py", line 695, in fetch_build_eggs resolved_dists = pkg_resources.working_set.resolve( File "/usr/local/python3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 781, in resolve dist = best[req.key] = env.best_match( File "/usr/local/python3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1066, in best_match return self.obtain(req, installer) File "/usr/local/python3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1078, in obtain return installer(requirement) File "/usr/local/python3/lib/python3.8/site-packages/setuptools/dist.py", line 754, in fetch_build_egg return fetch_build_egg(self, req) File "/usr/local/python3/lib/python3.8/site-packages/setuptools/installer.py", line 130, in fetch_build_egg raise DistutilsError(str(e)) distutils.errors.DistutilsError: Command '['/usr/bin/python3', '-m', 'pip', '--disable-pip-version-check', 'wheel', '--no-deps', '-w', '/tmp/tmpn_8n87uq', '--quiet', '--index-url', 'http://mirrors.cloud.aliyuncs.com/pypi/simple/', 'incremental>=16.10.1']' returned non-zero exit status 1. ----------------------------------------ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
其中加粗部分显示报错信息:
DistutilsError: Command '['/usr/bin/python3', '-m', 'pip', '--disable-pip-version-check', 'wheel', '--no-deps', '-w', '/tmp/tmpn_8n87uq', '--quiet', '--index-url', 'http://mirrors.cloud.aliyuncs.com/pypi/simple/', 'incremental>=16.10.1']' returned non-zero exit status 1.
大概意思是说要求incremental>=16.10.1 但是系统检测到未满足条件,所以返回了status 1.解决办法是安装最新的incremental依赖:
[root@iZ2zegaforshlunfo6xw8qZ~]# pip3 install incremental
安装成功后再次运行sudo pip3 install scrapy命令:
[root@iZ2zegaforshlunfo6xw8qZ ~]# pip3 install scrapy
Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/Collecting scrapy Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/9a/d3/5af102af577f57f706fcb302ea47d40e09355778488de904b3594d4e48d2/Scrapy-2.1.0-py2.py3-none-any.whl (239 kB) |████████████████████████████████| 239 kB 4.3 MB/s Collecting queuelib>=1.4.2 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/4c/85/ae64e9145f39dd6d14f8af3fa809a270ef3729f3b90b3c0cf5aa242ab0d4/queuelib-1.5.0-py2.py3-none-any.whl (13 kB)Collecting service-identity>=16.0.0 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/e9/7c/2195b890023e098f9618d43ebc337d83c8b38d414326685339eb024db2f6/service_identity-18.1.0-py2.py3-none-any.whl (11 kB)Collecting w3lib>=1.17.0 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/a3/59/b6b14521090e7f42669cafdb84b0ab89301a42f1f1a82fcf5856661ea3a7/w3lib-1.22.0-py2.py3-none-any.whl (20 kB)Collecting PyDispatcher>=2.0.5 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/cd/37/39aca520918ce1935bea9c356bcbb7ed7e52ad4e31bff9b943dfc8e7115b/PyDispatcher-2.0.5.tar.gz (34 kB)Collecting zope.interface>=4.1.3 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/45/87/0d0c79724621056e39ac0385d0171fba3e92645b7947b143347aecf3069f/zope.interface-5.1.0-cp38-cp38-manylinux2010_x86_64.whl (243 kB) |████████████████████████████████| 243 kB 8.8 MB/s Collecting protego>=0.1.15 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/db/6e/bf6d5e4d7cf233b785719aaec2c38f027b9c2ed980a0015ec1a1cced4893/Protego-0.1.16.tar.gz (3.2 MB) |████████████████████████████████| 3.2 MB 36.1 MB/s Requirement already satisfied: cryptography>=2.0 in /usr/local/python3/lib/python3.8/site-packages (from scrapy) (2.9.2)Requirement already satisfied: pyOpenSSL>=16.2.0 in /usr/local/python3/lib/python3.8/site-packages (from scrapy) (19.1.0)Collecting Twisted>=17.9.0 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/4a/b4/4973c7ccb5be2ec0abc779b7d5f9d5f24b17b0349e23240cfc9dc3bd83cc/Twisted-20.3.0.tar.bz2 (3.1 MB) |████████████████████████████████| 3.1 MB 4.4 MB/s Requirement already satisfied: lxml>=3.5.0 in /usr/local/python3/lib/python3.8/site-packages (from scrapy) (4.5.0)Collecting cssselect>=0.9.1 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/3b/d4/3b5c17f00cce85b9a1e6f91096e1cc8e8ede2e1be8e96b87ce1ed09e92c5/cssselect-1.1.0-py2.py3-none-any.whl (16 kB)Collecting parsel>=1.5.0 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/23/1e/9b39d64cbab79d4362cdd7be7f5e9623d45c4a53b3f7522cd8210df52d8e/parsel-1.6.0-py2.py3-none-any.whl (13 kB)Collecting pyasn1-modules Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/95/de/214830a981892a3e286c3794f41ae67a4495df1108c3da8a9f62159b9a9d/pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB) |████████████████████████████████| 155 kB 8.7 MB/s Collecting attrs>=16.0.0 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/a2/db/4313ab3be961f7a763066401fb77f7748373b6094076ae2bda2806988af6/attrs-19.3.0-py2.py3-none-any.whl (39 kB)Collecting pyasn1 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/62/1e/a94a8d635fa3ce4cfc7f506003548d0a2447ae76fd5ca53932970fe3053f/pyasn1-0.4.8-py2.py3-none-any.whl (77 kB) |████████████████████████████████| 77 kB 78.2 MB/s Requirement already satisfied: six>=1.4.1 in /usr/local/python3/lib/python3.8/site-packages (from w3lib>=1.17.0->scrapy) (1.14.0)Requirement already satisfied: setuptools in /usr/local/python3/lib/python3.8/site-packages (from zope.interface>=4.1.3->scrapy) (47.1.1)Requirement already satisfied: cffi!=1.11.3,>=1.8 in /usr/local/python3/lib/python3.8/site-packages (from cryptography>=2.0->scrapy) (1.14.0)Collecting constantly>=15.1 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/b9/65/48c1909d0c0aeae6c10213340ce682db01b48ea900a7d9fce7a7910ff318/constantly-15.1.0-py2.py3-none-any.whl (7.9 kB)Requirement already satisfied: incremental>=16.10.1 in /usr/local/python3/lib/python3.8/site-packages (from Twisted>=17.9.0->scrapy) (17.5.0)Collecting Automat>=0.3.0 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/dd/83/5f6f3c1a562674d65efc320257bdc0873ec53147835aeef7762fe7585273/Automat-20.2.0-py2.py3-none-any.whl (31 kB)Collecting hyperlink>=17.1.1 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/7f/91/e916ca10a2de1cb7101a9b24da546fb90ee14629e23160086cf3361c4fb8/hyperlink-19.0.0-py2.py3-none-any.whl (38 kB)Collecting PyHamcrest!=1.10.0,>=1.9.0 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/40/16/e54cc65891f01cb62893540f44ffd3e8dab0a22443e1b438f1a9f5574bee/PyHamcrest-2.0.2-py3-none-any.whl (52 kB) |████████████████████████████████| 52 kB 34.0 MB/s Requirement already satisfied: pycparser in /usr/local/python3/lib/python3.8/site-packages (from cffi!=1.11.3,>=1.8->cryptography>=2.0->scrapy) (2.20)Requirement already satisfied: idna>=2.5 in /usr/local/python3/lib/python3.8/site-packages (from hyperlink>=17.1.1->Twisted>=17.9.0->scrapy) (2.9)Building wheels for collected packages: PyDispatcher, protego, Twisted Building wheel for PyDispatcher (setup.py) ... done Created wheel for PyDispatcher: filename=PyDispatcher-2.0.5-py3-none-any.whl size=11515 sha256=8e02fc1fe7a7c370afdb7a9ca1444165cd92e91eea545f280a31a3a094a1dcde Stored in directory: /root/.cache/pip/wheels/f4/8a/2f/3888c02609d5e31c3ce52d11e865ced1e67e2b7ad196145414 Building wheel for protego (setup.py) ... done Created wheel for protego: filename=Protego-0.1.16-py3-none-any.whl size=7765 sha256=ea5d8dd4472010aeb8f67012c31223234721c77b90db8168bfb48c5017f4aecb Stored in directory: /root/.cache/pip/wheels/7b/e4/16/a1b04c3547b913e8898894d84c92efe23ed1b52db62cfaf2e9 Building wheel for Twisted (setup.py) ... done Created wheel for Twisted: filename=Twisted-20.3.0-cp38-cp38-linux_x86_64.whl size=3076155 sha256=7e2cd1aae813872858c2dd344bb0a92e41289a90e0b03f98315d2cec879c9f42 Stored in directory: /root/.cache/pip/wheels/03/e1/89/0c492632a418a54778123a939e3cac6719e7a93795661175a1Successfully built PyDispatcher protego TwistedInstalling collected packages: queuelib, pyasn1, pyasn1-modules, attrs, service-identity, w3lib, PyDispatcher, zope.interface, protego, constantly, Automat, hyperlink, PyHamcrest, Twisted, cssselect, parsel, scrapySuccessfully installed Automat-20.2.0 PyDispatcher-2.0.5 PyHamcrest-2.0.2 Twisted-20.3.0 attrs-19.3.0 constantly-15.1.0 cssselect-1.1.0 hyperlink-19.0.0 parsel-1.6.0 protego-0.1.16 pyasn1-0.4.8 pyasn1-modules-0.2.8 queuelib-1.5.0 scrapy-2.1.0 service-identity-18.1.0 w3lib-1.22.0 zope.interface-5.1.0
4.查看Scrapy的安装位置
[root@iZ2zegaforshlunfo6xw8qZ ~]# whereis scrapy
scrapy: /usr/local/python3/bin/scrapy
4.验证scrapy的版本信息
[root@iZ2zegaforshlunfo6xw8qZ ~]# /usr/local/python3/bin/scrapy -v
Scrapy 2.1.0 - no active projectUsage: scrapy <command> [options] [args]Available commands: bench Run quick benchmark test fetch Fetch a URL using the Scrapy downloader genspider Generate new spider using pre-defined templates runspider Run a self-contained spider (without creating a project) settings Get settings values shell Interactive scraping console startproject Create new project version Print Scrapy version view Open URL in browser, as seen by Scrapy [ more ] More commands available when run from project directoryUse "scrapy <command> -h" to see more info about a command
每次使用scrapy都需要 在/usr/local/python3/bin/scrapy目录下使用,直接使用scrapy命令的解决方法:
(1)在环境变量中添加scrapy的路径
vi /etc/profile
将下面的代码添加到最后一行
export SCRAPY_HOME=/usr/local/python3/
export PATH=$PATH:$SCRAPY_HOME/bin
或者直接使用:
export PATH=$PATH:/usr/local/python3/bin
然后执行resource /etc/profile,是修改生效
(2)建立软连接
[root@iZ2zegaforshlunfo6xw8qZ ~]# ln -s /usr/local/python3/bin/scrapy /usr/bin/scrapy
在控制台输入命令:scrapy -v .:
[root@iZ2zegaforshlunfo6xw8qZ ~]# scrapy -v
Scrapy 2.1.0 - no active projectUsage: scrapy <command> [options] [args]Available commands: bench Run quick benchmark test fetch Fetch a URL using the Scrapy downloader genspider Generate new spider using pre-defined templates runspider Run a self-contained spider (without creating a project) settings Get settings values shell Interactive scraping console startproject Create new project version Print Scrapy version view Open URL in browser, as seen by Scrapy [ more ] More commands available when run from project directoryUse "scrapy <command> -h" to see more info about a command
展示效果如上,表示已经安装成功,赶快开启你的爬虫之旅吧 !