远程访问设置
查找配置文件
sudo find / -name default_scrapyd.conf
配置文件路径如下图:
编辑配置文件内容,由于默认bind_address = 127.0.0.1
现需要远程访问需要更改为bind_address = 0.0.0.0
[scrapyd]
eggs_dir = eggs
logs_dir = logs
items_dir =
jobs_to_keep = 5
dbs_dir = dbs
max_proc = 0
max_proc_per_cpu = 4
finished_to_keep = 100
poll_interval = 5.0
# bind_address = 127.0.0.1
bind_address = 0.0.0.0
http_port = 6800
debug = off
runner = scrapyd.runner
application = scrapyd.app.application
launcher = scrapyd.launcher.Launcher
webroot = scrapyd.website.Root
[services]
schedule.json = scrapyd.webservice.Schedule
cancel.json = scrapyd.webservice.Cancel
addversion.json = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json = scrapyd.webservice.ListSpiders
delproject.json = scrapyd.webservice.DeleteProject
delversion.json = scrapyd.webservice.DeleteVersion
listjobs.json = scrapyd.webservice.ListJobs
daemonstatus.json = scrapyd.webservice.DaemonStatus
启动scrapyd
scrapyd
打开浏览器,访问ip:6800
,效果如下:
scrapy本地项目配置
切换到项目所在目录,找到scrapy.cfg文件,打开文件编辑内容,将# url = http://localhost:6800/
更改为url = http://目标ip:6800/
这里的目标ip地址为192.168.137.239,编辑后内容如下:
# Automatically created by: scrapy startproject
#
# For more information about the [deploy] section see:
# https://scrapyd.readthedocs.io/en/latest/deploy.html
[settings]
default = meizhuang.settings
[deploy]
url = http://192.168.137.239:6800/
project = meizhuang
安装scrapyd-client
pip install scrapyd-client
拷贝scrapyd-deploy文件到项目根目录,anaconda安装scrapyd-client后的srapyd-deploy文件在windows中的位置C:\ProgramData\Anaconda3\Scripts
运行命令python scrapyd-deploy -l
执行结果如下:
D:\project\python\meizhuang>python scrapyd-deploy -l
default http://192.168.137.239:6800/
发布爬虫
scrapyd-deploy <target> -p <project> --version <version>
- target就是前面配置文件里deploy后面的的target名字。
- project 可以随意定义,跟爬虫的工程名字无关。
- version自定义版本号,不写的话默认为当前时间戳。
注意,爬虫目录下不要放无关的py文件,放无关的py文件会导致发布失败,但是当爬虫发布成功后,会在当前目录生成一个setup.py文件,可以删除掉。
D:\project\python\meizhuang>python scrapyd-deploy
Packing version 1542160941
Deploying to project "meizhuang" in http://192.168.137.239:6800/addversion.json
Server response (200):
{"project": "meizhuang", "status": "ok", "version": "1542160941", "node_name": "raspberrypi", "spiders": 2}
运行爬虫任务
D:\project\python\meizhuang>curl http://192.168.137.239:6800/schedule.json -d project=meizhuang -d spider=gcfts
{"status": "ok", "jobid": "6a7b48b4e7b411e89e4db827eb8b4dc9", "node_name": "raspberrypi"}
运行效果截图