注意:一定要注意浏览器与对应驱动间的版本对应关系,否则会报错。
1、Firefox浏览器
Firefox与对应的geckodriver:
火狐下载:http://ftp.mozilla.org/pub/firefox/releases/
geckodriver下载:https://github.com/mozilla/geckodriver/releases
geckodriver下载后解压放到火狐浏览器的安装文件夹下。查看Firefox的安装目录:
bash中输入:
whereis firefox
## firefox: /usr/bin/firefox /usr/lib/firefox /etc/firefox /usr/share/man/man1/firefox.1.gz
默认是在/usr/lib/firefox文件夹中。不确定的话可以再次查询:
ll /usr/bin/firefox
2、chrome浏览器
下载地址:http://npm.taobao.org/mirrors/chromedriver/
下载后将chrome driver解压放到chrome的安装目录,查询目录位置:
which google-chrome-stable
## /usr/bin/google-chrome-stable
但这个不是真正的目录,只是一个软连接,再次查询:
ll /usr/bin/google-chrome-stable
## lrwxrwxrwx 1 root root 32 5月 15 2018 /usr/bin/google-chrome-stable -> /opt/google/chrome/google-chrome*
得到真实安装目录为/opt/google/chrome/。
Google Chrome与chrome driver对应表:
Google Chrome Linux Version:64-bit deb for Ubuntu/Debian
Version | Size | Date |
---|---|---|
86.0.4240.75 | 67.85 MB | 2020-10-7 |
84.0.4147.135 | 66.36 MB | 2020-08-20 |
83.0.4103.116 | 65.47 MB | 2020-07-6 |
81.0.4044.92 | 63.58 MB | 2020-04-13 |
80.0.3987.149 | 60.21 MB | 2020-03-23 |
79.0.3945.88 | 59.3 MB | 2019-12-29 |
78.0.3904.97 | 59.49 MB | 2019-11-12 |
76.0.3809.100 | 56.72 MB | 2019-08-15 |
75.0.3770.80 | 56.21 MB | 2019-06-5 |
71.0.3578.80 | 53.98 MB | 2018-12-11 |
70.0.3538.77 | 53.46 MB | 2018-11-6 |
69.0.3497.92 | 52.27 MB | 2018-09-16 |
68.0.3440.84 | 51.57 MB | 2020-04-29 |
67.0.3396.79 | 50.1 MB | 2020-04-29 |
66.0.3359.181 | 49.91 MB | 2020-04-29 |
65.0.3325.181 | 49.72 MB | 2020-04-29 |
64.0.3282.140 | 49.29 MB | 2020-04-29 |
63.0.3239.108 | 46.76 MB | 2020-04-29 |
62.0.3202.75 | 46.47 MB | 2020-04-29 |
61.0.3163.79 | 62.5 MB | 2020-04-29 |
60.0.3112.90 | 55.65 MB | 2020-04-29 |
59.0.3071.86 | 58.02 MB | 2020-04-29 |
58.0.3029.96 | 51.44 MB | 2020-04-29 |
57.0.2987.133 | 45.13 MB | 2020-04-29 |
56.0.2924.87 | 43.77 MB | 2020-04-29 |
55.0.2883.75 | 43.96 MB | 2020-04-29 |
54.0.2840.71 | 43.42 MB | 2020-04-29 |
53.0.2785.116 | 47.87 MB | 2020-04-29 |
52.0.2743.116 | 46.98 MB | 2020-04-29 |
51.0.2704.84 | 47.17 MB | 2020-04-29 |
50.0.2661.75 | 46.12 MB | 2020-04-29 |
49.0.2623.75 | 46.5 MB | 2020-04-29 |
48.0.2564.109 | 45.84 MB | 2020-04-29 |
3、使用
使用前一定要提前安装好java,下载地址:https://www.java.com/en/download/,安装配置过程网上搜索。
下载selenium standalone版本,并放至指定位置,下载地址:http://www.seleniumhq.org/download/
然后手动运行selenium-server-standalone,启动selenium。在R或RStudio中运行:
system("java -jar ~/selenium-server-standalone-4.0.0-alpha-1.jar", wait = FALSE)
或者手动打开bash命令,注意运行时不要关闭窗口:
java -jar ~/selenium-server-standalone-4.0.0-alpha-1.jar
library(pacman)
p_load(RSelenium)
# 前面已经运行过
# system("java -jar ~/selenium-server-standalone-4.0.0-alpha-1.jar", wait = FALSE)
# 连接并打开chrome
remDr <- remoteDriver(browserName = "chrome")
# 打开浏览器
remDr$open()
可以开始爬虫了。