说明
本文参考答案基于Chrome,分辨率1920*1080,在其他环境表现可能会不同。
本文代码地址
- 参考书籍下载:
2018最佳人工智能数据采集(爬虫)工具书下载
Learning Selenium Testing Tools with Python-2014.pdf
Selenium自动化测试 基于 Python 语言 - 2018.pdf
selenium 上机实操: 越过浏览器查询更多记录
在"Name"框输入".",在"Page size"中输入"300"
参考答案
#!/usr/bin/python3
# -*- coding: utf-8 -*-
# 讨论钉钉免费群21745728 qq群144081101 567351477
# CreateDate: 2018-10-20
from selenium import webdriver
driver = webdriver.Firefox()
driver.implicitly_wait(30)
driver.get('http://example.webscraping.com/places/default/search')
driver.find_element_by_id('search_term').send_keys('.')
js = "document.getElementById('page_size').options[1].text = '300';"
driver.execute_script(js)
driver.find_element_by_id('search').click()
links = driver.find_elements_by_css_selector('#results a')
countries = [link.text for link in links]
print(len(countries))
print(countries)
driver.close()
本例参考书籍:用Python写网络爬虫.pdf
selenium 上机实操: 下拉刷新框中所有内容(javascript实现)
- 打开:http://www.webscrapingfordatascience.com/complexjavascript/
- 抓取框中所有内容。该框按住鼠标中键下滚,会有刷新内容,直至完全加载
参考答案
#!/usr/bin/python3
# -*- coding: utf-8 -*-
# 讨论钉钉免费群21745728 qq群144081101 567351477
# CreateDate: 2018-10-18
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
class at_least_n_elements_found(object):
def __init__(self, locator, n):
self.locator = locator
self.n = n
def __call__(self, driver):
elements = driver.find_elements(*self.locator)
if len(elements) >= self.n:
return elements
else:
return False
url = 'http://www.webscrapingfordatascience.com/complexjavascript/'
driver = webdriver.Chrome()
driver.get(url)
# Use an implicit wait for cases where we don't use an explicit one
driver.implicitly_wait(10)
div_element = driver.find_element_by_class_name('infinite-scroll')
quotes_locator = (By.CSS_SELECTOR, ".quote:not(.decode)")
nr_quotes = 0
while True:
# Scroll down to the bottom
driver.execute_script('arguments[0].scrollTop = arguments[0].scrollHeight',
div_element)
# Try to fetch at least nr_quotes+1 quotes
try:
all_quotes = WebDriverWait(driver, 3).until(
at_least_n_elements_found(quotes_locator, nr_quotes + 1))
except TimeoutException as ex:
# No new quotes found within 3 seconds, assume this is all there is
print("... done!")
break
# Otherwise, update the quote counter
nr_quotes = len(all_quotes)
print("... now seeing", nr_quotes, "quotes")
# all_quotes will contain all the quote elements
print(len(all_quotes), 'quotes found\n')
for quote in all_quotes:
print(quote.text)
input('Press ENTER to close the automated browser')
driver.quit()
- 面试问答
1.execute_script()有什么用途?