背景
- python
- selenium
- 截图
问题
- 网站反应慢,滚动截图出现不全的情况
目标
- 等待网页加载完成再截图
过程
- 以下方法使用效果都不好,还是会出现截图不完整的情况
- 网站加载过程
- 接口获取数据
- dom加载
- 图片下载
- 网页渲染展示
- selenium 等待方法
- 智能等待
# 等待详情加载
WebDriverWait(self.browser, 60).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'div.extra-image')))
- time.sleep
time.sleep(10)
- 检测滚动截图是否完整
- 商品详情无内容时,截图像素高度 938
- 长度 938 - 40(图片编辑添加的地址栏高度) = 898
- 如果连续三次截图 长度不变 视为通过
for retry in range(1, 10):
current_url = self.browser.current_url
height = self.browser.execute_script(js_height)
self.logger.info(
'=====> scroll screenshot height is {},url is {}'.format(height, current_url))
if height > 898:
self.logger.info(
'=====> check height > 898 , scroll screenshot url is: {}'.format(current_url))
for check_times in range(1, 5):
html_pic_bytes = self.selenium_firefox.get_screenshot_scroll(scroll=500,
scroll_interval=3)
new_height = self.browser.execute_script(js_height)
if old_height == new_height:
same_add = same_add + 1
self.logger.info(
'=====> retry:scroll screenshot height is same,retry {} times,url is: {}'.format(
check_times, current_url))
else:
self.logger.info(
'=====> erro:scroll screenshot,retry {} times,url is: {}'.format(
check_times, current_url))
if same_add > 3:
self.logger.info(
'=====> correct:scroll screenshot url is: {}'.format(current_url))
break
old_height = new_height
time.sleep(5)
break
else:
self.logger.info(
'=====> check height < 898 , scroll screenshot,retry {} times, url is: {}'.format(
retry, current_url))
# 附近数量大于1的,多等会
if len(imgs) > 1:
time.sleep(10)
else:
time.sleep(5)
html_pic_bytes = self.selenium_firefox.get_screenshot_scroll(scroll=200,
scroll_interval=3)
解决方法
- JS判断图片是否显示完成
function (root, imgs) {
let espectImgs = imgs.split(",");
let imgNodeList = root.querySelectorAll("img");
let imgArr = Array.prototype.slice.call(imgNodeList);
result = {};
for (let img of espectImgs) {
result[img] = "0";
}
let resultKeys = Object.keys(result);
for (let img of imgArr) {
if (!!!img.attributes["src"]) {
continue;
}
if (resultKeys.indexOf(img.attributes["src"].value) == -1) {
continue;
}
if (img.complete) {
result[img.attributes["src"].value] = "1";
}
}
return result;
}
- selenium调用JS返回状态
self.browser.execute_script(
'''window.isAllImgLoaded = function(root,imgs){let espectImgs=imgs.split(",");let imgNodeList=root.querySelectorAll("img");let imgArr=Array.prototype.slice.call(imgNodeList);result={};for(let img of espectImgs){result[img]="0"}let resultKeys=Object.keys(result);for(let img of imgArr){if(!!!img.attributes["src"]){continue}if(resultKeys.indexOf(img.attributes["src"].value)==-1){continue}if(img.complete){result[img.attributes["src"].value]="1"}}return result}''')
if len(imgs) == 0:
pass
else:
for i in range(1, 100):
result: dict = self.browser.execute_script(
f"return window.isAllImgLoaded(app, '{','.join(pic_arr)}')")
flag = True
for r in result.values():
if "0" == r:
flag = False
break
if flag:
break
time.sleep(1)
html_pic_bytes = self.selenium_firefox.get_screenshot_scroll(scroll=500, scroll_interval=1)
总结
selenium的加载等待,不能确定图片是否加载完成
selenium借助JS会有很棒的效果
致谢
- 感谢大神协助
- 参考很多网络文章
- 如果对您有帮助,请帮给我点个赞