0x00 尝试
本来是打算截图定位的,模板匹配的函数都写好了,参见python+opencv 暴力模板匹配,但是后来发现,这个验证码,居然是可以通过xpath直接定位过去的。那直接上selenium模拟用户点击就好了。代码参见github。
0x01 重来
那既然可以定位到元素,就尝试直接鼠标定位过去,然后点击
btn = driver.find_element_by_xpath('//div[@aria-label="点击按钮进行验证"]')
ActionChains(driver).move_to_element(btn).perform()
不过,自然不可能这么简单。果然,认证失败,需要进行二次验证才可以,二次验证也是极验验证码,会随机出现两种,一种选字验证码,一种滑块验证码。不过这次我们主要搞这个点击的验证码。
于是猜测它是通过鼠标移动情况来辨别是否是人机。为此,我进行了一项测试,鼠标迅速移动到按钮上不减速直接停止,让自己看起来像是个机器,果然,多次尝试都是认证失败。然后起始的时候慢加速,快到按钮的时候减速,缓慢停止,就像遵守交通规则的车一样,然后就轻而易举的通过验证了。
0x02 思路
那既然我们的猜想得到了初步的验证,下面就是想办法模仿鼠标的移动了。但是selenium是通过指定鼠标相对位移来移动鼠标的,而且selenium也并没提供获取鼠标位置的函数或者方法。于是初步想法是这样的:
- 鼠标先定位到元素
- 模仿用户鼠标移动,随便绕一圈再回来
- 点击通过验证码
思路很清晰,开始实现就行了。第一步和第三步都很简单,主要重点放在第二步:模拟鼠标移动。因为我们只需要每一步的相对位置,那么假设用户的初试鼠标位置为(0, 0),然后下面的事情就是,周期性记录鼠标位置,然后计算出相邻坐标点的差值,就是相对位移了。
import pyautogui as pag
import json
import time
class MouseTracker(object):
"""
This function will generate tracks which is used to move mouse like human in selenium.
the date will save to a string file. The format of the result:
[[(x1,y1), delay1], [(x2, y2), delay2], [(x3, y3), delay3]...]
"""
def __init__(self, filename='track.txt', period=0.01, max_stop_time=0.5):
"""
:para filename: the filename to save the track of mouse.
:para period: the fixed time to record mouse position.
:para max_stop_time: the max_time user stayed which will be considerd as finishing record.
"""
self.period = period
self.filename = filename
self.stop_num = int(max_stop_time/period)
self.res = []
# record start point of mouse
self.start_point = tuple(pag.position())
# this variable is to previous point
self.previous_point = self.start_point
# save the record of track
self.track = []
# save the interval between each point
self.sleep_time = []
# calculate loop times
self.track.append(self.start_point)
def record(self):
"""
Record the relative displacement of user's mouse each fixed time.
"""
print('Moving your mouse to start record, stop moving to finish')
# record the number of same position.
num = 0
# dead loop, break when staying longer than max_stop_time
while True:
new = tuple(pag.position())
time.sleep(self.period)
if new == self.start_point:
continue
if new == self.previous_point:
num = num + 1
else:
self.track.append(new)
self.sleep_time.append(num*self.period)
num = 1
self.previous_point = new
if num > self.stop_num:
break;
self.sleep_time.append(0)
# A function used to minus two point, like (3,2)-(2,1) is (1,1)
tuple_minus = lambda x,y:(x[0]-y[0],x[1]-y[1])
# save generator to speed up
_range = range(1,len(self.track))
# get relative displacement, that is the diff coordinate of neightbour
diff = [tuple_minus(self.track[x],self.track[x-1]) for x in _range]
# make sure the length of diff list is eaqual to sleep_time's
diff.insert(0,(0,0))
# get results list
for i in range(len(self.track)):
self.res.append((diff[i], self.sleep_time[i]))
def print_res(self):
for i in self.res:
print(i)
# save results to file
def save(self):
with open(self.filename, 'w') as f:
json.dump(self.res, f)
def generate(self):
self.print_res()
self.record()
self.save()
if __name__ == "__main__":
mouseTracker().generate()
然后运行程序,我们做一回遵守“交通规则”的鼠标,缓加速,慢停止,移动一圈鼠标回到原点附近的位置。这样就记录下来鼠标每一个周期移动的相对位置。这里经过多次调试,选择了默认周期为0.01s,也可以初始化的时候传入自定义的记录周期。鼠标停止一定时间会认为结束记录,即类的初始化中的max_stop_time参数。一般设置成0.5~1s不会觉得冗长,刚刚好。默认设置为1s。
0x03 破解
接下来就是利用selenium来模拟用户登陆. 首先找到登陆页面. 然后模拟用户输入和提交表单. selenium的用法就不赘述了. 直接上代码
class Crack(object):
"""
Crack geetest click CAPTCHA and auto login.
Please make sure you have generated the mouse-track file in this path.
"""
def __init__(self, username, password, trackfilename='track.txt', proxy=''):
# Get login information and init variable.
self.username = username
self.password = password
self.cookies = ''
# This is login entrance
self.url = 'https://passport.weibo.cn/signin/login'
# read track data
self.trackfilename = trackfilename
self.track = []
with open(self.trackfilename,'r') as f:
self.track = json.load(f)
# start Chrome headless, add proxy and run in headless mode.
if proxy:
chrome_options = Options()
#chrome_options.add_argument('--headless')
chrome_options.add_argument('--proxy-server='+proxy)
print('proxy set sucess')
self.driver = webdriver.Chrome(chrome_options=chrome_options)
else:
chrome_options = Options()
#chrome_options.add_argument('--headless')
self.driver = webdriver.Chrome(chrome_options=chrome_options)
self.wait = WebDriverWait(self.driver, 6)
# Waiting for chrome to open and open login entrance.
self.driver.implicitly_wait(5)
try:
self.driver.get(self.url, 10)
except Exception as e:
print("Target URL cannot be reached: ", e)
self.__del__()
def __del__(self):
""" Destroy the web browser """
print(self.cookies)
self.driver.close()
def wait_for_main_page(self):
"""
Waiting for the loading of main page
:return : If the main page load in given time, return True.
"""
try:
self.wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'main-wrap')))
return True
except:
return False
def move(self):
""" Move mouse by using given track """
for offset, sleeptime in self.track:
x, y = offset
ActionChains(self.driver).move_by_offset(x,y).perform()
time.sleep(sleeptime)
ActionChains(self.driver).click().perform()
def login(self):
""" Login weibo by selenium """
# Waiting until the presence of login button.
class button():
def __call__(self, driver):
if driver.find_element_by_xpath('//*[@id="loginAction"]'):
return True
else:
return False
WebDriverWait(self.driver, 15, 0.5).until(button())
# Input username and password.
print('Inputing username and password...', end='')
username_area = self.driver.find_element_by_xpath('//*[@id="loginName"]')
username_area.send_keys(self.username)
time.sleep(1)
psw_area = self.driver.find_element_by_xpath('//*[@id="loginPassword"]')
psw_area.send_keys(self.password)
print('Ok')
# Submit login form.
print('Posint form data...', entranc)
btn = self.driver.find_element_by_xpath('//*[@id="loginAction"]')
btn.click()
print('Ok')
# If their is a CAPTCHA, then crack it.
if self.driver.current_url.find('CAPTCHA'):
print('CAPTCHA has been detected, need crack.')
self.crack()
else:
ret = self.wait_for_main_page()
self.cookies = self.driver.get_cookies()
def crack(self):
""" Crack the click CAPTCHA """
# Waiting for page loading.
class button():
def __call__(self, driver):
if driver.find_element_by_xpath('//div[@aria-label="点击按钮进行验证"]'):
return True
else:
return False
print('Loading CAPTCHA...', end='')
WebDriverWait(self.driver, 10, 0.5).until(button())
print('Compelete')
# find button and move to the button
print('Cracking...', end='')
btn = self.driver.find_element_by_xpath('//div[@aria-label="点击按钮进行验证"]')
ActionChains(self.driver).move_to_element(btn).perform()
self.move()
ActionChains(self.driver).click().perform()
print('Complete')
# waiting from page and get cookies
ret = self.wait_for_main_page()
if ret:
print('Cracking success!')
# Choose-Word-CAPTCHA has been appeared, need second varification.
elif EC.presence_of_element_located((By.CLASS_NAME, 'geetest_commit_tip')):
print('Cracking failed, Choose-Word-CAPTCHA has been appeared!')
self.__del__()
# crack_choose_CAPTCH()
# Slide-CAPTCHA has been appeared, need second varification.
elif EC.presence_of_element_located((By.CLASS_NAME, 'geetest_slider_track')):
print('Cracking failed, Slider-CAPTCHA has been appeared!')
self.__del__()
# crack_slide_CAPTCH()
else:
print('Unknown Error!')
self.__del__()
# log_error()
if ret:
cookies_dict = {}
cookies = self.driver.get_cookies()
for d in cookies:
cookies_dict[d['name']] = d['value']
print('Get cookies:', cookies_dict)
self.cookies = json.dumps(cookies_dict)
if __name__ == '__main__':
Crack('xxxxxxxxxx@sina.com', 'xxxxxxxx').login()
值得注意的, 我在类中增加了代理的参数, 就是说, 可以提供代理来破解. 因为多次实验发现, 同一个ip在短时间内连续登陆就容易出现二次验证, 二次验证码有两种形式, 一种是点选的验证码, 选择图片中出现的文字, 另外一种是滑块验证码(相对容易破解). 这两种不一定出现哪一种,不过如果登陆太频繁, 一般触发的都是第一种. 代码中分别留出了这两种二次验证的扩展位置. 日后可以加上去.
代理可以通过自己建立和维护代理池来获取, 不过可用的不是很多了. 如果使用代理的话, 就在类的初始化中传入参数:
if __name__ == '__main__':
# 这里是代理, ip:port
proxy = "xxx.xxx.xxx.xxx:xxxxx"
proxy = "http://" + proxy
Crack('xxxxxxxxxx@sina.com', 'xxxxxxxx', proxy=proxy).login()