Python + selenium 破解极验点击验证码

0x00 尝试

本来是打算截图定位的，模板匹配的函数都写好了，参见python+opencv 暴力模板匹配，但是后来发现，这个验证码，居然是可以通过xpath直接定位过去的。那直接上selenium模拟用户点击就好了。代码参见github。

点击验证码

0x01 重来

image.png

那既然可以定位到元素，就尝试直接鼠标定位过去，然后点击

btn = driver.find_element_by_xpath('//div[@aria-label="点击按钮进行验证"]')
ActionChains(driver).move_to_element(btn).perform()

不过，自然不可能这么简单。果然，认证失败，需要进行二次验证才可以，二次验证也是极验验证码，会随机出现两种，一种选字验证码，一种滑块验证码。不过这次我们主要搞这个点击的验证码。
于是猜测它是通过鼠标移动情况来辨别是否是人机。为此，我进行了一项测试，鼠标迅速移动到按钮上不减速直接停止，让自己看起来像是个机器，果然，多次尝试都是认证失败。然后起始的时候慢加速，快到按钮的时候减速，缓慢停止，就像遵守交通规则的车一样，然后就轻而易举的通过验证了。

0x02 思路

那既然我们的猜想得到了初步的验证，下面就是想办法模仿鼠标的移动了。但是selenium是通过指定鼠标相对位移来移动鼠标的，而且selenium也并没提供获取鼠标位置的函数或者方法。于是初步想法是这样的：

鼠标先定位到元素
模仿用户鼠标移动，随便绕一圈再回来
点击通过验证码

思路很清晰，开始实现就行了。第一步和第三步都很简单，主要重点放在第二步：模拟鼠标移动。因为我们只需要每一步的相对位置，那么假设用户的初试鼠标位置为(0, 0)，然后下面的事情就是，周期性记录鼠标位置，然后计算出相邻坐标点的差值，就是相对位移了。

import pyautogui as pag
import json
import time

class MouseTracker(object):
    """
    This function will generate tracks which is used to move mouse like human in selenium.
    the date will save to a string file. The format of the result:
    [[(x1,y1), delay1], [(x2, y2), delay2], [(x3, y3), delay3]...]
    """
    def __init__(self, filename='track.txt', period=0.01, max_stop_time=0.5):
        """
        :para filename: the filename to save the track of mouse.
        :para period: the fixed time to record mouse position.
        :para max_stop_time: the max_time user stayed which will be considerd as finishing record.
        """

        self.period = period
        self.filename = filename
        self.stop_num = int(max_stop_time/period)
        self.res = []

        # record start point of mouse
        self.start_point = tuple(pag.position())
        # this variable is to previous point 
        self.previous_point = self.start_point
        # save the record of track
        self.track = []
        # save the interval between each point
        self.sleep_time = []
        # calculate loop times
        self.track.append(self.start_point)

    def record(self):
        """
        Record the relative displacement of user's mouse each fixed time.
        """

        print('Moving your mouse to start record, stop moving to finish')

        # record the number of same position.
        num = 0
        # dead loop, break when staying longer than max_stop_time
        while True:
            new = tuple(pag.position())
            time.sleep(self.period)
            if new == self.start_point:
                continue
            if new == self.previous_point:
                num = num + 1
            else:
                self.track.append(new)
                self.sleep_time.append(num*self.period)
                num = 1
            self.previous_point = new
            if num > self.stop_num:
                break;
        self.sleep_time.append(0)

        # A function used to minus two point, like (3,2)-(2,1) is (1,1)
        tuple_minus = lambda x,y:(x[0]-y[0],x[1]-y[1])
        # save generator to speed up
        _range = range(1,len(self.track))
        # get relative displacement, that is the diff coordinate of neightbour
        diff = [tuple_minus(self.track[x],self.track[x-1]) for x in _range]
        # make sure the length of diff list is eaqual to sleep_time's
        diff.insert(0,(0,0))
        # get results list
        for i in range(len(self.track)):
            self.res.append((diff[i], self.sleep_time[i]))


    def print_res(self):
        for i in self.res:
            print(i)

    # save results to file
    def save(self):
        with open(self.filename, 'w') as f:
            json.dump(self.res, f)
            
    def generate(self):
        self.print_res()
        self.record()
        self.save()

if __name__ == "__main__":
    mouseTracker().generate()

然后运行程序，我们做一回遵守“交通规则”的鼠标，缓加速，慢停止，移动一圈鼠标回到原点附近的位置。这样就记录下来鼠标每一个周期移动的相对位置。这里经过多次调试，选择了默认周期为0.01s，也可以初始化的时候传入自定义的记录周期。鼠标停止一定时间会认为结束记录，即类的初始化中的max_stop_time参数。一般设置成0.5~1s不会觉得冗长，刚刚好。默认设置为1s。

0x03 破解

接下来就是利用selenium来模拟用户登陆. 首先找到登陆页面. 然后模拟用户输入和提交表单. selenium的用法就不赘述了. 直接上代码

class Crack(object):                                                                                           
    """                                                                                                        
    Crack geetest click CAPTCHA and auto login.                                                                
    Please make sure you have generated the mouse-track file in this path.                                     
    """                                                                                                        
    def __init__(self, username, password, trackfilename='track.txt', proxy=''):                               
        # Get login information and init variable.                                                             
        self.username = username                                                                               
        self.password = password                                                                               
        self.cookies = ''                                                                                      
                                                                                                               
        # This is login entrance                                                                               
        self.url = 'https://passport.weibo.cn/signin/login'                                                    
                                                                                                               
                                                                                                               
        # read track data                                                                                      
        self.trackfilename = trackfilename                                                                     
        self.track = []                                                                                        
        with open(self.trackfilename,'r') as f:                                                                
            self.track = json.load(f)                                                                          
                                                                                                               
        # start Chrome headless, add proxy and run in headless mode.                                           
        if proxy:                                                                                              
            chrome_options = Options()                                                                         
            #chrome_options.add_argument('--headless')                                                         
            chrome_options.add_argument('--proxy-server='+proxy)                                               
            print('proxy set sucess')                                                                          
            self.driver = webdriver.Chrome(chrome_options=chrome_options)                                      
        else:                                                                                                  
            chrome_options = Options()                                                                         
            #chrome_options.add_argument('--headless')                                                         
            self.driver = webdriver.Chrome(chrome_options=chrome_options)                                      
        self.wait = WebDriverWait(self.driver, 6)                                                              
                                                                                                               
        # Waiting for chrome to open and open login entrance.                                                  
        self.driver.implicitly_wait(5)                                                                         
        try:                                                                                                   
            self.driver.get(self.url, 10)                                                                      
        except Exception as e:                                                                                 
            print("Target URL cannot be reached: ", e)                                                         
            self.__del__()                                                                                     
                                                                                                               
    def __del__(self):                                                                                         
        """ Destroy the web browser """                                                                        
        print(self.cookies)                                                                                    
        self.driver.close()                                                                                    
                                                                                                               
    def wait_for_main_page(self):                                                                              
        """                                                                                                    
        Waiting for the loading of main page                                                                   
        :return : If the main page load in given time, return True.                                            
        """                                                                                                    
        try:                                                                                                   
            self.wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'main-wrap')))                      
            return True                                                                                        
        except:                                                                                                
            return False                                                                                       
                                                                                                               
    def move(self):                                                                                            
        """ Move mouse by using given track """                                                                
        for offset, sleeptime in self.track:                                                                   
            x, y = offset                                                                                      
            ActionChains(self.driver).move_by_offset(x,y).perform()                                            
            time.sleep(sleeptime)                                                                              
        ActionChains(self.driver).click().perform()                                                            
                                                                                                               
    def login(self):                                                                                           
        """ Login weibo by selenium """                                                                        
                                                                                                               
        # Waiting until the presence of login button.                                                          
        class button():                                                                                        
            def __call__(self, driver):                                                                        
                if driver.find_element_by_xpath('//*[@id="loginAction"]'):                                     
                    return True                                                                                
                else:                                                                                          
                    return False                                                                               
        WebDriverWait(self.driver, 15, 0.5).until(button())                                                    
                                                                                                               
        # Input username and password.                                                                         
        print('Inputing username and password...', end='')                                                     
        username_area = self.driver.find_element_by_xpath('//*[@id="loginName"]')                              
        username_area.send_keys(self.username)                                                                 
        time.sleep(1)                                                                                          
        psw_area = self.driver.find_element_by_xpath('//*[@id="loginPassword"]')                               
        psw_area.send_keys(self.password)                                                                      
        print('Ok')                                                                                            
                                                                                                               
        # Submit login form.                                                                                   
        print('Posint form data...', entranc)                                                                  
        btn = self.driver.find_element_by_xpath('//*[@id="loginAction"]')                                      
        btn.click()                                                                                            
        print('Ok')                                                                                            
                                                                                                               
        # If their is a CAPTCHA, then crack it.                                                                
        if self.driver.current_url.find('CAPTCHA'):                                                            
            print('CAPTCHA has been detected, need crack.')                                                    
            self.crack()                                                                                       
        else:                                                                                                  
            ret = self.wait_for_main_page()                                                                    
            self.cookies = self.driver.get_cookies()                                                           
                                                                                                               
    def crack(self):                                                                                           
        """ Crack the click CAPTCHA """                                                                        
        # Waiting for page loading.                                                                            
        class button():                                                                                        
            def __call__(self, driver):                                                                        
                if driver.find_element_by_xpath('//div[@aria-label="点击按钮进行验证"]'):                              
                    return True                                                                                
                else:                                                                                          
                    return False                                                                               
        print('Loading CAPTCHA...', end='')                                                                    
        WebDriverWait(self.driver, 10, 0.5).until(button())                                                    
        print('Compelete')                                                                                     
                                                                                                               
        # find button and move to the button                                                                   
        print('Cracking...', end='')                                                                           
        btn = self.driver.find_element_by_xpath('//div[@aria-label="点击按钮进行验证"]')                               
        ActionChains(self.driver).move_to_element(btn).perform()                                               
        self.move()                                                                                            
        ActionChains(self.driver).click().perform()                                                            
        print('Complete')                                                                                      
                                                                                                               
        # waiting from page and get cookies                                                                    
        ret = self.wait_for_main_page()                                                                        
        if ret:                                                                                                
            print('Cracking success!')                                                                         
        # Choose-Word-CAPTCHA  has been appeared, need second varification.                                    
        elif EC.presence_of_element_located((By.CLASS_NAME, 'geetest_commit_tip')):                            
            print('Cracking failed, Choose-Word-CAPTCHA has been appeared!')                                   
            self.__del__()                                                                                     
            # crack_choose_CAPTCH()                                                                            
        # Slide-CAPTCHA  has been appeared, need second varification.                                          
        elif EC.presence_of_element_located((By.CLASS_NAME, 'geetest_slider_track')):                          
            print('Cracking failed, Slider-CAPTCHA has been appeared!')                                        
            self.__del__()                                                                                     
            # crack_slide_CAPTCH()                                                                             
        else:                                                                                                  
            print('Unknown Error!')                                                                            
            self.__del__()                                                                                     
            # log_error()                                                                                      
                                                                                                               
        if ret:                                                                                                
            cookies_dict = {}                                                                                  
            cookies = self.driver.get_cookies()                                                                
            for d in cookies:                                                                                  
                cookies_dict[d['name']] = d['value']                                                           
            print('Get cookies:', cookies_dict)                                                                
            self.cookies = json.dumps(cookies_dict)                                                            
                                                                                                               
                                                                                                               
if __name__ == '__main__':                                                                                                                            
    Crack('xxxxxxxxxx@sina.com', 'xxxxxxxx').login()

值得注意的, 我在类中增加了代理的参数, 就是说, 可以提供代理来破解. 因为多次实验发现, 同一个ip在短时间内连续登陆就容易出现二次验证, 二次验证码有两种形式, 一种是点选的验证码, 选择图片中出现的文字, 另外一种是滑块验证码(相对容易破解). 这两种不一定出现哪一种,不过如果登陆太频繁, 一般触发的都是第一种. 代码中分别留出了这两种二次验证的扩展位置. 日后可以加上去.

代理可以通过自己建立和维护代理池来获取, 不过可用的不是很多了. 如果使用代理的话, 就在类的初始化中传入参数:

if __name__ == '__main__':     
    # 这里是代理, ip:port    
    proxy = "xxx.xxx.xxx.xxx:xxxxx"
    proxy = "http://" + proxy                                                                              
    Crack('xxxxxxxxxx@sina.com', 'xxxxxxxx', proxy=proxy).login()