如何让你的pandas循环更快

本文转载至How To Make Your Pandas Loop 71803 Times Faster

结论 - numpy向量化操作最快

五种循环速度对比

标准循环

def soc_loop(leaguedf,TEAM,):
    leaguedf['Draws'] = 99999
    for row in range(0, len(leaguedf)):
        if ((leaguedf['HomeTeam'].iloc[row] == TEAM) & (leaguedf['FTR'].iloc[row] == 'D')) | \
            ((leaguedf['AwayTeam'].iloc[row] == TEAM) & (leaguedf['FTR'].iloc[row] == 'D')):
            leaguedf['Draws'].iloc[row] = 'Draw'
        elif ((leaguedf['HomeTeam'].iloc[row] == TEAM) & (leaguedf['FTR'].iloc[row] != 'D')) | \
            ((leaguedf['AwayTeam'].iloc[row] == TEAM) & (leaguedf['FTR'].iloc[row] != 'D')):
            leaguedf['Draws'].iloc[row] = 'No_Draw'
        else:
            leaguedf['Draws'].iloc[row] = 'No_Game'
The standard loop

pandas内置函数 - 300倍

def soc_iter(TEAM,home,away,ftr):
    #team, row['HomeTeam'], row['AwayTeam'], row['FTR']
    if [((home == TEAM) & (ftr == 'D')) | ((away == TEAM) & (ftr == 'D'))]:
        result = 'Draw'
    elif [((home == TEAM) & (ftr != 'D')) | ((away == TEAM) & (ftr != 'D'))]:
        result = 'No_Draw'
    else:
        result = 'No_Game'
    return result
The Pandas Built-In Function: iterrows()

apply()方法 - 800倍

The apply() Method

pandas向量化操作 - 9,000倍

def soc_iter(TEAM,home,away,ftr):
    df['Draws'] = 'No_Game'
    df.loc[((home == TEAM) & (ftr == 'D')) | ((away == TEAM) & (ftr == 'D')), 'Draws'] = 'Draw'
    df.loc[((home == TEAM) & (ftr != 'D')) | ((away == TEAM) & (ftr != 'D')), 'Draws'] = 'No_Draw'
Pandas Vectorization

numpy向量化操作 - 70,000倍

Numpy Vectorization
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

友情链接更多精彩内容