加速你的python脚本

因为近期要写嵌套for循环，由于运算量有点大，耗时比较久。所以就在谷歌上搜了搜有没有办法可以提升python for loop的速度，然后就发现了非常好用的模块：Numba

image

Numba makes Python code fast

官方网址：http://numba.pydata.org/

首先如果你没安装的话，可以通过pip install numba --user装一下，或者如果你已经安装了Anaconda3的话，那直接用conda安装的python3就有这个模块。

tips：用anaconda管理模块、软件，解决环境冲突问题，省时省力，附上linux上的安装小教程

# download from tsinghua mirror site
wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-5.3.1-Linux-x86_64.sh
# check the help message
bash Anaconda3-5.3.1-Linux-x86_64.sh -h
# then install or install into Nonexistent Custom Directory by adding -p
bash Anaconda3-5.3.1-Linux-x86_64.sh
# add to the environment
echo ". /home/saber/anaconda3/etc/profile.d/conda.sh" >> ~/.bashrc

Numba的用法很简单，一般是加速某个函数。如果你想加速函数x，只需要在定义函数x的时候，在def前一行加上一个装饰器@jit就行了（就简单的一行代码）。

下面以笔者写的小例子进行介绍，这个例子主要计算a1到a2所有数的加和，并用time模块来检测函数的运行时间：

from numba import jit
import time

#define function A without numba
def func_A(a1,a2):
 A_result=0
 for i in range(a1,a2):
  A_result+=i
 return A_result

#define func A1 with numba
#just add the @jit
@jit
def func_A1(a1,a2):
 A1_result=0
 for i in range(a1,a2):
  A1_result+=i
 return A1_result

#record the elasped time
def time_func(func_A_i,*args):
 start = time.time()
 func_A_i(*args)
 end = time.time()
 print("Elasped time of func %s is %.4e"%(func_A_i.__name__,end-start))


time_func(func_A,1,10000000)
time_func(func_A,1,10000000)
print()
time_func(func_A1,1,10000000)
time_func(func_A1,1,10000000)

其实能发现两个函数的主体是完全一样的，最主要的不同是在func_A1前面加了一句@jit。

运行结果如下：


Elasped time of func func_A is 5.4757e-01
Elasped time of func func_A is 5.3267e-01

Elasped time of func func_A1 is 5.3686e-02
Elasped time of func func_A1 is 4.7684e-06

细心的读者可能发现了，我对每个函数都运行了2次，func_A的时间几乎一致，func_A1第二次的时间比第一次少了四个数量级，这是因为第二次的时间才是numba加速后函数执行的时间。

通俗理解，numba第一次读取函数时，会将函数转换为计算更快的语言，这是编译的过程，会消耗一些时间，之后numba将编译存储起来，下次遇见同类型的数据，直接读取编译，计算得到结果。官方解释如下：

First, recall that Numba has to compile your function for the argument types given before it executes the machine code version of your function, this takes time. However, once the compilation has taken place Numba caches the machine code version of your function for the particular types of arguments presented. If it is called again the with same types, it can reuse the cached version instead of having to compile again.

所以总的来说numba加速后速度提升还是很大的，特别是对有想加速python脚本需求的人来说。

欢迎关注公众号："生物信息学"

加速你的python脚本

推荐阅读更多精彩内容