python因为有GIL锁,因此多线程也只能使用一个处理器,但是numpy是例外:
http://scipy-cookbook.readthedocs.io/items/ParallelProgramming.html 这篇文字讲了numpy的并行计算,我把自己的理解总结如下:
numpy本身的矩阵运算(array operations)可以绕过GIL
因为numpy内部是用C写的,不经过python解释器,因此它本身的矩阵运算(array operations)都可以使用多核,此外它内部还用了BLAS(the Basic Linear Algebra Subroutines),因此可以进一步优化计算速度。
多线程(Threads),numpy的矩阵运算和IO一样,都会释放GIL
据我理解即使释放解释器,numpy因为不依赖解释器,所以仍然在运行;而其他线程这个时候也可以使用解释器,如果其他线程也有numpy的代码,那么该numpy也可以同样释放解释器。
while a thread is waiting** for IO **(for you to type something, say, or for something to come in the network) python releases the GIL so other threads can run. And, more importantly for us, while numpy is doing an array operation, python also releases the GIL. Thus if you tell one thread to do, (A和B都是numpy矩阵):
>>> A = B + C
>>> print A
During the print operations and the % formatting operation, no other thread can execute. But during the A = B + C, another thread can run - and if you've written your code in a numpy style, much of the calculation will be done in a few array operations like A = B + C. Thus you can actually get a speedup from using multiple threads.
多进程(Processes)自然更加能解决并行问题
多进程间numpy arrays也可共享,具体怎么共享再说
It is possible to share memory between processes, including numpy arrays
最后这个例子特别好:
Comparison
Here is a very basic comparison which illustrates the effect of the GIL (on a dual core machine).
import numpy as np
import math
def f(x):
print x
y = [1]*10000000
[math.exp(i) for i in y]
def g(x):
print x
y = np.ones(10000000)
np.exp(y)
from handythread import foreach
from processing import Pool
from timings import f,g
def fornorm(f,l):
for i in l:
f(i)
time fornorm(g,range(100))
time fornorm(f,range(10))
time foreach(g,range(100),threads=2)
time foreach(f,range(10),threads=2)
p = Pool(2)
time p.map(g,range(100))
time p.map(f,range(10))
100 * g() | 10 * f() | |
---|---|---|
normal | 43.5s |
48s |
2 threads | 31s |
71.5s |
2 processes | 27s |
31.23 |
For function f()
, which does not release the GIL, threading actually performs worse than serial code, presumably due to the overhead of context switching. However, using 2 processes does provide a significant speedup. For function g()
which uses numpy and releases the GIL, both threads and processes provide a significant speed up, although multiprocesses is slightly faster.
我自己用代码仿照写了一个例子,可以直接运行(python3.6):https://gist.github.com/miniyk2012/4a2edf98493d91c60af06232b6c69582
注:
这篇文章假设numpy本身无法利用多核, 因此需要python写多线程来让numpy在多核跑.
其实numpy本身也是可以利用多核的, 见这篇文章: https://roman-kh.github.io/numpy-multicore/