python的多进程和多线程

1、进程和线程

进程(processing)是系统系统资源分配的基本单元；

线程(thread)是独立运行和调度的基本单元；

当一个程序开始运行，这个程序就变成了一个进程。

当没有多线程编程时，一个进程相当于一个主线程；当有多线程编程时，一个进程包含多个线程（含主线程）。

2、GIL(Global Interpreter Lock)全局解释器锁

image

目的：限制多线程同时执行，保证同一时间内只有一个线程在执行。

结果：Python中多线程是假的多线程，每个线程在竞争到GIL后才可以运行，Python的多线程实际是串行执行的，而不是同一时间多个线程分布在多个CPU核上运行。

原因：同一进程中多个线程共享数据，当各个线程访问数据资源时会出现“竞争”状态，即数据可能会同时被多个线程占用，造成数据混乱，这就是线程的不安全。解决多线程之间数据完整性和状态同步的最简单方法就是GIL锁。但也导致了python的假多进程：利用多个CPU核共同协作，但同一时刻实则只利用一个CPU核的资源。

总结：

（a）因为GIL的存在，只有IO密集场景下的多线程会得到较好的性能，因为遇到IO阻塞会自动释放GIL锁。

（b）Python使用多进程是可以利用多核的CPU资源的。

（c）GIL全局解释器锁。每个线程在执行的过程都需要先获取GIL，保证同一时刻只有一个线程可以执行代码。

3、python的多线程

（a）通过 threading.Thread () 创建

import time
import threading

def printNumber(n: int) -> None:
    while True:
        print(n)
        time.sleep(n)

for i in range(1, 3):
    t = threading.Thread(target=printNumber, args=(i, ))
    t.start()

（b）通过继承 threading.Thread 类的继承

import time
import threading

class MyThread(threading.Thread):

    def __init__(self, n):
        self.n = n
        # 注意：一定要调用父类的初始化函数，否则否发创建线程
        super().__init__()

    def run(self) -> None:
        while True:
            print(self.n)
            time.sleep(self.n)

for i in range(1, 3):
    t = MyThread(i)
    t.start()

（c）主线程和子线程
进程至少有一个线程，这个线程就是主线程。当程序执行到第一次t.start()的时候，程序创建了一个子线程，此时活跃的线程个数是2。进一步，当执行第二次t.start()的时候，程序又创建了一个子线程，因此最终活跃的线程个数是3。

import time
import threading

class MyThread(threading.Thread):

    def __init__(self, n):
        self.n = n
        super().__init__()

    def run(self) -> None:
        while True:
            _count = threading.active_count()
            print(self.n, f"当前活跃的线程个数：{_count}")
            time.sleep(self.n)

for i in range(1, 3):
    t = MyThread(i)
    t.start()

（d）守护线程（Daemon Thread）
守护线程（Daemon Thread）也叫后台进程，它的目的是为其他线程提供服务。如果其他线程被杀死了，那么守护线程也就没有了存在的必要。因此守护线程会随着非守护线程的消亡而消亡。Thread类中，子线程被创建时默认是非守护线程，我们可以通过setDaemon(True)将一个子线程设置为守护线程。
因为当程序执行完print("结束！")以后，主线程就可以结束了，这时候被设定为守护线程的两个子线程会被杀死，然后主线程结束。

import time
import threading

class MyThread(threading.Thread):

    def __init__(self, n):
        self.n = n
        super().__init__()

    def run(self) -> None:
        while True:
            _count = threading.active_count()
            print(self.n, f"当前活跃的线程个数：{_count}")
            time.sleep(self.n)

for i in range(1, 3):
    t = MyThread(i)
    t.setDaemon(True)
    t.start()
print("结束！")

如果我把两个子线程的其中一个设置为守护线程，另一个设置为非守护线程，因为非守护线程作为前台程序还在继续执行，守护线程就还有“守护”的意义，就会继续执行。

import time
import threading

class MyThread(threading.Thread):

    def __init__(self, n):
        self.n = n
        super().__init__()

    def run(self) -> None:
        while True:
            _count = threading.active_count()
            print(self.n, f"当前活跃的线程个数：{_count}")
            time.sleep(self.n)

for i in range(1, 3):
    t = MyThread(i)
    if i == 1:
        t.setDaemon(True)       # 将其中一个线程设置为守护线程
    t.start()
print("结束！")

（e）join()方法
join()会使主线程进入等待状态（阻塞），直到调用join()方法的子线程运行结束。同时你也可以通过设置timeout参数来设定等待的时间

import time
import threading

class MyThread(threading.Thread):

    def __init__(self, n):
        self.n = n
        super().__init__()

    def run(self) -> None:
        while True:
            _count = threading.active_count()
            print(f"线程-{self.n}", f"当前活跃的线程个数：{_count}")
            time.sleep(self.n)

for i in range(1, 3):
    t = MyThread(i)
    t.start()
    t.join(3)

4、python的多进程

multiprocessing模块提供了本地和远程计算机的并行处理能力，并且通过使用创建子进程，有效地避开了全局解释器锁（GIL）。因此，multiprocessing模块允许程序员充分利用机器上的多个处理器。
简单例子

# importing the multiprocessing module 
import multiprocessing 

def print_cube(num): 
    print("Cube: {}".format(num * num * num)) 

def print_square(num): 
    print("Square: {}".format(num * num)) 

if __name__ == "__main__": 
    # creating processes 
    p1 = multiprocessing.Process(target=print_square, args=(10, )) 
    p2 = multiprocessing.Process(target=print_cube, args=(10, )) 

    # starting process 1&2
    p1.start() 
    p2.start() 

    # wait until process 1&2 is finished 
    p1.join() 
    p2.join() 

    # both processes finished 
    print("Done!")

带参数的多进程

p1 = multiprocessing.Process(target=print_square, args=(10, ))
p2 = multiprocessing.Process(target=print_cube, args=(10, ))

使用pool的多进程

from multiprocessing import Pool
import os,time

def sub_process(n):
    time.sleep(1)
    print('Process (%s) is running!' % os.getpid())

## 指定同时运行的进程数
p = Pool(4)
for i in range(5):
    p.apply_async(sub_process,args=(i+1,))
## 不能继续添加进程了
p.close()
p.join()

python的多进程和多线程