Python高级知识点学习（八）

线程同步 - condition介绍

多线程中的另外一个重要点就是condition：条件变量。
condition是python多线程编程中用于复杂线程间通信的一个锁叫做条件变量。

cond = threading.Condition()

with self.cond:
     cond.notify()
     cond.wait()

condition有两层锁，一把底层锁会在线程调用了wait方法的时候释放，上面的锁会在每次调用wait的时候分配一把并放入到cond的等待队列中，等到notify方法的唤醒。

有关condition的详情请查阅资料。（这里作者自己暂时还没理清楚原理，见谅）

线程同步 - Semaphore 介绍

信号量，Semaphore。
Semaphore 是用于控制进入数量的锁，控制进入某段代码的线程数。

文件，读、写，写一般只是用于一个线程写，读可以允许有多个。

ThreadPoolExecutor线程池

多线程和多进程对比

运算，耗cpu的操作，用多进程编程
对于io操作来说，使用多线程编程

由于进程切换代价要高于线程，所以能使用线程就不用进程。

耗费cpu的操作：

def fib(n):
    if n<=2:
        return 1
    return fib(n-1)+fib(n-2)


if __name__ == "__main__":
    with ThreadPoolExecutor(3) as executor:
        all_task = [executor.submit(fib, (num)) for num in range(1, 10)]
        start_time = time.time()
        for future in as_completed(all_task):
            data = future.result()
            print("exe result: {}".format(data))

        print("last time is: {}".format(time.time()-start_time))

模拟IO操作：

def random_sleep(n):
    time.sleep(n)
    return n


if __name__ == "__main__":
    with ProcessPoolExecutor(3) as executor:
        all_task = [executor.submit(random_sleep, (num)) for num in [2]*30]
        start_time = time.time()
        for future in as_completed(all_task):
            data = future.result()
            print("exe result: {}".format(data))

        print("last time is: {}".format(time.time()-start_time))

multiprocessing 多进程

使用os.fork创建子进程，fork只能用于linux/unix中。

import os
import time
# fork新建子进程 fork只能用于linux/unix中
pid = os.fork()
print("a")
if pid == 0:
  print('子进程id:{} ，父进程id是： {}.' .format(os.getpid(), os.getppid()))
else:
  print('我是父进程, 我fork出的子进程id是:{}.'.format(pid))

time.sleep(2)

运行结果：
a
我是父进程, 我fork出的子进程id是:3093.
a
子进程id:3093 ，父进程id是： 3092.

运行结果中，可以看到打印了两次a，因为在执行完pid = os.fork()这行代码后，就创建了一个子进程，且子进程把父进程中的数据原样拷贝了一份到自己的进程中，所以父进程中打印一次，子进程中又打印一次。

多进程编程：

import multiprocessing

# 多进程编程
import time
def get_html(n):
    time.sleep(n)
    print("sub_progress success")
    return n


if __name__ == "__main__":
    progress = multiprocessing.Process(target=get_html, args=(2,))
    print(progress.pid)
    progress.start()
    print(progress.pid)
    progress.join()
    print("main progress end")

使用进程池：

import multiprocessing

# 多进程编程
import time
def get_html(n):
    time.sleep(n)
    print("sub_progress success")
    return n


if __name__ == "__main__":
    # 使用进程池
    pool = multiprocessing.Pool(multiprocessing.cpu_count())
    result = pool.apply_async(get_html, args=(3,))

    # 等待所有任务完成
    pool.close()
    pool.join()

    print(result.get())

进程池另一种方法：

import multiprocessing

# 多进程编程
import time
def get_html(n):
    time.sleep(n)
    print("sub_progress success")
    return n


if __name__ == "__main__":
 
    # 使用进程池
    pool = multiprocessing.Pool(multiprocessing.cpu_count())
    for result in pool.imap_unordered(get_html, [1, 5, 3]):
        print("{} sleep success".format(result))

进程间通信 Queue、Pipe，Manager

共享全局变量不能适用于多进程编程，可以适用于多线程。

进程间通信和线程间通信有相同也有不同，不同点是之前在多线程中用的线程间通信的类和线程间同步的锁在多进程中是不能用的。

使用multiprocessing中的Queue实现进程通信

import time
from multiprocessing import Process, Queue, Pool, Manager, Pipe


def producer(queue):
    queue.put("a")
    time.sleep(2)

def consumer(queue):
    time.sleep(2)
    data = queue.get()
    print(data)

if __name__ == "__main__":
    queue = Queue(10)
    my_producer = Process(target=producer, args=(queue,))
    my_consumer = Process(target=consumer, args=(queue,))
    my_producer.start()
    my_consumer.start()
    my_producer.join()
    my_consumer.join()

一定要使用multiprocessing中的Queue，如果使用import queue这个queue是不行的。

pool中的进程间通信需要使用manager中的queue

multiprocessing中的queue不能用于pool进程池。
pool中的进程间通信需要使用manager中的queue

def producer(queue):
    queue.put("a")
    time.sleep(2)

def consumer(queue):
    time.sleep(2)
    data = queue.get()
    print(data)

if __name__ == "__main__":
    queue = Manager().Queue(10)
    pool = Pool(2)

    pool.apply_async(producer, args=(queue,))
    pool.apply_async(consumer, args=(queue,))

    pool.close()
    pool.join()

使用Manager，多进程修改同一变量：

def add_data(p_dict, key, value):
    p_dict[key] = value


if __name__ == "__main__":
    progress_dict = Manager().dict()
    from queue import PriorityQueue

    first_progress = Process(target=add_data, args=(progress_dict, "a", 22))
    second_progress = Process(target=add_data, args=(progress_dict, "b", 23))

    first_progress.start()
    second_progress.start()
    first_progress.join()
    second_progress.join()

    print(progress_dict)

可以看到两个进程对一个dict变量做值得填充，最终主进程中打印出了最终的dict。

通过pipe实现进程间通信：

pipe的性能高于queue。
pipe只能适用于两个进程。

def producer(pipe):
    pipe.send("a")

def consumer(pipe):
    print(pipe.recv())


if __name__ == "__main__":
    recevie_pipe, send_pipe = Pipe()
    #pipe只能适用于两个进程
    my_producer = Process(target=producer, args=(send_pipe, ))
    my_consumer = Process(target=consumer, args=(recevie_pipe,))

    my_producer.start()
    my_consumer.start()
    my_producer.join()
    my_consumer.join()

my_producer进程给my_consumer进程发送的a变量可以正常打印。

Python高级知识点学习（八）