SHELL的并行编程:通过启用多个并行的后台子进程,实现任务的并行处理。
并发编程的模式:
- 简单模式
- 批处理模式
- 轮询模式
- 队列模式
简单模式
将多个任务放在后台,以子进程的方式进行执行。可以看成是简单模式的并发编程。
#!/bin/bash
function log() {
echo "$(date '+%Y-%m-%d %H:%M:%S')" $@
}
function do_working() {
log "I am working for $1 seconds."
sleep $1
log "I am working ok."
}
function main() {
for ((i=0; i<5; ++i))
do
do_working $i &
done
wait
}
main
运行结果:
2019-08-28 21:33:10 I am working for 0 seconds.
2019-08-28 21:33:10 I am working for 1 seconds.
2019-08-28 21:33:10 I am working for 4 seconds.
2019-08-28 21:33:10 I am working for 2 seconds.
2019-08-28 21:33:10 I am working for 3 seconds.
2019-08-28 21:33:10 I am working ok.
2019-08-28 21:33:11 I am working ok.
2019-08-28 21:33:12 I am working ok.
2019-08-28 21:33:13 I am working ok.
2019-08-28 21:33:14 I am working ok.
优势:编程简单、直观,对于少量的后台并发任务,可以直接使用该并发模式。
劣势:如果并发任务的数量较大,这种模式会存在消耗过多的系统资源,拖慢系统的运行。过多的资源消耗,也会让并发任务失败。
批处理模式
#!/bin/bash
function log() {
echo "$(date '+%Y-%m-%d %H:%M:%S')" $@
}
function do_working() {
log "I am working for $1 seconds."
sleep $1
log "I am working ok."
}
function main() {
local i=1
while ((i<=6))
do
for ((j=0; j<2; ++j))
do
do_working $i &
i=$(($i+1))
done
wait
done
wait
}
main
运行结果:
2019-08-28 21:40:31 I am working for 1 seconds.
2019-08-28 21:40:31 I am working for 2 seconds.
2019-08-28 21:40:32 I am working ok.
2019-08-28 21:40:33 I am working ok.
2019-08-28 21:40:33 I am working for 3 seconds.
2019-08-28 21:40:33 I am working for 4 seconds.
2019-08-28 21:40:36 I am working ok.
2019-08-28 21:40:37 I am working ok.
2019-08-28 21:40:37 I am working for 6 seconds.
2019-08-28 21:40:37 I am working for 5 seconds.
2019-08-28 21:40:42 I am working ok.
2019-08-28 21:40:43 I am working ok.
优势:编程简单、直观,对于并发任务的耗时相差不多时,可以直接使用该并发模式。
劣势:每一批次的并发任务,是按照耗时最长的那个任务计算的。如果任务的耗时相差较大,那么使用这种方式会明显的具有“木桶效应”带来的性能降低问题。
轮询模式
#!/bin/bash
function log() {
echo "$(date '+%Y-%m-%d %H:%M:%S')" $@
}
function do_working() {
log "I am working for $1 seconds."
sleep $1
log "I am working ok."
}
function main() {
local i=1
while ((i<=6))
do
local joblist=($(jobs -p))
while ((${#joblist[*]} >= 2))
do
sleep 0.1
joblist=($(jobs -p))
done
do_working $i &
i=$(($i+1))
done
wait
}
main
运行输出:
2019-08-28 21:48:42 I am working for 1 seconds.
2019-08-28 21:48:42 I am working for 2 seconds.
2019-08-28 21:48:43 I am working ok.
2019-08-28 21:48:43 I am working for 3 seconds.
2019-08-28 21:48:44 I am working ok.
2019-08-28 21:48:44 I am working for 4 seconds.
2019-08-28 21:48:46 I am working ok.
2019-08-28 21:48:46 I am working for 5 seconds.
2019-08-28 21:48:48 I am working ok.
2019-08-28 21:48:48 I am working for 6 seconds.
2019-08-28 21:48:51 I am working ok.
2019-08-28 21:48:54 I am working ok.
优势:避免了并发任务运行时间的差异问题,对于性能要求不是特别严格的场景,轮询/睡眠的方式可以解决绝大部分的问题,且这种模式编码简单,逻辑直观。
劣势:轮询的方式检查后台子进程的数量,存在一定的性能损耗。同时,jobs -p
并没有区分子进程的类型。若父进程启动的后台进程中,存在不同类型的任务,那么这种方式就不太适合了。
队列模式
#!/bin/bash
Q_FD=8
Q_SIZE=2
function fifo_init() {
mkfifo $$.fifo.tmp
eval exec "${Q_FD}""<>$$.fifo.tmp"
rm -rf $$.fifo.tmp
trap fifo_clean SIGINT SIGHUP SIGQUIT SIGKILL
for ((i=0; i<${Q_SIZE}; ++i))
do
fifo_release
done
}
function fifo_free() {
log "fifo free ok"
exec 3<&- #关闭文件描述符的读
exec 3>&- #关闭文件描述符的写
}
function fifo_clean() {
log "fifo clean ok"
exec 3<&- #关闭文件描述符的读
exec 3>&- #关闭文件描述符的写
kill 0
kill 9 $$
}
function fifo_acquire() {
read -u${Q_FD}
}
function fifo_release() {
echo >&${Q_FD}
}
function log() {
echo "$(date '+%Y-%m-%d %H:%M:%S')" $@
}
function do_working() {
log "I am working for $1 seconds."
sleep $1
log "I am working ok for $1 seconds."
fifo_release
}
function main() {
fifo_init
for ((i=1; i<=6; ++i))
do
fifo_acquire
do_working $i &
done
wait
fifo_free
}
main
运行结果:
2019-08-28 22:10:16 I am working for 1 seconds.
2019-08-28 22:10:16 I am working for 2 seconds.
2019-08-28 22:10:17 I am working ok for 1 seconds.
2019-08-28 22:10:17 I am working for 3 seconds.
2019-08-28 22:10:18 I am working ok for 2 seconds.
2019-08-28 22:10:18 I am working for 4 seconds.
2019-08-28 22:10:20 I am working ok for 3 seconds.
2019-08-28 22:10:20 I am working for 5 seconds.
2019-08-28 22:10:22 I am working ok for 4 seconds.
2019-08-28 22:10:22 I am working for 6 seconds.
2019-08-28 22:10:25 I am working ok for 5 seconds.
2019-08-28 22:10:28 I am working ok for 6 seconds.
2019-08-28 22:10:28 fifo free ok
优势:使用命名管道的方式,避免了上面轮询模式下的效率问题。通过把命名管道看成是FIFO队列,可以从系统层面实现生产者-消费者模式。应用范围可以覆盖所有的并发场景。
劣势:编程复杂,需要掌握的系统知识较多。
优化
- 并发任务数根据CPU内核的数量,自动推断使用几个后台进程进行并发处理;
- 并发处理的结果,需要通过一个队列保存起来,以便后续的定位和调试跟踪;