高效率代码课程目录
Chapter1. Benchmarking
Chapter2. R语言高效化基础
Chapter3. 查看Code内部
Chapter4. 多线程计算
查看CPU线程数
会用到多线程包parallel
。镜像服务器是8线程。
# Load the parallel package
library(parallel)
# Store the number of cores in the object no_of_cores
no_of_cores <- detectCores()
# Print no_of_cores
no_of_cores
[1] 8
常规parallel
操作流程
detectCores()
[1] 8
# Create a cluster via makeCluster
cl <- makeCluster(2)
# Parallelize this code
parApply(cl,dd, 2, median)
[1] -0.053946179 -0.168234607 -0.056308656 -0.103888726 0.202869314
[6] 0.019541928 -0.258089759 -0.006198904 -0.054646615 0.094430957
# Stop the cluster
stopCluster(cl)
- 指定线程数
- 根据线程数或者实际需要创建cluster
- 用
parApply
等多线程专用指令 - 结束多线程
如果是自己编写的函数的话,还需要多一步传递函数到cluster的步骤
library("parallel")
# Create a cluster via makeCluster (2 cores)
cl <- makeCluster(2)
# Export the play() function to the cluster
clusterExport(cl,"play")
# Re-write sapply as parSapply
res <- parSapply(cl, 1:100, function(i) play())
# Stop the cluster
stopCluster(cl)
最后再举个例子来比较一下多线程和单线程到底差多少。
# Set the number of games to play
no_of_games <- 1e5
## Time serial version
system.time(serial <- sapply(1:no_of_games, function(i) play()))
user system elapsed
9.370 0.016 9.512
## Set up cluster
cl <- makeCluster(4)
clusterExport(cl, "play")
## Time parallel version
system.time(par <- parSapply(cl,1:no_of_games, function(i) play()))
user system elapsed
0.064 0.008 3.216
## Stop cluster
stopCluster(cl)
4线程比单线程快了3倍。
当然并不是什么情况下都是多线程快,大多数需要用到for
循环的情况下,多线程的优势会比较明显。