Spark是粗粒度的,即在默认情况下会预先分配好资源,再进行计算。
好处是资源提前分配好,有计算任务时就直接使用计算资源,不用再考虑资源分配。
不好的地方是,高峰值和低峰值时需要的资源是不一样的。资源如果是针对高峰值情况下考虑的,那势必在低峰值情况下会有大量的资源浪费。
Twitter最近推出了会秒杀Storm的Heron,非常值得关注。因为Heron能有更好的资源分配、 更低的延迟。Heron在语义上兼容了Storm,即原来在Storm上开发的应用程序可以马上在Heron上使用。Storm绝对要成为历史了。Heron的主要开发语言是C++、Java、Python。其API支持Java。
SparkCore的入口SparkContext:
// Optionally scale number of executors dynamically based on workload. Exposed for testing.
val dynamicAllocationEnabled = Utils.isDynamicAllocationEnabled(_conf)
if (!dynamicAllocationEnabled && _conf.getBoolean("spark.dynamicAllocation.enabled", false)) {
logWarning("Dynamic Allocation and num executors both set, thus dynamic allocation disabled.")
}
_executorAllocationManager =
if (dynamicAllocationEnabled) {
Some(new ExecutorAllocationManager(this, listenerBus, _conf))
} else {
None
}
已经支持资源的动态分配。
Utils.isDynamicAllocationEnabled:
/**
* Return whether dynamic allocation is enabled in the given conf
* Dynamic allocation and explicitly setting the number of executors are inherently
* incompatible. In environments where dynamic allocation is turned on by default,
* the latter should override the former (SPARK-9092).
*/
defisDynamicAllocationEnabled(conf: SparkConf): Boolean = {
conf.getBoolean("spark.dynamicAllocation.enabled", false) &&
conf.getInt("spark.executor.instances", 0) == 0
}
ExecutorAllocationManager:
...
// Clock used to schedule when executors should be added and removed
private var clock: Clock = newSystemClock()
...
有个时钟,基于时钟的定时器会不断的扫描Executor的情况,每过一段时间去看看资源情况。
Master.schedule:
/**
* Schedule the currently available resources among waiting apps. This method will be called
* every time a new app joins or resource availability changes.
*/
private defschedule(): Unit = {
if (state != RecoveryState.ALIVE) { return }
// Drivers take strict precedence over executors
val shuffledWorkers = Random.shuffle(workers) // Randomization helps balance drivers
for (worker <- shuffledWorkers if worker.state == WorkerState.ALIVE) {
for (driver <- waitingDrivers) {
if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) {
launchDriver(worker, driver)
waitingDrivers -= driver
}
}
}
startExecutorsOnWorkers()
}
原先默认的用于分配资源。
ExecutorAllocaionManager:
// Polling loop interval (ms)
private val intervalMillis: Long = 100
...
// A timestamp for each executor of when the executor should be removed, indexed by the ID
// This is set when an executor is no longer running a task, or when it first registers
private valremoveTimes= new mutable.HashMap[String, Long]
...
// Clock used to schedule when executors should be added and removed
private var clock: Clock = new SystemClock()
...
// Executor that handles the scheduling task.
private val executor =
ThreadUtils.newDaemonSingleThreadScheduledExecutor("spark-dynamic-executor-allocation")
...
removeTimes中有Executor的ID。
executor中有定时器,不断执行schedule。默认周期是intervalMillis(100ms)
ExecutorAllocaionManager.start:
/**
* Register for scheduler callbacks to decide when to add and remove executors, and start
* the scheduling task.
*/
defstart(): Unit = {
listenerBus.addListener(listener)
val scheduleTask = newRunnable() {
override def run(): Unit = {
try {
schedule()
} catch {
case ct: ControlThrowable =>
throw ct
case t: Throwable =>
logWarning(s"Uncaught exception in thread ${Thread.currentThread().getName}", t)
}
}
}
executor.scheduleAtFixedRate(scheduleTask, 0,intervalMillis, TimeUnit.MILLISECONDS)
}
ExecutorAllocaionManager.schedule:
/**
* This is called at a fixed interval to regulate the number of pending executor requests
* and number of executors running.
*
* First, adjust our requested executors based on the add time and our current needs.
* Then, if the remove time for an existing executor has expired, kill the executor.
*
* This is factored out into its own method for testing.
*/
private defschedule(): Unit = synchronized {
val now = clock.getTimeMillis
updateAndSyncNumExecutorsTarget(now)
removeTimes.retain { case (executorId, expireTime) =>
val expired = now >= expireTime
if (expired) {
initializing = false
removeExecutor(executorId)
}
!expired
}
}
这个内部方法会被周期性的触发执行。
实际生产环境下,动态资源分配可能要自己做好定制。
SparkStreaming的动态调整的复杂之处在于,即使在batch duration内刚做了调整,但可能本batch duration马上就会过期。
你可以考虑改变执行周期(intervalMills),来动态调整。在一个batchduration中要对数据分片,可以算一下已拥有闲置的core,如果不够,则可以申请增加Executor,从而把任务分配到新增的Executor。
也可以考量针对上一个batchduration的资源需求情况,因为峰值出现时,往往会延续在多个连续的batch duration中。考量上一个batch duration的情况,用某种算法来动态调整后续的batch duration的资源。修改Spark Streaming可以设计StreamingContext的新子类。
其实前面的动态资源分配的定制方式做起来不容易,可能仍不太合适。
备注:
资料来源于:DT_大数据梦工厂(Spark发行版本定制)
更多私密内容,请关注微信公众号:DT_Spark
如果您对大数据Spark感兴趣,可以免费听由王家林老师每天晚上20:00开设的Spark永久免费公开课,地址YY房间号:68917580