解读Spark UI

UI 首页

Spark UI 包含了以下几个tabs：

1. Jobs

1.1 Scheduling Mode

https://spark.apache.org/docs/latest/job-scheduling.html

1. Scheduling Across Applications

2. Scheduling Within an Application

FIFO：先进先出，按顺序执行job，并且占有全部资源

Each job is divided into “stages” (e.g. map and reduce phases), and the first job gets priority on all available resources while its stages have tasks to launch, then the second job gets priority, etc

FAIR（from Spark 0.8）:多个job可同时获取资源，

Spark assigns tasks between jobs in a “round robin” fashion, so that all jobs get a roughly equal share of cluster resources. This means that short jobs submitted while a long job is running can start receiving resources right away and still get good response times, without waiting for the long job to finish. This mode is best for multi-user settings.

设置 spark.scheduler.mode=fair,同时也可以配置 pool级别来控制作业等级

spark.scheduler.pool Configuring Pool Properties

1.2 Jobs划分

application 会根据action算子划分，job的数量就是action的数量

https://spark.apache.org/docs/latest/rdd-programming-guide.html#actions

遇宽依赖切割stage

2. Stages

details

1. Total Time Across All Tasks: 0.4 s

a) 当前stage所消耗总时间

2. Locality Level Summary: Process local: 8

a) 本地化级别 https://spark.apache.org/docs/latest/tuning.html#data-locality

3. Shuffle Read: 1533.0 B / 23

4. GC TIME

a) https://spark.apache.org/docs/latest/tuning.html#garbage-collection-tuning、

5. Peak Execution Memory 内存峰值

6. Scheduler Delay 调度时间

其中包括将任务从调度程序发送到执行程序的时间，以及将任务结果从执行程序发送到调度程序的时间

7. Task Deserialization Time 反序列化时间

8. Result Serialization Time 结果返回时将数据序列化时间

3. Storage

4. Environment

5. Executors

6. Executors

7. SQL