Spark UI 包含了以下几个tabs:
1. Jobs
1.1 Scheduling Mode
https://spark.apache.org/docs/latest/job-scheduling.html
1. Scheduling Across Applications
2. Scheduling Within an Application
FIFO:先进先出,按顺序执行job,并且占有全部资源
Each job is divided into “stages” (e.g. map and reduce phases), and the first job gets priority on all available resources while its stages have tasks to launch, then the second job gets priority, etc
FAIR(from Spark 0.8):多个job可同时获取资源,
Spark assigns tasks between jobs in a “round robin” fashion, so that all jobs get a roughly equal share of cluster resources. This means that short jobs submitted while a long job is running can start receiving resources right away and still get good response times, without waiting for the long job to finish. This mode is best for multi-user settings.
设置 spark.scheduler.mode=fair,同时也可以配置 pool级别来控制作业等级
spark.scheduler.pool Configuring Pool Properties
1.2 Jobs划分
application 会根据action算子划分,job的数量就是action的数量
https://spark.apache.org/docs/latest/rdd-programming-guide.html#actions
2. Stages
1. Total Time Across All Tasks: 0.4 s
a) 当前stage所消耗总时间
2. Locality Level Summary: Process local: 8
a) 本地化级别 https://spark.apache.org/docs/latest/tuning.html#data-locality
3. Shuffle Read: 1533.0 B / 23
4. GC TIME
a) https://spark.apache.org/docs/latest/tuning.html#garbage-collection-tuning、
5. Peak Execution Memory 内存峰值
6. Scheduler Delay 调度时间
其中包括将任务从调度程序发送到执行程序的时间,以及将任务结果从执行程序发送到调度程序的时间
7. Task Deserialization Time 反序列化时间
8. Result Serialization Time 结果返回时将数据序列化时间
3. Storage
4. Environment
5. Executors
6. Executors
7. SQL