当以cluster/client运行spark时候,运行在如下所示,没有任何异常报错。
16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO scheduler.DAGScheduler: Registering RDD 1 (map at UserAction.scala:598)
16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO scheduler.DAGScheduler: Got job 0 (collect at UserAction.scala:609) with 1 output partitions
16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (collect at UserAction.scala:609)
16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0)
16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[1] at map at UserAction.scala:598), which has no missing parents
16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.8 KB, free 365.9 MB)
16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KB, free 365.9 MB)
16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.81.77.67:17664 (size: 2.3 KB, free: 366.3 MB)
16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[1] at map at UserAction.scala:598) (first 15 tasks are for partitions Vector(0))
16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO cluster.YarnScheduler: Adding task set 0.0 with 1 tasks
16-11-2018 15:14:37 CST noah-dp-spark INFO - 18/11/16 15:14:37 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1)
16-11-2018 15:14:41 CST noah-dp-spark INFO - 18/11/16 15:14:41 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.81.174.117:39678) with ID 1
16-11-2018 15:14:41 CST noah-dp-spark INFO - 18/11/16 15:14:41 INFO spark.ExecutorAllocationManager: New executor 1 has registered (new total is 1)
16-11-2018 15:14:41 CST noah-dp-spark INFO - 18/11/16 15:14:41 INFO storage.BlockManagerMasterEndpoint: Registering block manager hadoop-slave1:46294 with 366.3 MB RAM, BlockManagerId(1, hadoop-slave1, 46294, None)
16-11-2018 15:14:41 CST noah-dp-spark INFO - 18/11/16 15:14:41 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, hadoop-slave1, executor 1, partition 0, RACK_LOCAL, 5811 bytes)
16-11-2018 15:14:41 CST noah-dp-spark INFO - 18/11/16 15:14:41 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on hadoop-slave1:46294 (size: 2.3 KB, free: 366.3 MB)
16-11-2018 15:14:43 CST noah-dp-spark INFO - 18/11/16 15:14:43 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop-slave1:46294 (size: 32.8 KB, free: 366.3 MB)
接下来就是找日志,发现卡在hadoop-slave1
节点上,那么我们去hadoop-slave1
上去找日志信息。
spark on yarn模式下一个executor对应yarn的一个container,所以在executor的节点运行ps -ef|grep spark.yarn.app.container.log.dir
,如果这个节点上可能运行多个application,那么再通过application id进一步过滤。上面的命令会查到executor的进程信息,并且包含了日志路径,例如
-Djava.io.tmpdir=/data1/hadoop/yarn/local/usercache/ocdp/appcache/application_1521424748238_0051/container_e07_1521424748238_0051_01_000002/tmp '
-Dspark.history.ui.port=18080' '-Dspark.driver.port=59555'
-Dspark.yarn.app.container.log.dir=/data1/hadoop/yarn/log/application_1521424748238_0051/container_e07_1521424748238_0051_01_000002
也就是说这个executor的日志就在/data1/hadoop/yarn/log/application_1521424748238_0051/container_e07_1521424748238_0051_01_000002目录里。至此,我们就找到了运行时的executor日志。
另外还遇到个问题,我在以cluster模式启动的时候,14秒左右就fail了,想看container里面的日志,结果被删除了,原因是默认运行结束删除,我在CDH中修改了yarn的配置yarn.nodemanager.delete.debug-delay-sec = 1000
修改该配置即可,你就能看到运行完的debug log记录了。
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
10-10-2019 15:52:42 CST noah-dp-spark INFO - at org.apache.spark.deploy.SparkSubmitArguments.handle(SparkSubmitArguments.scala:410)
10-10-2019 15:52:42 CST noah-dp-spark INFO - at org.apache.spark.launcher.SparkSubmitOptionParser.parse(SparkSubmitOptionParser.java:163)
10-10-2019 15:52:42 CST noah-dp-spark INFO - at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:104)
10-10-2019 15:52:42 CST noah-dp-spark INFO - at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
10-10-2019 15:52:42 CST noah-dp-spark INFO - at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
10-10-2019 15:52:42 CST noah-dp-spark INFO - Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
10-10-2019 15:52:42 CST noah-dp-spark INFO - at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
10-10-2019 15:52:42 CST noah-dp-spark INFO - at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
10-10-2019 15:52:42 CST noah-dp-spark INFO - at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
10-10-2019 15:52:42 CST noah-dp-spark INFO - at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
10-10-2019 15:52:42 CST noah-dp-spark INFO - ... 5 more
解决:https://blog.csdn.net/wiki347552913/article/details/88605749