【PDF大放送】Spark&Hadoop Summit精选分享PDF合集-博客-云栖社区-阿里云 https://yq.aliyun.com/articles/72207?spm=5176.100239.blogcont71098.13.Kt7Srt
//下载链接
【Spark Summit East 2017】Debugging PySpark
//p13
● Error messages reported to the console*
● Log messages reported to the console*
● Log messages on the workers - access through the
Spark Web UI or Spark History Server :)
//p16
● Use yarn logs to get logs after log collection
● Or set up the Spark history server
● Or yarn.nodemanager.delete.debug-delay-sec :)
//p17
Most of the time it tells you things you already know
● Or don’t need to know
● You can dynamically control the log level with
sc.setLogLevel
//p25
//p28
Regardless of language
● Can be difficult to determine which element failed
● Stack trace sometimes helps (it did this time)
● take(1) + count() are your friends - but a lot of work :(
//p30
● spark-testing-base is on pip now for your happy test
adventures
//p31
Adding your own logging:
● Java users use Log4J & friends
● Python users: use logging library (or even print!)