SchemaCompatibilityException: Unable to validate the rewritten record

spark 3.2.3
hudi 0.11.0

spark 写hudi,commit失败。.hoodie目录下,有commit.request和inflight,没有commit文件

-rw-r--r--@ 1 lqq  staff  1572  5 23 09:54 20230512145004274.rollback
-rw-r--r--@ 1 lqq  staff     0  5 23 09:54 20230512145004274.rollback.inflight
-rw-r--r--@ 1 lqq  staff  1384  5 23 09:54 20230512145004274.rollback.requested
-rw-r--r--@ 1 lqq  staff     0  5 23 09:54 20230522173618331.commit.requested
-rw-r--r--@ 1 lqq  staff  3123  5 23 09:54 20230522173618331.inflight

查看log,发现有错误日志,但是没有打印具体的错误信息

 ERROR HoodieSparkSqlWriter$: UPSERT failed with errors

继续查看源码查,发现打印具体错误日志为TRACE级别

  private def commitAndPerformPostOperations(spark: SparkSession,
                                             schema: StructType,
                                             writeResult: HoodieWriteResult,
                                             parameters: Map[String, String],
                                             client: SparkRDDWriteClient[HoodieRecordPayload[Nothing]],
                                             tableConfig: HoodieTableConfig,
                                             jsc: JavaSparkContext,
                                             tableInstantInfo: TableInstantInfo
                                            ): (Boolean, common.util.Option[java.lang.String], common.util.Option[java.lang.String]) = {

23/05/25 11:57:45 TRACE HoodieSparkSqlWriter$: Printing out the top 100 errors
........
    } else {
      log.error(s"${tableInstantInfo.operation} failed with errors")
      if (log.isTraceEnabled) {
        log.trace("Printing out the top 100 errors")
        writeResult.getWriteStatuses.rdd.filter(ws => ws.hasErrors)
          .take(100)
          .foreach(ws => {
            log.trace("Global error :", ws.getGlobalError)
            if (ws.getErrors.size() > 0) {
              ws.getErrors.foreach(kt =>
                log.trace(s"Error for key: ${kt._1}", kt._2))
            }
          })
      }
      (false, common.util.Option.empty(), common.util.Option.empty())
    }

降低日志级别(参考https://www.jianshu.com/u/c2bc3695bc47),重跑程序,打印出了具体的错误日志

org.apache.hudi.exception.SchemaCompatibilityException: Unable to validate the rewritten record {"gender": "male",  "id": 708075384135690,  "count": null} against schema {{"name":"id","type":["null","long"],{"name":"gender","type":["null","string"],"default":null},{"name":"count","type":["null","int"],"default":null}}

原因: schema不兼容。 count字段,之前写入hudi的是int类型,新写一批写入是指定为long类型,导致写入失败
解决方法:改回int类型或者删除hudi表重新写入

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

友情链接更多精彩内容