【Hive】 HiveServer2 内存溢出总结

1.前言

用户使用Beeline访问HiveServer2 (3.1.2版本) 执行离线SQL任务,持续运行一周后HiveServer2就出现OOM现象,严重影响数据查询与报表产出,经过几轮修复问题终于解决。作者把修复过的问题进行了汇总,避免其他小伙伴再遇到此问题时束手无策。

2.案例

2.1 HIVE-16455

HiveServer2 在使用ADD JAR语句时导致文件句柄泄漏

[root@host-10-17-80-111 ~]# lsof -p 29588 | grep "(deleted)" | wc -l
java    29588 hive  391u   REG              252,3    125987  2099944 /tmp/57d98f5b-1e53-44e2-876b-6b4323ac24db_resources/hive-contrib.jar (deleted)
java    29588 hive  392u   REG              252,3    125987  2099946 /tmp/eb3184ad-7f15-4a77-a10d-87717ae634d1_resources/hive-contrib.jar (deleted)
java    29588 hive  393r   REG              252,3    125987  2099825 /tmp/e29dccfc-5708-4254-addb-7a8988fc0500_resources/hive-contrib.jar (deleted)
java    29588 hive  394r   REG              252,3    125987  2099833 /tmp/5153dd4a-a606-4f53-b02c-d606e7e56985_resources/hive-contrib.jar (deleted)
java    29588 hive  395r   REG              252,3    125987  2099827 /tmp/ff3cdb05-917f-43c0-830a-b293bf397a23_resources/hive-contrib.jar (deleted)
java    29588 hive  396r   REG              252,3    125987  2099822 /tmp/60531b66-5985-421e-8eb5-eeac31fdf964_resources/hive-contrib.jar (deleted)
java    29588 hive  397r   REG              252,3    125987  2099831 /tmp/78878921-455c-438c-9735-447566ed8381_resources/hive-contrib.jar (deleted)
java    29588 hive  399r   REG              252,3    125987  2099835 /tmp/0e5d7990-30cc-4248-9058-587f7f1ff211_resources/hive-contrib.jar (deleted)

2.2 HIVE-24236

不容易复现,只能某些特定条件下可能存在连接泄漏风险

2020-09-29T18:44:26,563 INFO  [Heartbeater-0]: txn.TxnHandler (TxnHandler.java:checkRetryable(3733)) - Non-retryable error in heartbeat(HeartbeatRequest(lockid:0, txnid:11908)) : Cannot get a connection, general error (SQLState=null, ErrorCode=0)
2020-09-29T18:44:26,564 ERROR [Heartbeater-0]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(201)) - MetaException(message:Unable to select from transaction database org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, general error
        at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:118)
        at org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:3605)
        at org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:3598)
        at org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:2739)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:8452)
        at sun.reflect.GeneratedMethodAccessor415.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
        at com.sun.proxy.$Proxy63.heartbeat(Unknown Source)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:3247)
        at sun.reflect.GeneratedMethodAccessor414.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:213)
        at com.sun.proxy.$Proxy64.heartbeat(Unknown Source)
        at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:671)
        at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.lambda$run$0(DbTxnManager.java:1102)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
        at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.run(DbTxnManager.java:1101)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

2.3 HIVE-24552

调用loadDynamicPartitions(Hive.java)时生成多个线程来处理FileMove,这些线程可能会生成HiveMetaStore连接,这些连接可能没有及时关闭造成大量的连接堆积。

2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a connection to metastore, current connections: 43901
2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a connection to metastore, current connections: 43900
2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a connection to metastore, current connections: 43899
2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a connection to metastore, current connections: 43898
2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a connection to metastore, current connections: 43897
2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="transport.TIOStreamTransport" level="WARN" thread="Finalizer"] Error closing output stream.
java.net.SocketException: Socket closed
    at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
    at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
    at java.io.FilterOutputStream.close(FilterOutputStream.java:158)

2.4 HIVE-24858

如果在会话中注册了一个UDF JAR 并从中创建了一个临时函数,当会话关闭时UDFClassLoader不会被GC回收掉。

Class Name                                                                                                                          | Shallow Heap | Retained Heap
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
contextClassLoader org.apache.hive.service.server.ThreadWithGarbageCleanup @ 0x7164deb50  HiveServer2-Handler-Pool: Thread-72 Thread|          128 |        79,072
referent java.util.WeakHashMap$Entry @ 0x7164e67d0                                                                                  |           40 |           824
'- [6] java.util.WeakHashMap$Entry[16] @ 0x71581aac0                                                                                |           80 |         5,056
   '- table java.util.WeakHashMap @ 0x71580f510                                                                                     |           48 |         6,920
      '- CACHE_CLASSES class org.apache.hadoop.conf.Configuration @ 0x71580f3d8                                                     |           64 |        74,528
-------------------------------------------------------------------------------------------------------------------------------------------------------------------

2.5 HIVE-26404

HiveMetaStore无法响应JVM垃圾回收停顿时间长,堆内存org.apache.hadoop.conf.Configuration占用过多存在OOM风险。

 Class Name                                                                             | Shallow Heap | Retained Heap
----------------------------------------------------------------------------------------------------------------------
org.apache.hadoop.fs.FileSystem$Cache @ 0x45403fe70                                    |           32 |   108,671,824
|- <class> class org.apache.hadoop.fs.FileSystem$Cache @ 0x45410c3e0                   |            8 |           544
'- map java.util.HashMap @ 0x453ffb598                                                 |           48 |    92,777,232
   |- <class> class java.util.HashMap @ 0x4520382c8 System Class                       |           40 |           168
   |- entrySet java.util.HashMap$EntrySet @ 0x454077848                                |           16 |            16
   '- table java.util.HashMap$Node[32768] @ 0x463585b68                                |      131,088 |    92,777,168
      |- class java.util.HashMap$Node[] @ 0x4520b7790                                  |            0 |             0
      '- [1786] java.util.HashMap$Node @ 0x451998ce0                                   |           32 |         9,968
         |- <class> class java.util.HashMap$Node @ 0x4520b7728 System Class            |            8 |            32
         '- value org.apache.hadoop.hdfs.DistributedFileSystem @ 0x452990178           |           56 |         4,976
            |- <class> class org.apache.hadoop.hdfs.DistributedFileSystem @ 0x45402e290|            8 |         4,664
            |- uri java.net.URI @ 0x451a05cd0  hdfs://nameservice1                     |           80 |           432
            |- dfs org.apache.hadoop.hdfs.DFSClient @ 0x451f5d9b8                      |          128 |         3,824
            '- conf org.apache.hadoop.hive.conf.HiveConf @ 0x453a34b38                 |           80 |       250,160
----------------------------------------------------------------------------------------------------------------------

2.6 HIVE-22275

单个Hive Session执行多条SQL语时OperationManager.queryIdOperation没有正常清理存在OOM风险

2019-09-13T08:37:36,785 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=dfed4c18-a284-4640-9f4a-1a20527105f9]
2019-09-13T08:37:38,432 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Removed queryId: hive_20190913083736_c49cf3cc-cfe8-48a1-bd22-8b924dfb0396 corresponding to operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=dfed4c18-a284-4640-9f4a-1a20527105f9] with tag: null
2019-09-13T08:37:38,469 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=24d0030c-0e49-45fb-a918-2276f0941cfb]
2019-09-13T08:37:52,662 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=b983802c-1dec-4fa0-8680-d05ab555321b]
2019-09-13T08:37:56,239 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=75dbc531-2964-47b2-84d7-85b59f88999c]
2019-09-13T08:38:30,791 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=b697c801-7da0-4544-bcfa-442eb1d3bd77]
2019-09-13T08:39:10,187 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=bda93c8f-0822-4592-a61c-4701720a1a5c]
2019-09-13T08:39:15,471 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Removed queryId: hive_20190913083910_c4809ca8-d8db-423c-8b6d-fbe3eee89971 corresponding to operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=24d0030c-0e49-45fb-a918-2276f0941cfb] with tag: null
2019-09-13T08:39:15,507 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Removed queryId: hive_20190913083910_c4809ca8-d8db-423c-8b6d-fbe3eee89971 corresponding to operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=b983802c-1dec-4fa0-8680-d05ab555321b] with tag: null
2019-09-13T08:39:15,538 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Removed queryId: hive_20190913083910_c4809ca8-d8db-423c-8b6d-fbe3eee89971 corresponding to operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=75dbc531-2964-47b2-84d7-85b59f88999c] with tag: null

2.7 HIVE-24590

日志输出文件没有正常关闭或删除,Log4j中的RandomAccessFileManager实例占用堆内存空间过多存在OOM风险。
image.png

3.总结

笔者使用HiveServer2版本为3.1.2,由于此版本内存泄漏问题较多,大家可根据上述案例进行编译修复,如遇到其他BUG或性能问题,建议多去社区看看。

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 214,658评论 6 496
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,482评论 3 389
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 160,213评论 0 350
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,395评论 1 288
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,487评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,523评论 1 293
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,525评论 3 414
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,300评论 0 270
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,753评论 1 307
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,048评论 2 330
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,223评论 1 343
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,905评论 5 338
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,541评论 3 322
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,168评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,417评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,094评论 2 365
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,088评论 2 352

推荐阅读更多精彩内容