ElasticSearch异常处理案例

解释索引 app-log-2024.05.20.18 分片 0 未分配的原因:
{"index":"app-log-2024.05.20.18","shard":0,"primary":true,"current_state":"started","current_node":{"id":"lkdWWSzjS9iGFcvZaaVvrA","name":"elk-02.zgzf.com","transport_address":"172.19.70.11:9800","attributes":{"ml.machine_memory":"66854977536","ml.max_open_jobs":"20","xpack.installed":"true"},"weight_ranking":2},"can_remain_on_current_node":"yes","can_rebalance_cluster":"no","can_rebalance_cluster_decisions":[{"decider":"rebalance_only_when_active","decision":"NO","explanation":"rebalancing is not allowed until all replicas in the cluster are active"},{"decider":"cluster_rebalance","decision":"NO","explanation":"the cluster has unassigned shards and cluster setting [cluster.routing.allocation.allow_rebalance] is set to [indices_all_active]"}],"can_rebalance_to_other_node":"no","rebalance_explanation":"rebalancing is not allowed, even though there is at least one node on which the shard can be allocated","node_allocation_decisions":[{"node_id":"Fyk0dv4hSUaqHVMqsAqXdg","node_name":"elk-01.zgzf.com","transport_address":"172.19.70.10:9800","node_attributes":{"ml.machine_memory":"66854977536","ml.max_open_jobs":"20","xpack.installed":"true"},"node_decision":"yes","weight_ranking":1},{"node_id":"NwmszklQTqqAQ1p9xzYXZw","node_name":"elk-03.zgzf.com","transport_address":"172.19.70.12:9800","node_attributes":{"ml.machine_memory":"66854977536","ml.max_open_jobs":"20","xpack.installed":"true"},"node_decision":"worse_balance","weight_ranking":3}]}

调整集群配置

如果您确信所有节点都正常且准备好接受分片,可以调整集群设置以允许重新平衡。可以临时修改 cluster.routing.allocation.allow_rebalance 设置:

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.allow_rebalance": "always"
  }
}

恢复默认设置

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.allow_rebalance": indices_all_active
  }
}
解释索引 app-log-2024.05.19.09 分片 1 未分配的原因:
{"index":"app-log-2024.05.19.09","shard":1,"primary":false,"current_state":"unassigned","unassigned_info":{"reason":"ALLOCATION_FAILED","at":"2024-06-03T13:48:36.982Z","failed_allocation_attempts":5,"details":"failed shard on node [Fyk0dv4hSUaqHVMqsAqXdg]: failed recovery, failure RecoveryFailedException[[app-log-2024.05.19.09][1]: Recovery failed from {elk-02.zgzf.com}{lkdWWSzjS9iGFcvZaaVvrA}{VcX4s-mXTt65du9F6NI5XA}{172.19.70.11}{172.19.70.11:9800}{dilm}{ml.machine_memory=66854977536, ml.max_open_jobs=20, xpack.installed=true} into {elk-01.zgzf.com}{Fyk0dv4hSUaqHVMqsAqXdg}{QvoljIiTSOaGvOvzXuObkw}{172.19.70.10}{172.19.70.10:9800}{dilm}{ml.machine_memory=66854977536, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[elk-02.zgzf.com][172.19.70.11:9800][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[elk-01.zgzf.com][172.19.70.10:9800][internal:index/shard/recovery/prepare_translog]]; nested: EngineCreationFailureException[failed to open reader on writer]; nested: FileSystemException[/zgapp/data/elasticsearch7/nodes/0/indices/T8UUt8CWRd6XZUyHJu8skA/1/index/_1bo_Lucene84_0.tim: Too many open files]; ","last_allocation_status":"no_attempt"},"can_allocate":"no","allocate_explanation":"cannot allocate because allocation is not permitted to any of the nodes","node_allocation_decisions":[{"node_id":"Fyk0dv4hSUaqHVMqsAqXdg","node_name":"elk-01.zgzf.com","transport_address":"172.19.70.10:9800","node_attributes":{"ml.machine_memory":"66854977536","ml.max_open_jobs":"20","xpack.installed":"true"},"node_decision":"no","deciders":[{"decider":"max_retry","decision":"NO","explanation":"shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2024-06-03T13:48:36.982Z], failed_attempts[5], failed_nodes[[Fyk0dv4hSUaqHVMqsAqXdg]], delayed=false, details[failed shard on node [Fyk0dv4hSUaqHVMqsAqXdg]: failed recovery, failure RecoveryFailedException[[app-log-2024.05.19.09][1]: Recovery failed from {elk-02.zgzf.com}{lkdWWSzjS9iGFcvZaaVvrA}{VcX4s-mXTt65du9F6NI5XA}{172.19.70.11}{172.19.70.11:9800}{dilm}{ml.machine_memory=66854977536, ml.max_open_jobs=20, xpack.installed=true} into {elk-01.zgzf.com}{Fyk0dv4hSUaqHVMqsAqXdg}{QvoljIiTSOaGvOvzXuObkw}{172.19.70.10}{172.19.70.10:9800}{dilm}{ml.machine_memory=66854977536, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[elk-02.zgzf.com][172.19.70.11:9800][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[elk-01.zgzf.com][172.19.70.10:9800][internal:index/shard/recovery/prepare_translog]]; nested: EngineCreationFailureException[failed to open reader on writer]; nested: FileSystemException[/zgapp/data/elasticsearch7/nodes/0/indices/T8UUt8CWRd6XZUyHJu8skA/1/index/_1bo_Lucene84_0.tim: Too many open files]; ], allocation_status[no_attempt]]]"}]},{"node_id":"NwmszklQTqqAQ1p9xzYXZw","node_name":"elk-03.zgzf.com","transport_address":"172.19.70.12:9800","node_attributes":{"ml.machine_memory":"66854977536","ml.max_open_jobs":"20","xpack.installed":"true"},"node_decision":"no","deciders":[{"decider":"max_retry","decision":"NO","explanation":"shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2024-06-03T13:48:36.982Z], failed_attempts[5], failed_nodes[[Fyk0dv4hSUaqHVMqsAqXdg]], delayed=false, details[failed shard on node [Fyk0dv4hSUaqHVMqsAqXdg]: failed recovery, failure RecoveryFailedException[[app-log-2024.05.19.09][1]: Recovery failed from {elk-02.zgzf.com}{lkdWWSzjS9iGFcvZaaVvrA}{VcX4s-mXTt65du9F6NI5XA}{172.19.70.11}{172.19.70.11:9800}{dilm}{ml.machine_memory=66854977536, ml.max_open_jobs=20, xpack.installed=true} into {elk-01.zgzf.com}{Fyk0dv4hSUaqHVMqsAqXdg}{QvoljIiTSOaGvOvzXuObkw}{172.19.70.10}{172.19.70.10:9800}{dilm}{ml.machine_memory=66854977536, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[elk-02.zgzf.com][172.19.70.11:9800][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[elk-01.zgzf.com][172.19.70.10:9800][internal:index/shard/recovery/prepare_translog]]; nested: EngineCreationFailureException[failed to open reader on writer]; nested: FileSystemException[/zgapp/data/elasticsearch7/nodes/0/indices/T8UUt8CWRd6XZUyHJu8skA/1/index/_1bo_Lucene84_0.tim: Too many open files]; ], allocation_status[no_attempt]]]"}]},{"node_id":"lkdWWSzjS9iGFcvZaaVvrA","node_name":"elk-02.zgzf.com","transport_address":"172.19.70.11:9800","node_attributes":{"ml.machine_memory":"66854977536","ml.max_open_jobs":"20","xpack.installed":"true"},"node_decision":"no","deciders":[{"decider":"max_retry","decision":"NO","explanation":"shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2024-06-03T13:48:36.982Z], failed_attempts[5], failed_nodes[[Fyk0dv4hSUaqHVMqsAqXdg]], delayed=false, details[failed shard on node [Fyk0dv4hSUaqHVMqsAqXdg]: failed recovery, failure RecoveryFailedException[[app-log-2024.05.19.09][1]: Recovery failed from {elk-02.zgzf.com}{lkdWWSzjS9iGFcvZaaVvrA}{VcX4s-mXTt65du9F6NI5XA}{172.19.70.11}{172.19.70.11:9800}{dilm}{ml.machine_memory=66854977536, ml.max_open_jobs=20, xpack.installed=true} into {elk-01.zgzf.com}{Fyk0dv4hSUaqHVMqsAqXdg}{QvoljIiTSOaGvOvzXuObkw}{172.19.70.10}{172.19.70.10:9800}{dilm}{ml.machine_memory=66854977536, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[elk-02.zgzf.com][172.19.70.11:9800][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[elk-01.zgzf.com][172.19.70.10:9800][internal:index/shard/recovery/prepare_translog]]; nested: EngineCreationFailureException[failed to open reader on writer]; nested: FileSystemException[/zgapp/data/elasticsearch7/nodes/0/indices/T8UUt8CWRd6XZUyHJu8skA/1/index/_1bo_Lucene84_0.tim: Too many open files]; ], allocation_status[no_attempt]]]"},{"decider":"same_shard","decision":"NO","explanation":"the shard cannot be allocated to the same node on which a copy of the shard already exists [[app-log-2024.05.19.09][1], node[lkdWWSzjS9iGFcvZaaVvrA], [P], s[STARTED], a[id=luKIA-plTle91E77OPn1GA]]"}]}]}

主要问题是恢复过程中打开文件过多(Too many open files),导致分片无法成功分配。
检查并增加文件描述符限制

ulimit -n 检查
ulimit -n 65536 临时修改
/etc/security/limits.conf 持久修改
* soft nofile 65536
* hard nofile 65536

增加文件描述符限制后,手动重试分片分配

POST /_cluster/reroute?retry_failed=true

在重试分配后,监控集群健康状态,确保所有分片正常分配:

GET /_cluster/health
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 221,273评论 6 515
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 94,349评论 3 398
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 167,709评论 0 360
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 59,520评论 1 296
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 68,515评论 6 397
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 52,158评论 1 308
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,755评论 3 421
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,660评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 46,203评论 1 319
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 38,287评论 3 340
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 40,427评论 1 352
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 36,122评论 5 349
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,801评论 3 333
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 32,272评论 0 23
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,393评论 1 272
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,808评论 3 376
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 45,440评论 2 359

推荐阅读更多精彩内容