频繁产生对象造成gc时间过长案例分析

本文主要分析一个频繁产生对象造成gc时间过长的case。

症状及分析

gc时间过长,平均gc pause的时间要将近4秒,有13%的gc超过10秒,太可怕了,部分gc日志如下:

[PSYoungGen: 457878K->126656K(489472K)] 1746043K->1453131K(1887744K), 12.1965757 secs] [Times: user=5.59 sys=0.52, real=12.19 secs] 
154415.774: [GC (Allocation Failure) 
Desired survivor size 212860928 bytes, new threshold 1 (max 15)
[PSYoungGen: 376192K->65968K(484864K)] 1702667K->1392499K(1883136K), 0.1665513 secs] [Times: user=0.10 sys=0.00, real=0.17 secs] 
154416.838: [GC (Allocation Failure) 
Desired survivor size 235929600 bytes, new threshold 1 (max 15)
[PSYoungGen: 341424K->196182K(445952K)] 1667955K->1523034K(1844224K), 1.7996294 secs] [Times: user=0.89 sys=0.03, real=1.80 secs] 
154419.456: [GC (Allocation Failure) 
Desired survivor size 225968128 bytes, new threshold 1 (max 15)
[PSYoungGen: 434262K->121776K(468480K)] 1761114K->1486938K(1866752K), 23.0844304 secs] [Times: user=10.75 sys=0.81, real=23.09 secs] 
154442.541: [Full GC (Ergonomics) [PSYoungGen: 121776K->0K(468480K)] [ParOldGen: 1365162K->208108K(1398272K)] 1486938K->208108K(1866752K), [Metaspace: 93615K->93615K(1132544K)], 23.5955214 secs] [Times: user=3.76 sys=5.30, real=23.59 secs] 
154504.670: [GC (Allocation Failure) 
Desired survivor size 217579520 bytes, new threshold 1 (max 15)
[PSYoungGen: 200553K->106368K(486400K)] 408662K->314476K(1884672K), 1.0664613 secs] [Times: user=0.39 sys=0.13, real=1.06 secs] 
154507.542: [GC (Allocation Failure) 
Desired survivor size 218103808 bytes, new threshold 1 (max 15)
[PSYoungGen: 372096K->144182K(478208K)] 580204K->425927K(1876480K), 5.7999561 secs] [Times: user=1.12 sys=1.55, real=5.80 secs] 
154514.037: [GC (Allocation Failure) 
Desired survivor size 213909504 bytes, new threshold 1 (max 15)
[PSYoungGen: 409910K->87920K(489984K)] 691655K->407999K(1888256K), 10.1020217 secs] [Times: user=4.46 sys=0.61, real=10.11 secs] 
154563.240: [GC (Allocation Failure) 
Desired survivor size 213385216 bytes, new threshold 1 (max 15)
[PSYoungGen: 328380K->65952K(485888K)] 648460K->386087K(1884160K), 0.0918412 secs] [Times: user=0.04 sys=0.01, real=0.09 secs] 
154564.037: [GC (Allocation Failure) 
Desired survivor size 219676672 bytes, new threshold 1 (max 15)
[PSYoungGen: 342944K->153558K(478208K)] 663079K->474022K(1876480K), 3.1948641 secs] [Times: user=0.72 sys=0.69, real=3.19 secs] 
154568.135: [GC (Allocation Failure) 
Desired survivor size 212336640 bytes, new threshold 1 (max 15)
[PSYoungGen: 423382K->98528K(484352K)] 743846K->457302K(1882624K), 13.4085860 secs] [Times: user=6.04 sys=0.69, real=13.41 secs] 

通过jmap dump下内存之后,使用mat分享,查看thread_overview


mat

可以看到ElasticsearchJestHealthIndicator.doHealthCheck持有了很多对象没释放

调试及复现

本地复现

spring-context-4.3.7.RELEASE-sources.jar!/org/springframework/jmx/export/SpringModelMBean.java

    /**
     * Switches the {@link Thread#getContextClassLoader() context ClassLoader} for the
     * managed resources {@link ClassLoader} before allowing the invocation to occur.
     * @see javax.management.modelmbean.ModelMBean#invoke
     */
    @Override
    public Object invoke(String opName, Object[] opArgs, String[] sig)
            throws MBeanException, ReflectionException {

        ClassLoader currentClassLoader = Thread.currentThread().getContextClassLoader();
        try {
            Thread.currentThread().setContextClassLoader(this.managedResourceClassLoader);
            return super.invoke(opName, opArgs, sig);
        }
        finally {
            Thread.currentThread().setContextClassLoader(currentClassLoader);
        }
    }

spring-boot-actuator-1.4.5.RELEASE-sources.jar!/org/springframework/boot/actuate/endpoint/jmx/DataEndpointMBean.java

@ManagedResource
public class DataEndpointMBean extends EndpointMBean {

    /**
     * Create a new {@link DataEndpointMBean} instance.
     * @param beanName the bean name
     * @param endpoint the endpoint to wrap
     * @param objectMapper the {@link ObjectMapper} used to convert the payload
     */
    public DataEndpointMBean(String beanName, Endpoint<?> endpoint,
            ObjectMapper objectMapper) {
        super(beanName, endpoint, objectMapper);
    }

    @ManagedAttribute(description = "Invoke the underlying endpoint")
    public Object getData() {
        return convert(getEndpoint().invoke());
    }

}

系本地idea开启了Enable JMX Agent才可以复现

healthEndpoint

spring-boot-admin-server-1.4.6-sources.jar!/de/codecentric/boot/admin/registry/StatusUpdateApplicationListener.java

    private long updatePeriod = 10_000L;

    public void startStatusUpdate() {
        if (scheduledTask != null && !scheduledTask.isDone()) {
            return;
        }

        scheduledTask = taskScheduler.scheduleAtFixedRate(new Runnable() {
            @Override
            public void run() {
                statusUpdater.updateStatusForAllApplications();
            }
        }, updatePeriod);
        LOGGER.debug("Scheduled status-updater task for every {}ms", updatePeriod);

    }

spring-boot-admin-server-1.4.6-sources.jar!/de/codecentric/boot/admin/registry/StatusUpdater.java

    public void updateStatusForAllApplications() {
        long now = System.currentTimeMillis();
        for (Application application : store.findAll()) {
            if (now - statusLifetime > application.getStatusInfo().getTimestamp()) {
                updateStatus(application);
            }
        }
    }

    public void updateStatus(Application application) {
        StatusInfo oldStatus = application.getStatusInfo();
        StatusInfo newStatus = queryStatus(application);

        Application newState = Application.create(application).withStatusInfo(newStatus).build();
        store.save(newState);

        if (!newStatus.equals(oldStatus)) {
            publisher.publishEvent(
                    new ClientApplicationStatusChangedEvent(newState, oldStatus, newStatus));
        }
    }

    protected StatusInfo queryStatus(Application application) {
        LOGGER.trace("Updating status for {}", application);

        try {
            @SuppressWarnings("unchecked")
            ResponseEntity<Map<String, Object>> response = restTemplate.getForEntity(
                    application.getHealthUrl(), (Class<Map<String, Object>>) (Class<?>) Map.class);
            LOGGER.debug("/health for {} responded with {}", application, response);

            if (response.hasBody() && response.getBody().get("status") instanceof String) {
                return StatusInfo.valueOf((String) response.getBody().get("status"));
            } else if (response.getStatusCode().is2xxSuccessful()) {
                return StatusInfo.ofUp();
            } else {
                return StatusInfo.ofDown();
            }

        } catch (Exception ex) {
            if ("OFFLINE".equals(application.getStatusInfo().getStatus())) {
                LOGGER.debug("Couldn't retrieve status for {}", application, ex);
            } else {
                LOGGER.warn("Couldn't retrieve status for {}", application, ex);
            }
            return StatusInfo.ofOffline();
        }
    }

可以看到这个admin-server注册了个定时任务,定时调用/health

问题分析

ElasticsearchHealthIndicatorConfiguration

spring-boot-actuator-1.4.5.RELEASE-sources.jar!/org/springframework/boot/actuate/autoconfigure/ElasticsearchHealthIndicatorConfiguration.java

class ElasticsearchHealthIndicatorConfiguration {

    @Configuration
    @ConditionalOnBean(Client.class)
    @ConditionalOnEnabledHealthIndicator("elasticsearch")
    @EnableConfigurationProperties(ElasticsearchHealthIndicatorProperties.class)
    static class ElasticsearchClientHealthIndicatorConfiguration extends
            CompositeHealthIndicatorConfiguration<ElasticsearchHealthIndicator, Client> {

        private final Map<String, Client> clients;

        private final ElasticsearchHealthIndicatorProperties properties;

        ElasticsearchClientHealthIndicatorConfiguration(Map<String, Client> clients,
                ElasticsearchHealthIndicatorProperties properties) {
            this.clients = clients;
            this.properties = properties;
        }

        @Bean
        @ConditionalOnMissingBean(name = "elasticsearchHealthIndicator")
        public HealthIndicator elasticsearchHealthIndicator() {
            return createHealthIndicator(this.clients);
        }

        @Override
        protected ElasticsearchHealthIndicator createHealthIndicator(Client client) {
            return new ElasticsearchHealthIndicator(client, this.properties);
        }

    }

    @Configuration
    @ConditionalOnBean(JestClient.class)
    @ConditionalOnEnabledHealthIndicator("elasticsearch")
    static class ElasticsearchJestHealthIndicatorConfiguration extends
            CompositeHealthIndicatorConfiguration<ElasticsearchJestHealthIndicator, JestClient> {

        private final Map<String, JestClient> clients;

        ElasticsearchJestHealthIndicatorConfiguration(Map<String, JestClient> clients) {
            this.clients = clients;
        }

        @Bean
        @ConditionalOnMissingBean(name = "elasticsearchHealthIndicator")
        public HealthIndicator elasticsearchHealthIndicator() {
            return createHealthIndicator(this.clients);
        }

        @Override
        protected ElasticsearchJestHealthIndicator createHealthIndicator(
                JestClient client) {
            return new ElasticsearchJestHealthIndicator(client);
        }

    }

}

ElasticsearchHealthIndicator

spring-boot-actuator-1.4.5.RELEASE-sources.jar!/org/springframework/boot/actuate/health/ElasticsearchHealthIndicator.java

    @Override
    protected void doHealthCheck(Health.Builder builder) throws Exception {
        List<String> indices = this.properties.getIndices();
        ClusterHealthResponse response = this.client.admin().cluster()
                .health(Requests.clusterHealthRequest(indices.isEmpty() ? allIndices
                        : indices.toArray(new String[indices.size()])))
                .actionGet(this.properties.getResponseTimeout());

        switch (response.getStatus()) {
        case GREEN:
        case YELLOW:
            builder.up();
            break;
        case RED:
        default:
            builder.down();
            break;
        }
        builder.withDetail("clusterName", response.getClusterName());
        builder.withDetail("numberOfNodes", response.getNumberOfNodes());
        builder.withDetail("numberOfDataNodes", response.getNumberOfDataNodes());
        builder.withDetail("activePrimaryShards", response.getActivePrimaryShards());
        builder.withDetail("activeShards", response.getActiveShards());
        builder.withDetail("relocatingShards", response.getRelocatingShards());
        builder.withDetail("initializingShards", response.getInitializingShards());
        builder.withDetail("unassignedShards", response.getUnassignedShards());
    }

spring-boot-actuator-1.4.5.RELEASE-sources.jar!/org/springframework/boot/actuate/health/ElasticsearchHealthIndicatorProperties.java

@ConfigurationProperties(prefix = "management.health.elasticsearch", ignoreUnknownFields = false)
public class ElasticsearchHealthIndicatorProperties {

    /**
     * Comma-separated index names.
     */
    private List<String> indices = new ArrayList<String>();

    /**
     * Time, in milliseconds, to wait for a response from the cluster.
     */
    private long responseTimeout = 100L;

    //......

}

如果有指定indices,则会查询他们的健康情况,比如/index1,index2/_stats,如果没有则查询所有的indices,这个就是个潜在的坑。

ElasticsearchJestHealthIndicator

spring-boot-actuator-1.4.5.RELEASE-sources.jar!/org/springframework/boot/actuate/health/ElasticsearchJestHealthIndicator.java

    @Override
    protected void doHealthCheck(Health.Builder builder) throws Exception {
        JestResult aliases = this.jestClient.execute(new Stats.Builder().build());
        JsonElement root = this.jsonParser.parse(aliases.getJsonString());
        JsonObject shards = root.getAsJsonObject().get("_shards").getAsJsonObject();
        int failedShards = shards.get("failed").getAsInt();
        if (failedShards != 0) {
            builder.outOfService();
        }
        else {
            builder.up();
        }
    }

jest最后发出的http请求是/_all/_stats,这个就是问题所在,查询所有_all的统计数据,对于一个大的elasticsearch平台来说,返回的数据是巨大的,将近5000条数据,返回的json纯文件都要20多M,这个再加上定时任务/health查询,导致新生对象不断产生,ygc非常频繁,造成内存泄露的现象。

小结

定时监控应用health是个好东东,但是得注意频率,另外还得关注具体实现,像elasticsearch的这个稍不注意就被坑了,相当于定时产生一定量的垃圾,频率超过垃圾回收的速度,类似内存泄露,给应用gc带来很大的负担。

doc

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 213,711评论 6 493
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,079评论 3 387
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 159,194评论 0 349
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,089评论 1 286
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,197评论 6 385
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,306评论 1 292
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,338评论 3 412
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,119评论 0 269
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,541评论 1 306
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,846评论 2 328
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,014评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,694评论 4 337
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,322评论 3 318
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,026评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,257评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,863评论 2 365
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,895评论 2 351

推荐阅读更多精彩内容