问题:
最近在测试环境上出现一个随机宕机的问题,而且还没有出现异常日志,唯一的就是tomcat的关闭的日志:
20170919:10:53:46.327 [DubboServerHandler-120.26.209.77:20882-thread-195] [com.souche.knox.rule.async.biz.SynchronousResolver:85] INFO ===规则脚本执行成功===,taskid:831501,结果:null
20170919:10:53:46.356 [DubboServerHandler-120.26.209.77:20882-thread-195] [com.souche.knox.invoke.service.impl.RuleServiceImpl:97] INFO ********RuleService end result:{"code":"tangeche_down_payment_ratio","data":{"dscDownPaymentRatio":18},"ruleId":121,"ruleName":"弹个车首付比例配置规则","ruleVersion":"1.0.0","status":"FINISHED","success":true,"taskId":831501}
20170919:10:53:52.608 [DubboServerHandler-120.26.209.77:20882-thread-197] [com.souche.knox.invoke.service.impl.RuleServiceImpl:65] INFO ********RuleService start ruleCode:tangeche_formula,param:{canzhi=100, carType=1, fee=200, guidePrice=100, repurchasePrice=null, loanPeriod=12, bargainPrice=null}
20170919:10:53:52.630 [DubboServerHandler-120.26.209.77:20882-thread-197] [com.souche.knox.rule.async.biz.SynchronousResolver:85] INFO ===规则脚本执行成功===,taskid:831502,结果:null
20170919:10:53:52.653 [DubboServerHandler-120.26.209.77:20882-thread-197] [com.souche.knox.invoke.service.impl.RuleServiceImpl:97] INFO ********RuleService end result:{"code":"tangeche_formula","data":{"SXXZ":100},"ruleId":152,"ruleName":"弹个车授信公式配置规则","ruleVersion":"2.0.0","status":"FINISHED","success":true,"taskId":831502}
20170919:10:53:52.659 [DubboServerHandler-120.26.209.77:20882-thread-198] [com.souche.knox.invoke.service.impl.RuleServiceImpl:65] INFO ********RuleService start ruleCode:tangeche_down_payment_ratio,param:{alipayDownPaymentRatio=17, idCardNo=430482199105209271}
20170919:10:53:52.678 [DubboServerHandler-120.26.209.77:20882-thread-198] [com.souche.knox.rule.async.biz.SynchronousResolver:85] INFO ===规则脚本执行成功===,taskid:831503,结果:null
20170919:10:53:52.700 [DubboServerHandler-120.26.209.77:20882-thread-198] [com.souche.knox.invoke.service.impl.RuleServiceImpl:97] INFO ********RuleService end result:{"code":"tangeche_down_payment_ratio","data":{"dscDownPaymentRatio":18},"ruleId":121,"ruleName":"弹个车首付比例配置规则","ruleVersion":"1.0.0","status":"FINISHED","success":true,"taskId":831503}
19-Sep-2017 10:53:53.374 INFO [Thread-10] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["http-nio-4025"]
19-Sep-2017 10:53:53.382 INFO [Thread-10] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["ajp-nio-4030"]
20170919:10:53:53.415 [DubboShutdownHook] [org.apache.zookeeper.ZooKeeper:684] INFO Session: 0x15afe4318e5c818 closed
20170919:10:53:53.415 [DEFAULT.handleRunTaskJob_Scheduler_QuartzSchedulerThread-EventThread] [org.apache.zookeeper.ClientCnxn$EventThread:519] INFO EventThread shut down for session: 0x15afe4318e5c818
19-Sep-2017 10:53:53.434 INFO [Thread-10] org.apache.catalina.core.StandardService.stopInternal Stopping service Catalina
20170919:10:53:53.476 [DubboShutdownHook] [org.apache.zookeeper.ZooKeeper:684] INFO Session: 0x15afe4318e5c813 closed
20170919:10:53:53.476 [localhost-startStop-1-EventThread] [org.apache.zookeeper.ClientCnxn$EventThread:519] INFO EventThread shut down for session: 0x15afe4318e5c813
19-Sep-2017 10:53:53.509 WARNING [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [ROOT] appears to have started a thread named [Curator-ConnectionStateManager-0] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403)
org.apache.curator.framework.state.ConnectionStateManager.processEvents(ConnectionStateManager.java:245)
org.apache.curator.framework.state.ConnectionStateManager.access$000(ConnectionStateManager.java:43)
org.apache.curator.framework.state.ConnectionStateManager$1.call(ConnectionStateManager.java:111)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
其中加粗部分是异常的开始点,下面是正常的进行tomcat关闭,
自己的环境配置:
环境:
Java version: 1.8.0_77
Server version: Apache Tomcat/8.0.39
jvm配置:
/opt/souche/java/bin/java
-Djava.util.logging.config.file=/home/souche/server/knox-rule-service/conf/logging.properties
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Djdk.tls.ephemeralDHKeySize=2048
-Djava.protocol.handler.pkgs=org.apache.catalina.webresources
-server
-Xms512m
-Xmx512m
-XX:PermSize=256m
-XX:MaxPermSize=256m
-Xmn256m
-Djava.endorsed.dirs=/home/souche/server/knox-rule-service/endorsed
-classpath /home/souche/server/knox-rule-service/bin/bootstrap.jar:/home/souche/server/knox-rule-service/bin/tomcat-juli.jar
-Dcatalina.base=/home/souche/server/knox-rule-service
-Dcatalina.home=/home/souche/server/knox-rule-service
-Djava.io.tmpdir=/home/souche/server/knox-rule-service/temp
org.apache.catalina.startup.Bootstrap start
********************************************************************************************************************************************
个人思路:
首先排查jvm的方面的东西,于是对jvm做了一些优化,但是还是没有作用,还是会出现这种异常
/opt/souche/java/bin/java
-Djava.util.logging.config.file=/home/souche/server/knox-rule-service/conf/logging.properties
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Djdk.tls.ephemeralDHKeySize=2048
-Djava.protocol.handler.pkgs=org.apache.catalina.webresources
-server
-Xms1024m
-Xmx1024m
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:PermSize=64m
-XX:MaxPermSize=128m
-Xmn512m
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/home/souche/server/knox-rule-service/logs/heapdump/heap.bin
-Xloggc:/home/souche/server/knox-rule-service/logs/gc.log -Djava.endorsed.dirs=/home/souche/server/knox-rule-service/endorsed
-classpath /home/souche/server/knox-rule-service/bin/bootstrap.jar:/home/souche/server/knox-rule-service/bin/tomcat-juli.jar
-Dcatalina.base=/home/souche/server/knox-rule-service -Dcatalina.home=/home/souche/server/knox-rule-service
-Djava.io.tmpdir=/home/souche/server/knox-rule-service/temp
org.apache.catalina.startup.Bootstrap start
这里修改主要是参考链接
http://vekergu.blog.51cto.com/9966832/1619640
这里主要是对堆的分配和GC日志定位添加日志和进行定位,但是发现,我这里的问题并不是上面的这个,我这里的打印都是很正常的
2017-09-19T22:01:06.826+0800: 3833.058: [GC (Allocation Failure) [PSYoungGen: 523661K->333K(523776K)] 768894K->245566K(1048064K), 0.0046172 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
2017-09-19T22:01:17.471+0800: 3843.702: [GC (Allocation Failure) [PSYoungGen: 523597K->410K(523776K)] 768830K->245659K(1048064K), 0.0055236 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
2017-09-19T22:01:18.107+0800: 3844.339: [GC (Allocation Failure) [PSYoungGen: 523674K->506K(523776K)] 768923K->245771K(1048064K), 0.0070453 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
2017-09-19T22:01:18.766+0800: 3844.998: [GC (Allocation Failure) [PSYoungGen: 523770K->346K(522752K)] 769035K->245611K(1047040K), 0.0053823 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
2017-09-19T22:01:29.408+0800: 3855.640: [GC (Allocation Failure) [PSYoungGen: 522586K->365K(523264K)] 767851K->245630K(1047552K), 0.0049888 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
2017-09-19T22:01:30.078+0800: 3856.310: [GC (Allocation Failure) [PSYoungGen: 522605K->384K(523776K)] 767870K->245657K(1048064K), 0.0054023 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
太多就不截取了
根据异常关键字Pausing ProtocolHandler 搜索到相关的文章
后来参考了几个文章
https://stackoverflow.com/questions/30759220/tomcat8-shutdown-randomly-with-abstractprotocol-pause
上面这个跟自己的问题很像,但是我们这边没有办法更改权限。他这里将web用户更改为root之后就Ok ,但是我这边是测试环境有问题,但是线上的没有,线上的和本地测试的权限都是一样的。
问题遗留。。。。。。。
参考链接
https://link.zhihu.com/?target=http%3A//hongjiang.info/why-kill-2-cannot-stop-tomcat/
http://vekergu.blog.51cto.com/9966832/1619640
https://stackoverflow.com/questions/30759220/tomcat8-shutdown-randomly-with-abstractprotocol-pause
http://www.365mini.com/page/tomcat-connector-mode.htm (tomcat优化)