(文章内容部分来源自于网络,如有侵权,请告知删除)
前言
去年负责的一个项目老出现线程阻塞,线上三天两头服务内存溢出问题,于是开始了一系列的优化之旅。现在,将这段时间所学的东西进行总结。
-
基本概念:
-
dump基本概念
在故障定位(尤其是out of memory)和性能分析的时候,经常会用到一些文件来帮助我们排除代码问题。这些文件记录了JVM运行期间的内存占用、线程执行等情况,这就是我们常说的dump文件。常用的有heap dump和thread dump(也叫javacore,或java dump)。我们可以这么理解:heap dump记录内存信息的,thread dump是记录CPU信息的。-
thread dump
- 主要记录JVM实例在某一时刻各个线程执行的情况,以栈的形式显示调用信息。thread dump是一个文本文件,通过对thread dump文件的分析,可以分析出程序各个线程调用以及各个线程所处的状态与锁的情况,从而定位具体线上的问题线程所发生的调用过程与代码问题,从而进行进一步的修改调整
-
thread dump
-
dump基本概念
jstack
1.介绍
-
jstack是java虚拟机自带的一种堆栈跟踪工具。jstack用于打印出给定的java进程ID或core file或远程调试服务的Java堆栈信息,其命令如下:
[root@dev18 ~]# jstack -h Usage: jstack [-l] <pid> (to connect to running process) jstack -F [-m] [-l] <pid> (to connect to a hung process) jstack [-m] [-l] <executable> <core> (to connect to a core file) jstack [-m] [-l] [server_id@]<remote server IP or hostname> (to connect to a remote debug server) Options: -F to force a thread dump. Use when jstack <pid> does not respond (process is hung) -m to print both java and native frames (mixed mode) -l long listing. Prints additional information about locks -h or -help to print this help message
-
基本语法:
jstack [option] <pid>
-
主要功能:
- 针对活着的进程做本地的或远程的线程dump
- 针对core文件做线程dump
jstack用于生成java虚拟机当前时刻的线程快照。线程快照是当前java虚拟机内每一条线程正在执行的方法堆栈的集合,生成线程快照的主要目的是定位线程出现长时间停顿的原因,如线程间死锁、死循环、请求外部资源导致的长时间等待等。 线程出现停顿的时候通过jstack来查看各个线程的调用堆栈,就可以知道没有响应的线程到底在后台做什么事情,或者等待什么资源。 如果java程序崩溃生成core文件,jstack工具可以用来获得core文件的java stack和native stack的信息,从而可以轻松地知道java程序是如何崩溃和在程序何处发生问题。另外,jstack工具还可以附属到正在运行的java程序中,看到当时运行的java程序的java stack和native stack的信息, 如果现在运行的java程序呈现hung的状态,jstack是非常有用的。
2.线程状态
-
想要通过jstack命令来分析线程的情况的话,首先要知道在线程栈中,线程都会哪些状态,下面这些状态是我们使用jstack命令查看线程堆栈信息时可能会看到的线程的几种状态:
- NEW:未获得执行的线程,该状态不会出现在Dump中
- RUNNABLE:获取时间片,正在执行的线程
- BLOCKED:受阻塞的线程,通常在等待锁的获取时会处于BLOCKED状态
- WATING:无限期等待另一个线程执行特定操作,WATING时将会释放当前锁资源
- TIMED_WATING:有时限的等待另一个线程的特定操作,WATING时将会释放当前锁资源
- TERMINATED:已退出的线程
-
Monitor
在多线程的 JAVA程序中,实现线程之间的同步,就要说说 Monitor。 Monitor是 Java中用以实现线程之间的互斥与协作的主要手段,它可以看成是对象或者 Class的锁。每一个对象都有,也仅有一个 monitor。下面这个图,描述了线程和 Monitor之间关系,以 及线程的状态转换图:
- 进入区(Entrt Set):表示线程通过synchronized要求获取对象的锁。如果对象未被锁住,则迚入拥有者;否则则在进入区等待。一旦对象锁被其他线程释放,立即参与竞争。
- 拥有者(The Owner):表示某一线程成功竞争到对象锁。
- 等待区(Wait Set):表示线程通过对象的wait方法,释放对象的锁,并在等待区等待被唤醒。
从图中可以看出,一个 Monitor在某个时刻,只能被一个线程拥有,该线程就是
Active Thread
,而其它线程都是Waiting Thread
,分别在两个队列Entry Set
和Wait Set
里面等候。在Entry Set
中等待的线程状态是Waiting for monitor entry
,而在Wait Set
中等待的线程状态是Object.wait()
。 先看Entry Set
里面的线程。我们称被 synchronized保护起来的代码段为临界区。当一个线程申请进入临界区时,它就进入了Entry Set
队列。对应的 code就像:synchronized(obj) { ......... }
-
调用栈修饰信息
表示线程在方法调用时,额外的重要的操作。线程Dump分析的重要信息。修饰上方的方法调用,从中我们可以通过观察调用栈的修饰信息去获取更多线程在内存中的相关情况
- locked <地址> 目标:申请对象锁成功,拥有对象锁
- waiting to lock <地址> 目标:申请对象锁未成功,等待获取锁的信息
- waiting on <地址> 目标:申请对象锁成功后,释放锁幵在等待区等待
- parking to wait for <地址> 目标 : 线程进入等待状态,等待另一个条件发生来将自己唤醒
3.线程栈说明
-
实例一:waiting for monitor entry
"nioEventLoopGroup-1-87" daemon prio=10 tid=0x00007f44dc002800 nid=0x25c3 waiting for monitor entry [0x00007f44846f1000] java.lang.Thread.State: BLOCKED (on object monitor) at sun.security.util.MemoryCache.get(Cache.java:378) - waiting to lock <0x00000005ab7b2db0> (a sun.security.util.MemoryCache) at sun.security.ssl.SSLSessionContextImpl.getSession(SSLSessionContextImpl.java:73) at sun.security.ssl.SSLSessionContextImpl.get(SSLSessionContextImpl.java:146) at sun.security.ssl.ServerHandshaker.clientHello(ServerHandshaker.java:458) at sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:167) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:878) at sun.security.ssl.Handshaker$1.run(Handshaker.java:818) at sun.security.ssl.Handshaker$1.run(Handshaker.java:816) at java.security.AccessController.doPrivileged(Native Method) at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1237) - locked <0x0000000654384260> (a sun.security.ssl.SSLEngineImpl) at io.netty.handler.ssl.SslHandler.runDelegatedTasks(SslHandler.java:1101) at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:960) at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:891) 。。。
-
waiting for monitor entry
意味着它 在等待进入一个临界区 ,所以它在”Entry Set“队列中等待 - 当线程栈信息中出现
locked <0x0000000654384260> (a sun.security.ssl.SSLEngineImpl)
,表示获取了当前的对象锁,进入了零界资源区 - 在
waiting to lock <0x00000005ab7b2db0>
中表示该线程在等候0x00000005ab7b2db0
的锁资源被释放,获取该锁,为此,线程进入了Blocked状态
-
-
实例二:Waiting on condition
"http-nio-8083-exec-90" daemon prio=6 tid=0x0000000015a73800 nid=0x18dbc waiting on condition [0x000000001fe7f000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000007f8df2970> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:104) at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:32) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:745)
-
WAITING (parking)说明它在等待另一个条件的发生,来把自己唤醒,例如调用了
sleep(N)
方法 - 由此可以看出,当前线程正处于
WAITING
状态,线程被挂起,结合parking to wait for
来看,本线程在等待某个条件的发生,来把自己唤醒。java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
从这段代码可以知道,当前tomcat线程处于空闲状态,正在等待一个请求的接入
"pool-1-thread-1" prio=10 tid=0x00007f45d484b000 nid=0x254c waiting on condition [0x00007f4608235000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000005aaac48b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
- TIMED_WAITING (parking)指当前线程处于等待状态,但这里指定了等待的时间,到达指定的时间后自动退出等待状态 ,其他同上
-
WAITING (parking)说明它在等待另一个条件的发生,来把自己唤醒,例如调用了
-
**实例三:in Obejct.wait() **
"Thread-7" daemon prio=10 tid=0x00007f45d4cec000 nid=0x254d in Object.wait() [0x00007f4608129000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000005aaac87d0> (a io.netty.channel.AbstractChannel$CloseFuture) at java.lang.Object.wait(Object.java:503) at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254) - locked <0x00000005aaac87d0> (a io.netty.channel.AbstractChannel$CloseFuture) at io.netty.channel.DefaultChannelPromise.await(DefaultChannelPromise.java:129) at io.netty.channel.DefaultChannelPromise.await(DefaultChannelPromise.java:28) at io.netty.util.concurrent.DefaultPromise.sync(DefaultPromise.java:218) at io.netty.channel.DefaultChannelPromise.sync(DefaultChannelPromise.java:117) at io.netty.channel.DefaultChannelPromise.sync(DefaultChannelPromise.java:28) at com.test.netty.server.Server.bind(Server.java:92) at com.test.netty.server.Server.run(Server.java:64) at java.lang.Thread.run(Thread.java:745)
- 表明当前先线程调用了
java.lang.Object.wait
方法,线程处于等待的状态,等候唤醒
- 表明当前先线程调用了
4.实例分析
-
DBCP连接池不够用导致的线程等待
"http-nio-8083-exec-98" daemon prio=6 tid=0x0000000012899000 nid=0x186bc in Object.wait() [0x000000002157d000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000007e1a242e8> (a org.apache.commons.pool.impl.GenericObjectPool$Latch) at java.lang.Object.wait(Object.java:503) at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1104) - locked <0x00000007e1a242e8> (a org.apache.commons.pool.impl.GenericObjectPool$Latch) at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106) at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044) at org.springframework.orm.hibernate3.LocalDataSourceConnectionProvider.getConnection(LocalDataSourceConnectionProvider.java:82) at org.hibernate.jdbc.ConnectionManager.openConnection(ConnectionManager.java:446) at org.hibernate.jdbc.ConnectionManager.getConnection(ConnectionManager.java:167) at org.hibernate.jdbc.JDBCContext.connection(JDBCContext.java:160) at org.hibernate.transaction.JDBCTransaction.begin(JDBCTransaction.java:81) at org.hibernate.impl.SessionImpl.beginTransaction(SessionImpl.java:1473) at org.springframework.orm.hibernate3.HibernateTransactionManager.doBegin(HibernateTransactionManager.java:558) at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:374) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:125) at com.test.persistence.TransactionTemplateEx.executeWithinNewTrans(Unknown Source) at com.test.sys.webcontroller.OperateLogFilter.doFilter(OperateLogFilter.java:81) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at com.test.sys.webcontroller.SafetyFilter.doFilter(SafetyFilter.java:64) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at com.test.web.filter.HttpCacheOptionsFilter.doFilter(Unknown Source) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at com.test.web.filter.CharacterEncodingFilter.doFilterInternal(Unknown Source) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at com.test.web.filter.LogExceptionFilter.doFilter(Unknown Source) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.springframework.web.filter.AbstractRequestLoggingFilter.doFilterInternal(AbstractRequestLoggingFilter.java:156) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at com.test.web.filter.WebContextFilter.doFilter(Unknown Source) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:218) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:110) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:506) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:169) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:962) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:452) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1087) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:637) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1760) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1719) - locked <0x00000007d560e0a8> (a org.apache.tomcat.util.net.NioChannel) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:745)
-
项目在多个并发请求过程中出现了大量的等待,打开页面变得缓慢,为此,通过jstack命令,当出现页面等待的情况问题时候,在服务器上打印出当前的线程栈信息,通过分析发现,线程中出现了大量WAITING线程,为此发现,这批大量的WAITING中都有大致相同的栈信息
通过翻看源代码发现,当前线程在获取连接池的时候,由于连接池连接数量不够,导致资源等待,修改DBCP连接池的连接数配置后,页面响应问题得以解决
-
-
socketRead0 挂起线程导致大量线程堆积问题
"pool-2-thread-40" prio=10 tid=0x00007f12b8049000 nid=0x62e2 runnable [0x00007f11207f2000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked <0x0000000599d46fd8> (a java.io.BufferedInputStream) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) - locked <0x0000000599d47088> (a sun.net.www.protocol.http.HttpURLConnection) at com.test..thread.test.testController.sendMessage(testController.java:408) at com.test..thread.test.testController.progress(TestController.java:267) at com.test..thread.test.testController.runService(TestController.java:107) at com.test..thread.test.testThread.run(TestThread.java:29) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
- 服务端程序是一个调用第三方功能对外提供服务的接口服务端,使用
HttpClient
作为调用第三方服务的工具,在平时测试环境下并未发现问题,但是当出现大量的并发请求的环境下,系统后台后台调用第三方服务的线程出现了大量的堆积,打印线程日志,都处于RUNNABLE
状态,并没有太大的出入,但是经过多次的打印线程日志发现,线程处于该状态的时间过长,由堆栈信息可以看出大量的线程阻塞在socketRead0
,推断可能是由于网络延迟,服务连接后没有响应造成java.net.SocketInputStream.socketRead0
一直阻塞等待读取的状态。当客户端与远程服务连接出现长时间未连接成功或读写出现长时间阻塞时,就应该停止连接或读写,最终,查看源代码,发现只设置了线程的请求连接的等待时间,并未设置请求响应等待时间导致,最终设置后问题得以解决。
- 服务端程序是一个调用第三方功能对外提供服务的接口服务端,使用