jstack 命令使用经验总结
阅读原文时间:2023年07月09日阅读:2

jstack 命令的基本使用

jstack 在命令使用上十分简洁, 其信息量与复杂度主要体如今 thread dump 内容的分析上;
web

# 最基本的使用sudo -u xxx jstack {vmid}# 从 core dump 中提取 thread dumpsudo -u xxx jstack core_file_path# 除了基本输出外, 额外展现 AbstractOwnableSynchronizer 锁的占有信息# 可能会消耗较长时间sudo -u xxx jstack -l {vmid}

jstack 输出内容结构分析

首先展现几段 thread dump 的典型例子:
正在 RUNNING 中的线程:面试

"elasticsearch[datanode-39][[xxx_index_v4][9]: Lucene Merge Thread #2403]" #45061 daemon prio=5 os_prio=0 tid=0x00007fb968213800 nid=0x249ca runnable [0x00007fb6843c2000]   java.lang.Thread.State: RUNNABLE        ...        at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)

阻塞在 java.util.concurrent.locks.Condition 上:
shell

"DubboServerHandler-10.64.16.66:20779-thread-510"&nbsp;#631&nbsp;daemon&nbsp;prio=5&nbsp;os_prio=0&nbsp;tid=0x00007fb6f4ce5800&nbsp;nid=0x1743&nbsp;waiting&nbsp;on&nbsp;condition&nbsp;[0x00007fb68ed2f000]&nbsp;&nbsp;&nbsp;java.lang.Thread.State:&nbsp;WAITING&nbsp;(parking)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;at&nbsp;sun.misc.Unsafe.park(Native&nbsp;Method)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;-&nbsp;parking&nbsp;to&nbsp;wait&nbsp;for&nbsp;&nbsp;<0x00000000e2978ef0>&nbsp;(a&nbsp;java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;at&nbsp;java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;at&nbsp;java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;...

阻塞在内置锁上:
apache

"qtp302870502-26-acceptor-0@45ff00a-ServerConnector@63475ace{HTTP/1.1}{0.0.0.0:9088}"&nbsp;#26&nbsp;prio=5&nbsp;os_prio=0&nbsp;tid=0x00007f1830d3a800&nbsp;nid=0xdf64&nbsp;waiting&nbsp;for&nbsp;monitor&nbsp;entry&nbsp;[0x00007f16b5ef9000]&nbsp;&nbsp;&nbsp;java.lang.Thread.State:&nbsp;BLOCKED&nbsp;(on&nbsp;object&nbsp;monitor)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;at&nbsp;sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:234)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;-&nbsp;waiting&nbsp;to&nbsp;lock&nbsp;<0x00000000c07549f8>&nbsp;(a&nbsp;java.lang.Object)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;at&nbsp;org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:377)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;...&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;at&nbsp;java.lang.Thread.run(Thread.java:745)

"JFR&nbsp;request&nbsp;timer"&nbsp;#6&nbsp;daemon&nbsp;prio=5&nbsp;os_prio=0&nbsp;tid=0x00007fc2f6b1f800&nbsp;nid=0x18070&nbsp;in&nbsp;Object.wait()&nbsp;[0x00007fb9aa96b000]&nbsp;&nbsp;&nbsp;java.lang.Thread.State:&nbsp;WAITING&nbsp;(on&nbsp;object&nbsp;monitor)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;at&nbsp;java.lang.Object.wait(Native&nbsp;Method)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;-&nbsp;waiting&nbsp;on&nbsp;<0x00007fba6b50ea38>&nbsp;(a&nbsp;java.util.TaskQueue)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;at&nbsp;java.lang.Object.wait(Object.java:502)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;at&nbsp;java.util.TimerThread.mainLoop(Timer.java:526)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;-&nbsp;locked&nbsp;<0x00007fba6b50ea38>&nbsp;(a&nbsp;java.util.TaskQueue)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;at&nbsp;java.util.TimerThread.run(Timer.java:505)

以上展现了四个线程的 jstack dump, 有 running 状态, 也有阻塞状态, 覆盖面广, 具备典型性; 下面来对 jstack 的输出内容做详细梳理;api

输出内容的结构浏览器

首先仍是要说一下 jstack 输出的内容结构, 就以上方举的第四个线程为例:
如下是第一部份内容, 记录了线程的一些基本信息, 从左到右每一个元素的含义已经以注释标注在元素上方; 其中比较重要的是 nid, 它是 java 线程与操做系统的映射, 在 linux 中它和与其对应的轻量级进程 pid 相同 (须要十六进制与十进制转换), 这将为基于 java 线程的性能诊断带来帮助, 详细请见本文后面段落 #线程性能诊断的辅助脚本;
微信

//|-----线程名------|&nbsp;|-线程建立次序-|&nbsp;|是否守护进程|&nbsp;|---线程优先级---|&nbsp;|-------线程&nbsp;id-------|&nbsp;|-所映射的linux轻量级进程id-|&nbsp;|-------------线程动做--------------|&nbsp;&nbsp;"JFR&nbsp;request&nbsp;timer"&nbsp;#6&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;daemon&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;prio=5&nbsp;os_prio=0&nbsp;&nbsp;tid=0x00007fc2f6b1f800&nbsp;nid=0x18070&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in&nbsp;Object.wait()&nbsp;[0x00007fb9aa96b000]

如下是第二部份内容, 表示线程当前的状态;

java.lang.Thread.State:&nbsp;WAITING&nbsp;(on&nbsp;object&nbsp;monitor)

如下是第三部份内容, 主要记录了线程的调用栈; 其中比较重要的是一些关键调用上的 #动做修饰, 这些为线程死锁问题的排查提供了依据;

at&nbsp;java.lang.Object.wait(Native&nbsp;Method)-&nbsp;waiting&nbsp;on&nbsp;<0x00007fba6b50ea38>&nbsp;(a&nbsp;java.util.TaskQueue)at&nbsp;java.lang.Object.wait(Object.java:502)at&nbsp;java.util.TimerThread.mainLoop(Timer.java:526)-&nbsp;locked&nbsp;<0x00007fba6b50ea38>&nbsp;(a&nbsp;java.util.TaskQueue)at&nbsp;java.util.TimerThread.run(Timer.java:505)

线程的动做

线程动做的记录在每一个 thread dump 的第一行末尾, 通常状况下可分为以下几类:

  1. runnable, 表示线程在参与 cpu 资源的竞争, 可能在被调度运行也可能在就绪等待;

  2. sleeping, 表示调用了 Thread.sleep(), 线程进入休眠;

  3. waiting for monitor entry [0x...], 表示线程在试图获取内置锁, 进入了等待区 Entry Set, 方括号内的地址表示线程等待的资源地址;

  4. in Object.wait() [0x...], 表示线程调用了 object.wait(), 放弃了内置锁, 进入了等待区 Wait Set, 等待被唤醒, 方括号内的地址表示线程放弃的资源地址;

  5. waiting on condition [0x...], 表示线程被阻塞原语所阻塞, 方括号内的地址表示线程等待的资源地址; 这种和 jvm 的内置锁体系没有关系, 它是 jdk5 以后的 java.util.concurrent 包下的锁机制;

线程的状态

线程的状态记录在每一个 thread dump 的第二行, 并以 java.lang.Thread.State 开头, 通常状况下可分为以下几类:

  1. RUNNABLE, 这种通常与线程动做 runnable 一块儿出现;

  2. BLOCKED (on object monitor), 这种通常与线程动做 waiting for monitor entry 一块儿出现, 不过在其线程调用栈最末端并无一个固定的方法, 由于 synchronized 关键字能够修饰各类方法或者同步块;

  3. WAITING (on object monitor) 或者 TIMED_WAITING (on object monitor), 这种通常与线程动做 in Object.wait() [0x...] 一块儿出现, 而且线程调用栈的最末端调用方法为 at java.lang.Object.wait(Native Method), 以表示 object.wait() 方法的调用;
    另外, WAITING 与 TIMED_WAITING 的区别在因而否设置了超时中断, 即 wait(long timeout) 与 wait() 的区别;

  4. WAITING (parking) 或者 TIMED_WAITING (parking), 这种通常与线程动做 waiting on condition [0x...] 一块儿出现, 而且线程调用栈的最末端调用方法通常为 at sun.misc.Unsafe.park(Native Method);
    Unsafe.park 使用的是线程阻塞原语, 主要在 java.util.concurrent.locks.AbstractQueuedSynchronizer 类中被使用到, 不少基于 AQS 构建的同步工具, 如 ReentrantLock, Condition, CountDownLatch, Semaphore 等都会诱发线程进入该状态;
    另外, WAITING 与 TIMED_WAITING 的区别与第三点中提到的缘由一致;

线程的重要调用修饰

thread dump 的第三部分线程调用栈中, 通常会把与锁相关的资源使用状态以附加的形式做重点修饰, 这与线程的动做及状态有着密切的联系, 通常状况下可分为以下几类:

  1. locked <0x...>, 表示其成功获取了内置锁, 成为了 owner;

  2. parking to wait for <0x...>, 表示其被阻塞原语所阻塞, 一般与线程动做 waiting on condition 一块儿出现;

  3. waiting to lock <0x...>, 表示其在 Entry Set 中等待某个内置锁, 一般与线程动做 waiting for monitor entry 一块儿出现;

  4. waiting on <0x...>, 表示其在 Wait Set 中等待被唤醒, 一般与线程动做 in Object.wait() [0x...] 一块儿出现;
    另外, waiting on 调用修饰每每与 locked 调用修饰一同出现, 如以前列举的第四个 thread dump:

    at java.lang.Object.wait(Native Method)  - waiting on <0x00007fba6b50ea38> (a java.util.TaskQueue)  at java.lang.Object.wait(Object.java:502)  at java.util.TimerThread.mainLoop(Timer.java:526)  - locked <0x00007fba6b50ea38> (a java.util.TaskQueue)  at java.util.TimerThread.run(Timer.java:505)

这是由于该线程以前得到过该内置锁, 如今由于 object.wait() 又将其放弃了, 因此在调用栈中会出现前后两个调用修饰;

死锁检测的展现

在 jdk5 以前, Doug Lea 大神尚未发布 java.util.concurrent 包, 这个时候说起的锁, 就仅限于 jvm 监视器内置锁; 此时若是进程内有死锁发生, jstack 将会把死锁检测信息打印出来:

Found&nbsp;one&nbsp;Java-level&nbsp;deadlock:============================="Thread-xxx":&nbsp;&nbsp;waiting&nbsp;to&nbsp;lock&nbsp;monitor&nbsp;0x00007f0134003ae8&nbsp;(object&nbsp;0x00000007d6aa2c98,&nbsp;a&nbsp;java.lang.Object),&nbsp;&nbsp;which&nbsp;is&nbsp;held&nbsp;by&nbsp;"Thread-yyy""Thread-yyy":&nbsp;&nbsp;waiting&nbsp;to&nbsp;lock&nbsp;monitor&nbsp;0x00007f0134006168&nbsp;(object&nbsp;0x00000007d6aa2ca8,&nbsp;a&nbsp;java.lang.Object),&nbsp;&nbsp;which&nbsp;is&nbsp;held&nbsp;by&nbsp;"Thread-xxx"Java&nbsp;stack&nbsp;information&nbsp;for&nbsp;the&nbsp;threads&nbsp;listed&nbsp;above:==================================================="Thread-xxx":&nbsp;&nbsp;&nbsp;&nbsp;..."Thread-yyy":&nbsp;&nbsp;&nbsp;&nbsp;...Found&nbsp;1&nbsp;deadlock.

然然后来 Doug Lea 发布了 java.util.concurrent 包, 当谈及 java 的锁, 除了内置锁以外还有了基于 AbstractOwnableSynchronizer 的各类形式; 因为是新事物, 彼时 jdk5 的 jstack 没有及时提供对以 AQS 构建的同步工具的死锁检测功能, 直到 jdk6 才完善了相关支持;

首先, 不论是什么类型的 java 应用, 有一些一般都会存在的线程:

VM Thread 与 VM Periodic Task Thread
虚拟机线程, 属于 native thread, 凌驾于其余用户线程之上;
VM Periodic Task Thread 一般用于虚拟机做 sampling/profiling, 收集系统运行信息, 为 JIT 优化做决策依据;

C1 / C2 CompilerThread
虚拟机的 JIT 及时编译器线程:

"C1&nbsp;CompilerThread2"&nbsp;#10&nbsp;daemon&nbsp;prio=9&nbsp;os_prio=0&nbsp;tid=0x00007feb34114000&nbsp;nid=0x18b2&nbsp;waiting&nbsp;on&nbsp;condition&nbsp;[0x0000000000000000]&nbsp;&nbsp;&nbsp;java.lang.Thread.State:&nbsp;RUNNABLE"C2&nbsp;CompilerThread1"&nbsp;#9&nbsp;daemon&nbsp;prio=9&nbsp;os_prio=0&nbsp;tid=0x00007feb34112000&nbsp;nid=0x18b1&nbsp;waiting&nbsp;on&nbsp;condition&nbsp;[0x0000000000000000]&nbsp;&nbsp;&nbsp;java.lang.Thread.State:&nbsp;RUNNABLE"C2&nbsp;CompilerThread0"&nbsp;#8&nbsp;daemon&nbsp;prio=9&nbsp;os_prio=0&nbsp;tid=0x00007feb3410f800&nbsp;nid=0x18b0&nbsp;waiting&nbsp;on&nbsp;condition&nbsp;[0x0000000000000000]&nbsp;&nbsp;&nbsp;java.lang.Thread.State:&nbsp;RUNNABLE

Reference Handler 线程与 Finalizer 线程这两个线程用于虚拟机处理 override 了 Object.finalize() 方法的实例, 对实例回收前做最后的判决;

Reference Handler 线程用于将目标对象放入 reference queue:

"Reference&nbsp;Handler"&nbsp;#2&nbsp;daemon&nbsp;prio=10&nbsp;os_prio=0&nbsp;tid=0x00007f91e007f000&nbsp;nid=0xa80&nbsp;in&nbsp;Object.wait()&nbsp;[0x...]&nbsp;&nbsp;&nbsp;java.lang.Thread.State:&nbsp;WAITING&nbsp;(on&nbsp;object&nbsp;monitor)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;at&nbsp;java.lang.Object.wait(Native&nbsp;Method)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;at&nbsp;java.lang.Object.wait(Object.java:502)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;at&nbsp;java.lang.ref.Reference$ReferenceHandler.run(Reference.java:157)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;-&nbsp;locked&nbsp;<0x00000000c0495140>&nbsp;(a&nbsp;java.lang.ref.Reference$Lock)

Finalizer 线程用于从 reference queue 中取出对象以执行其 finalize 方法:

"Finalizer"&nbsp;#3&nbsp;daemon&nbsp;prio=8&nbsp;os_prio=0&nbsp;tid=0x00007f91e0081000&nbsp;nid=0xa81&nbsp;in&nbsp;Object.wait()&nbsp;[0x...]&nbsp;&nbsp;&nbsp;java.lang.Thread.State:&nbsp;WAITING&nbsp;(on&nbsp;object&nbsp;monitor)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;at&nbsp;java.lang.Object.wait(Native&nbsp;Method)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;at&nbsp;java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;-&nbsp;locked&nbsp;<0x00000000c008db88>&nbsp;(a&nbsp;java.lang.ref.ReferenceQueue$Lock)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;at&nbsp;java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;at&nbsp;java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

GC 线程
这块对于不一样的 gc 收集器选型有各自不一样的线程状态 (线程数视 cpu 核心数而定);

#&nbsp;Parallel&nbsp;Scavenge"GC&nbsp;task&nbsp;thread#0&nbsp;(ParallelGC)"&nbsp;os_prio=0&nbsp;tid=0x00007f91e0021000&nbsp;nid=0xa7a&nbsp;runnable&nbsp;"GC&nbsp;task&nbsp;thread#1&nbsp;(ParallelGC)"&nbsp;os_prio=0&nbsp;tid=0x00007f91e0023000&nbsp;nid=0xa7b&nbsp;runnable

#&nbsp;ParNew"Gang&nbsp;worker#0&nbsp;(Parallel&nbsp;GC&nbsp;Threads)"&nbsp;os_prio=0&nbsp;tid=0x00007feb3401e800&nbsp;nid=0x18a4&nbsp;runnable&nbsp;"Gang&nbsp;worker#1&nbsp;(Parallel&nbsp;GC&nbsp;Threads)"&nbsp;os_prio=0&nbsp;tid=0x00007feb34020000&nbsp;nid=0x18a5&nbsp;runnable

#&nbsp;CMS"Concurrent&nbsp;Mark-Sweep&nbsp;GC&nbsp;Thread"&nbsp;os_prio=0&nbsp;tid=0x00007feb34066800&nbsp;nid=0x18a8&nbsp;runnable

#&nbsp;G1"G1&nbsp;Main&nbsp;Concurrent&nbsp;Mark&nbsp;GC&nbsp;Thread"&nbsp;os_prio=0&nbsp;tid=0x00007fc2f4091800&nbsp;nid=0x1805e&nbsp;runnable"Gang&nbsp;worker#0&nbsp;(G1&nbsp;Parallel&nbsp;Marking&nbsp;Threads)"&nbsp;os_prio=0&nbsp;tid=0x00007fc2f4093800&nbsp;nid=0x1805f&nbsp;runnable&nbsp;"Gang&nbsp;worker#1&nbsp;(G1&nbsp;Parallel&nbsp;Marking&nbsp;Threads)"&nbsp;os_prio=0&nbsp;tid=0x00007fc2f4095800&nbsp;nid=0x18060&nbsp;runnable"G1&nbsp;Concurrent&nbsp;Refinement&nbsp;Thread#0"&nbsp;os_prio=0&nbsp;tid=0x00007fc2f4079000&nbsp;nid=0x1805d&nbsp;runnable&nbsp;"G1&nbsp;Concurrent&nbsp;Refinement&nbsp;Thread#1"&nbsp;os_prio=0&nbsp;tid=0x00007fc2f4077000&nbsp;nid=0x1805c&nbsp;runnable

以上即是 java 进程里一般都会存在的线程;

使用代码做 thread dump

除了使用 jstack 以外, 还有其余一些方法能够对 java 进程做 thread dump, 若是将其封装为 http 接口, 即可以不用登录主机, 直接在浏览器上查询 thread dump 的状况; 
使用 jmx 的 api

public&nbsp;void&nbsp;&nbsp;threadDump()&nbsp;{&nbsp;&nbsp;&nbsp;ThreadMXBean&nbsp;threadMxBean&nbsp;=&nbsp;ManagementFactory.getThreadMXBean();&nbsp;&nbsp;&nbsp;for&nbsp;(ThreadInfo&nbsp;threadInfo&nbsp;:&nbsp;threadMxBean.dumpAllThreads(true,&nbsp;true))&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//&nbsp;deal&nbsp;with&nbsp;threadInfo.toString()&nbsp;&nbsp;&nbsp;}}

使用 Thread.getAllStackTraces() 方法

public&nbsp;void&nbsp;threadDump()&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;for&nbsp;(Map.Entry<Thread,&nbsp;StackTraceElement[]>&nbsp;stackTrace&nbsp;:&nbsp;Thread.getAllStackTraces().entrySet())&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Thread&nbsp;thread&nbsp;=&nbsp;(Thread)&nbsp;stackTrace.getKey();&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;StackTraceElement[]&nbsp;stack&nbsp;=&nbsp;(StackTraceElement[])&nbsp;stackTrace.getValue();&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if&nbsp;(thread.equals(Thread.currentThread()))&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;continue;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//&nbsp;deal&nbsp;with&nbsp;thread&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;for&nbsp;(StackTraceElement&nbsp;stackTraceElement&nbsp;:&nbsp;stack)&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//&nbsp;deal&nbsp;with&nbsp;stackTraceElement&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;}}

线程性能诊断的辅助脚本

使用 jstack 还有一个重要的功能就是分析热点线程: 找出占用 cpu 资源最高的线程;
首先我先介绍一下手工敲命令分析的方法:

  • 使用 top 命令找出 cpu 使用率高的 thread id:

    # -p pid: 只显示指定进程的信息# -H: 展现线程的详细信息top -H -p {pid}# 使用 P 快捷键按 cpu 使用率排序, 并记录排序靠前的若干 pid (轻量级进程 id)

  • 做进制转换:

         #&nbsp;将记录下的十进制&nbsp;pid&nbsp;转为十六进制 thread_id_0x=`printf&nbsp;"%x"&nbsp;$thread_id` `echo&nbsp;"obase=16;&nbsp;$thread_id"&nbsp;|&nbsp;bc`

  • 因为 thread dump 中记录的每一个线程的 nid 是与 linux 轻量级进程 pid 一一对应的 (只是十进制与十六进制的区别), 因此即可以拿转换获得的十六进制 thread_id_0x, 去 thread dump 中搜索对应的 nid, 定位问题线程;

下面介绍一个脚本, 其功能是: 按照 cpu 使用率从高到低排序, 打印指定 jvm 进程的前 n 个线程;

#!/bin/shdefault_lines=10top_head_info_padding_lines=8default_stack_lines=15jvm_pid=$1jvm_user=$2((thread_stack_lines=${3:-$default_lines}+top_head_info_padding_lines))threads_top_capture=$(top&nbsp;-b&nbsp;-n1&nbsp;-H&nbsp;-p&nbsp;$jvm_pid&nbsp;|&nbsp;grep&nbsp;$jvm_user&nbsp;|&nbsp;head&nbsp;-n&nbsp;$thread_stack_lines)jstack_output=$(echo&nbsp;"$(sudo&nbsp;-i&nbsp;-u&nbsp;$jvm_user&nbsp;jstack&nbsp;$jvm_pid)")top_output=$(echo&nbsp;"$(echo&nbsp;"$threads_top_capture"&nbsp;|&nbsp;perl&nbsp;-pe&nbsp;'s/\e\[?.*?[\@-~]&nbsp;?//g'&nbsp;|&nbsp;awk&nbsp;'{gsub(/^&nbsp;+/,"");print}'&nbsp;|&nbsp;awk&nbsp;'{gsub(/&nbsp;+|[+-]/,"&nbsp;");print}'&nbsp;|&nbsp;cut&nbsp;-d&nbsp;"&nbsp;"&nbsp;-f&nbsp;1,9&nbsp;)\n&nbsp;")echo&nbsp;"***********************************************************"uptimeecho&nbsp;"Analyzing&nbsp;top&nbsp;$top_threads&nbsp;threads"echo&nbsp;"***********************************************************"printf&nbsp;%s&nbsp;"$top_output"&nbsp;|&nbsp;while&nbsp;IFS=&nbsp;read&nbsp;linedo&nbsp;&nbsp;&nbsp;&nbsp;pid=$(echo&nbsp;$line&nbsp;|&nbsp;cut&nbsp;-d&nbsp;"&nbsp;"&nbsp;-f&nbsp;1)&nbsp;&nbsp;&nbsp;&nbsp;hexapid=$(printf&nbsp;"%x"&nbsp;$pid)&nbsp;&nbsp;&nbsp;&nbsp;cpu=$(echo&nbsp;$line&nbsp;|&nbsp;cut&nbsp;-d&nbsp;"&nbsp;"&nbsp;-f&nbsp;2)&nbsp;&nbsp;&nbsp;&nbsp;echo&nbsp;-n&nbsp;$cpu&nbsp;"%&nbsp;[$pid]&nbsp;"&nbsp;&nbsp;&nbsp;&nbsp;echo&nbsp;"$jstack_output"&nbsp;|&nbsp;grep&nbsp;"tid.*0x$hexapid&nbsp;"&nbsp;-A&nbsp;$default_stack_lines&nbsp;|&nbsp;sed&nbsp;-n&nbsp;-e&nbsp;'/0x'$hexapid'/,/tid/&nbsp;p'&nbsp;|&nbsp;head&nbsp;-n&nbsp;-1done

该脚本有多种版本, 在我司的每台主机上的指定路径下都存放了一个副本; 出于保密协议, 该脚本源码不便于公开, 上方所展现的版本是基于美团点评的技术专家王锐老师在一次 问答分享 中给出的代码所改造的;

原文地址:https://my.oschina.net/u/3825800/blog/4402756

手机扫一扫

移动阅读更方便

阿里云服务器
腾讯云服务器
七牛云服务器