字节面试:CPU飙升100%如何排查

作者:微信小助手

发布时间:2024-12-25T22:15:16


CPU 飙升是一个常见的问题。

在生产环境中,会出现由代码问题导致CPU占用很高,该如何诊断出是哪行java代码导致? 是大家的一项重要基本功,也是大家面试中的家常骗饭。

如果连CPU 飙升的问题都回答不清楚, 都支支吾吾, 面试就很难通过了。

CPU100%定位的两大神器

想要定位到具体是哪一行的代码导致, 一般都会使用下面的两大神器

  • 通常使用的jvm自带的工具jstack,
  • 还有一种就是开源神器arthas,

一般而言,arthas还有其它的功能,所以选择它多一点.


CPU 飙升100%的解决思路与方法论


使用jstack 解决CPU 100%问题实操

使用jstack 解决CPU 100%问题,在方法论上要用到两个命令,

  • top 命令查看TOP N线程,
  • jstack命令查看堆栈信息

1.jstack命令讲解

命令jstack是java堆栈的跟踪工具,可以打印出程序中所有线程的堆栈信息,包括线程状态,调用栈信息,锁信息等。

jstack可以诊断线程死锁、内存泄漏等问题

命令格式: jstack [options] pid

常用例子: jstack -l pid,查看线程的堆栈信息

堆栈信息解读:

[root@192-168-65-185 ~]# jstack -294402024-12-17 13:48:51Full thread dump OpenJDK 64-Bit Server VM (25.412-b08 mixed mode):
"Attach Listener" #12 daemon prio=9 os_prio=0 tid=0x00007f097c001000 nid=0x763a runnable [0x0000000000000000]   java.lang.Thread.State: RUNNABLE
   Locked ownable synchronizers: - None
"DestroyJavaVM" #11 prio=5 os_prio=0 tid=0x00007f09e004b800 nid=0x7301 waiting on condition [0x0000000000000000]   java.lang.Thread.State: RUNNABLE
   Locked ownable synchronizers: - None
"pool-1-thread-1" #10 prio=5 os_prio=0 tid=0x00007f09e0166000 nid=0x7316 runnable [0x00007f09cc9cb000]   java.lang.Thread.State: RUNNABLE at com.tuling.learnjuc.demo.JVMCPU$CPUTask.run(JVMCPU.java:16) - locked <0x0000000757d645b8> (a java.lang.Object) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)
   Locked ownable synchronizers: - <0x0000000757d65d00> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"Service Thread" #9 daemon prio=9 os_prio=0 tid=0x00007f09e0140000 nid=0x7314 runnable [0x0000000000000000]   java.lang.Thread.State: RUNNABLE
   Locked ownable synchronizers: - None
"C1 CompilerThread3" #8 daemon prio=9 os_prio=0 tid=0x00007f09e0133000 nid=0x7313 waiting on condition [0x0000000000000000]   java.lang.Thread.State: RUNNABLE
   Locked ownable synchronizers: - None
"C2 CompilerThread2" #7 daemon prio=9 os_prio=0 tid=0x00007f09e0131000 nid=0x7312 waiting on condition [0x0000000000000000]   java.lang.Thread.State: RUNNABLE
   Locked ownable synchronizers: - None
"C2 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007f09e012f000 nid=0x7311 waiting on condition [0x0000000000000000]   java.lang.Thread.State: RUNNABLE
   Locked ownable synchronizers: - None
"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007f09e012c800 nid=0x7310 waiting on condition [0x0000000000000000]   java.lang.Thread.State: RUNNABLE
   Locked ownable synchronizers: - None
"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007f09e011e000 nid=0x730f runnable [0x0000000000000000]   java.lang.Thread.State: RUNNABLE
   Locked ownable synchronizers: - None
"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f09e00f2000 nid=0x730e in Object.wait() [0x00007f09cd1d3000]   java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x0000000757d08f00> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144) - locked <0x0000000757d08f00> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:188)
   Locked ownable synchronizers: - None
"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007f09e00ed000 nid=0x730d in Object.wait() [0x00007f09cd2d4000]   java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x0000000757d06b98> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:502) at java.lang.ref.Reference.tryHandlePending(Reference.java:191) - locked <0x0000000757d06b98> (a java.lang.ref.Reference$Lock) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
   Locked ownable synchronizers: - None
"VM Thread" os_prio=0 tid=0x00007f09e00e3800 nid=0x730c runnable 
"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007f09e005e000 nid=0x7302 runnable 
"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007f09e0060000 nid=0x7303 runnable 
"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007f09e0062000 nid=0x7304 runnable 
"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007f09e0064000 nid=0x7305 runnable 
"GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00007f09e0065800 nid=0x7306 runnable 
"GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00007f09e0067800 nid=0x7307 runnable 
"GC task thread#6 (ParallelGC)" os_prio=0 tid=0x00007f09e0069800 nid=0x7308 runnable 
"GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00007f09e006b800 nid=0x7309 runnable 
"GC task thread#8 (ParallelGC)" os_prio=0 tid=0x00007f09e006d000 nid=0x730a runnable 
"GC task thread#9 (ParallelGC)" os_prio=0 tid=0x00007f09e006f000 nid=0x730b runnable 
"VM Periodic Task Thread" os_prio=0 tid=0x00007f09e0142800 nid=0x7315 waiting on condition 
JNI global references5

你会发现上面的信息其实是一段一段的,摘取其中的一段为大家说明:

"pool-1-thread-1" #10 prio=5 os_prio=0 tid=0x00007f09e0166000 nid=0x7316 runnable [0x00007f09cc9cb000]   java.lang.Thread.StateRUNNABLE	at com.tuling.learnjuc.demo.JVMCPU$CPUTask.run(JVMCPU.java:16)	- locked <0x0000000757d645b8> (a java.lang.Object)	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)	at java.lang.Thread.run(Thread.java:750)
   Locked ownable synchronizers: - <0x0000000757d65d00> (a java.util.concurrent.ThreadPoolExecutor$Worker)

pool-1-thread-1:线程名称

#1:当前线程ID,从main开始,jvm会根据线程创建的顺序为其线程编号

prio:优先级的顺序,一般默认是5

os_prio:线程对应系统的优先级

tid:java内的线程id

nid:操作系统级别的线程id,是一个十六进制

关于线程的信息:

NEW:线程新建,还没开始运行

RUNNABLE:正在java虚拟机中运行的线程

BLOCKED:被阻塞,正在等待监视器锁的线程

WAITING:无限期等待另一个线程执行特定操作的线程

TIMED_WAITING:等待另一个线程执行操作达到指定等待时间的线程

TERMINATED:已经退出的线程

我们这里关注的最多的就是nid


2.使用jstack解决CPU占用很高的问题并定位具体行数

先来看一段代码:

package com.tuling.learnjuc.demo;
import java.util.concurrent.ExecutorService;import java.util.concurrent.Executors;
public class JVMCPU {    private static ExecutorService service = Executors.newFixedThreadPool(5);    private static Object lock = new Object();    public static class CPUTask implements Runnable{
        @Override        public void run() {            synchronized (lock){                long sum = 0L;                while(true){                    sum +=1;                }            }        }    }    public static void main(String[] args) {
        CPUTask cpuTask = new CPUTask();        service.execute(cpuTask);
    }}

将这段代码上传到linux服务器上编译

注意:如果提示javac: 未找到命令... ,执行yum install java-1.8.0-openjdk-devel

使用nohup java com.tuling.learnjuc.demo.JVMCPU &运行

使用top命令可以看到cpu被打满了

知道了进程的PID,如何找到进程下是哪个线程呢?可以使用命令top -Hp 32208,如下所示

如果能获取到线程名,下面这步可以忽略,不需要根据线程id找线程

从上面的图可以看到,cpu占用最多的线程是32230这个线程id,接下来就是使用jstack命令来查看程序的所有堆栈信息,但是,这里需要有一个注意的点,32230这个是一个十进制的,使用jstack看到的nid是十六进制,所以我们需要转换,可以使用printf "%x\n"这个命令

接下来使用jstack -l 32208打印堆栈信息

[root@192-168-65-185 ~]# jstack -322082024-12-17 14:04:09Full thread dump OpenJDK 64-Bit Server VM (25.412-b08 mixed mode):
"Attach Listener" #12 daemon prio=9 os_prio=0 tid=0x00007fb498001000 nid=0x4e5 waiting on condition [0x0000000000000000]   java.lang.Thread.State: RUNNABLE
   Locked ownable synchronizers: - None
"DestroyJavaVM" #11 prio=5 os_prio=0 tid=0x00007fb4fc04b800 nid=0x7dd1 waiting on condition [0x0000000000000000]   java.lang.Thread.State: RUNNABLE
   Locked ownable synchronizers: - None
"pool-1-thread-1" #10 prio=5 os_prio=0 tid=0x00007fb4fc16e000 nid=0x7de6 runnable [0x00007fb4e87fe000]   java.lang.Thread.State: RUNNABLE at com.tuling.learnjuc.demo.JVMCPU$CPUTask.run(JVMCPU.java:16) - locked <0x0000000757d645b8> (a java.lang.Object) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)
   Locked ownable synchronizers: - <0x0000000757d65d00> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"Service Thread" #9 daemon prio=9 os_prio=0 tid=0x00007fb4fc140000 nid=0x7de4 runnable [0x0000000000000000]   java.lang.Thread.State: RUNNABLE
   Locked ownable synchronizers: - None
"C1 CompilerThread3" #8 daemon prio=9 os_prio=0 tid=0x00007fb4fc133000 nid=0x7de3 waiting on condition [0x0000000000000000]   java.lang.Thread.State: RUNNABLE
   Locked ownable synchronizers: - None
"C2 CompilerThread2" #7 daemon prio=9 os_prio=0 tid=0x00007fb4fc131000 nid=0x7de2 waiting on condition [0x0000000000000000]   java.lang.Thread.State: RUNNABLE
   Locked ownable synchronizers: - None
"C2 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007fb4fc12f000 nid=0x7de1 waiting on condition [0x0000000000000000]   java.lang.Thread.State: RUNNABLE
   Locked ownable synchronizers: - None
"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007fb4fc12c800 nid=0x7de0 waiting on condition [0x0000000000000000]   java.lang.Thread.State: RUNNABLE
   Locked ownable synchronizers: - None
"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007fb4fc11e000 nid=0x7ddf runnable [0x0000000000000000]   java.lang.Thread.State: RUNNABLE
   Locked ownable synchronizers: - None
"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007fb4fc0f2000 nid=0x7dde in Object.wait() [0x00007fb4e92d4000]   java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x0000000757d08f00> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144) - locked <0x0000000757d08f00> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:188)
   Locked ownable synchronizers: - None
"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007fb4fc0ed000 nid=0x7ddd in Object.wait() [0x00007fb4e93d5000]   java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x0000000757d06b98> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:502) at java.lang.ref.Reference.tryHandlePending(Reference.java:191) - locked <0x0000000757d06b98> (a java.lang.ref.Reference$Lock) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
   Locked ownable synchronizers: - None
"VM Thread" os_prio=0 tid=0x00007fb4fc0e3800 nid=0x7ddc runnable 
"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007fb4fc05e000 nid=0x7dd2 runnable 
"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007fb4fc060000 nid=0x7dd3 runnable 
"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007fb4fc062000 nid=0x7dd4 runnable 
"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007fb4fc064000 nid=0x7dd5 runnable 
"GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00007fb4fc065800 nid=0x7dd6 runnable 
"GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00007fb4fc067800 nid=0x7dd7 runnable 
"GC task thread#6 (ParallelGC)" os_prio=0 tid=0x00007fb4fc069800 nid=0x7dd8 runnable 
"GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00007fb4fc06b800 nid=0x7dd9 runnable 
"GC task thread#8 (ParallelGC)" os_prio=0 tid=0x00007fb4fc06d000 nid=0x7dda runnable 
"GC task thread#9 (ParallelGC)" os_prio=0 tid=0x00007fb4fc06f000 nid=0x7ddb runnable 
"VM Periodic Task Thread" os_prio=0 tid=0x00007fb4fc142800 nid=0x7de5 waiting on condition 
JNI global references5

从上面的信息中,可以看到转换的结果和nid是一致的,

从下面的信息中就可以看到问题其实是出现在JVMCPU.java的16行左右,这里给出的是16行,但是实际情况是16行的附近

"pool-1-thread-1" #10 prio=5 os_prio=0 tid=0x00007fb4fc16e000 nid=0x7de6 runnable [0x00007fb4e87fe000]   java.lang.Thread.StateRUNNABLE	at com.tuling.learnjuc.demo.JVMCPU$CPUTask.run(JVMCPU.java:16)	- locked <0x0000000757d645b8> (a java.lang.Object)	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)	at java.lang.Thread.run(Thread.java:750)
   Locked ownable synchronizers: - <0x0000000757d65d00> (a java.util.concurrent.ThreadPoolExecutor$Worker)

结合代码来看一下就很容易问题

使用arthas解决CPU占用很高的问题,定位具体代码行

官方文档:https://arthas.aliyun.com/

使用arthas解决CPU 100%问题,在方法论上要用到两个命令,

  • dashboard 命令查看TOP N线程,
  • thread 命令查看堆栈信息

先来运行arthas

curl -O https://arthas.aliyun.com/arthas-boot.jarjava -jar arthas-boot.jar

输入1显示如下

输入dashboard命令可以看到是哪个线程占用cpu最高

接下来输入thread -n 1,表示最忙的线程并打印信息

从上面的图中可以看到arthas和jstack展示的信息差不多,都定位到了JVMCPU.java的16行程序