最近又遇到一次hung,dmesg中堆栈如下:
[176223.913270] ------------[ cut here ]------------ [176223.913280] WARNING: CPU: 10 PID: 0 at net/sched/sch_generic.c:300 dev_watchdog+0x248/0x260-----注意cpu号是10 [176223.913282] NETDEV WATCHDOG: eth4 (ixgbe): transmit queue 5 timed out----------------warn的内容 [176223.913283] Modules linked in: witdriver(OE) mysendmsg(OE) fuse tipc(OE) ossmod(OE) iptable_filter mptctl mptbase bonding dm_mirror dm_region_hash dm_log dm_mod iTCO_wdt iTCO_vendor_support skx_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr joydev ses enclosure sg mei_me mei shpchp lpc_ich i2c_i801 wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_cpufreq acpi_pad tcp_bbr sch_fq binfmt_misc ip_tables ext4 mbcache jbd2 raid1 sd_mod crc_t10dif crct10dif_generic ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crct10dif_pclmul crct10dif_common crc32c_intel ahci libahci libata ixgbe(OE) i40e(OE) mpt3sas(OE)
[176223.913340] ptp raid_class pps_core scsi_transport_sas i2c_core dca [last unloaded: witdriver]
[176223.913347] CPU: 10 PID: 0 Comm: swapper/10 Tainted: G OE ------------ 3.10.0-693.21.1.el7.x86_64 #1 [176223.913348] Hardware name: ZTE ZXCDN/SC621DI-16F, BIOS 1.0b 09/21/2017 [176223.913350] Call Trace:
[176223.913351]
[176224.918915] ixgbe 0000:86:00.1 eth5: Fake Tx hang detected with timeout of 5 seconds
[176224.918917] ixgbe 0000:af:00.1 eth7: Fake Tx hang detected with timeout of 5 seconds
[176224.918948] ixgbe 0000:af:00.0 eth6: Fake Tx hang detected with timeout of 5 seconds
[176224.934872] ixgbe 0000:18:00.1 eth1: Fake Tx hang detected with timeout of 5 seconds
[176226.403293] hrtimer_gaq_time = 912735, soft_irq_time = 42216, cpu = 7 [176233.921764] ixgbe 0000:86:00.0 eth4: Fake Tx hang detected with timeout of 10 seconds
[176234.911438] ixgbe 0000:af:00.1 eth7: Fake Tx hang detected with timeout of 10 seconds
[176234.911441] ixgbe 0000:86:00.1 eth5: Fake Tx hang detected with timeout of 10 seconds
[176234.911474] ixgbe 0000:af:00.0 eth6: Fake Tx hang detected with timeout of 10 seconds
[176234.943364] ixgbe 0000:18:00.1 eth1: Fake Tx hang detected with timeout of 10 seconds
[176237.664127] hrtimer_gaq_time = 854268, soft_irq_time = 72171, cpu = 43
如果只盯着这个堆栈看,可以看出,网卡的 dev_watchdog 函数检测到了eth4的queue 5 出现了 trans_timeout。
超时的检测周期,不同的设备是不一样的,intel的ixgbe对应的超时时间是 ixgbe_main.c 中设置的
netdev->watchdog_timeo = 5 * HZ;
检测的原理就是,每当发包成功的时候,设置 txq->trans_start ,如果这个值在timer到期检测的时候,和当前时间相差 一个超时周期(默认是5s,各种驱动可能不同),则认为出现发送
超时。依赖于定时器软中断,检测的对象是处于running状态的网卡已经处于stop状态的发送队列,话说网卡如果没有running,检测也没有意义。
static inline netdev_tx_t netdev_start_xmit(struct sk_buff *skb, struct net_device *dev, struct netdev_queue *txq, bool more)
{ const struct net_device_ops *ops = dev->netdev_ops; int rc; /*__netdev_start_xmit 里面就完全是使用driver 的ops去发包了,其实到此为止,一个skb已经从netdevice层送到driver层了,接下来会等待driver的返回*/ rc = __netdev_start_xmit(ops, skb, dev, more); if (rc == NETDEV_TX_OK)
txq_trans_update(txq); return rc;
}
static inline void txq_trans_update(struct netdev_queue *txq)
{ if (txq->xmit_lock_owner != -1)
txq->trans_start = jiffies;---------------------更新当前发送队列txq的最后发送时间
}
ixgbe驱动会检测是否pending,也就是有的tx中的ring没变动了,那么打印warning。
接下来,又有一个堆栈:
[176242.505583] NMI watchdog: BUG: soft lockup - CPU#9 stuck for 22s! [kthread_send/9:178838]-----------注意这个时候是cpu9
[176242.506532] Modules linked in: witdriver(OE) mysendmsg(OE) fuse tipc(OE) ossmod(OE) iptable_filter mptctl mptbase bonding dm_mirror dm_region_hash dm_log dm_mod iTCO_wdt iTCO_vendor_support skx_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr joydev ses enclosure sg mei_me mei shpchp lpc_ich i2c_i801 wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_cpufreq acpi_pad tcp_bbr sch_fq binfmt_misc ip_tables ext4 mbcache jbd2 raid1 sd_mod crc_t10dif crct10dif_generic ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crct10dif_pclmul crct10dif_common crc32c_intel ahci libahci libata ixgbe(OE) i40e(OE) mpt3sas(OE)
[176242.506588] ptp raid_class pps_core scsi_transport_sas i2c_core dca [last unloaded: witdriver]
[176242.506597] CPU: 9 PID: 178838 Comm: kthread_send/9 Tainted: G W OE ------------ 3.10.0-693.21.1.el7.x86_64 #1 [176242.506599] Hardware name: ZTE ZXCDN/SC621DI-16F, BIOS 1.0b 09/21/2017 [176242.506601] task: ffff88290f7ddee0 ti: ffff882b680d0000 task.ti: ffff882b680d0000
[176242.506603] RIP: 0010:[
[176242.506614] RBP: ffff882fbe843dc0 R08: 0000000000000101 R09: ffff8801363f4000
[176242.506616] R10: 0000000000000000 R11: ffff882fbe843da8 R12: ffff882fbe843d38
[176242.506617] R13: ffffffff816c6732 R14: ffff882fbe843dc0 R15: ffff882fb80a0b40
[176242.506619] FS: 0000000000000000(0000) GS:ffff882fbe840000(0000) knlGS:0000000000000000 [176242.506621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [176242.506622] CR2: 00007fd86fd46316 CR3: 0000000001a0a000 CR4: 00000000003607e0
[176242.506624] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [176242.506626] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [176242.506627] Call Trace:
[176242.506629]
[176242.506635] [
[176242.506702] [
[176242.506710] [
[176242.506715] [
[176242.506719] [
[176242.506723] [
[176242.506726] [
[176242.506729] [
[176242.506733] [
[176242.506736] [
分析这个堆栈,dev_watchdog 这次拿不到锁了,已经持续了22s,走查代码,它需要哪一把锁呢?
etif_tx_lock(dev);
static inline void netif_tx_lock(struct net_device *dev)
{
unsigned int i; int cpu;
spin\_lock(&dev->tx\_global\_lock);
cpu \= smp\_processor\_id(); for (i = 0; i < dev->num\_tx\_queues; i++) { struct netdev\_queue \*txq = netdev\_get\_tx\_queue(dev, i); /\* We are the only thread of execution doing a
\* freeze, but we have to grab the \_xmit\_lock in
\* order to synchronize with threads which are in
\* the ->hard\_start\_xmit() handler and already
\* checked the frozen bit. \*/ \_\_netif\_tx\_lock(txq, cpu);
set\_bit(\_\_QUEUE\_STATE\_FROZEN, &txq->state);
\_\_netif\_tx\_unlock(txq);
}
}
那这把锁被谁拿了呢?被 wit_kthread_xmit_fn 函数拿了,因为我们正在发包啊。
也就是说,我们的内核线程发包的时候,拿了锁,结果被中断,然后硬中断处理dev_watchdog的时候,也要拿这把锁,死锁了。
接下来,又引发其他cpu死锁:(每次进入_raw_spin_lock,rdi中必然存放的就是锁,x86反汇编看的话确实是rdi,其他架构不是如此)
[176246.611936] NMI watchdog: BUG: soft lockup - CPU#25 stuck for 22s! [a.out:189281]-------------------------------cpu25
[176246.612901] Modules linked in: witdriver(OE) mysendmsg(OE) fuse tipc(OE) ossmod(OE) iptable_filter mptctl mptbase bonding dm_mirror dm_region_hash dm_log dm_mod iTCO_wdt iTCO_vendor_support skx_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr joydev ses enclosure sg mei_me mei shpchp lpc_ich i2c_i801 wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_cpufreq acpi_pad tcp_bbr sch_fq binfmt_misc ip_tables ext4 mbcache jbd2 raid1 sd_mod crc_t10dif crct10dif_generic ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crct10dif_pclmul crct10dif_common crc32c_intel ahci libahci libata ixgbe(OE) i40e(OE) mpt3sas(OE)
[176246.612961] ptp raid_class pps_core scsi_transport_sas i2c_core dca [last unloaded: witdriver]
[176246.612969] CPU: 25 PID: 189281 Comm: a.out Tainted: G W OEL ------------ 3.10.0-693.21.1.el7.x86_64 #1 [176246.612970] Hardware name: ZTE ZXCDN/SC621DI-16F, BIOS 1.0b 09/21/2017 [176246.612973] task: ffff882ba14a0000 ti: ffff882ba92c8000 task.ti: ffff882ba92c8000
[176246.612974] RIP: 0010:[
[176246.612985] RDX: 0000000000c90101 RSI: 0000000000000101 RDI: ffff882fb80a0bc0------------大家都在抢的香饽饽存放在此,就是这把锁
[176246.612986] RBP: ffff882ba92cb978 R08: ffff882fbe959700 R09: 0000000000000000 [176246.612988] R10: 0000000000000001 R11: ffffea017c3e5580 R12: ffff882c18a85c00
[176246.612989] R13: 00000000000005dc R14: ffff882ba2091100 R15: ffffffff811e63d1
[176246.612991] FS: 00007fbf25f9b700(0000) GS:ffff882fbe940000(0000) knlGS:0000000000000000 [176246.612993] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [176246.612995] CR2: 0000000000c79000 CR3: 000000593a2ec000 CR4: 00000000003607e0
[176246.612996] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [176246.612998] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [176246.612999] Call Trace:
[176246.613006] [
[176246.613036] [
[176246.613039] [
[176246.613084] [
[176246.613090] [
由于我们拿的是dev的锁,在没开启抢占的情况下不调度出去,但也并没有关中断,这样其他cpu在系统调用中要发包,也会去尝试拿这把锁,好吧,你也死锁了,等着吧。
可以想象,后续其他cpu要拿这把锁的话,大家都会挂,一个cpu的softlock导致其他cpu全挂,只要你这个cpu来拿那锁就会如此。
由于我们拿的是spin_lock,并没有关中断,所以它还可以响应中断,看如下堆栈就明白,最早打印网卡异常的9号cpu,堆栈在死的时候是如下这样的:
crash> bt
PID: 178838 TASK: ffff88290f7ddee0 CPU: 9 COMMAND: "kthread_send/9" #0 [ffff882fbe843a70] machine_kexec at ffffffff8105d77b
#1 [ffff882fbe843ad0] __crash_kexec at ffffffff8110aca2
#2 [ffff882fbe843ba0] panic at ffffffff816ad52f
#3 [ffff882fbe843c20] watchdog_timer_fn at ffffffff81135a51----------------------------hardlock检测中hrtimer的回调
#4 [ffff882fbe843c58] __hrtimer_run_queues at ffffffff810b93a6
#5 [ffff882fbe843cb0] hrtimer_interrupt at ffffffff810b993f
#6 [ffff882fbe843cf8] local_apic_timer_interrupt at ffffffff8105467b
#7 [ffff882fbe843d10] smp_apic_timer_interrupt at ffffffff816c9e83
#8 [ffff882fbe843d28] apic_timer_interrupt at ffffffff816c6732--------------------------死之前又增加的堆栈,这个是硬中断
#9 [ffff882fbe843dc8] queued_spin_lock_slowpath at ffffffff816adeee---------------------一开始打印hung的堆栈,在自旋
#10 [ffff882fbe843dd8] _raw_spin_lock at ffffffff816bb080
#11 [ffff882fbe843de8] dev_watchdog at ffffffff815bca52
#12 [ffff882fbe843e28] call_timer_fn at ffffffff8109a9c8
#13 [ffff882fbe843e60] run_timer_softirq at ffffffff8109ceed
#14 [ffff882fbe843ed8] __do_softirq at ffffffff8109404d
#15 [ffff882fbe843f48] call_softirq at ffffffff816c8afc
#16 [ffff882fbe843f60] do_softirq at ffffffff8102d435
#17 [ffff882fbe843f80] irq_exit at ffffffff81094495
#18 [ffff882fbe843f98] smp_apic_timer_interrupt at ffffffff816c9e88
#19 [ffff882fbe843fb0] apic_timer_interrupt at ffffffff816c6732 ---
[exception RIP: ixgbe_xmit_frame_ring+83]
RIP: ffffffffc01299e3 RSP: ffff882b680d3cd0 RFLAGS: 00000212 RAX: 0000000000000562 RBX: 0000000000000001 RCX: 000000000000403d
RDX: ffff882fb9331c00 RSI: ffff8828d7b8fac0 RDI: 0000000000000001 RBP: ffff882b680d3d48 R8: 0000000000000008 R9: 0000a0a5447b9d78
R10: ffff8828c6e84f00 R11: 000000002b3000b8 R12: ffff8828c0291b00
R13: 0000000022300000 R14: 0000000000000001 R15: ffff882b680d3cc0
ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018 #21 [ffff882b680d3d50] ixgbe_xmit_frame at ffffffffc012a918 [ixgbe]
#22 [ffff882b680d3d80] wit_send_tasklet at ffffffffc043b63c [witdriver]
#23 [ffff882b680d3e78] wit_kthread_xmit_fn at ffffffffc043ba95 [witdriver]
#24 [ffff882b680d3ec8] kthread at ffffffff810b5241
#25 [ffff882b680d3f50] ret_from_fork at ffffffff816c5577
总结:
1、解决办法很简单,就是在 wit_kthread_xmit_fn 函数中,加了一个关闭软中断就行,因为就是软中断和sys拿锁冲突。
2、分析软锁,最好从dmesg中开始分析,那个时候时间还早,会按时间顺序来打印堆栈,比起直接crash来分析要快一些,特别是在对自己模块代码比较熟悉的情况下。
手机扫一扫
移动阅读更方便
你可能感兴趣的文章