排查 “Detected Tx Unit Hang”问题
阅读原文时间:2023年07月17日阅读:1

实现功能:

使用自己已经分配的内存让skb->data指向,而不是使用alloc_malloc()。

部分代码如下:

         /\*  
          \* build a new sk\_buff  
          \*/  
         //struct sk\_buff \*send\_skb = kmem\_cache\_alloc\_node(skbuff\_head\_cache, GFP\_ATOMIC & ~\_\_GFP\_DMA, NUMA\_NO\_NODE);  
         struct sk\_buff \*send\_skb = kmem\_cache\_alloc(skbuff\_head\_cache, GFP\_ATOMIC & ~\_\_GFP\_DMA);

         if (!send\_skb) {  
             //spin\_unlock(&lock);  
             return NF\_DROP;  
         }

         //printk("what2\\n");  
         memset(send\_skb, , offsetof(struct sk\_buff, tail));  
         atomic\_set(&send\_skb->users, );  
         send\_skb->cloned = ;

         send\_skb->head = mmap\_buf + ;  
         send\_skb->data = mmap\_buf + ;

第18行,mmap_buf是提前分配的内存。

在/var/log/messages中网卡驱动会输出错误信息:

ep :: 10g-host2 kernel: ixgbe ::00.0: eth2: Detected Tx Unit Hang
Sep :: 10g-host2 kernel: Tx Queue <>
Sep :: 10g-host2 kernel: TDH, TDT <>, <1ea>
Sep :: 10g-host2 kernel: next_to_use <1ea>
Sep :: 10g-host2 kernel: next_to_clean <>
Sep :: 10g-host2 kernel: ixgbe ::00.0: eth2: Detected Tx Unit Hang
Sep :: 10g-host2 kernel: Tx Queue <>
Sep :: 10g-host2 kernel: TDH, TDT <>, <1eb>
Sep :: 10g-host2 kernel: next_to_use <1eb>
Sep :: 10g-host2 kernel: next_to_clean <>
Sep :: 10g-host2 kernel: ixgbe ::00.0: eth2: Detected Tx Unit Hang
Sep :: 10g-host2 kernel: Tx Queue <>
Sep :: 10g-host2 kernel: TDH, TDT <>, <1ea>
Sep :: 10g-host2 kernel: next_to_use <1ea>
Sep :: 10g-host2 kernel: next_to_clean <>
Sep :: 10g-host2 kernel: ixgbe ::00.0: eth2: Detected Tx Unit Hang
Sep :: 10g-host2 kernel: Tx Queue <>
Sep :: 10g-host2 kernel: TDH, TDT <>, <1ea>
Sep :: 10g-host2 kernel: next_to_use <1ea>
Sep :: 10g-host2 kernel: next_to_clean <>
Sep :: 10g-host2 kernel: ixgbe ::00.0: eth2: Detected Tx Unit Hang
Sep :: 10g-host2 kernel: Tx Queue <>
Sep :: 10g-host2 kernel: TDH, TDT <>, <1ef>
Sep :: 10g-host2 kernel: next_to_use <1ef>
Sep :: 10g-host2 kernel: next_to_clean <>
Sep :: 10g-host2 kernel: ixgbe ::00.0: eth2: Detected Tx Unit Hang
Sep :: 10g-host2 kernel: Tx Queue <>
Sep :: 10g-host2 kernel: TDH, TDT <>, <1ec>
Sep :: 10g-host2 kernel: next_to_use <1ec>
Sep :: 10g-host2 kernel: next_to_clean <>
Sep :: 10g-host2 kernel: ixgbe ::00.0: eth2: Detected Tx Unit Hang

在排除各种原因后,定位为分配的mmap_buf存在问题。使用vmalloc()分配不正确,改为kmalloc()后正常。

《Linux内核设计与实现》第12.5节有解释,应该是:网卡设备要求分配的物理地址连续,而vmalloc()只是虚拟地址连续