RFS 理解
阅读原文时间:2023年07月15日阅读:1

1.背景

网卡接收一个数据包的情况下,会经过三个阶段:

- 网卡产生硬件中断通知CPU有包到达

- 通过软中断处理此数据包

- 在用户态程序处理此数据包

在SMP体系下,这三个阶段有可能在3个不同的CPU上处理,如下图所示:

而RFS的目标就是增加CPU缓存的命中率从而提高网络延迟。当使用RFS后,其效果如下:

2.实现原理

当用户程序调用 revmsg() 或者 sendmsg()的时候,RFS会将此用户程序运行的CPU id存入hash表;

而当有关用户程序的数据包到达的时候,RFS尝试从hash表中取出相应的CPU id, 并将数据包放置

到此CPU的队列,从而对性能进行优化。

3.重要数据结构

/*
* The rps_sock_flow_table contains mappings of flows to the last CPU
* on which they were processed by the application (set in recvmsg).
*/
struct rps_sock_flow_table {
unsigned int mask;
u16 ents[];
};
#define RPS_SOCK_FLOW_TABLE_SIZE(_num) (sizeof(struct rps_sock_flow_table) + \
((_num) * sizeof(u16)))

结构体 rps_sock_flow_table 实现了一个hash表,RFS会将其声明一个全局变量用于存放所有sock对应的CPU。

/*
* The rps_dev_flow structure contains the mapping of a flow to a CPU, the
* tail pointer for that CPU's input queue at the time of last enqueue, and
* a hardware filter index.
*/
struct rps_dev_flow {
u16 cpu; //此链路上次使用的cpu
u16 filter;
unsigned int last_qtail; //此设备队列入队的sk_buff的个数
};
#define RPS_NO_FILTER 0xffff

/*
* The rps_dev_flow_table structure contains a table of flow mappings.
*/
struct rps_dev_flow_table {
unsigned int mask;
struct rcu_head rcu;
struct rps_dev_flow flows[]; //实现hash表
};
#define RPS_DEV_FLOW_TABLE_SIZE(_num) (sizeof(struct rps_dev_flow_table) + \
((_num) * sizeof(struct rps_dev_flow)))

结构体 rps_dev_flow_table 是针对一个设备队列

4.具体实现

用户程序使用revmsg() 或者 sendmsg()的时候 设置CPU id。

int inet_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
size_t size, int flags)
{
struct sock *sk = sock->sk;
int addr_len = ;
int err;

sock\_rps\_record\_flow(sk);   //设置CPU id

err = sk->sk\_prot->recvmsg(iocb, sk, msg, size, flags & MSG\_DONTWAIT,  
               flags & ~MSG\_DONTWAIT, &addr\_len);  
if (err >= )  
    msg->msg\_namelen = addr\_len;  
return err;  

}
EXPORT_SYMBOL(inet_recvmsg);

当有数据包进行了响应后,会调用get_rps_cpu()选择合适的CPU id。其关键代码如下:

 hash = skb\_get\_hash(skb);  
 if (!hash)  
     goto done;

 flow\_table = rcu\_dereference(rxqueue->rps\_flow\_table);     //设备队列的hash表  
 sock\_flow\_table = rcu\_dereference(rps\_sock\_flow\_table);    //全局的hash表  
 if (flow\_table && sock\_flow\_table) {  
     u16 next\_cpu;  
     struct rps\_dev\_flow \*rflow;

     rflow = &flow\_table->flows\[hash & flow\_table->mask\];  
     tcpu = rflow->cpu;  

     next\_cpu = sock\_flow\_table->ents\[hash & sock\_flow\_table->mask\];   //得到用户程序运行的CPU id

     /\*  

3133 * If the desired CPU (where last recvmsg was done) is
3134 * different from current CPU (one in the rx-queue flow
3135 * table entry), switch if one of the following holds:
3136 * - Current CPU is unset (equal to RPS_NO_CPU).
3137 * - Current CPU is offline.
3138 * - The current CPU's queue tail has advanced beyond the
3139 * last packet that was enqueued using this table entry.
3140 * This guarantees that all previous packets for the flow
3141 * have been dequeued, thus preserving in order delivery.
3142 */
if (unlikely(tcpu != next_cpu) &&
(tcpu == RPS_NO_CPU || !cpu_online(tcpu) ||
((int)(per_cpu(softnet_data, tcpu).input_queue_head -
rflow->last_qtail)) >= )) {
tcpu = next_cpu;
rflow = set_rps_cpu(dev, skb, rflow, next_cpu);
}

     if (tcpu != RPS\_NO\_CPU && cpu\_online(tcpu)) {  
         \*rflowp = rflow;  
         cpu = tcpu;  
         goto done;  
     }  
 }

上面的代码中第3145行比较难理解,数据结构 softnet_data用于管理进出的流量,他有两个关键的变量:

#ifdef CONFIG_RPS
/* Elements below can be accessed between CPUs for RPS */
struct call_single_data csd ____cacheline_aligned_in_smp;
struct softnet_data *rps_ipi_next;
unsigned int cpu;
unsigned int input_queue_head; //队列头,也可以理解为出队的位置
unsigned int input_queue_tail; //队列尾,也可以理解为入队的位置
#endif

表达式 (int)(per_cpu(softnet_data, tcpu).input_queue_head 求出了在tcpu 这个CPU上的出队数目,而rflow->last_qtail

代表设备队列上此sock对应的最后入队的位置,如果出队数目大于入队数目,那么说明这一链路上的包都处理完毕,不会

出现乱序处理的包。第3143的if 语句就是为了防止乱序包的出现,假如是多进程或者多线程同时处理一个socket,那么此

socket对应的CPU id就会不停变化。

参考文献:

http://www.pagefault.info/?p=115

http://syuu.dokukino.com/2013/05/linux-kernel-features-for-high-speed.html

https://www.kernel.org/doc/Documentation/networking/scaling.txt