1.背景
网卡接收一个数据包的情况下,会经过三个阶段:
- 网卡产生硬件中断通知CPU有包到达
- 通过软中断处理此数据包
- 在用户态程序处理此数据包
在SMP体系下,这三个阶段有可能在3个不同的CPU上处理,如下图所示:
而RFS的目标就是增加CPU缓存的命中率从而提高网络延迟。当使用RFS后,其效果如下:
2.实现原理
当用户程序调用 revmsg() 或者 sendmsg()的时候,RFS会将此用户程序运行的CPU id存入hash表;
而当有关用户程序的数据包到达的时候,RFS尝试从hash表中取出相应的CPU id, 并将数据包放置
到此CPU的队列,从而对性能进行优化。
3.重要数据结构
/*
* The rps_sock_flow_table contains mappings of flows to the last CPU
* on which they were processed by the application (set in recvmsg).
*/
struct rps_sock_flow_table {
unsigned int mask;
u16 ents[];
};
#define RPS_SOCK_FLOW_TABLE_SIZE(_num) (sizeof(struct rps_sock_flow_table) + \
((_num) * sizeof(u16)))
结构体 rps_sock_flow_table 实现了一个hash表,RFS会将其声明一个全局变量用于存放所有sock对应的CPU。
/*
* The rps_dev_flow structure contains the mapping of a flow to a CPU, the
* tail pointer for that CPU's input queue at the time of last enqueue, and
* a hardware filter index.
*/
struct rps_dev_flow {
u16 cpu; //此链路上次使用的cpu
u16 filter;
unsigned int last_qtail; //此设备队列入队的sk_buff的个数
};
#define RPS_NO_FILTER 0xffff
/*
* The rps_dev_flow_table structure contains a table of flow mappings.
*/
struct rps_dev_flow_table {
unsigned int mask;
struct rcu_head rcu;
struct rps_dev_flow flows[]; //实现hash表
};
#define RPS_DEV_FLOW_TABLE_SIZE(_num) (sizeof(struct rps_dev_flow_table) + \
((_num) * sizeof(struct rps_dev_flow)))
结构体 rps_dev_flow_table 是针对一个设备队列
4.具体实现
用户程序使用revmsg() 或者 sendmsg()的时候 设置CPU id。
int inet_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
size_t size, int flags)
{
struct sock *sk = sock->sk;
int addr_len = ;
int err;
sock\_rps\_record\_flow(sk); //设置CPU id
err = sk->sk\_prot->recvmsg(iocb, sk, msg, size, flags & MSG\_DONTWAIT,
flags & ~MSG\_DONTWAIT, &addr\_len);
if (err >= )
msg->msg\_namelen = addr\_len;
return err;
}
EXPORT_SYMBOL(inet_recvmsg);
当有数据包进行了响应后,会调用get_rps_cpu()选择合适的CPU id。其关键代码如下:
hash = skb\_get\_hash(skb);
if (!hash)
goto done;
flow\_table = rcu\_dereference(rxqueue->rps\_flow\_table); //设备队列的hash表
sock\_flow\_table = rcu\_dereference(rps\_sock\_flow\_table); //全局的hash表
if (flow\_table && sock\_flow\_table) {
u16 next\_cpu;
struct rps\_dev\_flow \*rflow;
rflow = &flow\_table->flows\[hash & flow\_table->mask\];
tcpu = rflow->cpu;
next\_cpu = sock\_flow\_table->ents\[hash & sock\_flow\_table->mask\]; //得到用户程序运行的CPU id
/\*
3133 * If the desired CPU (where last recvmsg was done) is
3134 * different from current CPU (one in the rx-queue flow
3135 * table entry), switch if one of the following holds:
3136 * - Current CPU is unset (equal to RPS_NO_CPU).
3137 * - Current CPU is offline.
3138 * - The current CPU's queue tail has advanced beyond the
3139 * last packet that was enqueued using this table entry.
3140 * This guarantees that all previous packets for the flow
3141 * have been dequeued, thus preserving in order delivery.
3142 */
if (unlikely(tcpu != next_cpu) &&
(tcpu == RPS_NO_CPU || !cpu_online(tcpu) ||
((int)(per_cpu(softnet_data, tcpu).input_queue_head -
rflow->last_qtail)) >= )) {
tcpu = next_cpu;
rflow = set_rps_cpu(dev, skb, rflow, next_cpu);
}
if (tcpu != RPS\_NO\_CPU && cpu\_online(tcpu)) {
\*rflowp = rflow;
cpu = tcpu;
goto done;
}
}
上面的代码中第3145行比较难理解,数据结构 softnet_data用于管理进出的流量,他有两个关键的变量:
#ifdef CONFIG_RPS
/* Elements below can be accessed between CPUs for RPS */
struct call_single_data csd ____cacheline_aligned_in_smp;
struct softnet_data *rps_ipi_next;
unsigned int cpu;
unsigned int input_queue_head; //队列头,也可以理解为出队的位置
unsigned int input_queue_tail; //队列尾,也可以理解为入队的位置
#endif
表达式 (int)(per_cpu(softnet_data, tcpu).input_queue_head 求出了在tcpu 这个CPU上的出队数目,而rflow->last_qtail
代表设备队列上此sock对应的最后入队的位置,如果出队数目大于入队数目,那么说明这一链路上的包都处理完毕,不会
出现乱序处理的包。第3143的if 语句就是为了防止乱序包的出现,假如是多进程或者多线程同时处理一个socket,那么此
socket对应的CPU id就会不停变化。
参考文献:
http://www.pagefault.info/?p=115
http://syuu.dokukino.com/2013/05/linux-kernel-features-for-high-speed.html
https://www.kernel.org/doc/Documentation/networking/scaling.txt
手机扫一扫
移动阅读更方便
你可能感兴趣的文章