【io_uring】liburing 用户库源码分析
阅读原文时间:2023年08月25日阅读:1

文章目录

当前内容基于 liburing 2.1 版本

之前有过总结,使用 io_uring 的一般流程如下:

  • 使用 openfstat 等函数来打开文件以及元数据查看等操作

    • 因为 io_uring 替换的是读写接口,后续 io_uring 操作的对象是 fd(由 open 函数执行返回的)
  • 使用 io_uring_queue_init 初始化 struct io_uring ring 结构体

  • 初始化 struct iovec *iovecs 结构体用于存放用户态 buffer 指针和长度

  • 通过 io_uring_get_sqe 获取 sqe

  • 通过 io_uring_prep_#OPsqe 填充命令,buffer 以及 offset 信息

    • 【可选】 通过 io_uring_sqe_set_datasqe 附加 user_data 信息(该信息会在 cqe 中进行返回)
  • 通过 io_uring_submit 对整个 ring 的所有 sqe 进行下发

  • 通过 io_uring_wait_cqe 或者 io_uring_peek_cqe 来获取 cqe

    • io_uring_wait_cqe 会阻塞当前线程直到有一个 cqe 返回
    • io_uring_peek_cqe 不会阻塞,如果当前没有 cqe,就会返回错误
    • io_uring_cqe_get_data 可以从 cqe 中获取 user_data
  • 通过 io_uring_cqe_seen 对当前 cqe 进行清除,避免被二次处理

  • 所有 IO 完成后,通过 io_uring_queue_exitring 销毁

  • 函数调用逻辑

    #mermaid-svg-H3yetaxX7jDQBCcY .label{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);fill:#333;color:#333}#mermaid-svg-H3yetaxX7jDQBCcY .label text{fill:#333}#mermaid-svg-H3yetaxX7jDQBCcY .node rect,#mermaid-svg-H3yetaxX7jDQBCcY .node circle,#mermaid-svg-H3yetaxX7jDQBCcY .node ellipse,#mermaid-svg-H3yetaxX7jDQBCcY .node polygon,#mermaid-svg-H3yetaxX7jDQBCcY .node path{fill:#ECECFF;stroke:#9370db;stroke-width:1px}#mermaid-svg-H3yetaxX7jDQBCcY .node .label{text-align:center;fill:#333}#mermaid-svg-H3yetaxX7jDQBCcY .node.clickable{cursor:pointer}#mermaid-svg-H3yetaxX7jDQBCcY .arrowheadPath{fill:#333}#mermaid-svg-H3yetaxX7jDQBCcY .edgePath .path{stroke:#333;stroke-width:1.5px}#mermaid-svg-H3yetaxX7jDQBCcY .flowchart-link{stroke:#333;fill:none}#mermaid-svg-H3yetaxX7jDQBCcY .edgeLabel{background-color:#e8e8e8;text-align:center}#mermaid-svg-H3yetaxX7jDQBCcY .edgeLabel rect{opacity:0.9}#mermaid-svg-H3yetaxX7jDQBCcY .edgeLabel span{color:#333}#mermaid-svg-H3yetaxX7jDQBCcY .cluster rect{fill:#ffffde;stroke:#aa3;stroke-width:1px}#mermaid-svg-H3yetaxX7jDQBCcY .cluster text{fill:#333}#mermaid-svg-H3yetaxX7jDQBCcY div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);font-size:12px;background:#ffffde;border:1px solid #aa3;border-radius:2px;pointer-events:none;z-index:100}#mermaid-svg-H3yetaxX7jDQBCcY .actor{stroke:#ccf;fill:#ECECFF}#mermaid-svg-H3yetaxX7jDQBCcY text.actor>tspan{fill:#000;stroke:none}#mermaid-svg-H3yetaxX7jDQBCcY .actor-line{stroke:grey}#mermaid-svg-H3yetaxX7jDQBCcY .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333}#mermaid-svg-H3yetaxX7jDQBCcY .messageLine1{stroke-width:1.5;stroke-dasharray:2, 2;stroke:#333}#mermaid-svg-H3yetaxX7jDQBCcY #arrowhead path{fill:#333;stroke:#333}#mermaid-svg-H3yetaxX7jDQBCcY .sequenceNumber{fill:#fff}#mermaid-svg-H3yetaxX7jDQBCcY #sequencenumber{fill:#333}#mermaid-svg-H3yetaxX7jDQBCcY #crosshead path{fill:#333;stroke:#333}#mermaid-svg-H3yetaxX7jDQBCcY .messageText{fill:#333;stroke:#333}#mermaid-svg-H3yetaxX7jDQBCcY .labelBox{stroke:#ccf;fill:#ECECFF}#mermaid-svg-H3yetaxX7jDQBCcY .labelText,#mermaid-svg-H3yetaxX7jDQBCcY .labelText>tspan{fill:#000;stroke:none}#mermaid-svg-H3yetaxX7jDQBCcY .loopText,#mermaid-svg-H3yetaxX7jDQBCcY .loopText>tspan{fill:#000;stroke:none}#mermaid-svg-H3yetaxX7jDQBCcY .loopLine{stroke-width:2px;stroke-dasharray:2, 2;stroke:#ccf;fill:#ccf}#mermaid-svg-H3yetaxX7jDQBCcY .note{stroke:#aa3;fill:#fff5ad}#mermaid-svg-H3yetaxX7jDQBCcY .noteText,#mermaid-svg-H3yetaxX7jDQBCcY .noteText>tspan{fill:#000;stroke:none}#mermaid-svg-H3yetaxX7jDQBCcY .activation0{fill:#f4f4f4;stroke:#666}#mermaid-svg-H3yetaxX7jDQBCcY .activation1{fill:#f4f4f4;stroke:#666}#mermaid-svg-H3yetaxX7jDQBCcY .activation2{fill:#f4f4f4;stroke:#666}#mermaid-svg-H3yetaxX7jDQBCcY .mermaid-main-font{font-family:"trebuchet ms", verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-H3yetaxX7jDQBCcY .section{stroke:none;opacity:0.2}#mermaid-svg-H3yetaxX7jDQBCcY .section0{fill:rgba(102,102,255,0.49)}#mermaid-svg-H3yetaxX7jDQBCcY .section2{fill:#fff400}#mermaid-svg-H3yetaxX7jDQBCcY .section1,#mermaid-svg-H3yetaxX7jDQBCcY .section3{fill:#fff;opacity:0.2}#mermaid-svg-H3yetaxX7jDQBCcY .sectionTitle0{fill:#333}#mermaid-svg-H3yetaxX7jDQBCcY .sectionTitle1{fill:#333}#mermaid-svg-H3yetaxX7jDQBCcY .sectionTitle2{fill:#333}#mermaid-svg-H3yetaxX7jDQBCcY .sectionTitle3{fill:#333}#mermaid-svg-H3yetaxX7jDQBCcY .sectionTitle{text-anchor:start;font-size:11px;text-height:14px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-H3yetaxX7jDQBCcY .grid .tick{stroke:#d3d3d3;opacity:0.8;shape-rendering:crispEdges}#mermaid-svg-H3yetaxX7jDQBCcY .grid .tick text{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-H3yetaxX7jDQBCcY .grid path{stroke-width:0}#mermaid-svg-H3yetaxX7jDQBCcY .today{fill:none;stroke:red;stroke-width:2px}#mermaid-svg-H3yetaxX7jDQBCcY .task{stroke-width:2}#mermaid-svg-H3yetaxX7jDQBCcY .taskText{text-anchor:middle;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-H3yetaxX7jDQBCcY .taskText:not([font-size]){font-size:11px}#mermaid-svg-H3yetaxX7jDQBCcY .taskTextOutsideRight{fill:#000;text-anchor:start;font-size:11px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-H3yetaxX7jDQBCcY .taskTextOutsideLeft{fill:#000;text-anchor:end;font-size:11px}#mermaid-svg-H3yetaxX7jDQBCcY .task.clickable{cursor:pointer}#mermaid-svg-H3yetaxX7jDQBCcY .taskText.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-H3yetaxX7jDQBCcY .taskTextOutsideLeft.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-H3yetaxX7jDQBCcY .taskTextOutsideRight.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-H3yetaxX7jDQBCcY .taskText0,#mermaid-svg-H3yetaxX7jDQBCcY .taskText1,#mermaid-svg-H3yetaxX7jDQBCcY .taskText2,#mermaid-svg-H3yetaxX7jDQBCcY .taskText3{fill:#fff}#mermaid-svg-H3yetaxX7jDQBCcY .task0,#mermaid-svg-H3yetaxX7jDQBCcY .task1,#mermaid-svg-H3yetaxX7jDQBCcY .task2,#mermaid-svg-H3yetaxX7jDQBCcY .task3{fill:#8a90dd;stroke:#534fbc}#mermaid-svg-H3yetaxX7jDQBCcY .taskTextOutside0,#mermaid-svg-H3yetaxX7jDQBCcY .taskTextOutside2{fill:#000}#mermaid-svg-H3yetaxX7jDQBCcY .taskTextOutside1,#mermaid-svg-H3yetaxX7jDQBCcY .taskTextOutside3{fill:#000}#mermaid-svg-H3yetaxX7jDQBCcY .active0,#mermaid-svg-H3yetaxX7jDQBCcY .active1,#mermaid-svg-H3yetaxX7jDQBCcY .active2,#mermaid-svg-H3yetaxX7jDQBCcY .active3{fill:#bfc7ff;stroke:#534fbc}#mermaid-svg-H3yetaxX7jDQBCcY .activeText0,#mermaid-svg-H3yetaxX7jDQBCcY .activeText1,#mermaid-svg-H3yetaxX7jDQBCcY .activeText2,#mermaid-svg-H3yetaxX7jDQBCcY .activeText3{fill:#000 !important}#mermaid-svg-H3yetaxX7jDQBCcY .done0,#mermaid-svg-H3yetaxX7jDQBCcY .done1,#mermaid-svg-H3yetaxX7jDQBCcY .done2,#mermaid-svg-H3yetaxX7jDQBCcY .done3{stroke:grey;fill:#d3d3d3;stroke-width:2}#mermaid-svg-H3yetaxX7jDQBCcY .doneText0,#mermaid-svg-H3yetaxX7jDQBCcY .doneText1,#mermaid-svg-H3yetaxX7jDQBCcY .doneText2,#mermaid-svg-H3yetaxX7jDQBCcY .doneText3{fill:#000 !important}#mermaid-svg-H3yetaxX7jDQBCcY .crit0,#mermaid-svg-H3yetaxX7jDQBCcY .crit1,#mermaid-svg-H3yetaxX7jDQBCcY .crit2,#mermaid-svg-H3yetaxX7jDQBCcY .crit3{stroke:#f88;fill:red;stroke-width:2}#mermaid-svg-H3yetaxX7jDQBCcY .activeCrit0,#mermaid-svg-H3yetaxX7jDQBCcY .activeCrit1,#mermaid-svg-H3yetaxX7jDQBCcY .activeCrit2,#mermaid-svg-H3yetaxX7jDQBCcY .activeCrit3{stroke:#f88;fill:#bfc7ff;stroke-width:2}#mermaid-svg-H3yetaxX7jDQBCcY .doneCrit0,#mermaid-svg-H3yetaxX7jDQBCcY .doneCrit1,#mermaid-svg-H3yetaxX7jDQBCcY .doneCrit2,#mermaid-svg-H3yetaxX7jDQBCcY .doneCrit3{stroke:#f88;fill:#d3d3d3;stroke-width:2;cursor:pointer;shape-rendering:crispEdges}#mermaid-svg-H3yetaxX7jDQBCcY .milestone{transform:rotate(45deg) scale(0.8, 0.8)}#mermaid-svg-H3yetaxX7jDQBCcY .milestoneText{font-style:italic}#mermaid-svg-H3yetaxX7jDQBCcY .doneCritText0,#mermaid-svg-H3yetaxX7jDQBCcY .doneCritText1,#mermaid-svg-H3yetaxX7jDQBCcY .doneCritText2,#mermaid-svg-H3yetaxX7jDQBCcY .doneCritText3{fill:#000 !important}#mermaid-svg-H3yetaxX7jDQBCcY .activeCritText0,#mermaid-svg-H3yetaxX7jDQBCcY .activeCritText1,#mermaid-svg-H3yetaxX7jDQBCcY .activeCritText2,#mermaid-svg-H3yetaxX7jDQBCcY .activeCritText3{fill:#000 !important}#mermaid-svg-H3yetaxX7jDQBCcY .titleText{text-anchor:middle;font-size:18px;fill:#000;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-H3yetaxX7jDQBCcY g.classGroup text{fill:#9370db;stroke:none;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);font-size:10px}#mermaid-svg-H3yetaxX7jDQBCcY g.classGroup text .title{font-weight:bolder}#mermaid-svg-H3yetaxX7jDQBCcY g.clickable{cursor:pointer}#mermaid-svg-H3yetaxX7jDQBCcY g.classGroup rect{fill:#ECECFF;stroke:#9370db}#mermaid-svg-H3yetaxX7jDQBCcY g.classGroup line{stroke:#9370db;stroke-width:1}#mermaid-svg-H3yetaxX7jDQBCcY .classLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.5}#mermaid-svg-H3yetaxX7jDQBCcY .classLabel .label{fill:#9370db;font-size:10px}#mermaid-svg-H3yetaxX7jDQBCcY .relation{stroke:#9370db;stroke-width:1;fill:none}#mermaid-svg-H3yetaxX7jDQBCcY .dashed-line{stroke-dasharray:3}#mermaid-svg-H3yetaxX7jDQBCcY #compositionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-H3yetaxX7jDQBCcY #compositionEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-H3yetaxX7jDQBCcY #aggregationStart{fill:#ECECFF;stroke:#9370db;stroke-width:1}#mermaid-svg-H3yetaxX7jDQBCcY #aggregationEnd{fill:#ECECFF;stroke:#9370db;stroke-width:1}#mermaid-svg-H3yetaxX7jDQBCcY #dependencyStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-H3yetaxX7jDQBCcY #dependencyEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-H3yetaxX7jDQBCcY #extensionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-H3yetaxX7jDQBCcY #extensionEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-H3yetaxX7jDQBCcY .commit-id,#mermaid-svg-H3yetaxX7jDQBCcY .commit-msg,#mermaid-svg-H3yetaxX7jDQBCcY .branch-label{fill:lightgrey;color:lightgrey;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-H3yetaxX7jDQBCcY .pieTitleText{text-anchor:middle;font-size:25px;fill:#000;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-H3yetaxX7jDQBCcY .slice{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-H3yetaxX7jDQBCcY g.stateGroup text{fill:#9370db;stroke:none;font-size:10px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-H3yetaxX7jDQBCcY g.stateGroup text{fill:#9370db;fill:#333;stroke:none;font-size:10px}#mermaid-svg-H3yetaxX7jDQBCcY g.statediagram-cluster .cluster-label text{fill:#333}#mermaid-svg-H3yetaxX7jDQBCcY g.stateGroup .state-title{font-weight:bolder;fill:#000}#mermaid-svg-H3yetaxX7jDQBCcY g.stateGroup rect{fill:#ECECFF;stroke:#9370db}#mermaid-svg-H3yetaxX7jDQBCcY g.stateGroup line{stroke:#9370db;stroke-width:1}#mermaid-svg-H3yetaxX7jDQBCcY .transition{stroke:#9370db;stroke-width:1;fill:none}#mermaid-svg-H3yetaxX7jDQBCcY .stateGroup .composit{fill:white;border-bottom:1px}#mermaid-svg-H3yetaxX7jDQBCcY .stateGroup .alt-composit{fill:#e0e0e0;border-bottom:1px}#mermaid-svg-H3yetaxX7jDQBCcY .state-note{stroke:#aa3;fill:#fff5ad}#mermaid-svg-H3yetaxX7jDQBCcY .state-note text{fill:black;stroke:none;font-size:10px}#mermaid-svg-H3yetaxX7jDQBCcY .stateLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.7}#mermaid-svg-H3yetaxX7jDQBCcY .edgeLabel text{fill:#333}#mermaid-svg-H3yetaxX7jDQBCcY .stateLabel text{fill:#000;font-size:10px;font-weight:bold;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-H3yetaxX7jDQBCcY .node circle.state-start{fill:black;stroke:black}#mermaid-svg-H3yetaxX7jDQBCcY .node circle.state-end{fill:black;stroke:white;stroke-width:1.5}#mermaid-svg-H3yetaxX7jDQBCcY #statediagram-barbEnd{fill:#9370db}#mermaid-svg-H3yetaxX7jDQBCcY .statediagram-cluster rect{fill:#ECECFF;stroke:#9370db;stroke-width:1px}#mermaid-svg-H3yetaxX7jDQBCcY .statediagram-cluster rect.outer{rx:5px;ry:5px}#mermaid-svg-H3yetaxX7jDQBCcY .statediagram-state .divider{stroke:#9370db}#mermaid-svg-H3yetaxX7jDQBCcY .statediagram-state .title-state{rx:5px;ry:5px}#mermaid-svg-H3yetaxX7jDQBCcY .statediagram-cluster.statediagram-cluster .inner{fill:white}#mermaid-svg-H3yetaxX7jDQBCcY .statediagram-cluster.statediagram-cluster-alt .inner{fill:#e0e0e0}#mermaid-svg-H3yetaxX7jDQBCcY .statediagram-cluster .inner{rx:0;ry:0}#mermaid-svg-H3yetaxX7jDQBCcY .statediagram-state rect.basic{rx:5px;ry:5px}#mermaid-svg-H3yetaxX7jDQBCcY .statediagram-state rect.divider{stroke-dasharray:10,10;fill:#efefef}#mermaid-svg-H3yetaxX7jDQBCcY .note-edge{stroke-dasharray:5}#mermaid-svg-H3yetaxX7jDQBCcY .statediagram-note rect{fill:#fff5ad;stroke:#aa3;stroke-width:1px;rx:0;ry:0}:root{--mermaid-font-family: '"trebuchet ms", verdana, arial';--mermaid-font-family: "Comic Sans MS", "Comic Sans", cursive}#mermaid-svg-H3yetaxX7jDQBCcY .error-icon{fill:#522}#mermaid-svg-H3yetaxX7jDQBCcY .error-text{fill:#522;stroke:#522}#mermaid-svg-H3yetaxX7jDQBCcY .edge-thickness-normal{stroke-width:2px}#mermaid-svg-H3yetaxX7jDQBCcY .edge-thickness-thick{stroke-width:3.5px}#mermaid-svg-H3yetaxX7jDQBCcY .edge-pattern-solid{stroke-dasharray:0}#mermaid-svg-H3yetaxX7jDQBCcY .edge-pattern-dashed{stroke-dasharray:3}#mermaid-svg-H3yetaxX7jDQBCcY .edge-pattern-dotted{stroke-dasharray:2}#mermaid-svg-H3yetaxX7jDQBCcY .marker{fill:#333}#mermaid-svg-H3yetaxX7jDQBCcY .marker.cross{stroke:#333}

    :root { --mermaid-font-family: "trebuchet ms", verdana, arial;}
    #mermaid-svg-H3yetaxX7jDQBCcY {
    color: rgba(0, 0, 0, 0.75);
    font: ;
    }

    陷入内核

    io_uring_queue_init

    io_uring_queue_init_params

    __sys_io_uring_setup

    syscall

    io_uring_setup

    io_uring_queue_mmap

    io_uring_mmap

    mmap

  • 函数功能

    该函数主要将队列深度以及额外的 flags 参数传递到内核,让内核的 io_uring_setup 来初始化 io_uring 结构体,同时使用 mmap 将在内核中初始化的 SQCQ 以及 SQEs 映射到用户态

    初始化时传递的 flags 将影响 io_uring 的运行方式:

    • IORING_SETUP_IOPOLL:开启此选项必须保证后续只用 O_DIRECT 打开文件并且文件系统的 file_operations 中注册了 iopoll 函数,否则 IO 将下发失败。开启后内核将调用注册的 iopoll 函数来主动轮询设备驱动确认 IO 是否完成,iopoll 的触发时机可以参看 io_uring 内核源码分析
    • IORING_SETUP_SQPOLL:将启动一个单独的内核线程 io_sq_thread,内核将主动轮询 SQ,然后将 IO 下发至驱动设备,能大大减少提交 IO 时的系统调用开销(内核线程工作时,提交 IO 将无需系统调用;但是该线程可能会休眠,休眠时需要系统调用来唤醒该线程)
    • IORING_SETUP_SQ_AFF:当 IORING_SETUP_SQPOLL 已经配置后,启用 sq_thread_cpu 字段,用于配置内核线程 io_sq_thread 的跑在哪个 CPU 上

由于 SQ 已经通过 mmap 映射到用户态,该函数只需在读取 sq->khead 时通过 io_uring_smp_load_acquire 保证一致性,而 sq->sqe_tail 只用于用户态,直接读取即可,根据 sq->khead 以及 sq->sqe_tail 判断 SQ 是否已满,未满则给出 sq->sqe_tail 处的 sqe 即可,然后更新 sq->sqe_tail

通过调用 io_uring_prep_rwsqe 填充命令 OP、fd、buffer 指针以及 offset 信息等

直接对 sqe->user_data 进行赋值

  • 函数调用逻辑

    #mermaid-svg-6eybk07pJhFDdbaU .label{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);fill:#333;color:#333}#mermaid-svg-6eybk07pJhFDdbaU .label text{fill:#333}#mermaid-svg-6eybk07pJhFDdbaU .node rect,#mermaid-svg-6eybk07pJhFDdbaU .node circle,#mermaid-svg-6eybk07pJhFDdbaU .node ellipse,#mermaid-svg-6eybk07pJhFDdbaU .node polygon,#mermaid-svg-6eybk07pJhFDdbaU .node path{fill:#ECECFF;stroke:#9370db;stroke-width:1px}#mermaid-svg-6eybk07pJhFDdbaU .node .label{text-align:center;fill:#333}#mermaid-svg-6eybk07pJhFDdbaU .node.clickable{cursor:pointer}#mermaid-svg-6eybk07pJhFDdbaU .arrowheadPath{fill:#333}#mermaid-svg-6eybk07pJhFDdbaU .edgePath .path{stroke:#333;stroke-width:1.5px}#mermaid-svg-6eybk07pJhFDdbaU .flowchart-link{stroke:#333;fill:none}#mermaid-svg-6eybk07pJhFDdbaU .edgeLabel{background-color:#e8e8e8;text-align:center}#mermaid-svg-6eybk07pJhFDdbaU .edgeLabel rect{opacity:0.9}#mermaid-svg-6eybk07pJhFDdbaU .edgeLabel span{color:#333}#mermaid-svg-6eybk07pJhFDdbaU .cluster rect{fill:#ffffde;stroke:#aa3;stroke-width:1px}#mermaid-svg-6eybk07pJhFDdbaU .cluster text{fill:#333}#mermaid-svg-6eybk07pJhFDdbaU div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);font-size:12px;background:#ffffde;border:1px solid #aa3;border-radius:2px;pointer-events:none;z-index:100}#mermaid-svg-6eybk07pJhFDdbaU .actor{stroke:#ccf;fill:#ECECFF}#mermaid-svg-6eybk07pJhFDdbaU text.actor>tspan{fill:#000;stroke:none}#mermaid-svg-6eybk07pJhFDdbaU .actor-line{stroke:grey}#mermaid-svg-6eybk07pJhFDdbaU .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333}#mermaid-svg-6eybk07pJhFDdbaU .messageLine1{stroke-width:1.5;stroke-dasharray:2, 2;stroke:#333}#mermaid-svg-6eybk07pJhFDdbaU #arrowhead path{fill:#333;stroke:#333}#mermaid-svg-6eybk07pJhFDdbaU .sequenceNumber{fill:#fff}#mermaid-svg-6eybk07pJhFDdbaU #sequencenumber{fill:#333}#mermaid-svg-6eybk07pJhFDdbaU #crosshead path{fill:#333;stroke:#333}#mermaid-svg-6eybk07pJhFDdbaU .messageText{fill:#333;stroke:#333}#mermaid-svg-6eybk07pJhFDdbaU .labelBox{stroke:#ccf;fill:#ECECFF}#mermaid-svg-6eybk07pJhFDdbaU .labelText,#mermaid-svg-6eybk07pJhFDdbaU .labelText>tspan{fill:#000;stroke:none}#mermaid-svg-6eybk07pJhFDdbaU .loopText,#mermaid-svg-6eybk07pJhFDdbaU .loopText>tspan{fill:#000;stroke:none}#mermaid-svg-6eybk07pJhFDdbaU .loopLine{stroke-width:2px;stroke-dasharray:2, 2;stroke:#ccf;fill:#ccf}#mermaid-svg-6eybk07pJhFDdbaU .note{stroke:#aa3;fill:#fff5ad}#mermaid-svg-6eybk07pJhFDdbaU .noteText,#mermaid-svg-6eybk07pJhFDdbaU .noteText>tspan{fill:#000;stroke:none}#mermaid-svg-6eybk07pJhFDdbaU .activation0{fill:#f4f4f4;stroke:#666}#mermaid-svg-6eybk07pJhFDdbaU .activation1{fill:#f4f4f4;stroke:#666}#mermaid-svg-6eybk07pJhFDdbaU .activation2{fill:#f4f4f4;stroke:#666}#mermaid-svg-6eybk07pJhFDdbaU .mermaid-main-font{font-family:"trebuchet ms", verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-6eybk07pJhFDdbaU .section{stroke:none;opacity:0.2}#mermaid-svg-6eybk07pJhFDdbaU .section0{fill:rgba(102,102,255,0.49)}#mermaid-svg-6eybk07pJhFDdbaU .section2{fill:#fff400}#mermaid-svg-6eybk07pJhFDdbaU .section1,#mermaid-svg-6eybk07pJhFDdbaU .section3{fill:#fff;opacity:0.2}#mermaid-svg-6eybk07pJhFDdbaU .sectionTitle0{fill:#333}#mermaid-svg-6eybk07pJhFDdbaU .sectionTitle1{fill:#333}#mermaid-svg-6eybk07pJhFDdbaU .sectionTitle2{fill:#333}#mermaid-svg-6eybk07pJhFDdbaU .sectionTitle3{fill:#333}#mermaid-svg-6eybk07pJhFDdbaU .sectionTitle{text-anchor:start;font-size:11px;text-height:14px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-6eybk07pJhFDdbaU .grid .tick{stroke:#d3d3d3;opacity:0.8;shape-rendering:crispEdges}#mermaid-svg-6eybk07pJhFDdbaU .grid .tick text{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-6eybk07pJhFDdbaU .grid path{stroke-width:0}#mermaid-svg-6eybk07pJhFDdbaU .today{fill:none;stroke:red;stroke-width:2px}#mermaid-svg-6eybk07pJhFDdbaU .task{stroke-width:2}#mermaid-svg-6eybk07pJhFDdbaU .taskText{text-anchor:middle;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-6eybk07pJhFDdbaU .taskText:not([font-size]){font-size:11px}#mermaid-svg-6eybk07pJhFDdbaU .taskTextOutsideRight{fill:#000;text-anchor:start;font-size:11px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-6eybk07pJhFDdbaU .taskTextOutsideLeft{fill:#000;text-anchor:end;font-size:11px}#mermaid-svg-6eybk07pJhFDdbaU .task.clickable{cursor:pointer}#mermaid-svg-6eybk07pJhFDdbaU .taskText.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-6eybk07pJhFDdbaU .taskTextOutsideLeft.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-6eybk07pJhFDdbaU .taskTextOutsideRight.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-6eybk07pJhFDdbaU .taskText0,#mermaid-svg-6eybk07pJhFDdbaU .taskText1,#mermaid-svg-6eybk07pJhFDdbaU .taskText2,#mermaid-svg-6eybk07pJhFDdbaU .taskText3{fill:#fff}#mermaid-svg-6eybk07pJhFDdbaU .task0,#mermaid-svg-6eybk07pJhFDdbaU .task1,#mermaid-svg-6eybk07pJhFDdbaU .task2,#mermaid-svg-6eybk07pJhFDdbaU .task3{fill:#8a90dd;stroke:#534fbc}#mermaid-svg-6eybk07pJhFDdbaU .taskTextOutside0,#mermaid-svg-6eybk07pJhFDdbaU .taskTextOutside2{fill:#000}#mermaid-svg-6eybk07pJhFDdbaU .taskTextOutside1,#mermaid-svg-6eybk07pJhFDdbaU .taskTextOutside3{fill:#000}#mermaid-svg-6eybk07pJhFDdbaU .active0,#mermaid-svg-6eybk07pJhFDdbaU .active1,#mermaid-svg-6eybk07pJhFDdbaU .active2,#mermaid-svg-6eybk07pJhFDdbaU .active3{fill:#bfc7ff;stroke:#534fbc}#mermaid-svg-6eybk07pJhFDdbaU .activeText0,#mermaid-svg-6eybk07pJhFDdbaU .activeText1,#mermaid-svg-6eybk07pJhFDdbaU .activeText2,#mermaid-svg-6eybk07pJhFDdbaU .activeText3{fill:#000 !important}#mermaid-svg-6eybk07pJhFDdbaU .done0,#mermaid-svg-6eybk07pJhFDdbaU .done1,#mermaid-svg-6eybk07pJhFDdbaU .done2,#mermaid-svg-6eybk07pJhFDdbaU .done3{stroke:grey;fill:#d3d3d3;stroke-width:2}#mermaid-svg-6eybk07pJhFDdbaU .doneText0,#mermaid-svg-6eybk07pJhFDdbaU .doneText1,#mermaid-svg-6eybk07pJhFDdbaU .doneText2,#mermaid-svg-6eybk07pJhFDdbaU .doneText3{fill:#000 !important}#mermaid-svg-6eybk07pJhFDdbaU .crit0,#mermaid-svg-6eybk07pJhFDdbaU .crit1,#mermaid-svg-6eybk07pJhFDdbaU .crit2,#mermaid-svg-6eybk07pJhFDdbaU .crit3{stroke:#f88;fill:red;stroke-width:2}#mermaid-svg-6eybk07pJhFDdbaU .activeCrit0,#mermaid-svg-6eybk07pJhFDdbaU .activeCrit1,#mermaid-svg-6eybk07pJhFDdbaU .activeCrit2,#mermaid-svg-6eybk07pJhFDdbaU .activeCrit3{stroke:#f88;fill:#bfc7ff;stroke-width:2}#mermaid-svg-6eybk07pJhFDdbaU .doneCrit0,#mermaid-svg-6eybk07pJhFDdbaU .doneCrit1,#mermaid-svg-6eybk07pJhFDdbaU .doneCrit2,#mermaid-svg-6eybk07pJhFDdbaU .doneCrit3{stroke:#f88;fill:#d3d3d3;stroke-width:2;cursor:pointer;shape-rendering:crispEdges}#mermaid-svg-6eybk07pJhFDdbaU .milestone{transform:rotate(45deg) scale(0.8, 0.8)}#mermaid-svg-6eybk07pJhFDdbaU .milestoneText{font-style:italic}#mermaid-svg-6eybk07pJhFDdbaU .doneCritText0,#mermaid-svg-6eybk07pJhFDdbaU .doneCritText1,#mermaid-svg-6eybk07pJhFDdbaU .doneCritText2,#mermaid-svg-6eybk07pJhFDdbaU .doneCritText3{fill:#000 !important}#mermaid-svg-6eybk07pJhFDdbaU .activeCritText0,#mermaid-svg-6eybk07pJhFDdbaU .activeCritText1,#mermaid-svg-6eybk07pJhFDdbaU .activeCritText2,#mermaid-svg-6eybk07pJhFDdbaU .activeCritText3{fill:#000 !important}#mermaid-svg-6eybk07pJhFDdbaU .titleText{text-anchor:middle;font-size:18px;fill:#000;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-6eybk07pJhFDdbaU g.classGroup text{fill:#9370db;stroke:none;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);font-size:10px}#mermaid-svg-6eybk07pJhFDdbaU g.classGroup text .title{font-weight:bolder}#mermaid-svg-6eybk07pJhFDdbaU g.clickable{cursor:pointer}#mermaid-svg-6eybk07pJhFDdbaU g.classGroup rect{fill:#ECECFF;stroke:#9370db}#mermaid-svg-6eybk07pJhFDdbaU g.classGroup line{stroke:#9370db;stroke-width:1}#mermaid-svg-6eybk07pJhFDdbaU .classLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.5}#mermaid-svg-6eybk07pJhFDdbaU .classLabel .label{fill:#9370db;font-size:10px}#mermaid-svg-6eybk07pJhFDdbaU .relation{stroke:#9370db;stroke-width:1;fill:none}#mermaid-svg-6eybk07pJhFDdbaU .dashed-line{stroke-dasharray:3}#mermaid-svg-6eybk07pJhFDdbaU #compositionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-6eybk07pJhFDdbaU #compositionEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-6eybk07pJhFDdbaU #aggregationStart{fill:#ECECFF;stroke:#9370db;stroke-width:1}#mermaid-svg-6eybk07pJhFDdbaU #aggregationEnd{fill:#ECECFF;stroke:#9370db;stroke-width:1}#mermaid-svg-6eybk07pJhFDdbaU #dependencyStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-6eybk07pJhFDdbaU #dependencyEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-6eybk07pJhFDdbaU #extensionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-6eybk07pJhFDdbaU #extensionEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-6eybk07pJhFDdbaU .commit-id,#mermaid-svg-6eybk07pJhFDdbaU .commit-msg,#mermaid-svg-6eybk07pJhFDdbaU .branch-label{fill:lightgrey;color:lightgrey;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-6eybk07pJhFDdbaU .pieTitleText{text-anchor:middle;font-size:25px;fill:#000;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-6eybk07pJhFDdbaU .slice{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-6eybk07pJhFDdbaU g.stateGroup text{fill:#9370db;stroke:none;font-size:10px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-6eybk07pJhFDdbaU g.stateGroup text{fill:#9370db;fill:#333;stroke:none;font-size:10px}#mermaid-svg-6eybk07pJhFDdbaU g.statediagram-cluster .cluster-label text{fill:#333}#mermaid-svg-6eybk07pJhFDdbaU g.stateGroup .state-title{font-weight:bolder;fill:#000}#mermaid-svg-6eybk07pJhFDdbaU g.stateGroup rect{fill:#ECECFF;stroke:#9370db}#mermaid-svg-6eybk07pJhFDdbaU g.stateGroup line{stroke:#9370db;stroke-width:1}#mermaid-svg-6eybk07pJhFDdbaU .transition{stroke:#9370db;stroke-width:1;fill:none}#mermaid-svg-6eybk07pJhFDdbaU .stateGroup .composit{fill:white;border-bottom:1px}#mermaid-svg-6eybk07pJhFDdbaU .stateGroup .alt-composit{fill:#e0e0e0;border-bottom:1px}#mermaid-svg-6eybk07pJhFDdbaU .state-note{stroke:#aa3;fill:#fff5ad}#mermaid-svg-6eybk07pJhFDdbaU .state-note text{fill:black;stroke:none;font-size:10px}#mermaid-svg-6eybk07pJhFDdbaU .stateLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.7}#mermaid-svg-6eybk07pJhFDdbaU .edgeLabel text{fill:#333}#mermaid-svg-6eybk07pJhFDdbaU .stateLabel text{fill:#000;font-size:10px;font-weight:bold;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-6eybk07pJhFDdbaU .node circle.state-start{fill:black;stroke:black}#mermaid-svg-6eybk07pJhFDdbaU .node circle.state-end{fill:black;stroke:white;stroke-width:1.5}#mermaid-svg-6eybk07pJhFDdbaU #statediagram-barbEnd{fill:#9370db}#mermaid-svg-6eybk07pJhFDdbaU .statediagram-cluster rect{fill:#ECECFF;stroke:#9370db;stroke-width:1px}#mermaid-svg-6eybk07pJhFDdbaU .statediagram-cluster rect.outer{rx:5px;ry:5px}#mermaid-svg-6eybk07pJhFDdbaU .statediagram-state .divider{stroke:#9370db}#mermaid-svg-6eybk07pJhFDdbaU .statediagram-state .title-state{rx:5px;ry:5px}#mermaid-svg-6eybk07pJhFDdbaU .statediagram-cluster.statediagram-cluster .inner{fill:white}#mermaid-svg-6eybk07pJhFDdbaU .statediagram-cluster.statediagram-cluster-alt .inner{fill:#e0e0e0}#mermaid-svg-6eybk07pJhFDdbaU .statediagram-cluster .inner{rx:0;ry:0}#mermaid-svg-6eybk07pJhFDdbaU .statediagram-state rect.basic{rx:5px;ry:5px}#mermaid-svg-6eybk07pJhFDdbaU .statediagram-state rect.divider{stroke-dasharray:10,10;fill:#efefef}#mermaid-svg-6eybk07pJhFDdbaU .note-edge{stroke-dasharray:5}#mermaid-svg-6eybk07pJhFDdbaU .statediagram-note rect{fill:#fff5ad;stroke:#aa3;stroke-width:1px;rx:0;ry:0}:root{--mermaid-font-family: '"trebuchet ms", verdana, arial';--mermaid-font-family: "Comic Sans MS", "Comic Sans", cursive}#mermaid-svg-6eybk07pJhFDdbaU .error-icon{fill:#522}#mermaid-svg-6eybk07pJhFDdbaU .error-text{fill:#522;stroke:#522}#mermaid-svg-6eybk07pJhFDdbaU .edge-thickness-normal{stroke-width:2px}#mermaid-svg-6eybk07pJhFDdbaU .edge-thickness-thick{stroke-width:3.5px}#mermaid-svg-6eybk07pJhFDdbaU .edge-pattern-solid{stroke-dasharray:0}#mermaid-svg-6eybk07pJhFDdbaU .edge-pattern-dashed{stroke-dasharray:3}#mermaid-svg-6eybk07pJhFDdbaU .edge-pattern-dotted{stroke-dasharray:2}#mermaid-svg-6eybk07pJhFDdbaU .marker{fill:#333}#mermaid-svg-6eybk07pJhFDdbaU .marker.cross{stroke:#333}

    :root { --mermaid-font-family: "trebuchet ms", verdana, arial;}
    #mermaid-svg-6eybk07pJhFDdbaU {
    color: rgba(0, 0, 0, 0.75);
    font: ;
    }

    陷入内核

    io_uring_submit

    __io_uring_submit_and_wait

    __io_uring_flush_sq

    __io_uring_submit

    sq_ring_needs_enter

    __sys_io_uring_enter

    __sys_io_uring_enter2

    syscall

    io_uring_enter

  • 函数功能

    • __io_uring_flush_sq

      根据 sq->sqe_tailsq->sqe_head 差值依次填充 sq->array,然后一次性更新 sq->ktail,并返回内核中仍未处理 sqe 数量(sq->ktail - sq->khead

    • sq_ring_needs_enter

      判断内核线程 io_sq_thread 是否启用以及正常工作(没有休眠):

      • 首先要判断用户态 ring->flags 是否配置了 IORING_SETUP_SQPOLL 标志位,判断是否启用了内核线程 io_sq_thread
      • 然后再判断内核态 ring->sq.kflags 是否配置了 IORING_SQ_NEED_WAKEUP 标志位,判断内核线程 io_sq_thread 是否需要唤醒

      当内核线程 io_sq_thread 启用并且正常工作时,则整个 io_uring_submit 到此结束,无需后续的 __sys_io_uring_enter 系统调用,减少了 IO 下发的系统调用的开销

    • __sys_io_uring_enter

      系统调用陷入内核态,将参数传递给内核的 io_uring_setup 函数,主要用于提交 IO 和获取 IO 完成情况,具体功能和初始化时配置的 ring->flags 相关,详细分析可以参看 io_uring 内核源码分析

在用户态轮询判断是否有一个新的 cqe,无需系统调用陷入内核,但是会阻塞当前线程直到有一个新的 cqe 或者出错

仅在用户态判断一次是否有新的 cqe,无需系统调用陷入内核,如果没有新的 cqe,会返回失败信息 -errno

cqe->user_data 会在 IO 完成后,从 sqe 复制到对应的 cqe 中,该函数只用直接对 cqe->user_data 进行读取

更新 cq->khead,避免当前 cqe 被重复获取

首先通过 munmap 将初始化时 mmapSQCQ 以及 SQEs 解除映射,然后通过 close 关闭 io_uring 对应的 fdclose 会调用到该 fd 注册的 io_uring_release 来释放 io_uring

本文作者: ywang_wnlo
本文链接: https://ywang-wnlo.github.io/posts/d7259d1d.html
版权声明: 本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!