MySQL启动过程详解三:Innodb存储引擎的启动
阅读原文时间:2022年04月14日阅读:5

Innodb启动过程如下:

1. 初始化innobase_hton,它是一个handlerton类型的指针,以便在server层能够调用存储引擎的接口。

2. Innodb相关参数的检车和初始化,包括系统表空间,临时表空间,undo表空间,redo文件,doublewrite文件等。

3. innobase_start_or_create_for_mysql()创建或者启动 innobase。

innobase_start_or_create_for_mysql() 过程如下:

1. 重置 start state.

2. 处理 innodb_flush_method, 一般情况下,线上使用 O_DIRECT | O_DIRECT_NO_FSYNC

3. 设置 Innodb 最大线程数量

4. 重置 innodb_buffer_pool_instances 和 innodb_buffer_pool_size

5. 根据 srv_buf_pool_instances 调整 innodb_page_cleaners 的数量

6. 启动innodb server, 进行相关参数和组件的初始化。

7. 初始化异步IO子系统

8. 创建 innodb_buffer_pool, 当没有足够的内存时会报错

9. 调用 fsp_init 和 log_init, 初始化 fsp 系统 & redo log 系统

10. 调用recv_sys_create和recv_sys_init函数,创建及初始化recovery系统

11. 调用 lock_sys_create函数,创建锁系统

12. 调用 os_thread_create 函数,创建 IO 线程

13. 调用 buf_flush_page_cleaner_init 函数,初始化 page_cleaner 系统,而后创建 buf_flush_page_cleaner_coordinator 和 buf_flush_page_cleaner_worker 线程

14. 等待 page_cleaner 变为  active 状态。

15. 调用 check_file_spec函数,检查数据文件是否存在, ibdata1 ibdata2 等等, 判断是否需要创建新的数据库

16. 如果需要创建新的数据库, 则检查是否存在 redo log file 和 undo 表空间

17. 调用 srv_sys_space.open_or_create(), 打开或创建新的数据文件[ibdata..],如果不是创建新的数据库,则从 ibdata1文件中读取 flushed_lsn

18. 这里如果是 create_new_db,则:

    18.1 从所有缓冲池的 flush list 的尾部同步flush脏的数据页

    18.2 获取当前 lsn

    18.3 创建 redo log 文件

19. 如果是 !create_new_db,则打开 redo log file

20. 调用 fil_space_create函数,创建 redo log 内存中的空间对象

21. 添加redo log file 文件到 redo log space 中

22. 初始化 redo log group 日志组

23. 调用 fil_open_log_and_system_tablespace_files,打开所有日志文件和系统表空间数据文件

24. 调用 srv_undo_tablespaces_init,打开 undo 表空间, 在找到并打开所有的 undo 文件之后, 将他们全部加入文件管理系统

25. 调用trx_sys_file_format_init函数,初始化变量file_format_max

26. 创建 trx_sys instance 并初始化 purge_queue 和 mutex

27. 如果 create_new_db,则:

    27.1 调用 fsp_header_init,在 ibdata 文件的开始分配空间,以便可以存储管理一些系统模块,如事务系统等

  27.2 调用 trx_sys_create_sys_pages,创建事务系统的文件页,在ibdata中的第6个页面。

  27.3 调用 trx_sys_init_at_db_start,创建并初始化事务系统内存结构。

  27.4 调用 trx_purge_sys_create,创建并初始化 trx purge 系统

  27.5 调用 dict_create, 创建新的数据字典并初始化 change buf

28. 使整个缓冲池无效, 来确保在 recovery的过程中我们重新读取之前读取的页。这是一个很轻量级的操作, 此时再 LRU 列表中只有一个数据页, 在 flush 列表中没有任何数据页。

29. 调用  recv_recovery_from_checkpoint_start(),开始 recovery 操作

29.1 初始化 flush 红黑树, 以便在恢复的过程中快速插入 flush 列表。

29.2 在 log groups 中查找 latest checkpoint

29.3 读取 latest checkpoint 所在的 redo log 页到 log_sys->checkpoint_buf中

29.4 获取 checkpoint_lsn 和 checkpoint_no

29.5 从 checkpoing_lsn 读取 redo log 到 hash 表中。

29.6 检查 crash recovery 所需的表空间, 处理并删除double write buf 中的数据页, 这里会检查double write buf 中页对应的真实数据页的完整性, 如果有问题, 则使用 double write buf 中页进行恢复。同时, 生成后台线程 recv_writer_thread 以清理缓冲池中的脏页。

29.7 将日志段从最新的日志组复制到其他组, 我们目前只有一个日志组。

30. 清除 double write buf 中的数据页

30. 调用 dict_boot, 初始化数据字典系统和change_buf

31. 调用trx_sys_init_at_db_start,创建并初始化事务系统

32. 调用 recv_apply_hashed_log_recs,应用 redo log

33. 调用trx_purge_sys_create,创建 trx_purge sys

34. 调用__recv_recovery_from_checkpoint_finish,从一个 checkpoint 位置完成 recovery 操作

34.1 确保 recv_writer 线程已完成

34.2 等待 flush 操作完成, flush脏页操作已经完成

34.3 等待 recv_writer 线程终止

34.4 释放 flush 红黑树

34.5 回滚所有的数据字典表的事务,以便数据字典表没有被锁定。数据字典 latch 应保证一次只有一个数据字典事务处于活跃状态。

35. 调用recv_recovery_rollback_active,回滚未提交的不完整的事务,这是在一个后台线程中进行中

36. 调用 srv_open_tmp_tablespace,打开临时表空间

37. 调用trx_sys_create_rsegs,创建回滚段

38. 创建锁等待超时线程,线程函数为lock_wait_timeout_thread。

39. 创建信号量超时监控线程,当信号量等待持续过长的时间时,打印警告信息,线程函数为srv_error_monitor_thread。

40. 创建 master thread,线程函数为 srv_master_thread

41. 创建 purge 系统线程,srv_purge_coordinator_thread 和 srv_worker_thread 线程

42. srv_start_wait_for_purge_to_start,等待 purge 系统启动

43. 创建buffer pool dump/load线程,线程函数为buf_dump_thread

44. 创建统计信息收集线程,线程函数为dict_stats_thread

45. 调用函数fts_optimize_init,创建优化线程,线程函数为fts_optimize_thread

46. 创建buffer pool size动态调整线程,线程函数为buf_resize_thread。

Innodb存储引擎的启动代码是在 ha_innodb.cc 的 innobase_init() 方法中,其源码如下:

/*********************************************************************//**
初始化Innodb 插件
Opens an InnoDB database.
@return 0 on success, 1 on failure */
static
int
innobase_init(
/*==========*/
void *p) /*!< in: InnoDB handlerton */
{
static char current_dir[3]; /*!< Set if using current lib */
int err;
char *default_path;
uint format_id;
ulong num_pll_degree;
// 初始化 innobase_hton,以便在server层能够调用Innodb的接口
DBUG_ENTER("innobase_init");
handlerton* innobase_hton= (handlerton*) p;
innodb_hton_ptr = innobase_hton;

innobase\_hton->state = SHOW\_OPTION\_YES;  
innobase\_hton->db\_type = DB\_TYPE\_INNODB;  
innobase\_hton->savepoint\_offset = sizeof(trx\_named\_savept\_t);  
innobase\_hton->close\_connection = innobase\_close\_connection;  
innobase\_hton->kill\_connection = innobase\_kill\_connection;  
innobase\_hton->savepoint\_set = innobase\_savepoint;  
innobase\_hton->savepoint\_rollback = innobase\_rollback\_to\_savepoint;

innobase\_hton->savepoint\_rollback\_can\_release\_mdl =  
            innobase\_rollback\_to\_savepoint\_can\_release\_mdl;

innobase\_hton->savepoint\_release = innobase\_release\_savepoint;  
innobase\_hton->commit = innobase\_commit;  
innobase\_hton->rollback = innobase\_rollback;  
innobase\_hton->prepare = innobase\_xa\_prepare;  
innobase\_hton->recover = innobase\_xa\_recover;  
innobase\_hton->commit\_by\_xid = innobase\_commit\_by\_xid;  
innobase\_hton->rollback\_by\_xid = innobase\_rollback\_by\_xid;  
innobase\_hton->create = innobase\_create\_handler;  
innobase\_hton->alter\_tablespace = innobase\_alter\_tablespace;  
innobase\_hton->drop\_database = innobase\_drop\_database;  
innobase\_hton->panic = innobase\_end;  
innobase\_hton->partition\_flags= innobase\_partition\_flags;

innobase\_hton->start\_consistent\_snapshot =  
    innobase\_start\_trx\_and\_assign\_read\_view;

innobase\_hton->flush\_logs = innobase\_flush\_logs;  
innobase\_hton->show\_status = innobase\_show\_status;  
innobase\_hton->fill\_is\_table = innobase\_fill\_i\_s\_table;  
innobase\_hton->flags =  
    HTON\_SUPPORTS\_EXTENDED\_KEYS | HTON\_SUPPORTS\_FOREIGN\_KEYS |  
    HTON\_SUPPORTS\_TABLE\_ENCRYPTION;

innobase\_hton->release\_temporary\_latches =  
    innobase\_release\_temporary\_latches;  
    innobase\_hton->replace\_native\_transaction\_in\_thd =  
            innodb\_replace\_trx\_in\_thd;  
innobase\_hton->data = &innodb\_api\_cb;  
innobase\_hton->is\_reserved\_db\_name= innobase\_check\_reserved\_file\_name;

innobase\_hton->is\_supported\_system\_table=  
    innobase\_is\_supported\_system\_table;

innobase\_hton->rotate\_encryption\_master\_key =  
    innobase\_encryption\_key\_rotation;

ut\_a(DATA\_MYSQL\_TRUE\_VARCHAR == (ulint)MYSQL\_TYPE\_VARCHAR);

#ifndef NDEBUG
static const char test_filename[] = "-@";
char test_tablename[sizeof test_filename
+ sizeof(srv_mysql50_table_name_prefix) - 1];
if ((sizeof(test_tablename)) - 1
!= filename_to_tablename(test_filename,
test_tablename,
sizeof(test_tablename), true)
|| strncmp(test_tablename,
srv_mysql50_table_name_prefix,
sizeof(srv_mysql50_table_name_prefix) - 1)
|| strcmp(test_tablename
+ sizeof(srv_mysql50_table_name_prefix) - 1,
test_filename)) {

    sql\_print\_error("tablename encoding has been changed");  
    DBUG\_RETURN(innobase\_init\_abort());  
}  

#endif /* NDEBUG */

/\* Check that values don't overflow on 32-bit systems. \*/  
if (sizeof(ulint) == 4) {  
    if (innobase\_buffer\_pool\_size > UINT\_MAX32) {  
        sql\_print\_error(  
            "innodb\_buffer\_pool\_size can't be over 4GB"  
            " on 32-bit systems");

        DBUG\_RETURN(innobase\_init\_abort());  
    }  
}

os\_file\_set\_umask(my\_umask);

/\* Setup the memory alloc/free tracing mechanisms before calling  
any functions that could possibly allocate memory. \*/  
ut\_new\_boot();

/\* First calculate the default path for innodb\_data\_home\_dir etc.,  
in case the user has not given any value.

Note that when using the embedded server, the datadirectory is not  
necessarily the current directory of this program. \*/

if (mysqld\_embedded) {  
    default\_path = mysql\_real\_data\_home;  
} else {  
    /\* It's better to use current lib, to keep paths short \*/  
    current\_dir\[0\] = FN\_CURLIB;  
    current\_dir\[1\] = FN\_LIBCHAR;  
    current\_dir\[2\] = 0;  
    default\_path = current\_dir;  
}

ut\_a(default\_path);

fil\_path\_to\_mysql\_datadir = default\_path;  
folder\_mysql\_datadir = fil\_path\_to\_mysql\_datadir;

/\* Set InnoDB initialization parameters according to the values  
read from MySQL .cnf file \*/

/\* The default dir for data files is the datadir of MySQL  
       默认的数据文件目录  
    \*/  
srv\_data\_home = innobase\_data\_home\_dir  
    ? innobase\_data\_home\_dir : default\_path;

/\*--------------- Shared tablespaces -------------------------  
      共享表空间, 分为系统表空间和临时共享表空间  
    \*/

/\* Check that the value of system variable innodb\_page\_size was  
set correctly.  Its value was put into srv\_page\_size. If valid,  
return the associated srv\_page\_size\_shift. \*/  
    // 检查系统变量 innodb\_page\_size 的值。  
srv\_page\_size\_shift = innodb\_page\_size\_validate(srv\_page\_size);  
if (!srv\_page\_size\_shift) {  
    sql\_print\_error("InnoDB: Invalid page size=%lu.\\n",  
            srv\_page\_size);  
    DBUG\_RETURN(innobase\_init\_abort());  
}

/\* Set default InnoDB temp data file size to 12 MB and let it be  
auto-extending.  
    设置默认的 Innodb 数据文件大小为12MB,并设置其自动增长。  
    \*/  
if (!innobase\_data\_file\_path) {  
    innobase\_data\_file\_path = (char\*) "ibdata1:12M:autoextend";  
}

/\* This is the first time univ\_page\_size is used.  
It was initialized to 16k pages before srv\_page\_size was set  
    univ\_page\_size 被初始化为 16k.  
    \*/  
univ\_page\_size.copy\_from(  
    page\_size\_t(srv\_page\_size, srv\_page\_size, false));  
    // 设置系统表空间的 space\_id  
srv\_sys\_space.set\_space\_id(TRX\_SYS\_SPACE);

/\* Create the filespace flags.  
       设置系统表空间 filespace\_flags\\name\\path  
    \*/  
ulint   fsp\_flags = fsp\_flags\_init(  
    univ\_page\_size, false, false, false, false);  
srv\_sys\_space.set\_flags(fsp\_flags);

srv\_sys\_space.set\_name(reserved\_system\_space\_name);  
srv\_sys\_space.set\_path(srv\_data\_home);

/\* Supports raw devices  
       支持 raw devices  
    \*/  
if (!srv\_sys\_space.parse\_params(innobase\_data\_file\_path, true)) {  
    ib::error() << "Unable to parse innodb\_data\_file\_path="  
            << innobase\_data\_file\_path;  
    DBUG\_RETURN(innobase\_init\_abort());  
}

/\* Set default InnoDB temp data file size to 12 MB and let it be  
auto-extending.  
       设置默认的 Innodb temp 数据文件大小为 12MB 并自动增长。  
    \*/  
if (!innobase\_temp\_data\_file\_path) {  
    innobase\_temp\_data\_file\_path = (char\*) "ibtmp1:12M:autoextend";  
}

/\* We set the temporary tablspace id later, after recovery.  
The temp tablespace doesn't support raw devices.  
Set the name and path.  
    在这里设置临时表空间 name 和 path,临时表空间不支持原始设备。  
    在 recovery 之后设置临时表空间id。  
    \*/  
srv\_tmp\_space.set\_name(reserved\_temporary\_space\_name);  
srv\_tmp\_space.set\_path(srv\_data\_home);

/\* Create the filespace flags with the temp flag set.  
       设置临时表空间的 filespace\_flags.  
    \*/  
fsp\_flags = fsp\_flags\_init(  
    univ\_page\_size, false, false, false, true);  
srv\_tmp\_space.set\_flags(fsp\_flags);

if (!srv\_tmp\_space.parse\_params(innobase\_temp\_data\_file\_path, false)) {  
    ib::error() << "Unable to parse innodb\_temp\_data\_file\_path="  
            << innobase\_temp\_data\_file\_path;  
    DBUG\_RETURN(innobase\_init\_abort());  
}

/\* Perform all sanity check before we take action of deleting files\*/  
    // 检查系统表空间和临时表空间是否有公共 data file.  
if (srv\_sys\_space.intersection(&srv\_tmp\_space)) {  
    sql\_print\_error("%s and %s file names seem to be the same.",  
        srv\_tmp\_space.name(), srv\_sys\_space.name());  
    DBUG\_RETURN(innobase\_init\_abort());  
}

/\* ------------ UNDO tablespaces files ---------------------  
       undo 表空间。  
    \*/  
    // undo表空间dir  
if (!srv\_undo\_dir) {  
    srv\_undo\_dir = default\_path;  
}  
    // 规范 undo 表空间目录  
os\_normalize\_path(srv\_undo\_dir);

if (strchr(srv\_undo\_dir, ';')) {  
    sql\_print\_error("syntax error in innodb\_undo\_directory");  
    DBUG\_RETURN(innobase\_init\_abort());  
}

/\* -------------- All log files ---------------------------  
       所有的日志文件  
    \*/

/\* The default dir for log files is the datadir of MySQL  
       默认redo log 目录  
    \*/  
    // 默认 redo log group dir  
if (!srv\_log\_group\_home\_dir) {  
    srv\_log\_group\_home\_dir = default\_path;  
}  
    // 规范目录  
os\_normalize\_path(srv\_log\_group\_home\_dir);

if (strchr(srv\_log\_group\_home\_dir, ';')) {  
    sql\_print\_error("syntax error in innodb\_log\_group\_home\_dir");  
    DBUG\_RETURN(innobase\_init\_abort());  
}

if (!innobase\_large\_prefix) {  
    ib::warn() << deprecated\_large\_prefix;  
}

if (!THDVAR(NULL, support\_xa)) {  
    ib::warn() << deprecated\_innodb\_support\_xa\_off;  
    THDVAR(NULL, support\_xa) = TRUE;  
}

if (innobase\_file\_format\_name != innodb\_file\_format\_default) {  
    ib::warn() << deprecated\_file\_format;  
}

/\* Validate the file format by animal name  
       校验 innodb\_file\_format\_max; innodb文件格式  
    \*/  
if (innobase\_file\_format\_name != NULL) {

    format\_id = innobase\_file\_format\_name\_lookup(  
        innobase\_file\_format\_name);

    if (format\_id > UNIV\_FORMAT\_MAX) {

        sql\_print\_error("InnoDB: wrong innodb\_file\_format.");

    DBUG\_RETURN(innobase\_init\_abort());  
    }  
} else {  
    /\* Set it to the default file format id. Though this  
    should never happen. \*/  
    format\_id = 0;  
}

srv\_file\_format = format\_id;

/\* Given the type of innobase\_file\_format\_name we have little  
choice but to cast away the constness from the returned name.  
innobase\_file\_format\_name is used in the MySQL set variable  
interface and so can't be const. \*/

innobase\_file\_format\_name =  
    (char\*) trx\_sys\_file\_format\_id\_to\_name(format\_id);

/\* Check innobase\_file\_format\_check variable  
       检查 innodb\_file\_format\_check 变量;  
    \*/  
if (!innobase\_file\_format\_check) {  
    ib::warn() << deprecated\_file\_format\_check;

    /\* Set the value to disable checking. \*/  
    srv\_max\_file\_format\_at\_startup = UNIV\_FORMAT\_MAX + 1;

} else {

    /\* Set the value to the lowest supported format. \*/  
    srv\_max\_file\_format\_at\_startup = UNIV\_FORMAT\_MIN;  
}

if (innobase\_file\_format\_max != innodb\_file\_format\_max\_default) {  
    ib::warn() << deprecated\_file\_format\_max;  
}

/\* Did the user specify a format name that we support?  
As a side effect it will update the variable  
srv\_max\_file\_format\_at\_startup \*/  
if (innobase\_file\_format\_validate\_and\_set(  
        innobase\_file\_format\_max) < 0) {

    sql\_print\_error("InnoDB: invalid"  
            " innodb\_file\_format\_max value:"  
            " should be any value up to %s or its"  
            " equivalent numeric id",  
            trx\_sys\_file\_format\_id\_to\_name(  
                UNIV\_FORMAT\_MAX));

    DBUG\_RETURN(innobase\_init\_abort());  
}  
    /\*\*  
       Innodb change buffer  
    \*/  
if (innobase\_change\_buffering) {  
    ulint   use;

    for (use = 0;  
         use < UT\_ARR\_SIZE(innobase\_change\_buffering\_values);  
         use++) {  
        if (!innobase\_strcasecmp(  
                innobase\_change\_buffering,  
                innobase\_change\_buffering\_values\[use\])) {  
            ibuf\_use = (ibuf\_use\_t) use;  
            goto innobase\_change\_buffering\_inited\_ok;  
        }  
    }

    sql\_print\_error("InnoDB: invalid value"  
            " innodb\_change\_buffering=%s",  
            innobase\_change\_buffering);  
    DBUG\_RETURN(innobase\_init\_abort());  
}

innobase_change_buffering_inited_ok:
// Innodb_change_buffering = ALL
ut_a((ulint) ibuf_use < UT_ARR_SIZE(innobase_change_buffering_values));
innobase_change_buffering = (char*)
innobase_change_buffering_values[ibuf_use];

/\* Check that interdependent parameters have sane values.  
       对相互依赖的参数进行检查。  
       srv\_max\_buf\_pool\_modified\_pct & srv\_max\_dirty\_pages\_pct\_lwm  
       srv\_max\_io\_capacity & srv\_io\_capacity & SRV\_MAX\_IO\_CAPACITY\_DUMMY\_DEFAULT  
    \*/  
if (srv\_max\_buf\_pool\_modified\_pct < srv\_max\_dirty\_pages\_pct\_lwm) {  
    sql\_print\_warning("InnoDB: innodb\_max\_dirty\_pages\_pct\_lwm"  
              " cannot be set higher than"  
              " innodb\_max\_dirty\_pages\_pct.\\n"  
              "InnoDB: Setting"  
              " innodb\_max\_dirty\_pages\_pct\_lwm to %lf\\n",  
              srv\_max\_buf\_pool\_modified\_pct);

    srv\_max\_dirty\_pages\_pct\_lwm = srv\_max\_buf\_pool\_modified\_pct;  
}

if (srv\_max\_io\_capacity == SRV\_MAX\_IO\_CAPACITY\_DUMMY\_DEFAULT) {

    if (srv\_io\_capacity >= SRV\_MAX\_IO\_CAPACITY\_LIMIT / 2) {  
        /\* Avoid overflow. \*/  
        srv\_max\_io\_capacity = SRV\_MAX\_IO\_CAPACITY\_LIMIT;  
    } else {  
        /\* The user has not set the value. We should  
        set it based on innodb\_io\_capacity. \*/  
        srv\_max\_io\_capacity =  
            ut\_max(2 \* srv\_io\_capacity, 2000UL);  
    }

} else if (srv\_max\_io\_capacity < srv\_io\_capacity) {  
    sql\_print\_warning("InnoDB: innodb\_io\_capacity"  
              " cannot be set higher than"  
              " innodb\_io\_capacity\_max.\\n"  
              "InnoDB: Setting"  
              " innodb\_io\_capacity to %lu\\n",  
              srv\_max\_io\_capacity);

    srv\_io\_capacity = srv\_max\_io\_capacity;  
}  
    // 检查 innodb\_buffer\_pool\_filename 配置  
if (!is\_filename\_allowed(srv\_buf\_dump\_filename,  
             strlen(srv\_buf\_dump\_filename), FALSE)) {  
    sql\_print\_error("InnoDB: innodb\_buffer\_pool\_filename"  
        " cannot have colon (:) in the file name.");  
    DBUG\_RETURN(innobase\_init\_abort());  
}

/\* --------------------------------------------------  
       innodb\_file\_flush\_method & innobase\_log\_file\_size & innodb\_log\_write\_ahead\_size  
       innodb\_log\_buffer\_size & innodb\_buffer\_pool\_size & innodb\_read\_io\_threads & innodb\_write\_io\_threads  
       innodb\_doublewrite & innodb\_log\_checksums & innodb\_rollback\_on\_timeout & innobase\_locks\_unsafe\_for\_binlog  
       innodb\_open\_files & innodb\_monitor 配置 & innodb\_old\_blocks\_pct & innodb\_undo\_logs & 

    \*/

srv\_file\_flush\_method\_str = innobase\_file\_flush\_method;

srv\_log\_file\_size = (ib\_uint64\_t) innobase\_log\_file\_size;

if (UNIV\_PAGE\_SIZE\_DEF != srv\_page\_size) {  
    ib::warn() << "innodb-page-size has been changed from the"  
        " default value " << UNIV\_PAGE\_SIZE\_DEF << " to "  
        << srv\_page\_size << ".";  
}

if (srv\_log\_write\_ahead\_size > srv\_page\_size) {  
    srv\_log\_write\_ahead\_size = srv\_page\_size;  
} else {  
    ulong   srv\_log\_write\_ahead\_size\_tmp = OS\_FILE\_LOG\_BLOCK\_SIZE;

    while (srv\_log\_write\_ahead\_size\_tmp  
           < srv\_log\_write\_ahead\_size) {  
        srv\_log\_write\_ahead\_size\_tmp  
            = srv\_log\_write\_ahead\_size\_tmp \* 2;  
    }  
    if (srv\_log\_write\_ahead\_size\_tmp  
        != srv\_log\_write\_ahead\_size) {  
        srv\_log\_write\_ahead\_size  
            = srv\_log\_write\_ahead\_size\_tmp / 2;  
    }  
}

srv\_log\_buffer\_size = (ulint) innobase\_log\_buffer\_size;

srv\_buf\_pool\_size = (ulint) innobase\_buffer\_pool\_size;

srv\_n\_read\_io\_threads = (ulint) innobase\_read\_io\_threads;  
srv\_n\_write\_io\_threads = (ulint) innobase\_write\_io\_threads;

srv\_use\_doublewrite\_buf = (ibool) innobase\_use\_doublewrite;

if (!innobase\_use\_checksums) {  
    ib::warn() << "Setting innodb\_checksums to OFF is DEPRECATED."  
        " This option may be removed in future releases. You"  
        " should set innodb\_checksum\_algorithm=NONE instead.";  
    srv\_checksum\_algorithm = SRV\_CHECKSUM\_ALGORITHM\_NONE;  
}

innodb\_log\_checksums\_func\_update(innodb\_log\_checksums);

#ifdef HAVE_LINUX_LARGE_PAGES
if ((os_use_large_pages = my_use_large_pages)) {
os_large_page_size = opt_large_page_size;
}
#endif

row\_rollback\_on\_timeout = (ibool) innobase\_rollback\_on\_timeout;

srv\_locks\_unsafe\_for\_binlog = (ibool) innobase\_locks\_unsafe\_for\_binlog;  
if (innobase\_locks\_unsafe\_for\_binlog) {  
    ib::warn() << "Using innodb\_locks\_unsafe\_for\_binlog is"  
        " DEPRECATED. This option may be removed in future"  
        " releases. Please use READ COMMITTED transaction"  
        " isolation level instead; " << SET\_TRANSACTION\_MSG;  
}

if (innobase\_open\_files < 10) {  
    innobase\_open\_files = 300;  
    if (srv\_file\_per\_table && table\_cache\_size > 300) {  
        innobase\_open\_files = table\_cache\_size;  
    }  
}

if (innobase\_open\_files > (long) open\_files\_limit) {  
    ib::warn() << "innodb\_open\_files should not be greater"  
                   " than the open\_files\_limit.\\n";  
    if (innobase\_open\_files > (long) table\_cache\_size) {  
        innobase\_open\_files = table\_cache\_size;  
    }  
}

srv\_max\_n\_open\_files = (ulint) innobase\_open\_files;  
srv\_innodb\_status = (ibool) innobase\_create\_status\_file;

srv\_print\_verbose\_log = mysqld\_embedded ? 0 : 1;

/\* Round up fts\_sort\_pll\_degree to nearest power of 2 number \*/  
for (num\_pll\_degree = 1;  
     num\_pll\_degree < fts\_sort\_pll\_degree;  
     num\_pll\_degree <<= 1) {

    /\* No op \*/  
}

fts\_sort\_pll\_degree = num\_pll\_degree;

/\* Store the default charset-collation number of this MySQL  
installation  
    MySQL默认的 charset-collation.  
    \*/  
data\_mysql\_default\_charset\_coll = (ulint) default\_charset\_info->number;  
    // 初始化 innodb\_commit\_concurrency\[限制并发提交\] 的默认值  
innobase\_commit\_concurrency\_init\_default();

    // 初始化 os\_event 对象。  
os\_event\_global\_init();

/\* Set buffer pool size to default for fast startup when mysqld is  
run with --help --verbose options. \*/  
ulint   srv\_buf\_pool\_size\_org = 0;  
if (opt\_help && opt\_verbose  
    && srv\_buf\_pool\_size > srv\_buf\_pool\_def\_size) {  
    ib::warn() << "Setting innodb\_buf\_pool\_size to "  
        << srv\_buf\_pool\_def\_size << " for fast startup, "  
        << "when running with --help --verbose options.";  
    srv\_buf\_pool\_size\_org = srv\_buf\_pool\_size;  
    srv\_buf\_pool\_size = srv\_buf\_pool\_def\_size;  
}

/\* Since we in this module access directly the fields of a trx  
struct, and due to different headers and flags it might happen that  
ib\_mutex\_t has a different size in this module and in InnoDB  
modules, we check at run time that the size is the same in  
these compilation modules. \*/  
    // 启动或直接创建 innobase  
err = innobase\_start\_or\_create\_for\_mysql();  
    // innobase\_buffer\_pool\_size  
if (srv\_buf\_pool\_size\_org != 0) {  
    /\* Set the original value back to show in help. \*/  
    srv\_buf\_pool\_size\_org =  
        buf\_pool\_size\_align(srv\_buf\_pool\_size\_org);  
    innobase\_buffer\_pool\_size =  
        static\_cast<long long>(srv\_buf\_pool\_size\_org);  
} else {  
    innobase\_buffer\_pool\_size =  
        static\_cast<long long>(srv\_buf\_pool\_size);  
}

if (err != DB\_SUCCESS) {  
    DBUG\_RETURN(innobase\_init\_abort());  
}

/\* Create mutex to protect encryption master\_key\_id. \*/  
mutex\_create(LATCH\_ID\_MASTER\_KEY\_ID\_MUTEX, &master\_key\_id\_mutex);

/\* Adjust the innodb\_undo\_logs config object  
       调整 innodb\_undo\_logs  
    \*/  
innobase\_undo\_logs\_init\_default\_max();

innobase\_old\_blocks\_pct = static\_cast<uint>(  
    buf\_LRU\_old\_ratio\_update(innobase\_old\_blocks\_pct, TRUE));

ibuf\_max\_size\_update(srv\_change\_buffer\_max\_size);

innobase\_open\_tables = hash\_create(200);  
mysql\_mutex\_init(innobase\_share\_mutex\_key.m\_value,  
         &innobase\_share\_mutex,  
         MY\_MUTEX\_INIT\_FAST);  
mysql\_mutex\_init(commit\_cond\_mutex\_key.m\_value,  
         &commit\_cond\_m, MY\_MUTEX\_INIT\_FAST);  
mysql\_cond\_init(commit\_cond\_key.m\_value, &commit\_cond);

innodb\_inited= 1;  

#ifdef MYSQL_DYNAMIC_PLUGIN
if (innobase_hton != p) {
innobase_hton = reinterpret_cast(p);
*innobase_hton = *innodb_hton_ptr;
}
#endif /* MYSQL_DYNAMIC_PLUGIN */

/\* Get the current high water mark format. \*/  
innobase\_file\_format\_max = (char\*) trx\_sys\_file\_format\_max\_get();

/\* Currently, monitor counter information are not persistent.  
       Innodb monitor  
    \*/  
memset(monitor\_set\_tbl, 0, sizeof monitor\_set\_tbl);

memset(innodb\_counter\_value, 0, sizeof innodb\_counter\_value);

/\* Do this as late as possible so server is fully starts up,  
since  we might get some initial stats if user choose to turn  
on some counters from start up \*/  
if (innobase\_enable\_monitor\_counter) {  
    innodb\_enable\_monitor\_at\_startup(  
        innobase\_enable\_monitor\_counter);  
}

/\* Turn on monitor counters that are default on \*/  
srv\_mon\_default\_on();

/\* Unit Tests \*/  

#ifdef UNIV_ENABLE_UNIT_TEST_GET_PARENT_DIR
unit_test_os_file_get_parent_dir();
#endif /* UNIV_ENABLE_UNIT_TEST_GET_PARENT_DIR */

#ifdef UNIV_ENABLE_UNIT_TEST_MAKE_FILEPATH
test_make_filepath();
#endif /*UNIV_ENABLE_UNIT_TEST_MAKE_FILEPATH */

#ifdef UNIV_ENABLE_DICT_STATS_TEST
test_dict_stats_all();
#endif /*UNIV_ENABLE_DICT_STATS_TEST */

#ifdef UNIV_ENABLE_UNIT_TEST_ROW_RAW_FORMAT_INT

ifdef HAVE_UT_CHRONO_T

test\_row\_raw\_format\_int();  

endif /* HAVE_UT_CHRONO_T */

#endif /* UNIV_ENABLE_UNIT_TEST_ROW_RAW_FORMAT_INT */

#ifndef UNIV_HOTBACKUP
#ifdef _WIN32
if (ut_win_init_time()) {
DBUG_RETURN(innobase_init_abort());
}
#endif /* _WIN32 */
#endif /* !UNIV_HOTBACKUP */

DBUG\_RETURN(0);  

}

innobase_start_or_create_for_mysql() 函数解析如下:

dberr_t
innobase_start_or_create_for_mysql(void)
/*====================================*/
{
bool create_new_db = false;
lsn_t flushed_lsn;
ulint sum_of_data_file_sizes;
ulint tablespace_size_in_header;
dberr_t err;
ulint srv_n_log_files_found = srv_n_log_files;
mtr_t mtr;
purge_pq_t* purge_queue;
char logfilename[10000];
char* logfile0 = NULL;
size_t dirnamelen;
unsigned i = 0;

/\* Reset the start state.  
重置 start state.  
\*/  
srv\_start\_state = SRV\_START\_STATE\_NONE;  
// SRV\_FORCE\_NO\_LOG\_REDO: 不做 redo log 的前滚操作  
if (srv\_force\_recovery == SRV\_FORCE\_NO\_LOG\_REDO) {  
    srv\_read\_only\_mode = true;  
}  
// high\_level\_read\_only:  
high\_level\_read\_only = srv\_read\_only\_mode  
    || srv\_force\_recovery > SRV\_FORCE\_NO\_TRX\_UNDO;  
// 如果处于 read\_only mode, 那么除了内部表之外,没有其他写操作,关闭两次写机制。  
if (srv\_read\_only\_mode) {  
    ib::info() << "Started in read only mode";

    /\* There is no write except to intrinsic table and so turn-off  
    doublewrite mechanism completely. \*/  
    srv\_use\_doublewrite\_buf = FALSE;  
}

#ifdef _WIN32
srv_use_native_aio = TRUE;

#elif defined(LINUX_NATIVE_AIO)

if (srv\_use\_native\_aio) {  
    ib::info() << "Using Linux native AIO";  
}  

#else
/* Currently native AIO is supported only on windows and linux
and that also when the support is compiled in. In all other
cases, we ignore the setting of innodb_use_native_aio. */
srv_use_native_aio = FALSE;
#endif /* _WIN32 */

/\* Register performance schema stages before any real work has been  
started which may need to be instrumented. \*/  
mysql\_stage\_register("innodb", srv\_stages, UT\_ARR\_SIZE(srv\_stages));  
/\*\*  
处理参数 innodb\_flush\_method  
通常情况下,innodb\_flush\_method 设置为 O\_DIRECT | O\_DIRECT\_NO\_FSYNC;  
\*/  
if (srv\_file\_flush\_method\_str == NULL) {  
    /\* These are the default options \*/  

#ifndef _WIN32
srv_unix_file_flush_method = SRV_UNIX_FSYNC;
} else if (0 == ut_strcmp(srv_file_flush_method_str, "fsync")) {
srv_unix_file_flush_method = SRV_UNIX_FSYNC;

} else if (0 == ut\_strcmp(srv\_file\_flush\_method\_str, "O\_DSYNC")) {  
    srv\_unix\_file\_flush\_method = SRV\_UNIX\_O\_DSYNC;

} else if (0 == ut\_strcmp(srv\_file\_flush\_method\_str, "O\_DIRECT")) {  
    srv\_unix\_file\_flush\_method = SRV\_UNIX\_O\_DIRECT;

} else if (0 == ut\_strcmp(srv\_file\_flush\_method\_str, "O\_DIRECT\_NO\_FSYNC")) {  
    srv\_unix\_file\_flush\_method = SRV\_UNIX\_O\_DIRECT\_NO\_FSYNC;

} else if (0 == ut\_strcmp(srv\_file\_flush\_method\_str, "littlesync")) {  
    srv\_unix\_file\_flush\_method = SRV\_UNIX\_LITTLESYNC;

} else if (0 == ut\_strcmp(srv\_file\_flush\_method\_str, "nosync")) {  
    srv\_unix\_file\_flush\_method = SRV\_UNIX\_NOSYNC;  

#else
srv_win_file_flush_method = SRV_WIN_IO_UNBUFFERED;
} else if (0 == ut_strcmp(srv_file_flush_method_str, "normal")) {
srv_win_file_flush_method = SRV_WIN_IO_NORMAL;
srv_use_native_aio = FALSE;

} else if (0 == ut\_strcmp(srv\_file\_flush\_method\_str, "unbuffered")) {  
    srv\_win\_file\_flush\_method = SRV\_WIN\_IO\_UNBUFFERED;  
    srv\_use\_native\_aio = FALSE;

} else if (0 == ut\_strcmp(srv\_file\_flush\_method\_str,  
              "async\_unbuffered")) {  
    srv\_win\_file\_flush\_method = SRV\_WIN\_IO\_UNBUFFERED;  

#endif /* _WIN32 */
} else {
ib::error() << "Unrecognized value "
<< srv_file_flush_method_str
<< " for innodb_flush_method";
return(srv_init_abort(DB_ERROR));
}

/\* Note that the call srv\_boot() also changes the values of  
some variables to the units used by InnoDB internally \*/

/\* Set the maximum number of threads which can wait for a semaphore  
inside InnoDB: this is the 'sync wait array' size, as well as the  
maximum number of threads that can wait in the 'srv\_conc array' for  
their time to enter InnoDB.  
设置 Innodb 内部可能等待信号量的最大线程数量: 这是 sync wait array 的大小, 以及  
在 srv\_conc 数组中等待进入 Innodb的最大线程数。  
\*/

srv\_max\_n\_threads = 1   /\* io\_ibuf\_thread \*/  
            + 1 /\* io\_log\_thread \*/  
            + 1 /\* lock\_wait\_timeout\_thread \*/  
            + 1 /\* srv\_error\_monitor\_thread \*/  
            + 1 /\* srv\_monitor\_thread \*/  
            + 1 /\* srv\_master\_thread \*/  
            + 1 /\* srv\_purge\_coordinator\_thread \*/  
            + 1 /\* buf\_dump\_thread \*/  
            + 1 /\* dict\_stats\_thread \*/  
            + 1 /\* fts\_optimize\_thread \*/  
            + 1 /\* recv\_writer\_thread \*/  
            + 1 /\* trx\_rollback\_or\_clean\_all\_recovered \*/  
            + 128 /\* added as margin, for use of  
              InnoDB Memcached etc. \*/  
            + max\_connections  
            + srv\_n\_read\_io\_threads  
            + srv\_n\_write\_io\_threads  
            + srv\_n\_purge\_threads  
            + srv\_n\_page\_cleaners  
            /\* FTS Parallel Sort \*/  
            + fts\_sort\_pll\_degree \* FTS\_NUM\_AUX\_INDEX  
              \* max\_connections;  
/\*\*  
    重置 innodb\_buffer\_pool\_instances  
\*/  
if (srv\_buf\_pool\_size >= BUF\_POOL\_SIZE\_THRESHOLD) {

    if (srv\_buf\_pool\_instances == srv\_buf\_pool\_instances\_default) {  

#if defined(_WIN32) && !defined(_WIN64)
/* Do not allocate too large of a buffer pool on
Windows 32-bit systems, which can have trouble
allocating larger single contiguous memory blocks. */
srv_buf_pool_instances = ut_min(
static_cast(MAX_BUFFER_POOLS),
static_cast(srv_buf_pool_size
/ (128 * 1024 * 1024)));
#else /* defined(_WIN32) && !defined(_WIN64) */
/* Default to 8 instances when size > 1GB. */
srv_buf_pool_instances = 8;
#endif /* defined(_WIN32) && !defined(_WIN64) */
}
} else {
/* If buffer pool is less than 1 GiB, assume fewer
threads. Also use only one buffer pool instance. */
if (srv_buf_pool_instances != srv_buf_pool_instances_default
&& srv_buf_pool_instances != 1) {
/* We can't distinguish whether the user has explicitly
started mysqld with --innodb-buffer-pool-instances=0,
(srv_buf_pool_instances_default is 0) or has not
specified that option at all. Thus we have the
limitation that if the user started with =0, we
will not emit a warning here, but we should actually
do so. */
ib::info()
<< "Adjusting innodb_buffer_pool_instances"
" from " << srv_buf_pool_instances << " to 1"
" since innodb_buffer_pool_size is less than "
<< BUF_POOL_SIZE_THRESHOLD / (1024 * 1024)
<< " MiB";
}

    srv\_buf\_pool\_instances = 1;  
}  
// 调整 srv\_buf\_pool\_chunk\_unit 大小。  
if (srv\_buf\_pool\_chunk\_unit \* srv\_buf\_pool\_instances  
    > srv\_buf\_pool\_size) {  
    /\* Size unit of buffer pool is larger than srv\_buf\_pool\_size.  
    adjust srv\_buf\_pool\_chunk\_unit for srv\_buf\_pool\_size. \*/  
    srv\_buf\_pool\_chunk\_unit  
        = static\_cast<ulong>(srv\_buf\_pool\_size)  
          / srv\_buf\_pool\_instances;  
    if (srv\_buf\_pool\_size % srv\_buf\_pool\_instances != 0) {  
        ++srv\_buf\_pool\_chunk\_unit;  
    }  
}  
// 基于 srv\_buf\_pool\_chunk\_unit 对齐 srv\_buf\_pool\_size  
srv\_buf\_pool\_size = buf\_pool\_size\_align(srv\_buf\_pool\_size);  
// 根据 srv\_buf\_pool\_instances 重置 innodb\_page\_cleaners  
if (srv\_n\_page\_cleaners > srv\_buf\_pool\_instances) {  
    /\* limit of page\_cleaner parallelizability  
    is number of buffer pool instances. \*/  
    srv\_n\_page\_cleaners = srv\_buf\_pool\_instances;  
}  
/\*\*  
启动innodb server, 进行相关参数和组件的初始化。  
\*/  
srv\_boot();

ib::info() << (ut\_crc32\_sse2\_enabled ? "Using" : "Not using")  
    << " CPU crc32 instructions";  
// innodb monitor 相关  
if (!srv\_read\_only\_mode) {

    mutex\_create(LATCH\_ID\_SRV\_MONITOR\_FILE,  
             &srv\_monitor\_file\_mutex);

    if (srv\_innodb\_status) {

        srv\_monitor\_file\_name = static\_cast<char\*>(  
            ut\_malloc\_nokey(  
                strlen(fil\_path\_to\_mysql\_datadir)  
                + 20 + sizeof "/innodb\_status."));

        sprintf(srv\_monitor\_file\_name,  
            "%s/innodb\_status." ULINTPF,  
            fil\_path\_to\_mysql\_datadir,  
            os\_proc\_get\_number());

        srv\_monitor\_file = fopen(srv\_monitor\_file\_name, "w+");

        if (!srv\_monitor\_file) {  
            ib::error() << "Unable to create "  
                << srv\_monitor\_file\_name << ": "  
                << strerror(errno);  
            return(srv\_init\_abort(DB\_ERROR));  
        }  
    } else {

        srv\_monitor\_file\_name = NULL;  
        srv\_monitor\_file = os\_file\_create\_tmpfile(NULL);

        if (!srv\_monitor\_file) {  
            return(srv\_init\_abort(DB\_ERROR));  
        }  
    }

    mutex\_create(LATCH\_ID\_SRV\_DICT\_TMPFILE,  
             &srv\_dict\_tmpfile\_mutex);

    srv\_dict\_tmpfile = os\_file\_create\_tmpfile(NULL);

    if (!srv\_dict\_tmpfile) {  
        return(srv\_init\_abort(DB\_ERROR));  
    }

    mutex\_create(LATCH\_ID\_SRV\_MISC\_TMPFILE,  
             &srv\_misc\_tmpfile\_mutex);

    srv\_misc\_tmpfile = os\_file\_create\_tmpfile(NULL);

    if (!srv\_misc\_tmpfile) {  
        return(srv\_init\_abort(DB\_ERROR));  
    }  
}  
/\*\*  
file\_io\_threads  
\*/  
// innodb\_read\_io\_threads & innodb\_write\_io\_threads  
srv\_n\_file\_io\_threads = srv\_n\_read\_io\_threads;

srv\_n\_file\_io\_threads += srv\_n\_write\_io\_threads;  
// 非 read only, 添加 log & ibuf io thread  
if (!srv\_read\_only\_mode) {  
    /\* Add the log and ibuf IO threads. \*/  
    srv\_n\_file\_io\_threads += 2;  
} else {  
    ib::info() << "Disabling background log and ibuf IO write"  
        << " threads.";  
}

ut\_a(srv\_n\_file\_io\_threads <= SRV\_MAX\_N\_IO\_THREADS);  
// 初始化异步IO子系统。  
if (!os\_aio\_init(srv\_n\_read\_io\_threads,  
         srv\_n\_write\_io\_threads,  
         SRV\_MAX\_N\_PENDING\_SYNC\_IOS)) {

    ib::error() << "Cannot initialize AIO sub-system";

    return(srv\_init\_abort(DB\_ERROR));  
}  
// 初始化各表空间的内存cache  
fil\_init(srv\_file\_per\_table ? 50000 : 5000, srv\_max\_n\_open\_files);

double  size;  
char    unit;  
// innodb\_buffer\_pool\_size 和 chunk\_size  
if (srv\_buf\_pool\_size >= 1024 \* 1024 \* 1024) {  
    size = ((double) srv\_buf\_pool\_size) / (1024 \* 1024 \* 1024);  
    unit = 'G';  
} else {  
    size = ((double) srv\_buf\_pool\_size) / (1024 \* 1024);  
    unit = 'M';  
}

double  chunk\_size;  
char    chunk\_unit;

if (srv\_buf\_pool\_chunk\_unit >= 1024 \* 1024 \* 1024) {  
    chunk\_size = srv\_buf\_pool\_chunk\_unit / 1024.0 / 1024 / 1024;  
    chunk\_unit = 'G';  
} else {  
    chunk\_size = srv\_buf\_pool\_chunk\_unit / 1024.0 / 1024;  
    chunk\_unit = 'M';  
}

ib::info() << "Initializing buffer pool, total size = "  
    << size << unit << ", instances = " << srv\_buf\_pool\_instances  
    << ", chunk size = " << chunk\_size << chunk\_unit;  
// 创建 innodb\_buffer\_pool, 当没有足够的内存时会报错  
err = buf\_pool\_init(srv\_buf\_pool\_size, srv\_buf\_pool\_instances);

if (err != DB\_SUCCESS) {  
    ib::error() << "Cannot allocate memory for the buffer pool";

    return(srv\_init\_abort(DB\_ERROR));  
}

ib::info() << "Completed initialization of buffer pool";

// 初始化 fsp 系统 & redo log  
fsp\_init();  
log\_init();  
// 创建 recovery 系统, 针对一个 recovery 操作初始化 recovery 系统  
recv\_sys\_create();  
recv\_sys\_init(buf\_pool\_get\_curr\_size());  
// 数据库启动时创建锁系统  
lock\_sys\_create(srv\_lock\_table\_size);  
// start lock-timeout thread  
srv\_start\_state\_set(SRV\_START\_STATE\_LOCK\_SYS);

/\* Create i/o-handler threads:  
创建 io 线程  
\*/  
for (ulint t = 0; t < srv\_n\_file\_io\_threads; ++t) {

    n\[t\] = t;

    os\_thread\_create(io\_handler\_thread, n + t, thread\_ids + t);  
}

/\* Even in read-only mode there could be flush job generated by  
intrinsic table operations.  
初始化 page\_cleaner  
\*/  
buf\_flush\_page\_cleaner\_init();  
// 创建 buf\_flush\_page\_cleaner\_coordinator 线程  
os\_thread\_create(buf\_flush\_page\_cleaner\_coordinator,  
         NULL, NULL);  
// 创建 buf\_flush\_page\_cleaner\_worker 线程  
for (i = 1; i < srv\_n\_page\_cleaners; ++i) {  
    os\_thread\_create(buf\_flush\_page\_cleaner\_worker,  
             NULL, NULL);  
}

/\* Make sure page cleaner is active.  
page\_cleaner处于活跃状态  
\*/  
while (!buf\_page\_cleaner\_is\_active) {  
    os\_thread\_sleep(10000);  
}  
// start io-thread  
srv\_start\_state\_set(SRV\_START\_STATE\_IO);

// 对目录进行规范  
os\_normalize\_path(srv\_data\_home);

/\* Check if the data files exist or not.  
检查数据文件是否存在, ibdata1 ibdata2 等等,判断是否需要创建新的数据库  
\*/  
err = srv\_sys\_space.check\_file\_spec(  
    &create\_new\_db, MIN\_EXPECTED\_TABLESPACE\_SIZE);

if (err != DB\_SUCCESS) {  
    return(srv\_init\_abort(DB\_ERROR));  
}  
// 不是创建新的db, 则需要回滚未完成的事务  
srv\_startup\_is\_before\_trx\_rollback\_phase = !create\_new\_db;

/\* Check if undo tablespaces and redo log files exist before creating  
a new system tablespace  
检查是否存在 redo log file 和 undo 表空间  
\*/  
if (create\_new\_db) {  
    err = srv\_check\_undo\_redo\_logs\_exists();  
    if (err != DB\_SUCCESS) {  
        return(srv\_init\_abort(DB\_ERROR));  
    }  
    recv\_sys\_debug\_free();  
}

/\* Open or create the data files.  
打开或者创建数据文件。  
\*/  
ulint   sum\_of\_new\_sizes;  
// 打开或者创建数据文件\[ibdata文件\],并从 ibdata1 文件中读取 flushed\_lsn  
err = srv\_sys\_space.open\_or\_create(  
    false, create\_new\_db, &sum\_of\_new\_sizes, &flushed\_lsn);

switch (err) {  
case DB\_SUCCESS:  
    break;  
case DB\_CANNOT\_OPEN\_FILE:  
    ib::error()  
        << "Could not open or create the system tablespace. If"  
        " you tried to add new data files to the system"  
        " tablespace, and it failed here, you should now"  
        " edit innodb\_data\_file\_path in my.cnf back to what"  
        " it was, and remove the new ibdata files InnoDB"  
        " created in this failed attempt. InnoDB only wrote"  
        " those files full of zeros, but did not yet use"  
        " them in any way. But be careful: do not remove"  
        " old data files which contain your precious data!";  
    /\* fall through \*/  
default:  
    /\* Other errors might come from Datafile::validate\_first\_page() \*/  
    return(srv\_init\_abort(err));  
}

dirnamelen = strlen(srv\_log\_group\_home\_dir);  
ut\_a(dirnamelen < (sizeof logfilename) - 10 - sizeof "ib\_logfile");  
memcpy(logfilename, srv\_log\_group\_home\_dir, dirnamelen);

/\* Add a path separator if needed. \*/  
if (dirnamelen && logfilename\[dirnamelen - 1\] != OS\_PATH\_SEPARATOR) {  
    logfilename\[dirnamelen++\] = OS\_PATH\_SEPARATOR;  
}

srv\_log\_file\_size\_requested = srv\_log\_file\_size;

if (create\_new\_db) {  
    /\*\*  
        如果是 create new db  
    \*/  
    // 从所有缓冲池实例的 flush list 的末尾同步的 flush dirty blocks.  
    buf\_flush\_sync\_all\_buf\_pools();  
    // 获取 current lsn  
    flushed\_lsn = log\_get\_lsn();  
    // 创建 redo log file  
    err = create\_log\_files(  
        logfilename, dirnamelen, flushed\_lsn, logfile0);

    if (err != DB\_SUCCESS) {  
        return(srv\_init\_abort(err));  
    }  
} else {  
    // not create new db  
    for (i = 0; i < SRV\_N\_LOG\_FILES\_MAX; i++) {  
        os\_offset\_t   size;  
        os\_file\_stat\_t   stat\_info;

        sprintf(logfilename + dirnamelen,  
            "ib\_logfile%u", i);  
        // 获取 logfile 文件状态  
        err = os\_file\_get\_status(  
            logfilename, &stat\_info, false,  
            srv\_read\_only\_mode);

        if (err == DB\_NOT\_FOUND) {  
            if (i == 0) {  
                if (flushed\_lsn  
                    < static\_cast<lsn\_t>(1000)) {  
                    ib::error()  
                        << "Cannot create"  
                        " log files because"  
                        " data files are"  
                        " corrupt or the"  
                        " database was not"  
                        " shut down cleanly"  
                        " after creating"  
                        " the data files.";  
                    return(srv\_init\_abort(  
                        DB\_ERROR));  
                }

                err = create\_log\_files(  
                    logfilename, dirnamelen,  
                    flushed\_lsn, logfile0);

                if (err != DB\_SUCCESS) {  
                    return(srv\_init\_abort(err));  
                }

                create\_log\_files\_rename(  
                    logfilename, dirnamelen,  
                    flushed\_lsn, logfile0);

                /\* Suppress the message about  
                crash recovery. \*/  
                flushed\_lsn = log\_get\_lsn();  
                goto files\_checked;  
            } else if (i < 2) {  
                /\* must have at least 2 log files \*/  
                ib::error() << "Only one log file"  
                    " found.";  
                return(srv\_init\_abort(err));  
            }

            /\* opened all files \*/  
            break;  
        }  
        // 检查 log file mode  
        if (!srv\_file\_check\_mode(logfilename)) {  
            return(srv\_init\_abort(DB\_ERROR));  
        }  
        // 打开 redo log file  
        err = open\_log\_file(&files\[i\], logfilename, &size);

        if (err != DB\_SUCCESS) {  
            return(srv\_init\_abort(err));  
        }

        ut\_a(size != (os\_offset\_t) -1);

        if (size & ((1 << UNIV\_PAGE\_SIZE\_SHIFT) - 1)) {

            ib::error() << "Log file " << logfilename  
                << " size " << size << " is not a"  
                " multiple of innodb\_page\_size";  
            return(srv\_init\_abort(DB\_ERROR));  
        }

        size >>= UNIV\_PAGE\_SIZE\_SHIFT;

        if (i == 0) {  
            srv\_log\_file\_size = size;  
        } else if (size != srv\_log\_file\_size) {

            ib::error() << "Log file " << logfilename  
                << " is of different size "  
                << (size << UNIV\_PAGE\_SIZE\_SHIFT)  
                << " bytes than other log files "  
                << (srv\_log\_file\_size  
                    << UNIV\_PAGE\_SIZE\_SHIFT)  
                << " bytes!";  
            return(srv\_init\_abort(DB\_ERROR));  
        }  
    }  
    // logfile的数量  
    srv\_n\_log\_files\_found = i;

    /\* Create the in-memory file space objects.  
        创建 log file 内存中的文件空间对象。  
    \*/

    sprintf(logfilename + dirnamelen, "ib\_logfile%u", 0);

    /\* Disable the doublewrite buffer for log files.  
        log file 禁用两次写缓冲区。  
    \*/  
    fil\_space\_t\* log\_space = fil\_space\_create(  
        "innodb\_redo\_log",  
        SRV\_LOG\_SPACE\_FIRST\_ID,  
        fsp\_flags\_set\_page\_size(0, univ\_page\_size),  
        FIL\_TYPE\_LOG);

    ut\_a(fil\_validate());  
    ut\_a(log\_space);

    /\* srv\_log\_file\_size is measured in pages; if page size is 16KB,  
    then we have a limit of 64TB on 32 bit systems \*/  
    ut\_a(srv\_log\_file\_size <= ULINT\_MAX);  
    // 添加 log file文件到 log file space 中  
    for (unsigned j = 0; j < i; j++) {  
        sprintf(logfilename + dirnamelen, "ib\_logfile%u", j);

        if (!fil\_node\_create(logfilename,  
                     (ulint) srv\_log\_file\_size,  
                     log\_space, false, false)) {  
            return(srv\_init\_abort(DB\_ERROR));  
        }  
    }  
    // 初始化 redo log group  
    if (!log\_group\_init(0, i, srv\_log\_file\_size \* UNIV\_PAGE\_SIZE,  
                SRV\_LOG\_SPACE\_FIRST\_ID)) {  
        return(srv\_init\_abort(DB\_ERROR));  
    }  
}

files_checked:
/* Open all log files and data files in the system
tablespace: we keep them open until database
shutdown */
// 打开所有的日志文件和系统表数据文件。
fil_open_log_and_system_tablespace_files();
// 打开 undo 表空间, 在找到并打开所有的 undo 文件之后, 将他们全部加入文件管理系统
err = srv_undo_tablespaces_init(
create_new_db,
srv_undo_tablespaces,
&srv_undo_tablespaces_open);

/\* If the force recovery is set very high then we carry on regardless  
of all errors. Basically this is fingers crossed mode.  
接下来涉及到数据的恢复。  
\*/

if (err != DB\_SUCCESS  
    && srv\_force\_recovery < SRV\_FORCE\_NO\_UNDO\_LOG\_SCAN) {

    return(srv\_init\_abort(err));  
}

/\* Initialize objects used by dict stats gathering thread, which  
can also be used by recovery if it tries to drop some table \*/  
if (!srv\_read\_only\_mode) {  
    dict\_stats\_thread\_init();  
}  
// 初始化 file\_format\_max变量。  
trx\_sys\_file\_format\_init();  
// 创建 trx\_sys instance 并初始化 purge\_queue 和 mutex  
trx\_sys\_create();

if (create\_new\_db) {

    ut\_a(!srv\_read\_only\_mode);

    mtr\_start(&mtr);

    bool ret = fsp\_header\_init(0, sum\_of\_new\_sizes, &mtr);

    mtr\_commit(&mtr);

    if (!ret) {  
        return(srv\_init\_abort(DB\_ERROR));  
    }

    /\* To maintain backward compatibility we create only  
    the first rollback segment before the double write buffer.  
    All the remaining rollback segments will be created later,  
    after the double write buffer has been created. \*/  
    trx\_sys\_create\_sys\_pages();

    purge\_queue = trx\_sys\_init\_at\_db\_start();

    DBUG\_EXECUTE\_IF("check\_no\_undo",  
            ut\_ad(purge\_queue->empty());  
            );

    /\* The purge system needs to create the purge view and  
    therefore requires that the trx\_sys is inited. \*/

    trx\_purge\_sys\_create(srv\_n\_purge\_threads, purge\_queue);

    err = dict\_create();

    if (err != DB\_SUCCESS) {  
        return(srv\_init\_abort(err));  
    }

    buf\_flush\_sync\_all\_buf\_pools();

    flushed\_lsn = log\_get\_lsn();

    fil\_write\_flushed\_lsn(flushed\_lsn);

    create\_log\_files\_rename(  
        logfilename, dirnamelen, flushed\_lsn, logfile0);

} else {

    /\* Check if we support the max format that is stamped  
    on the system tablespace.  
    Note:  We are NOT allowed to make any modifications to  
    the TRX\_SYS\_PAGE\_NO page before recovery  because this  
    page also contains the max\_trx\_id etc. important system  
    variables that are required for recovery.  We need to  
    ensure that we return the system to a state where normal  
    recovery is guaranteed to work. We do this by  
    invalidating the buffer cache, this will force the  
    reread of the page and restoration to its last known  
    consistent state, this is REQUIRED for the recovery  
    process to work. \*/  
    // 检查是否支持系统表空间上的 max 格式。  
    err = trx\_sys\_file\_format\_max\_check(  
        srv\_max\_file\_format\_at\_startup);

    if (err != DB\_SUCCESS) {  
        return(srv\_init\_abort(err));  
    }

    /\* Invalidate the buffer pool to ensure that we reread  
    the page that we read above, during recovery.  
    Note that this is not as heavy weight as it seems. At  
    this point there will be only ONE page in the buf\_LRU  
    and there must be no page in the buf\_flush list.  
    使整个缓冲池无效, 来确保在 recovery的过程中我们重启读取之前读取的页。  
    这是一个很轻量级的操作, 此时再 LRU 列表中只有一个数据页, 在 flush 列表中没有任何数据页。  
    \*/  
    buf\_pool\_invalidate();

    /\* Scan and locate truncate log files. Parsed located files  
    and add table to truncate information to central vector for  
    truncate fix-up action post recovery.  
    扫描并定位 truncate log file, 解析truncate log file.  
    \*/  
    err = TruncateLogParser::scan\_and\_parse(srv\_log\_group\_home\_dir);  
    if (err != DB\_SUCCESS) {

        return(srv\_init\_abort(DB\_ERROR));  
    }

    /\* We always try to do a recovery, even if the database had  
    been shut down normally: this is the normal startup path  
    通常情况下, 需要做一个 recovery 操作, 即使 database 正常关闭。  
    \*/  
    /\*\*  
    从 checkpoint  flushed\_lsn 位置开始恢复。  
    1. 初始化红黑树, 以便在恢复的过程中快速插入 flush 列表。  
    2. 在 log groups 中查找 latest checkpoint  
    3. 读取 latest checkpoint 所在的 redo log 页到 log\_sys->checkpoint\_buf中  
    4. 获取 checkpoint\_lsn 和 checkpoint\_no  
    5. 从 checkpoing\_lsn 读取 redo log 到 hash 表中。  
    6. 检查 crash recovery 所需的表空间, 处理并删除double write buf 中的数据页, 这里会检查double write buf 中页对应的真实数据页的  
    完整性, 如果有问题, 则使用 double write buf 中页进行恢复。同时, 生成后台线程 recv\_writer\_thread 以清理缓冲池中的脏页。  
    7. 将日志段从最新的日志组复制到其他组, 我们目前只有一个日志组。  
    \*/  
    err = recv\_recovery\_from\_checkpoint\_start(flushed\_lsn);  

          // 清除 double write buf 中的数据页
recv_sys->dblwr.pages.clear();
// 初始化 数据字典系统,并初始化change buffer
if (err == DB_SUCCESS) {
/* Initialize the change buffer. */
err = dict_boot();
}

    if (err != DB\_SUCCESS) {

        /\* A tablespace was not found during recovery. The  
        user must force recovery. \*/

        if (err == DB\_TABLESPACE\_NOT\_FOUND) {

            srv\_fatal\_error();

            ut\_error;  
        }

        return(srv\_init\_abort(DB\_ERROR));  
    }  
    // 创建并初始化事务系统。  
    purge\_queue = trx\_sys\_init\_at\_db\_start();

    if (srv\_force\_recovery < SRV\_FORCE\_NO\_LOG\_REDO) {  
        /\* Apply the hashed log records to the  
        respective file pages, for the last batch of  
        recv\_group\_scan\_log\_recs(). \*/  
        // 应用 redo log, 完成 crash recovery 操作.  
        recv\_apply\_hashed\_log\_recs(TRUE);  
        DBUG\_PRINT("ib\_log", ("apply completed"));

        if (recv\_needed\_recovery) {  
            /// Last MySQL binlog file position 0 894036112, file name mysql-bin.002128  
            trx\_sys\_print\_mysql\_binlog\_offset();  
        }  
    }

    if (recv\_sys->found\_corrupt\_log) {  
        ib::warn()  
            << "The log file may have been corrupt and it"  
            " is possible that the log scan or parsing"  
            " did not proceed far enough in recovery."  
            " Please run CHECK TABLE on your InnoDB tables"  
            " to check that they are ok!"  
            " It may be safest to recover your"  
            " InnoDB database from a backup!";  
    }

    /\* The purge system needs to create the purge view and  
    therefore requires that the trx\_sys is inited. \*/  
    // 创建 trx\_purge\_sys  
    trx\_purge\_sys\_create(srv\_n\_purge\_threads, purge\_queue);

    /\* recv\_recovery\_from\_checkpoint\_finish needs trx lists which  
    are initialized in trx\_sys\_init\_at\_db\_start(). \*/  
    /\*  
        完成 recovery 操作。  
        1. 确保 recv\_writer 线程已完成  
        2. 等待 flush 操作完成, flush脏页操作已经完成  
        3. 等待 recv\_writer 线程终止  
        4. 释放 flush 红黑树  
        5. _回滚所有的数据字典表的事务,以便数据字典表没有被锁定。数据字典 latch 应保证一次只有一个数据字典事务处于活跃状态。_  
    \*/  
    recv\_recovery\_from\_checkpoint\_finish();

    /\* Fix-up truncate of tables in the system tablespace  
    if server crashed while truncate was active. The non-  
    system tables are done after tablespace discovery. Do  
    this now because this procedure assumes that no pages  
    have changed since redo recovery.  Tablespace discovery  
    can do updates to pages in the system tablespace.\*/  
    // 修复系统表空间中的表  
    err = truncate\_t::fixup\_tables\_in\_system\_tablespace();

    if (srv\_force\_recovery < SRV\_FORCE\_NO\_IBUF\_MERGE) {  
        /\* Open or Create SYS\_TABLESPACES and SYS\_DATAFILES  
        so that tablespace names and other metadata can be  
        found. \*/  
        srv\_sys\_tablespaces\_open = true;  
        // 检查数据字典中每个表的表空间  
        err = dict\_create\_or\_check\_sys\_tablespace();  
        if (err != DB\_SUCCESS) {  
            return(srv\_init\_abort(err));  
        }

        /\* The following call is necessary for the insert  
        buffer to work with multiple tablespaces. We must  
        know the mapping between space id's and .ibd file  
        names.

        In a crash recovery, we check that the info in data  
        dictionary is consistent with what we already know  
        about space id's from the calls to fil\_ibd\_load().

        In a normal startup, we create the space objects for  
        every table in the InnoDB data dictionary that has  
        an .ibd file.

        We also determine the maximum tablespace id used.

        The 'validate' flag indicates that when a tablespace  
        is opened, we also read the header page and validate  
        the contents to the data dictionary. This is time  
        consuming, especially for databases with lots of ibd  
        files.  So only do it after a crash and not forcing  
        recovery.  Open rw transactions at this point is not  
        a good reason to validate. \*/  
        bool validate = recv\_needed\_recovery  
            && srv\_force\_recovery == 0;

        dict\_check\_tablespaces\_and\_store\_max\_id(validate);  
    }

    /\* Rotate the encryption key for recovery. It's because  
    server could crash in middle of key rotation. Some tablespace  
    didn't complete key rotation. Here, we will resume the  
    rotation. \*/  
    if (!srv\_read\_only\_mode  
        && srv\_force\_recovery < SRV\_FORCE\_NO\_LOG\_REDO) {  
        fil\_encryption\_rotate();  
    }

    /\* Fix-up truncate of table if server crashed while truncate  
    was active. \*/  
    err = truncate\_t::fixup\_tables\_in\_non\_system\_tablespace();

    if (err != DB\_SUCCESS) {  
        return(srv\_init\_abort(err));  
    }

    if (!srv\_force\_recovery  
        && !recv\_sys->found\_corrupt\_log  
        && (srv\_log\_file\_size\_requested != srv\_log\_file\_size  
        || srv\_n\_log\_files\_found != srv\_n\_log\_files)) {

        /\* Prepare to replace the redo log files. \*/

        if (srv\_read\_only\_mode) {  
            ib::error() << "Cannot resize log files"  
                " in read-only mode.";  
            return(srv\_init\_abort(DB\_READ\_ONLY));  
        }

        /\* Prepare to delete the old redo log files \*/  
        flushed\_lsn = srv\_prepare\_to\_delete\_redo\_log\_files(i);

        /\* Prohibit redo log writes from any other  
        threads until creating a log checkpoint at the  
        end of create\_log\_files(). \*/  
        ut\_d(recv\_no\_log\_write = true);  
        ut\_ad(!buf\_pool\_check\_no\_pending\_io());

        RECOVERY\_CRASH(3);

        /\* Stamp the LSN to the data files. \*/  
        fil\_write\_flushed\_lsn(flushed\_lsn);

        RECOVERY\_CRASH(4);

        /\* Close and free the redo log files, so that  
        we can replace them. \*/  
        fil\_close\_log\_files(true);

        RECOVERY\_CRASH(5);

        /\* Free the old log file space. \*/  
        log\_group\_close\_all();

        ib::warn() << "Starting to delete and rewrite log"  
            " files.";

        srv\_log\_file\_size = srv\_log\_file\_size\_requested;

        err = create\_log\_files(  
            logfilename, dirnamelen, flushed\_lsn,  
            logfile0);

        if (err != DB\_SUCCESS) {  
            return(srv\_init\_abort(err));  
        }

        create\_log\_files\_rename(  
            logfilename, dirnamelen, flushed\_lsn,  
            logfile0);  
    }  
    // 回滚未提交的不完整的事务, 这是在一个后台线程中进行的。  
    recv\_recovery\_rollback\_active();

    /\* It is possible that file\_format tag has never  
    been set. In this case we initialize it to minimum  
    value.  Important to note that we can do it ONLY after  
    we have finished the recovery process so that the  
    image of TRX\_SYS\_PAGE\_NO is not stale. \*/  
    trx\_sys\_file\_format\_tag\_init();  
}

if (!create\_new\_db) {  
    /\* Check and reset any no-redo rseg slot on disk used by  
    pre-5.7.2 redo resg with no data to purge. \*/  
    trx\_rseg\_reset\_pending();  
}

if (!create\_new\_db && sum\_of\_new\_sizes > 0) {  
    /\* New data file(s) were added \*/  
    mtr\_start(&mtr);

    fsp\_header\_inc\_size(0, sum\_of\_new\_sizes, &mtr);

    mtr\_commit(&mtr);

    /\* Immediately write the log record about increased tablespace  
    size to disk, so that it is durable even if mysqld would crash  
    quickly \*/

    log\_buffer\_flush\_to\_disk();  
}

/\* Open temp-tablespace and keep it open until shutdown. \*/  
// 打开临时表空间  
err = srv\_open\_tmp\_tablespace(create\_new\_db, &srv\_tmp\_space);

if (err != DB\_SUCCESS) {  
    return(srv\_init\_abort(err));  
}

/\* Create the doublewrite buffer to a new tablespace \*/  
if (buf\_dblwr == NULL && !buf\_dblwr\_create()) {  
    return(srv\_init\_abort(DB\_ERROR));  
}

/\* Here the double write buffer has already been created and so  
any new rollback segments will be allocated after the double  
write buffer. The default segment should already exist.  
We create the new segments only if it's a new database or  
the database was shutdown cleanly. \*/

/\* Note: When creating the extra rollback segments during an upgrade  
we violate the latching order, even if the change buffer is empty.  
We make an exception in sync0sync.cc and check srv\_is\_being\_started  
for that violation. It cannot create a deadlock because we are still  
running in single threaded mode essentially. Only the IO threads  
should be running at this stage. \*/

/\* Deprecate innodb\_undo\_logs.  But still use it if it is set to  
non-default and innodb\_rollback\_segments is default. \*/  
ut\_a(srv\_rollback\_segments > 0);  
ut\_a(srv\_rollback\_segments <= TRX\_SYS\_N\_RSEGS);  
ut\_a(srv\_undo\_logs > 0);  
ut\_a(srv\_undo\_logs <= TRX\_SYS\_N\_RSEGS);  
if (srv\_undo\_logs < TRX\_SYS\_N\_RSEGS) {  
    ib::warn() << deprecated\_undo\_logs;  
    if (srv\_rollback\_segments == TRX\_SYS\_N\_RSEGS) {  
        srv\_rollback\_segments = srv\_undo\_logs;  
    }  
}

/\* The number of rsegs that exist in InnoDB is given by status  
variable srv\_available\_undo\_logs. The number of rsegs to use can  
be set using the dynamic global variable srv\_rollback\_segments. \*/  
// 创建回滚段  
srv\_available\_undo\_logs = trx\_sys\_create\_rsegs(  
    srv\_undo\_tablespaces, srv\_rollback\_segments, srv\_tmp\_undo\_logs);

if (srv\_available\_undo\_logs == ULINT\_UNDEFINED) {  
    /\* Can only happen if server is read only. \*/  
    ut\_a(srv\_read\_only\_mode);  
    srv\_rollback\_segments = ULONG\_UNDEFINED;  
} else if (srv\_available\_undo\_logs < srv\_rollback\_segments  
       && !srv\_force\_recovery && !recv\_needed\_recovery) {  
    ib::error() << "System or UNDO tablespace is running of out"  
            << " of space";  
    /\* Should due to out of file space. \*/  
    return(srv\_init\_abort(DB\_ERROR));  
}

srv\_startup\_is\_before\_trx\_rollback\_phase = false;

if (!srv\_read\_only\_mode) {  
    /\* Create the thread which watches the timeouts  
    for lock waits  
        创建 lock\_wait\_timeout\_thread watch 线程  
    \*/  
    os\_thread\_create(  
        lock\_wait\_timeout\_thread,  
        NULL, thread\_ids + 2 + SRV\_MAX\_N\_IO\_THREADS);

    /\* Create the thread which warns of long semaphore waits  
        创建 srv\_error\_monitor\_thread 线程  
    \*/  
    os\_thread\_create(  
        srv\_error\_monitor\_thread,  
        NULL, thread\_ids + 3 + SRV\_MAX\_N\_IO\_THREADS);

    /\* Create the thread which prints InnoDB monitor info  
        创建 Innodb monitor info print 线程  
    \*/  
    os\_thread\_create(  
        srv\_monitor\_thread,  
        NULL, thread\_ids + 4 + SRV\_MAX\_N\_IO\_THREADS);

    srv\_start\_state\_set(SRV\_START\_STATE\_MONITOR);  
}

/\* Create the SYS\_FOREIGN and SYS\_FOREIGN\_COLS system tables \*/  
err = dict\_create\_or\_check\_foreign\_constraint\_tables();  
if (err != DB\_SUCCESS) {  
    return(srv\_init\_abort(err));  
}

/\* Create the SYS\_TABLESPACES system table \*/  
err = dict\_create\_or\_check\_sys\_tablespace();  
if (err != DB\_SUCCESS) {  
    return(srv\_init\_abort(err));  
}  
srv\_sys\_tablespaces\_open = true;

/\* Create the SYS\_VIRTUAL system table \*/  
err = dict\_create\_or\_check\_sys\_virtual();  
if (err != DB\_SUCCESS) {  
    return(srv\_init\_abort(err));  
}

srv\_is\_being\_started = false;

ut\_a(trx\_purge\_state() == PURGE\_STATE\_INIT);

/\* Create the master thread which does purge and other utility  
operations  
    创建 master 线程  
\*/

if (!srv\_read\_only\_mode) {

    os\_thread\_create(  
        srv\_master\_thread,  
        NULL, thread\_ids + (1 + SRV\_MAX\_N\_IO\_THREADS));

    srv\_start\_state\_set(SRV\_START\_STATE\_MASTER);  
}  
// purge\_coordinator 线程和 purge\_worker 线程  
if (!srv\_read\_only\_mode  
    && srv\_force\_recovery < SRV\_FORCE\_NO\_BACKGROUND) {

    os\_thread\_create(  
        srv\_purge\_coordinator\_thread,  
        NULL, thread\_ids + 5 + SRV\_MAX\_N\_IO\_THREADS);

    ut\_a(UT\_ARR\_SIZE(thread\_ids)  
         > 5 + srv\_n\_purge\_threads + SRV\_MAX\_N\_IO\_THREADS);

    /\* We've already created the purge coordinator thread above. \*/  
    for (i = 1; i < srv\_n\_purge\_threads; ++i) {  
        os\_thread\_create(  
            srv\_worker\_thread, NULL,  
            thread\_ids + 5 + i + SRV\_MAX\_N\_IO\_THREADS);  
    }  
    // 等待 purge thread 启动  
    srv\_start\_wait\_for\_purge\_to\_start();

    srv\_start\_state\_set(SRV\_START\_STATE\_PURGE);  
} else {  
    purge\_sys->state = PURGE\_STATE\_DISABLED;  
}

/\* wake main loop of page cleaner up  
    唤醒 page cleaner 主循环  
\*/  
os\_event\_set(buf\_flush\_event);

sum\_of\_data\_file\_sizes = srv\_sys\_space.get\_sum\_of\_sizes();  
ut\_a(sum\_of\_new\_sizes != ULINT\_UNDEFINED);

tablespace\_size\_in\_header = fsp\_header\_get\_tablespace\_size();

if (!srv\_read\_only\_mode  
    && !srv\_sys\_space.can\_auto\_extend\_last\_file()  
    && sum\_of\_data\_file\_sizes != tablespace\_size\_in\_header) {

    ib::error() << "Tablespace size stored in header is "  
        << tablespace\_size\_in\_header << " pages, but the sum"  
        " of data file sizes is " << sum\_of\_data\_file\_sizes  
        << " pages";

    if (srv\_force\_recovery == 0  
        && sum\_of\_data\_file\_sizes < tablespace\_size\_in\_header) {  
        /\* This is a fatal error, the tail of a tablespace is  
        missing \*/

        ib::error()  
            << "Cannot start InnoDB."  
            " The tail of the system tablespace is"  
            " missing. Have you edited"  
            " innodb\_data\_file\_path in my.cnf in an"  
            " inappropriate way, removing"  
            " ibdata files from there?"  
            " You can set innodb\_force\_recovery=1"  
            " in my.cnf to force"  
            " a startup if you are trying"  
            " to recover a badly corrupt database.";

        return(srv\_init\_abort(DB\_ERROR));  
    }  
}

if (!srv\_read\_only\_mode  
    && srv\_sys\_space.can\_auto\_extend\_last\_file()  
    && sum\_of\_data\_file\_sizes < tablespace\_size\_in\_header) {

    ib::error() << "Tablespace size stored in header is "  
        << tablespace\_size\_in\_header << " pages, but the sum"  
        " of data file sizes is only "  
        << sum\_of\_data\_file\_sizes << " pages";

    if (srv\_force\_recovery == 0) {

        ib::error()  
            << "Cannot start InnoDB. The tail of"  
            " the system tablespace is"  
            " missing. Have you edited"  
            " innodb\_data\_file\_path in my.cnf in an"  
            " InnoDB: inappropriate way, removing"  
            " ibdata files from there?"  
            " You can set innodb\_force\_recovery=1"  
            " in my.cnf to force"  
            " InnoDB: a startup if you are trying to"  
            " recover a badly corrupt database.";

        return(srv\_init\_abort(DB\_ERROR));  
    }  
}

if (srv\_print\_verbose\_log) {  
    ib::info() << INNODB\_VERSION\_STR  
        << " started; log sequence number "  
        << srv\_start\_lsn;  
}

if (srv\_force\_recovery > 0) {  
    ib::info() << "!!! innodb\_force\_recovery is set to "  
        << srv\_force\_recovery << " !!!";  
}

if (srv\_force\_recovery == 0) {  
    /\* In the insert buffer we may have even bigger tablespace  
    id's, because we may have dropped those tablespaces, but  
    insert buffer merge has not had time to clean the records from  
    the ibuf tree. \*/

    ibuf\_update\_max\_tablespace\_id();  
}

if (!srv\_read\_only\_mode) {  
    if (create\_new\_db) {  
        srv\_buffer\_pool\_load\_at\_startup = FALSE;  
    }

    /\* Create the buffer pool dump/load thread \*/  
    os\_thread\_create(buf\_dump\_thread, NULL, NULL);

    /\* Create the dict stats gathering thread \*/  
    os\_thread\_create(dict\_stats\_thread, NULL, NULL);

    /\* Create the thread that will optimize the FTS sub-system. \*/  
    fts\_optimize\_init();

    srv\_start\_state\_set(SRV\_START\_STATE\_STAT);  
}

/\* Create the buffer pool resize thread \*/  
os\_thread\_create(buf\_resize\_thread, NULL, NULL);

srv\_was\_started = TRUE;  
return(DB\_SUCCESS);  

}

手机扫一扫

移动阅读更方便

阿里云服务器
腾讯云服务器
七牛云服务器

你可能感兴趣的文章