最近遇到一个问题,某个表在全表扫描的时候数据库crash,设置innodb_force_recovery=6后能够正常访问,但是访问的数据却少了很多。那么分析一下是否有这种可能性,我们以8036为例子。
一、关于page损坏的检测
如果ibdata,数据字典等信息正确,如果只是用户表的page损坏,我们的数据库通常是能够启动,在数据库启动的时候只会检测每个表的第一个page,如果出现错误依旧可以启动数据库,当物理读取访问到需要的page的时候,才进行检测,因此这种情况下通常是可以启动数据库的。这个检测通常分为如下一些步骤,
- 比对page头尾的LSN,如果LSN不一致则直接标记为损坏页
- 比对page的LSN和当前系统的LSN,如果page的LSN大于当前系统的LSN,则报错,但是不会crash
- 比对page头尾的CRC值,如果不相等直接标记为损坏页
- 比对page的CRC值和根据page信息计算出来的CRC值,如果不相等直接标记为损坏页
- 如果page损坏则会打印出page的详细信息,供分析用
如果page标记为损坏,则终止访问数据,数据库crash,但是如果innodb_force_recovery不等于0的情况下即便是损坏的页会返回给读取者,日志依旧会输出,这个时候由读取者自行判断,出现各种错误的可能性就太多了,我们对一些错误进行测试。
二、测试场景
1、主键page rec next offset 损坏
这里的rec next offset用于page内部读取下一条记录,存储在rec header中暂用2个字节,我们可以修改物理文件中某行数据的这个值,重启数据库后测试读取这个表,
- innodb_force_recovery=0,则直接访问损坏的page的直接crash
- innodb_force_recovery=6,可能跳过损坏的page的数据继续读取,数据部分丢失,但是访问完成,这似乎说明了如果page损坏的情况下是有可能读取到部分数据的,报错如下,
InnoDB: Page may be an index page where index id is 889
2025-06-10T11:10:43.519150+08:00 7 [Note] [MY-011907] [InnoDB] It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html for information about forcing recovery.
2025-06-10T11:10:43.519242+08:00 7 [Note] [MY-011860] [InnoDB] Record in index `PRIMARY` of table `test`.`testmul`, page [page id: space=334, page number=5], at offset 297
2025-06-10T11:10:43.519275+08:00 7 [ERROR] [MY-013049] [InnoDB] Field 1 len is 4, should be 6
2025-06-10T11:10:43.519303+08:00 7 [Note] [MY-012860] [InnoDB] Index corruption: rec offs 297 next offs 4393, page no 5, index `PRIMARY` of table `test`.`testmul`. We try to skip the record.
或者
2025-06-10T11:47:50.971865+08:00 7 [Note] [MY-011878] [InnoDB] Uncompressed page, stored checksum in field1 992061719, calculated checksums for field1: crc32 79074980/751448067, innodb 203637893, none 3735928559, stored checksum in field2 992061719, calculated checksums for field2: crc32 79074980/751448067, innodb 718119291, none 3735928559, page LSN 0 1046151093, low 4 bytes of LSN at page end 1046151093, page number (if stored to page already) 5, space id (if created with >= MySQL-4.1.1 and stored already) 335
InnoDB: Page may be an index page where index id is 890
2025-06-10T11:47:50.971928+08:00 7 [Note] [MY-011907] [InnoDB] It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html for information about forcing recovery.
2025-06-10T11:47:50.972007+08:00 7 [Note] [MY-012859] [InnoDB] Index corruption: rec offs 1562 next offs 0, page no 5, index `PRIMARY` of table `test`.`testmul`. We try to skip the rest of the page.
而访问的数据明显比正常的时候少了很一些记录如下,报错中页明显有了 skip record和skip page的报错。
正常访问
mysql> select count(*) from testmul;
+----------+
| count(*) |
+----------+
| 896 |
+----------+
1 row in set (0.02 sec)
模拟错误后
mysql> select count(*) from testmul;
+----------+
| count(*) |
+----------+
| 707 |
+----------+
1 row in set (0.03 sec)
- innodb_force_recovery=6,也可能依旧crash
2025-06-10T11:08:47.831881+08:00 7 [Note] [MY-011878] [InnoDB] Uncompressed page, stored checksum in field1 201425712, calculated checksums for field1: crc32 865516675/2541663936, innodb 875666169, none 3735928559, stored checksum in field2 201425712, calculated checksums for field2: crc32 865516675/2541663936, innodb 1884839653, none 3735928559, page LSN 0 1045854495, low 4 bytes of LSN at page end 1045854495, page number (if stored to page already) 5, space id (if created with >= MySQL-4.1.1 and stored already) 334
InnoDB: Page may be an index page where index id is 889
2025-06-10T11:08:47.831969+08:00 7 [Note] [MY-011907] [InnoDB] It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html for information about forcing recovery.
2025-06-10T11:08:47.832279+08:00 7 [ERROR] [MY-013183] [InnoDB] Assertion failure: rec.cc:384 thread 140736649066240
2、主键page next page 损坏
每个page的FIL HEADER中,有next page和prev page的信息,用于在innodb B+树中将page链接为双向链表,分别占用4字节。同样修改物理文件中的next page值,重启数据库后测试读取这个表,
- innodb_force_recovery=0,则直接访问损坏的page的直接crash
- innodb_force_recovery=6,则直接访问损坏的page的直接crash,报错如下
2025-06-10T11:40:48.555629+08:00 7 [Note] [MY-011878] [InnoDB] Uncompressed page, stored checksum in field1 992061719, calculated checksums for field1: crc32 2105446543/1662032065, innodb 466328039, none 3735928559, stored checksum in field2 992061719, calculated checksums for field2: crc32 2105446543/1662032065, innodb 24260494, none 3735928559, page LSN 0 1046151093, low 4 bytes of LSN at page end 1046151093, page number (if stored to page already) 5, space id (if created with >= MySQL-4.1.1 and stored already) 335
InnoDB: Page may be an index page where index id is 890
2025-06-10T11:40:48.555691+08:00 7 [Note] [MY-011907] [InnoDB] It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html for information about forcing recovery.
2025-06-10T11:40:48.556066+08:00 7 [ERROR] [MY-013183] [InnoDB] Assertion failure: btr0pcur.cc:336:btr_page_get_prev(next_page, mtr) == get_block()->page.id.page_no() thread 140736649066240
3、page LSN 比当前系统LSN更新
这种情况下虽然报错了,但是依旧能够读取到数据
2025-06-10T16:07:48.160501+08:00 9 [ERROR] [MY-011971] [InnoDB] Tablespace 'mysql' Page [page id: space=4294967294, page number=176] log sequence number 1048872667 is in the future! Current system log sequence number 1048872666.
2025-06-10T16:07:48.160598+08:00 9 [ERROR] [MY-011972] [InnoDB] Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB redo log files. Please refer to http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html for information about forcing recovery.
4、二级索引page损坏
这种情况下如果使用全表扫描并不会导致数据库crash,但是如果使用到索引损坏的page则可能导致数据库crash,这种情况下我们可以考虑删除所有的二级索引,重建索引即可。
5、启动检测第一个page损坏
一个tablespace的page 0 包含了fsp 信息,其中有大量关于extent和inode信息,同时也包含详细的256 extent的信息,大概如下
page 0独有 fsp header: offset FIL_PAGE_DATA fsp_header_t 也就是fsp header
FSP_HEADER_OFFSET = FIL_PAGE_DATA(38)
constexpr uint32_t FSP_SPACE_ID = 0; //表空间ID
constexpr uint32_t FSP_NOT_USED = 4; //没有使用
constexpr uint32_t FSP_SIZE = 8; //总共的page数量
constexpr uint32_t FSP_FREE_LIMIT = 12; //初始化的limit
constexpr uint32_t FSP_SPACE_FLAGS = 16; //
constexpr uint32_t FSP_FRAG_N_USED = 20; //在碎片extent链表中使用的page数量,应该是碎片extent中使用的page数量
constexpr uint32_t FSP_FREE = 24; //链表 free extent 再分配给segment(inode) 之前 那么分配extent 还是从fsp header中获取的
constexpr uint32_t FSP_FREE_FRAG = 24 + FLST_BASE_NODE_SIZE; //这里是根节点地址 4 + 2 * FIL_ADDR_SIZE (4+2*6 = 16) 链表 不属于任何 segment,被共享使用,碎片extent
constexpr uint32_t FSP_FULL_FRAG = 24 + 2 * FLST_BASE_NODE_SIZE; //这里是跟节点地址 4 + 2 * FIL_ADDR_SIZE (4+2*6 = 16) 链表 不属于任何 segment,被共享使用,碎片extent
constexpr uint32_t FSP_SEG_ID = 24 + 3 * FLST_BASE_NODE_SIZE; //当前segment id的最大值
constexpr uint32_t FSP_SEG_INODES_FULL = 32 + 3 * FLST_BASE_NODE_SIZE; //满inode页面的链表 inode page --> inode page 分配segment的时候需要先从inode 链表中获取inode 才能得到segment ID,如果空闲的可以用空闲的
constexpr uint32_t FSP_SEG_INODES_FREE = 32 + 4 * FLST_BASE_NODE_SIZE; //半满inode页面的链表 inode page ---> inode page 一个inode page 大概能容纳85的inode(segment ) 信息,如果分配完了需要新分配
constexpr uint32_t FSP_HEADER_SIZE = 32 + 5 * FLST_BASE_NODE_SIZE; 122 (16*5=80+32=112)
offset 112+38
256个 XDES
XDES[0]
XDES_ID 0 8 //segment id
XDES_FLST_NODE 8 12 2 * FIL_ADDR_SIZE 12
XDES_STATE (FLST_NODE_SIZE + 8) 20 4 //XDES_NOT_INITED、XDES_FREE、XDES_FREE_FRAG、XDES_FULL_FRAG、XDES_FSEG、XDES_FSEG_FRAG 中的一种
XDES_BITMAP (FLST_NODE_SIZE + 12) 24 16 //16字节 128位 每2位 表示一个page的状态 管理64个page
XDES[1]
...
XDES[255]
我们可以随意修改page 0的信息,这样CRC检测都会出错,报错如下,数据库可以启动,但是这个tablespace 已经不加载了(Datafile::validate_first_page),如果有大量的数据文件可能这个过程有点耗时。
)
2025-06-10T14:36:55.830045+08:00 1 [ERROR] [MY-012224] [InnoDB] Checksum mismatch in datafile: ./test/testmul.ibd, Space ID:335, Flags: 16417. Please refer to http://dev.mysql.com/doc/refman/8.0/en/innodb-troubleshooting-datadict.html for how to resolve the issue.
2025-06-10T14:36:55.830204+08:00 1 [ERROR] [MY-012592] [InnoDB] Operating system error number 2 in a file operation.
2025-06-10T14:36:55.830266+08:00 1 [ERROR] [MY-012593] [InnoDB] The error means the system cannot find the path specified.
2025-06-10T14:36:55.830310+08:00 1 [ERROR] [MY-012131] [InnoDB] Could not find a valid tablespace file for `test/testmul`. Please refer to http://dev.mysql.com/doc/refman/8.0/en/innodb-troubleshooting-datadict.html for how to resolve the issue.
2025-06-10T14:36:55.830356+08:00 1 [Note] [MY-012355] [InnoDB] Tablespace 335, name 'test/testmul', unable to open file './test/testmul.ibd' - Data structure corruption
2025-06-10T14:36:55.830561+08:00 1 [Note] [MY-012356] [InnoDB] Scanned 83 tablespaces. Validated 82. Found 1 missing.
访问表不能访问
mysql> select * from testmul;
ERROR 1812 (HY000): Tablespace is missing for table `test`.`testmul`.
三、备份
buf_page_io_complete
->auto buf_pool = buf_pool_from_bpage(bpage);
获取page所在的 buffer 实例
->bool uncompressed = (buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE)
是否为非压缩page
->const auto io_type = bpage->is_io_fix_read_as_opposed_to_write() ? BUF_IO_READ : BUF_IO_WRITE
获取IO类型,因为本处为某个page完成IO后修改type的地方,因此不需要加锁
->flush_type = buf_page_get_flush_type(bpage)
BUF_FLUSH_LRU/BUF_FLUSH_LIST/BUF_FLUSH_SINGLE_PAGE/BUF_FLUSH_N_TYPES
->if (io_type == BUF_IO_READ)
->if (bpage->size.is_compressed()) {
如果是压缩page 不考虑
->else
->frame = reinterpret_cast<buf_block_t *>(bpage)->frame
获取page的frame
->read_page_no = mach_read_from_4(frame + FIL_PAGE_OFFSET)
从fil header中FIL_PAGE_OFFSET获取page no
->read_space_id = mach_read_from_4(frame + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID);
从fil header中FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID获取space id
->BlockReporter reporter = BlockReporter(true, frame, bpage->size,fsp_is_checksum_disabled(bpage->id.space()))
m_check_lsn ture
m_read_buf frame
m_page_size bpage->size
m_skip_checksum fsp_is_checksum_disabled(bpage->id.space())
->is_corrupted = reporter.is_corrupted();
->memcmp(m_read_buf + FIL_PAGE_LSN + 4,m_read_buf + m_page_size.logical() - FIL_PAGE_END_LSN_OLD_CHKSUM + 4,4)
比对头部和尾部的LSN
-> return (true);
如果不相等直接返回错误
->buf_page_lsn_check(m_check_lsn, m_read_buf)
->if (check_lsn && recv_lsn_checks_on)
->const lsn_t page_lsn = mach_read_from_8(read_buf + FIL_PAGE_LSN)
从fil header获取page最新一次修改的lsn
->current_lsn = log_get_lsn(*log_sys)
获取当前的系统的lsn
->if (current_lsn < page_lsn)
如果page 的lsn大于了当前的lsn
->space_id = mach_read_from_4(read_buf + FIL_PAGE_SPACE_ID)
获取fil header 中的space id
->const page_no_t page_no = mach_read_from_4(read_buf + FIL_PAGE_OFFSET)
获取fil header 中的page_no
->auto space = fil_space_get(space_id);
->auto shard = fil_system->shard_by_id(space_id);
获取某个instance,然后上shard锁获取file space结构
->fil_space_t *space = shard->get_space_by_id(space_id)
->ib::error(ER_IB_MSG_146) << "Tablespace '" << space->name << "'" << " Page " << page_id_t(space_id, page_no) << " log sequence number "
<< page_lsn << " is in the future! Current system" << " log sequence number " << current_lsn << ".";
触发错误日志,但是虽然触发了错误日志但是并没有
返回错误
->ib::error(ER_IB_MSG_147)<< "Your database may be corrupt or you may have copied the InnoDB"<< " tablespace but not the InnoDB redo log files. "
<< FORCE_RECOVERY_MSG;
->if (srv_checksum_algorithm == SRV_CHECKSUM_ALGORITHM_NONE || m_skip_checksum)
如果 innodb_checksum_algorithm 设置为NONE 则不进行CRC32计算校验
->return (false);
直接返回false 不需要检测了
->const auto checksum_field1 = mach_read_from_4(m_read_buf + FIL_PAGE_SPACE_OR_CHKSUM)
获取fil header中的checksum
->const auto checksum_field2 = mach_read_from_4(m_read_buf + m_page_size.logical() - FIL_PAGE_END_LSN_OLD_CHKSUM)
获取尾部的checksum
->const page_id_t page_id(mach_read_from_4(m_read_buf + FIL_PAGE_SPACE_ID), mach_read_from_4(m_read_buf + FIL_PAGE_OFFSET))
合并space id和 page no 为page_id
->const srv_checksum_algorithm_t curr_algo = static_cast<srv_checksum_algorithm_t>(srv_checksum_algorithm)
获取当前的校验算法 默认为crc32 innodb_checksum_algorithm
->switch (curr_algo)
case SRV_CHECKSUM_ALGORITHM_CRC32:
case SRV_CHECKSUM_ALGORITHM_STRICT_CRC32:
如果为crc32 算法
->if (is_checksum_valid_crc32(checksum_field1, checksum_field2, curr_algo, false))
这里首先检查checksum_field1和checksum_field2是否相等,
然后校验 frame中计算的crc32是否和checksum_field1匹配
->if (checksum_field1 != checksum_field2)
-> return (false)
如果不相等直接返回
->const auto crc32 = buf_calc_page_crc32(m_read_buf, use_legacy_big_endian)
use_legacy_big_endian为false
ut_crc32_init用于确定使用的crc32算法
->return (checksum_field1 == crc32)
检查page中的crc32和计算的crc32是否相等
->如果检查失败
print_crc32_fail
"Fail; page %" PRIuMAX " invalid (fails crc32 checksum)
报错page无效
->return (true)
返回page损坏 true
-> if (!compressed_page)
如果是非压缩页,进行报错
->报错
Database page corruption on disk or a failed file read of page
You may have to recover from a backup
->buf_page_print(frame, bpage->size, BUF_PAGE_PRINT_NO_CRASH)
打印page的内容
->报错
It is also possible that your operating system has corrupted
its own file cache and rebooting your computer removes the error.
If the corrupt page is an index page.You can also try to fix the
corruption by dumping, dropping,and reimporting the corrupt table.
You can use CHECK TABLE to scan your table for corruption.
->if (srv_force_recovery < SRV_FORCE_IGNORE_CORRUPT)
如果没有调整srv_force_recovery参数,这里就进行
返回false了,若设置了force recovery则继续
但是读取的数据可能有问题,导致读取page的时候
也会报各种错误