这两天遇到mysql宕机的问题,导致数据出现不一致的现象,结果发现有个特别重要的参数innodb_flush_log_at_trx_commit
特别需要关注,默认情况下这个参数是1,即严格保证数据库的一致性,事务在提交之后立即将log buffer的数据写入到log file,同时调用文件系统的flush操作。
仔细分析一下mysql和文件系统的关系,基本上分为write和flush两个主要的操作,mysql管理自己的log buffer,文件系统管理log file,mysql事务提交后会调用write方法将数据从log buffer写入到log file,即持久化;学过操作系统的同学都知道文件系统为了提高IO的效率,本身会对每个文件做缓存,所以要想保证数据真正落地到磁盘上,有时会多一步flush操作,这个flush操作可能是文件系统自己flush,也可能是mysql通过系统调用强制让文件系统去flush。
但是出现故障的时候innodb_flush_log_at_trx_commit
参数的设置为2,根据下面文档的解释,为2时mysql会在事务提交之后立即向文件系统写入log file,但并不会立即调用文件系统的flush操作,而是由定时任务调度每隔1秒flush一次,这就会引发数据丢失问题:当操作系统挂掉时,恰好文件系统自己也没有调用flush操作,那么那部分写入到文件缓存的数据就会丢失。如果mysql自己挂掉,由于log buffer中的数据已经写入文件系统,只要文件系统不挂掉,数据还是在的。
The default value of 1 is required for full ACID compliance. With this value, the contents of the InnoDB
log buffer are written out to thelog file at each transaction commit and the log file is flushed to disk.
With a value of 0, the contents of the InnoDB
log buffer are written to the log file approximately once per second and the log file is flushed to disk. No writes from the log buffer to the log file are performed at transaction commit. Once-per-second flushing is not 100% guaranteed to happen every second, due to process scheduling issues. Because the flush to disk operation only occurs approximately once per second, you can lose up to a second of transactions with any mysqld process crash.
With a value of 2, the contents of the InnoDB
log buffer are written to the log file after each transaction commit and the log file is flushed to disk approximately once per second. Once-per-second flushing is not 100% guaranteed to happen every second, due to process scheduling issues. Because the flush to disk operation only occurs approximately once per second, you can lose up to a second of transactions in an operating system crash or a power outage.
翻开mysql的源代码,很好地解释了innodb_flush_log_at_trx_commit
为0、1、2时mysql是如何处理数据落地的问题的。innodb_flush_log_at_trx_commit
为0时,什么都不做,等着定时任务调度去每隔1秒write和flush,这种效率最高,但是丢失数据的风险最大,mysql进程挂掉数据就有可能丢到,更不用说操作系统挂掉;为1时事务提交后立即write和flush,能保证强一致,但IO效率很差;为2时事务提交后立即write,但是每秒flush一次,IO效率有所提升,但是仍有丢失数据的风险,大部分互联网应用都是这个值。
/**********************************************************************//**
If required, flushes the log to disk based on the value of
innodb_flush_log_at_trx_commit. */
static
void
trx_flush_log_if_needed_low(
/*========================*/
lsn_t lsn) /*!< in: lsn up to which logs are to be
flushed. */
{
switch (srv_flush_log_at_trx_commit) {
case 0:
/* Do nothing */
break;
case 1:
/* Write the log and optionally flush it to disk */
log_write_up_to(lsn, LOG_WAIT_ONE_GROUP,
srv_unix_file_flush_method != SRV_UNIX_NOSYNC);
break;
case 2:
/* Write the log but do not flush it to disk */
log_write_up_to(lsn, LOG_WAIT_ONE_GROUP, FALSE);
break;
default:
ut_error;
}
}
建议对数据一致性要求比较高的,还是把innodb_flush_log_at_trx_commit
设置为1,对一致性要求相对较弱的可以设置为2。