一次MYSQL频繁宕机的排查

背景

近期,兄弟部门反馈关于我们产品自带的MYSQL数据库的问题。他们指出,数据库在每天凌晨12点会出现宕机现象。这一现象也是最新一段才出现的,这段时间只是增加了一些自动化程序。频繁宕机的问题影响了兄弟部门的工作流程,也产生了一定的负面影响。为了解决这一问题,我们需要深入调查并给出解决措施。


排查过程

确认MYSQL所在的服务器资源情况

我们了解到这段时间做的变更中提到增加了一些自动化的程序,是不是自动化程序中涉及到操作数据库,造成系统资源不足,被系统杀掉?于是查看各类系统资源。无论是磁盘还是内存都有较大剩余,也查看了部分系统日志,均没有因为资源不足杀进程的日志,所以应该不是此原因导致宕机。

结论:不是因为系统资源导致MYSQL宕机

排查MYSQL错误日志

在my.ini中指定当mysql发生故障时的的错误日志地址,具体配置方式如下:

# start error log

log-error=mysql.err

mysql.err内容如下

2024-01-08  9:30:46 0 [Note] InnoDB: Uses event mutexes

2024-01-08  9:30:46 0 [Note] InnoDB: Compressed tables use zlib 1.2.11

2024-01-08  9:30:46 0 [Note] InnoDB: Number of pools: 1

2024-01-08  9:30:46 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions

2024-01-08  9:30:46 0 [Note] mysqld: O_TMPFILE is not supported on /tmp (disabling future attempts)

2024-01-08  9:30:46 0 [Note] InnoDB: Using Linux native AIO

2024-01-08  9:30:46 0 [Note] InnoDB: Initializing buffer pool, total size = 536870912, chunk size = 134217728

2024-01-08  9:30:46 0 [Note] InnoDB: Completed initialization of buffer pool

2024-01-08  9:30:46 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=296938092837,296938109566

2024-01-08  9:30:47 0 [Note] InnoDB: Transaction 57863035 was in the XA prepared state.

2024-01-08  9:30:47 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo

2024-01-08  9:30:47 0 [Note] InnoDB: Trx id counter is 57863036

2024-01-08  9:30:47 0 [Note] InnoDB: Starting final batch to recover 54726 pages from redo log.

2024-01-08  9:30:49 0 [Note] InnoDB: Last binlog file './mysql-bin.000078', position 111348635

2024-01-08  9:30:49 0 [Note] InnoDB: 128 rollback segments are active.

2024-01-08  9:30:49 0 [Note] InnoDB: Starting in background the rollback of recovered transactions

2024-01-08  9:30:49 0 [Note] InnoDB: Rollback of non-prepared transactions completed

2024-01-08  9:30:49 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"

2024-01-08  9:30:49 0 [Note] InnoDB: Creating shared tablespace for temporary tables

2024-01-08  9:30:49 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...

2024-01-08  9:30:49 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.

2024-01-08  9:30:49 0 [Note] InnoDB: 10.5.11 started; log sequence number 297037605262; transaction id 57863037

2024-01-08  9:30:49 0 [Note] InnoDB: Loading buffer pool(s) from /home/hsrisk/disk01/see/mysql/data/ib_buffer_pool

2024-01-08  9:30:49 0 [Note] Plugin 'FEEDBACK' is disabled.

2024-01-08  9:30:49 0 [Note] Recovering after a crash using mysql-bin

2024-01-08  9:30:49 0 [Note] InnoDB: Buffer pool(s) load completed at 240108  9:30:49

2024-01-08  9:30:49 0 [Note] Starting crash recovery...

2024-01-08  9:30:49 0 [Note] InnoDB: Starting recovery for XA transactions...

2024-01-08  9:30:49 0 [Note] InnoDB: Transaction 57863035 in prepared state after recovery

2024-01-08  9:30:49 0 [Note] InnoDB: Transaction contains changes to 1 rows

2024-01-08  9:30:49 0 [Note] InnoDB: 1 transactions in prepared state after recovery

2024-01-08  9:30:49 0 [Note] Found 1 prepared transaction(s) in InnoDB

2024-01-08  9:30:49 0 [Note] Crash recovery finished.

2024-01-08  9:30:49 0 [Note] Server socket created on IP: '::'.

2024-01-08  9:30:49 0 [Note] Reading of all Master_info entries succeeded

2024-01-08  9:30:49 0 [Note] Added new Master_info '' to hash table

2024-01-08  9:30:49 0 [Note] ./mysqld: ready for connections.

Version: '10.5.11-MariaDB-log'  socket: '/tmp/see20/mysql.sock'  port: 3306  MariaDB Server

这段日志是客户因为数据库宕机后手动重启的日志。这个日志中发现有Recovering after a crash using mysql-bin,Starting crash recovery,Crash recovery finished 等字样,表示数据库之前确实是异常关闭的,但是这种异常是MYSQL SERVER预料之外的。预料之外的含义是这种异常数据库未定义,如果是意料之中的异常的话,日志中会留下更多的信息,例如下图,虽然没写明是什么原因导致的数据库崩溃,但是他清楚的告诉了报错的地点以及保留了帮助定位的信息。

2023-09-17 21:36:12 0x7f384413a700  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/storage/innobase/lock/lock0lock.cc line 7014

InnoDB: We intentionally generate a memory trap.

InnoDB: Submit a detailed bug report to https://jira.mariadb.org/

InnoDB: If you get repeated assertion failures or crashes, even

InnoDB: immediately after the mysqld startup, there may be

InnoDB: corruption in the InnoDB tablespace. Please refer to

InnoDB: https://mariadb.com/kb/en/library/xtradbinnodb-recovery-modes/

InnoDB: about forcing recovery.

230917 21:36:12 [ERROR] mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.


To report this bug, see https://mariadb.com/kb/en/reporting-bugs


We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.


Server version: 10.3.10-MariaDB-log

key_buffer_size=134217728

read_buffer_size=131072

max_used_connections=322

max_threads=6002

thread_count=261

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 13325602 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.


Thread pointer: 0x7f35980009a8

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x7f3844139e30 thread_stack 0x49000

mysys/stacktrace.c:270(my_print_stacktrace)[0x556c2c650b4b]

sql/signal_handler.cc:168(handle_fatal_signal)[0x556c2c13bba7]

sigaction.c:0(__restore_rt)[0x7f3849863680]

:0(__GI_raise)[0x7f384866f207]

:0(__GI_abort)[0x7f38486708f8]

ut/ut0rbt.cc:461(rbt_eject_node(ib_rbt_node_t*, ib_rbt_node_t*) [clone .part.4])[0x556c2bea97b9]

lock/lock0lock.cc:7014(DeadlockChecker::check_and_resolve(ib_lock_t const*, trx_t*))[0x556c2c319ec2]

lock/lock0lock.cc:1741(lock_rec_enqueue_waiting(ib_lock_t*, unsigned long, buf_block_t const*, unsigned long, dict_index_t*, que_thr_t*, lock_prdt*))[0x556c2c31aaf4]

lock/lock0lock.cc:1990(lock_rec_lock(bool, unsigned long, buf_block_t const*, unsigned long, dict_index_t*, que_thr_t*))[0x556c2c31df48]

row/row0sel.cc:1284(sel_set_rec_lock(btr_pcur_t*, unsigned char const*, dict_index_t*, unsigned long const*, unsigned long, unsigned long, que_thr_t*, mtr_t*))[0x556c2c3b783c]

row/row0sel.cc:5047(row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long))[0x556c2c3b8a10]

handler/ha_innodb.cc:9264(ha_innobase::index_read(unsigned char*, unsigned char const*, unsigned int, ha_rkey_function))[0x556c2c2e11fc]

sql/handler.cc:2798(handler::ha_rnd_pos(unsigned char*, unsigned char*))[0x556c2c140c82]

sql/handler.h:3568(handler::rnd_pos_by_record(unsigned char*))[0x556c2c14855b]

sql/sql_class.h:6384(handler::ha_rnd_pos_by_record(unsigned char*))[0x556c2c219170]

sql/log_event.cc:14265(Update_rows_log_event::do_exec_row(rpl_group_info*))[0x556c2c2194f4]

sql/log_event.cc:11514(Rows_log_event::do_apply_event(rpl_group_info*))[0x556c2c20ddcc]

sql/log_event.h:5159(Log_event::apply_event(rpl_group_info*))[0x556c2bef5c5c]

sql/slave.cc:3920(apply_event_and_update_pos_apply(Log_event*, THD*, rpl_group_info*, int))[0x556c2bee83db]

sql/mysqld.h:808(thread_safe_increment64)[0x556c2c09073e]

sql/rpl_parallel.cc:1300(handle_rpl_parallel_thread)[0x556c2c0931fb]

pthread_create.c:0(start_thread)[0x7f384985bdd5]

*** stack smashing detected ***: /home/hundsun/mysql/mysql/bin/mysqld terminated

结论:是异常宕机,并且该异常不是系统定义,无法被MYSQL SERVER所捕获。


排查MYSQL BINLOG日志

在my.ini中指定当开启binlog日志,具体配置方式如下:

log-bin=mysql-bin  #开启binlog

重启MYSQL后会创建如下格式的binlog文件

mysql-bin.000001

mysql-bin.000002

MYSQL生成的binlog是二进制文件,无法直接查看。可以使用mysqlbinlog工具将binlog二进制数据导出进行查看,例如

./mysqlbinlog ../data/mysql-bin.000004 --start-datetime='2020-02-12 23:30:00' --stop-datetime='2025-02-12:23:50:00' --base64-output=DECODE-ROWS  > 4.txt

可以按照时间维度将数据导出到文件中进行查看。下面是2个binlog的导出数据


mysql-bin.000041这个文件有203M,最后的修改时间是2023-12-24 23:59:15。此binlog导出文件的末尾数据如下

#231224 23:59:15 server id 1  end_log_pos 83409660 CRC32 0x595aa790 Table_map: `acm`.`tb_lock` mapped to number 58

# at 83409660

#231224 23:59:15 server id 1  end_log_pos 83410156 CRC32 0xed3ad18a Update_rows: table id 58 flags: STMT_END_F

# Number of rows: 1

# at 83410156

#231224 23:59:15 server id 1  end_log_pos 83410187 CRC32 0x539c3bfa Xid = 1904727

COMMIT/*!*/;

DELIMITER ;

# End of log file

ROLLBACK /* added by mysqlbinlog */;

/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

mysql-bin.000042这个文件有1G,最后的修改时间是2023-12-25 19:03:03。此binlog导出文件的末尾数据如下

还有 27% 的精彩内容
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
禁止转载,如需转载请通过简信或评论联系作者。
支付 ¥10.00 继续阅读
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,125评论 6 498
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,293评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 162,054评论 0 351
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,077评论 1 291
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,096评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,062评论 1 295
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,988评论 3 417
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,817评论 0 273
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,266评论 1 310
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,486评论 2 331
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,646评论 1 347
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,375评论 5 342
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,974评论 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,621评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,796评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,642评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,538评论 2 352

推荐阅读更多精彩内容