服务器异常重启MYSQL数据库异常处理

in 互联网技术 with 0 comment  访问: 5,354 次

问题分析解决

昨天突然服务器重启了,最后导致的就是Zabbix的数据库MYSQL库表坏了,然后MYSQL就启动不了了。启动不了咋整,看log呗,报什么异常情况,查看error如下:

2017-09-21 14:41:18 4255 [Note] InnoDB: The InnoDB memory heap is disabled
2017-09-21 14:41:18 4255 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2017-09-21 14:41:18 4255 [Note] InnoDB: Compressed tables use zlib 1.2.3
2017-09-21 14:41:18 4255 [Note] InnoDB: CPU does not support crc32 instructions
2017-09-21 14:41:18 4255 [Note] InnoDB: Using Linux native AIO
2017-09-21 14:41:18 4255 [Note] InnoDB: Initializing buffer pool, size = 256.0M
2017-09-21 14:41:18 4255 [Note] InnoDB: Completed initialization of buffer pool
2017-09-21 14:41:18 4255 [Note] InnoDB: Highest supported file format is Barracuda.
2017-09-21 14:41:18 4255 [Note] InnoDB: Log scan progressed past the checkpoint lsn 491181006779
2017-09-21 14:41:18 4255 [Note] InnoDB: Database was not shutdown normally!
2017-09-21 14:41:18 4255 [Note] InnoDB: Starting crash recovery.
2017-09-21 14:41:18 4255 [Note] InnoDB: Reading tablespace information from the .ibd files...
2017-09-21 14:41:18 4255 [ERROR] InnoDB: Attempted to open a previously opened tablespace. Previous tablespace mysql/slave_master_info uses space ID: 4 at filepath: ./mysql/slave_master_info.ibd. Cannot open tablespace zabbix/groups which uses space ID: 4 at filepath: ./zabbix/groups.ibd
2017-09-21 14:41:18 7fa123271720  InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
InnoDB: If you are installing InnoDB, remember that you must create
InnoDB: directories yourself, InnoDB does not create them.
InnoDB: Error: could not open single-table tablespace file ./zabbix/groups.ibd
InnoDB: We do not continue the crash recovery, because the table may become
InnoDB: corrupt if we cannot apply the log records in the InnoDB log to it.
InnoDB: To fix the problem and start mysqld:
InnoDB: 1) If there is a permission problem in the file and mysqld cannot
InnoDB: open the file, you should modify the permissions.
InnoDB: 2) If the table is not needed, or you can restore it from a backup,
InnoDB: then you can remove the .ibd file, and InnoDB will do a normal
InnoDB: crash recovery and ignore that table.
InnoDB: 3) If the file system or the disk is broken, and you cannot remove
InnoDB: the .ibd file, you can set innodb_force_recovery > 0 in my.cnf
InnoDB: and force InnoDB to continue crash recovery here.
170921 14:41:18 mysqld_safe mysqld from pid file /data/appData/mysql/zabbix_server.pid ended
170921 14:43:16 mysqld_safe Starting mysqld daemon with databases from /data/appData/mysql
2017-09-21 14:43:16 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-09-21 14:43:16 4985 [Note] Plugin 'FEDERATED' is disabled.
2017-09-21 14:43:16 7fe0b7d51720 InnoDB: Warning: Using innodb_additional_mem_pool_size is DEPRECATED. This option may be removed in future releases, together with the option innodb_use_sys_malloc and with the InnoDB's internal memory allocator.

从log中可以看出来,一些InnoDB 表的表空间都有问题,log还明确的告诉你了you can set innodb_force_recovery > 0 in my.cnf and force InnoDB to continue crash recovery here.

那就试试呗,在/etc/my.cnf[mysqld]下增加 innodb_force_recovery = 1 试试,添加完成后,果然MYSQL启动ok了。

启动ok了,但是凭经验应该会发现好多表,是非ok状态的,具体情况如下:

mysql> check table groups;
+---------------+-------+----------+---------------------------------------------------------+
| Table         | Op    | Msg_type | Msg_text                                                |
+---------------+-------+----------+---------------------------------------------------------+
| zabbix.groups | check | Warning  | InnoDB: Tablespace is missing for table 'zabbix/groups' |
| zabbix.groups | check | Error    | Table 'zabbix.groups' doesn't exist                     |
| zabbix.groups | check | status   | Operation failed                                        |
+---------------+-------+----------+---------------------------------------------------------+
3 rows in set (0.00 sec)

果不其然,就拿上面的groups表状态来说,就是缺少表空间,那咋办,看看repair能够修复不:

mysql> repair table groups;
+---------------+--------+----------+---------------------------------------------------------+
| Table         | Op     | Msg_type | Msg_text                                                |
+---------------+--------+----------+---------------------------------------------------------+
| zabbix.groups | repair | Warning  | InnoDB: Tablespace is missing for table 'zabbix/groups' |
| zabbix.groups | repair | Error    | Table 'zabbix.groups' doesn't exist                     |
| zabbix.groups | repair | status   | Operation failed                                        |
+---------------+--------+----------+---------------------------------------------------------+
3 rows in set (0.00 sec)

但是不起作用啊,只能再次看看log然后分析问题了,log内容如下:

InnoDB: Error: could not open single-table tablespace file ./zabbix/groups.ibd
InnoDB: We do not continue the crash recovery, because the table may become
InnoDB: corrupt if we cannot apply the log records in the InnoDB log to it.

还是跟之前一样,没辙只能上Google找找资料看看了;从网上看到差不多的文章说需要设置如下:

innodb_force_recovery = 6
innodb_purge_thread = 1

先不管什么意思了,先配置上看看能不能修复表了,配置之后果然服务也启动起来了,表也修复好了,具体check table结果如下:

zabbix.acknowledges check   status  OK
zabbix.actions  check   status  OK
zabbix.alerts   check   status  OK
zabbix.application_discovery    check   status  OK
zabbix.application_prototype    check   status  OK
zabbix.application_template check   status  OK
zabbix.applications check   status  OK
zabbix.auditlog check   status  OK
zabbix.auditlog_details check   status  OK
zabbix.autoreg_host check   status  OK
zabbix.conditions   check   status  OK
zabbix.config   check   status  OK
zabbix.corr_condition   check   status  OK
zabbix.corr_condition_group check   status  OK
zabbix.corr_condition_tag   check   status  OK
zabbix.corr_condition_tagpair   check   status  OK
zabbix.corr_condition_tagvalue  check   status  OK
zabbix.corr_operation   check   status  OK
zabbix.correlation  check   status  OK
zabbix.dbversion    check   status  OK
zabbix.dchecks  check   status  OK
zabbix.dhosts   check   status  OK
zabbix.drules   check   status  OK
zabbix.dservices    check   status  OK
zabbix.escalations  check   status  OK
zabbix.event_recovery   check   status  OK
zabbix.event_tag    check   status  OK
zabbix.events   check   status  OK
zabbix.expressions  check   status  OK
zabbix.functions    check   status  OK
zabbix.globalmacro  check   status  OK
zabbix.globalvars   check   status  OK
zabbix.graph_discovery  check   status  OK
zabbix.graph_theme  check   status  OK
zabbix.graphs   check   status  OK
zabbix.graphs_items check   status  OK
zabbix.group_discovery  check   status  OK
zabbix.group_prototype  check   status  OK
zabbix.groups   check   status  OK
zabbix.history  check   status  OK
zabbix.history_log  check   status  OK
zabbix.history_str  check   status  OK
zabbix.history_text check   status  OK
zabbix.history_uint check   status  OK
zabbix.host_discovery   check   status  OK
zabbix.host_inventory   check   status  OK
zabbix.hostmacro    check   status  OK
zabbix.hosts    check   status  OK
zabbix.hosts_groups check   status  OK
zabbix.hosts_templates  check   status  OK
zabbix.housekeeper  check   status  OK
zabbix.httpstep check   status  OK
zabbix.httpstepitem check   status  OK
zabbix.httptest check   status  OK
zabbix.httptestitem check   status  OK
zabbix.icon_map check   status  OK
zabbix.icon_mapping check   status  OK
zabbix.ids  check   status  OK
zabbix.images   check   status  OK
zabbix.interface    check   status  OK
zabbix.interface_discovery  check   status  OK
zabbix.item_application_prototype   check   status  OK
zabbix.item_condition   check   status  OK
zabbix.item_discovery   check   status  OK
zabbix.items    check   status  OK
zabbix.items_applications   check   status  OK
zabbix.maintenances check   status  OK
zabbix.maintenances_groups  check   status  OK
zabbix.maintenances_hosts   check   status  OK
zabbix.maintenances_windows check   status  OK
zabbix.mappings check   status  OK
zabbix.media    check   status  OK
zabbix.media_type   check   status  OK
zabbix.opcommand    check   status  OK
zabbix.opcommand_grp    check   status  OK
zabbix.opcommand_hst    check   status  OK
zabbix.opconditions check   status  OK
zabbix.operations   check   status  OK
zabbix.opgroup  check   status  OK
zabbix.opinventory  check   status  OK
zabbix.opmessage    check   status  OK
zabbix.opmessage_grp    check   status  OK
zabbix.opmessage_usr    check   status  OK
zabbix.optemplate   check   status  OK
zabbix.problem  check   status  OK
zabbix.problem_tag  check   status  OK
zabbix.profiles check   status  OK
zabbix.proxy_autoreg_host   check   status  OK
zabbix.proxy_dhistory   check   status  OK
zabbix.proxy_history    check   status  OK
zabbix.regexps  check   status  OK
zabbix.rights   check   status  OK
zabbix.screen_user  check   status  OK
zabbix.screen_usrgrp    check   status  OK
zabbix.screens  check   status  OK
zabbix.screens_items    check   status  OK
zabbix.scripts  check   status  OK
zabbix.service_alarms   check   status  OK
zabbix.services check   status  OK
zabbix.services_links   check   status  OK
zabbix.services_times   check   status  OK
zabbix.sessions check   status  OK
zabbix.slides   check   status  OK
zabbix.slideshow_user   check   status  OK
zabbix.slideshow_usrgrp check   status  OK
zabbix.slideshows   check   status  OK
zabbix.sysmap_element_url   check   status  OK
zabbix.sysmap_url   check   status  OK
zabbix.sysmap_user  check   status  OK
zabbix.sysmap_usrgrp    check   status  OK
zabbix.sysmaps  check   status  OK
zabbix.sysmaps_elements check   status  OK
zabbix.sysmaps_link_triggers    check   status  OK
zabbix.sysmaps_links    check   status  OK
zabbix.task check   status  OK
zabbix.task_close_problem   check   status  OK
zabbix.timeperiods  check   status  OK
zabbix.trends   check   status  OK
zabbix.trends_uint  check   status  OK
zabbix.trigger_depends  check   status  OK
zabbix.trigger_discovery    check   status  OK
zabbix.trigger_tag  check   status  OK
zabbix.triggers check   status  OK
zabbix.users    check   status  OK
zabbix.users_groups check   status  OK
zabbix.usrgrp   check   status  OK
zabbix.valuemaps    check   status  OK

统计了一下Zabbix库下的127张表,状态都是OK的,那目前来看是没有问题了,手动查询了groups、history_text等几个表查询都是没有问题的。

: 在check table的时候,遇到大数据的表会比较慢,耐心等待即可!

看起来基本没有问题了,那咱就先把Zabbix Server启动起来看看吧,然后用tail命令MYSQL的error日志中还会有什么异常情况。

麻蛋,发现还有错误,日记如下:

2017-09-21 15:59:21 7f31b1d29700 InnoDB: Error: Table "mysql"."innodb_table_stats" not found.
2017-09-21 15:59:21 7f31b1d29700 InnoDB: Recalculation of persistent statistics requested for table "zabbix"."escalations" but the required persistent statistics storage is not present or is corrupted. Using transient stats instead.
2017-09-21 15:59:31 7f31b1d29700 InnoDB: Error: Table "mysql"."innodb_table_stats" not found.
2017-09-21 15:59:31 7f31b1d29700 InnoDB: Recalculation of persistent statistics requested for table "zabbix"."housekeeper" but the required persistent statistics storage is not present or is corrupted. Using transient stats instead.
2017-09-21 15:59:49 7f31b6933700  InnoDB: Error: table `mysql`.`innodb_table_stats` does not exist in the InnoDB internal
InnoDB: data dictionary though MySQL is trying to drop it.
InnoDB: Have you copied the .frm file of the table to the
InnoDB: MySQL database directory from another database?
InnoDB: You can look for further help from
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting.html
2017-09-21 15:31:05 7f31b68b1700 InnoDB: Error: Table "mysql"."innodb_index_stats" not found.
2017-09-21 15:31:05 7f31b68b1700 InnoDB: Error: Fetch of persistent statistics requested for table "zabbix"."problem_tag" but the required system tables mysql.innodb_table_stats and mysql.innodb_index_stats are not present or have unexpected structure. Using transient stats instead.
2017-09-21 16:00:04 8996 [Warning] InnoDB: Cannot open table mysql/slave_relay_log_info from the internal data dictionary of InnoDB though the .frm file for the table exists. See http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting.html for how you can resolve the problem.
2017-09-21 16:00:04 8996 [Warning] InnoDB: Cannot open table mysql/slave_master_info from the internal data dictionary of InnoDB though the .frm file for the table exists. See http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting.html for how you can resolve the problem.
2017-09-21 16:00:04 8996 [Warning] InnoDB: Cannot open table mysql/slave_worker_info from the internal data dictionary of InnoDB though the .frm file for the table exists. See http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting.html for how you can resolve the problem.

仔细看看都是一些系统表的报错innodb_table_stats,innodb_index_stats,slave_master_info,slave_worker_info 等表的报错。

这种情况之前遇到过,需要清理系统表数据和删除表,然后导入mysql_system_tables.sql即可,操作如下:

1、登录数据库,进入mysql库,执行如下SQL删除5张表

mysql> use mysql;
mysql> drop table if exists innodb_index_stats; 
mysql> drop table if exists innodb_table_stats; 
mysql> drop table if exists slave_master_info; 
mysql> drop table if exists slave_relay_log_info; 
mysql> drop table if exists slave_worker_info;

执行完后,可以用show tables查看一下,看表的数据是否已经比删除之前减少了,如果减少了,说明你成功了!

记住,一定要是drop table if exists

2、停止数据库,进入到数据库数据文件所在目录,删除上面5个表所对应的idb文件

/etc/init.d/mysqld stop
cd /data/appData/mysql/mysql
rm -rf innodb_index_stats.ibd innodb_table_stats.ibd slave_master_info.ibd slave_relay_log_info.ibd slave_worker_info.ibd

3、重新启动数据库,进入到mysql库,重建上面被删除的表结构
数据库的建表脚本在mysql软件的安装目录的share目录下或者mysql的安装包的script目录下,我的mysql软件的安装路径为/data/app/mysql-3307/

# /etc/init.d/mysqld start
# mysql -uxx -poo -P 3307
mysql> use mysql;
mysql> source /data/app/mysql-3307/share/mysql/mysql_system_tables.sql;
mysql> show tables;
+---------------------------+
| Tables_in_mysql           |
+---------------------------+
| columns_priv              |
| db                        |
| event                     |
| func                      |
| general_log               |
| help_category             |
| help_keyword              |
| help_relation             |
| help_topic                |
| innodb_index_stats        |
| innodb_table_stats        |
| ndb_binlog_index          |
| plugin                    |
| proc                      |
| procs_priv                |
| proxies_priv              |
| servers                   |
| slave_master_info         |
| slave_relay_log_info      |
| slave_worker_info         |
| slow_log                  |
| tables_priv               |
| time_zone                 |
| time_zone_leap_second     |
| time_zone_name            |
| time_zone_transition      |
| time_zone_transition_type |
| user                      |
+---------------------------+
28 rows in set (0.00 sec)

mysql> desc innodb_table_stats;
+--------------------------+---------------------+------+-----+-------------------+-----------------------------+
| Field                    | Type                | Null | Key | Default           | Extra                       |
+--------------------------+---------------------+------+-----+-------------------+-----------------------------+
| database_name            | varchar(64)         | NO   | PRI | NULL              |                             |
| table_name               | varchar(64)         | NO   | PRI | NULL              |                             |
| last_update              | timestamp           | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| n_rows                   | bigint(20) unsigned | NO   |     | NULL              |                             |
| clustered_index_size     | bigint(20) unsigned | NO   |     | NULL              |                             |
| sum_of_other_index_sizes | bigint(20) unsigned | NO   |     | NULL              |                             |
+--------------------------+---------------------+------+-----+-------------------+-----------------------------+
6 rows in set (0.00 sec)

说明表都正常了,再次查看mysql报错日志,就会发现没有了关于这系统表的报错日志,到这里就所有的故障和错误都处理好了。

分析总结

MYSQL参数之innodb_force_recovery
innodb_force_recovery影响整个InnoDB存储引擎的恢复状况。默认为0,表示当需要恢复时执行所有的.

innodb_force_recovery可以设置为1-6,大的数字包含前面所有数字的影响。当设置参数值大于0后,可以对表进行select,create,drop操作,但insert,update或者delete这类操作是不允许的。

  1. (SRV_FORCE_IGNORE_CORRUPT):忽略检查到的corrupt页。
  2. (SRV_FORCE_NO_BACKGROUND):阻止主线程的运行,如主线程需要执行full purge操作,会导致crash。
  3. (SRV_FORCE_NO_TRX_UNDO):不执行事务回滚操作。
  4. (SRV_FORCE_NO_IBUF_MERGE):不执行插入缓冲的合并操作。
  5. (SRV_FORCE_NO_UNDO_LOG_SCAN):不查看重做日志,InnoDB存储引擎会将未提交的事务视为已提交。
  6. (SRV_FORCE_NO_LOG_REDO):不执行前滚的操作。

官网介绍:https://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html

MYSQL参数之innodb_purge_threads
innodb_purge_threads 将purge线程从master线程分离出来,提高cpu使用率提升存储引擎性能,innodb1.2之后可以设置多个purge线程。

这里的一个重要知识点就是 对 innodb_force_recovery 参数的理解了,要是遇到数据损坏甚至是其他的损坏。可能上面的方法不行了,需要尝试另一个方法:insert into tb select * from ta limit X;甚至是dump出去,再load回来。

预防建议:
1、如果MYSQL所在机器配置了阵列卡的话,记得配电池
2、MYSQL配置文件记住设置双1

innodb_flush_log_at_trx_commit = 1
sync_binlog = 1

innodb_flush_log_at_trx_commit参数解析:
抱怨Innodb比MyISAM慢 100倍?那么你大概是忘了调整这个值。默认值1的意思是每一次事务提交或事务外的指令都需要把日志写入(flush)硬盘,这是很费时的,但是可以保障数据的高可用,特别是使用电池供电缓存(Battery backed up cache)时。设成2对于很多运用,特别是从MyISAM表转过来的是可以的,它的意思是不写入硬盘而是写入系统缓存。日志仍然会每秒flush到硬盘,所以你一般不会丢失超过1-2秒的更新。设成0会更快一点,但安全方面比较差,即使MySQL挂了也可能会丢失事务的数据。而值2只会在整个操作系统挂了时才可能丢数据。

sync_binlog参数解析:
sync_binlog=1 or N
This makes MySQL synchronize the binary log’s contents to disk each time it commits a transaction

默认情况下,并不是每次写入时都将binlog与硬盘同步。因此如果操作系统或机器(不仅仅是MySQL服务器)崩溃,有可能binlog中最后的语句丢失了。要想防止这种情况,你可以使用sync_binlog全局变量(1是最安全的值,但也是最慢的),使binlog在每N次binlog写入后与硬盘同步。即使sync_binlog设置为1,出现崩溃时,也有可能表内容和binlog内容之间存在不一致性。如果使用InnoDB表,MySQL服务器 处理COMMIT语句,它将整个事务写入binlog并将事务提交到InnoDB中。如果在两次操作之间出现崩溃,重启时,事务被InnoDB回滚,但仍 然存在binlog中。可以用--innodb-safe-binlog选项来增加InnoDB表内容和binlog之间的一致性。(注释:在MySQL 5.1中不需要--innodb-safe-binlog;由于引入了XA事务支持,该选项作废了),该选项可以提供更大程度的安全,使每个事务的 binlog(sync_binlog =1)和(默认情况为真)InnoDB日志与硬盘同步,该选项的效果是崩溃后重启时,在滚回事务后,MySQL服务器从binlog剪切回滚的 InnoDB事务。这样可以确保binlog反馈InnoDB表的确切数据等,并使从服务器保持与主服务器保持同步(不接收 回滚的语句)。

故障时建议
故障时,innodb_force_recovery设置从1-6,逐级递增,而不是一上来就设置为6。

WeZan