Saturday, November 5, 2016

How to Recover Corrupted InnoDB Partition Tablespace in Replication Setup

This week I've got a question that sounded basically like this:
"Is it possible to just copy the entire partition from the replicated server?"
Let me share some background story. As it happens sometimes, user had a huge table with many partitions, let's say hundreds of gigabytes in size each, and one of them got unfortunately corrupted. It happened in a replication setup on master, but lucky they were, they had used innodb_file_per_table=1 and they had a slave that was more or less in sync with master. This allowed to reconfigure replication and continue to work, but the task remained to eventually put master back in use and get correct data in the corrupted partition. Let's assume that dumping and reloading data from one of instances in replication setup is not a desired option, as it will take too much time comparing to just copying the partition tablespace file. Hence the question above...
Side note: Let's assume for simplicity that corrupted partition does not get changes at the moment, but even if it does we can replay them theoretically (by careful processing of binary logs, for example).
My quick and simple answer was that surely it is possible with InnoDB from MySQL 5.6, and there is a great blog post that provide a lot of relevant details and steps, this one by my former colleague from Percona Jervin Real. I remember I had to deal with orphaned or corrupted partitions more than once in the past, while working for Percona. It is NOT possible to import them directly in MySQL 5.6, and I've even reported this as Bug #70196, but in 5.7.4+ after the fix for Bug #52422 it is possible. Anyway, nice partition exchange feature, as Jervin explained, helps to overcome this limitation easily in MySQL 5.6 as well.

User decided to try this, but surely dealing with corrupted InnoDB tables is not as easy as with non-corrupted ones. There many things that could go not as expected and require additional steps. For example, hardly any partition exchange is possible for a table with corrupted partition - I've suggested to do this for a clean table without any corruptions.

I still feel that things may go wrong, so I decided to make a test and show step by step how it works and what may go wrong in the process. To do this I've used one of replication sandboxes at hand, with MySQL 5.6.28 and, after making sure replication works on both slaves, created simple partitioned table on master with the following statements:
create table tp(id int, val int, ts datetime, primary key(id, ts)) partition by range (year(ts)) (partition p0 values less than (2006), partition p1 values less than (2015), partition px values less than maxvalue);
insert into tp values(1, 10, '2005-10-01');
insert into tp values(2, 10, '2014-10-02');
insert into tp values(3, 30, now());
This is what I've got on master:
master [localhost] {msandbox} (test) > select * from tp;
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  1 |   10 | 2005-10-01 00:00:00 |
|  2 |   10 | 2014-10-02 00:00:00 |
|  3 |   30 | 2016-11-05 17:34:30 |
+----+------+---------------------+
3 rows in set (0,00 sec)

master [localhost] {msandbox} (test) > select * from tp partition(px);
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  3 |   30 | 2016-11-05 17:34:30 |
+----+------+---------------------+
1 row in set (0,00 sec)

master [localhost] {msandbox} (test) > select * from tp partition(p0);
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  1 |   10 | 2005-10-01 00:00:00 |
+----+------+---------------------+
1 row in set (0,00 sec)

master [localhost] {msandbox} (test) > select * from tp partition(p1);
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  2 |   10 | 2014-10-02 00:00:00 |
+----+------+---------------------+
1 row in set (0,00 sec)

Then I checked that data are replicated on slave that I am going to corrupt soon: 
slave1 [localhost] {msandbox} (test) > select * from tp partition(p1);
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  2 |   10 | 2014-10-02 00:00:00 |
+----+------+---------------------+
1 row in set (0,00 sec)
It's time to stop MySQL server and corrupt the data in partition p1:
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ ./stop
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ ls -l data/test
total 152012
...

-rw-rw---- 1 openxs openxs     8610 лис  5 17:34 tp.frm
-rw-rw---- 1 openxs openxs       32 лис  5 17:34 tp.par
-rw-rw---- 1 openxs openxs    98304 лис  5 17:34 tp#P#p0.ibd
-rw-rw---- 1 openxs openxs    98304 лис  5 17:34 tp#P#p1.ibd
-rw-rw---- 1 openxs openxs    98304 лис  5 17:34 tp#P#px.ibd

openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ hexdump -C data/test/tp#P#p1.ibd | more
00000000  22 e2 b8 a3 00 00 00 00  00 00 00 00 00 00 00 00  |"...............|
00000010  00 00 00 00 0f f4 ae 86  00 08 00 00 00 00 00 00  |................|
00000020  00 00 00 00 00 0c 00 00  00 0c 00 00 00 00 00 00  |................|
00000030  00 06 00 00 00 40 00 00  00 00 00 00 00 04 00 00  |.....@..........|
00000040  00 00 ff ff ff ff 00 00  ff ff ff ff 00 00 00 00  |................|
00000050  00 01 00 00 00 00 00 9e  00 00 00 00 00 9e 00 00  |................|
...

openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ dd if=/dev/zero of=data/test/tp#P#p1.ibd bs=1 count=98304
98304+0 records in
98304+0 records out
98304 bytes (98 kB) copied, 0,253177 s, 388 kB/s
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ hexdump -C data/test/tp#P#p1.ibd | more
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00018000
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ ls -l data/test/tp*       
-rw-rw---- 1 openxs openxs  8610 лис  5 17:34 data/test/tp.frm
-rw-rw---- 1 openxs openxs    32 лис  5 17:34 data/test/tp.par
-rw-rw---- 1 openxs openxs 98304 лис  5 17:34 data/test/tp#P#p0.ibd
-rw-rw---- 1 openxs openxs 98304 лис  5 17:41 data/test/tp#P#p1.ibd
-rw-rw---- 1 openxs openxs 98304 лис  5 17:34 data/test/tp#P#px.ibd
I just filled entire partition file with zeroes and verified it's content  before and after with hexdump. I think it's bad enough kid of corruption - there is no way to recover data, they are gone entirely, and system areas of .ibd file will surely not match what InnoDB data dictionary may thing about the tablespace.

Now I can try to start slave back and access data in the table:
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ ./start
. sandbox server started
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ tail data/msandbox.err
2016-11-05 18:15:11 8092 [Note] Server hostname (bind-address): '127.0.0.1'; port: 22294
2016-11-05 18:15:11 8092 [Note]   - '127.0.0.1' resolves to '127.0.0.1';
2016-11-05 18:15:11 8092 [Note] Server socket created on IP: '127.0.0.1'.
2016-11-05 18:15:11 8092 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysql_sandbox22294-relay-bin' to avoid this problem.
2016-11-05 18:15:11 8092 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2016-11-05 18:15:11 8092 [Note] Slave I/O thread: connected to master 'rsandbox@127.0.0.1:22293',replication started in log 'mysql-bin.000005' at position 1015
2016-11-05 18:15:11 8092 [Note] Event Scheduler: Loaded 0 events
2016-11-05 18:15:11 8092 [Note] /home/openxs/5.6.28/bin/mysqld: ready for connections.
Version: '5.6.28-log'  socket: '/tmp/mysql_sandbox22294.sock'  port: 22294  MySQL Community Server (GPL)
2016-11-05 18:15:12 8092 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000005' at position 1015, relay log './mysql_sandbox22294-relay-bin.000236' position: 1178
The table was not accessed during recovery stage (as slave was stopped cleanly), so we have no hints in the error log about the corruption. But as soon as we try to access the table:

slave1 [localhost] {msandbox} (test) > select * from tp;
ERROR 2013 (HY000): Lost connection to MySQL server during query
In the error log I see:

InnoDB: Error: tablespace id is 12 in the data dictionary
InnoDB: but in file ./test/tp#P#p1.ibd it is 0!
2016-11-05 18:17:03 7f0e9c257700  InnoDB: Assertion failure in thread 139700725970688 in file fil0fil.cc line 796
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
16:17:03 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=1
max_threads=151
thread_count=3
connection_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 68108 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x276e860
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f0e9c256e50 thread_stack 0x40000
/home/openxs/5.6.28/bin/mysqld(my_print_stacktrace+0x35)[0x90f695]
/home/openxs/5.6.28/bin/mysqld(handle_fatal_signal+0x3d8)[0x674fc8]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f0eb0d99330]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f0eaf993c37]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f0eaf997028]
/home/openxs/5.6.28/bin/mysqld[0xae994e]
/home/openxs/5.6.28/bin/mysqld[0xae9ade]
/home/openxs/5.6.28/bin/mysqld[0xaf0419]
/home/openxs/5.6.28/bin/mysqld[0xabb7fb]
/home/openxs/5.6.28/bin/mysqld[0xabbe6b]
/home/openxs/5.6.28/bin/mysqld[0xaa996a]
/home/openxs/5.6.28/bin/mysqld[0xa9720f]
/home/openxs/5.6.28/bin/mysqld[0xa40b2e]
/home/openxs/5.6.28/bin/mysqld[0x99fca5]
/home/openxs/5.6.28/bin/mysqld[0x997d89]
/home/openxs/5.6.28/bin/mysqld[0x99f9d9]
/home/openxs/5.6.28/bin/mysqld(_ZN7handler11ha_rnd_nextEPh+0x9c)[0x58c52c]/home/openxs/5.6.28/bin/mysqld(_ZN12ha_partition8rnd_nextEPh+0x41)[0xb48441]
/home/openxs/5.6.28/bin/mysqld(_ZN7handler11ha_rnd_nextEPh+0x64)[0x58c4f4]
/home/openxs/5.6.28/bin/mysqld(_Z13rr_sequentialP11READ_RECORD+0x37)[0x8400a7]
/home/openxs/5.6.28/bin/mysqld(_Z10sub_selectP4JOINP13st_join_tableb+0x181)[0x6d3891]
/home/openxs/5.6.28/bin/mysqld(_ZN4JOIN4execEv+0x391)[0x6d16b1]
/home/openxs/5.6.28/bin/mysqld[0x718349]
/home/openxs/5.6.28/bin/mysqld(_Z12mysql_selectP3THDP10TABLE_LISTjR4ListI4ItemEPS4_P10SQL_I_ListI8st_orderESB_S7_yP13select_resultP18st_select_lex_unitP13st_select_lex+0xbc)[0x718e0c]
/home/openxs/5.6.28/bin/mysqld(_Z13handle_selectP3THDP13select_resultm+0x175)[0x719015]
/home/openxs/5.6.28/bin/mysqld[0x6f3769]
/home/openxs/5.6.28/bin/mysqld(_Z21mysql_execute_commandP3THD+0x3575)[0x6f7f25]
/home/openxs/5.6.28/bin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x338)[0x6fba58]
/home/openxs/5.6.28/bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0xc60)[0x6fce30]
/home/openxs/5.6.28/bin/mysqld(_Z10do_commandP3THD+0xd7)[0x6fec27]
/home/openxs/5.6.28/bin/mysqld(_Z24do_handle_one_connectionP3THD+0x116)[0x6c5da6]
/home/openxs/5.6.28/bin/mysqld(handle_one_connection+0x45)[0x6c5e85]
/home/openxs/5.6.28/bin/mysqld(pfs_spawn_thread+0x126)[0x989be6]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7f0eb0d91184]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f0eafa5737d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f0e74005050): select * from tp
Connection ID (thread ID): 3
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
161105 18:17:04 mysqld_safe Number of processes running now: 0
161105 18:17:04 mysqld_safe mysqld restarted
So, assertion failure happened because tablespace id of deliberately corrupted partition does not match InnoDB expectations in the data dictionary. Server was restart, but this time it was not successful:

...
2016-11-05 18:17:04 8146 [Note] InnoDB: Database was not shutdown normally!
2016-11-05 18:17:04 8146 [Note] InnoDB: Starting crash recovery.
2016-11-05 18:17:04 8146 [Note] InnoDB: Reading tablespace information from the .ibd files...
2016-11-05 18:17:04 8146 [ERROR] InnoDB: space header page consists of zero bytes in tablespace ./test/tp#P#p1.ibd (table test/tp#P#p1)
2016-11-05 18:17:04 8146 [Note] InnoDB: Page size:1024 Pages to analyze:64
2016-11-05 18:17:04 8146 [Note] InnoDB: Page size: 1024, Possible space_id count:0
2016-11-05 18:17:04 8146 [Note] InnoDB: Page size:2048 Pages to analyze:48
2016-11-05 18:17:04 8146 [Note] InnoDB: Page size: 2048, Possible space_id count:0
2016-11-05 18:17:04 8146 [Note] InnoDB: Page size:4096 Pages to analyze:24
2016-11-05 18:17:04 8146 [Note] InnoDB: Page size: 4096, Possible space_id count:0
2016-11-05 18:17:04 8146 [Note] InnoDB: Page size:8192 Pages to analyze:12
2016-11-05 18:17:04 8146 [Note] InnoDB: Page size: 8192, Possible space_id count:0
2016-11-05 18:17:04 8146 [Note] InnoDB: Page size:16384 Pages to analyze:6
2016-11-05 18:17:04 8146 [Note] InnoDB: Page size: 16384, Possible space_id count:0
2016-11-05 18:17:04 7f0f0f745780  InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
InnoDB: If you are installing InnoDB, remember that you must create
InnoDB: directories yourself, InnoDB does not create them.
InnoDB: Error: could not open single-table tablespace file ./test/tp#P#p1.ibd
InnoDB: We do not continue the crash recovery, because the table may become
InnoDB: corrupt if we cannot apply the log records in the InnoDB log to it.
InnoDB: To fix the problem and start mysqld:
InnoDB: 1) If there is a permission problem in the file and mysqld cannot
InnoDB: open the file, you should modify the permissions.
InnoDB: 2) If the table is not needed, or you can restore it from a backup,
InnoDB: then you can remove the .ibd file, and InnoDB will do a normal
InnoDB: crash recovery and ignore that table.
InnoDB: 3) If the file system or the disk is broken, and you cannot remove
InnoDB: the .ibd file, you can set innodb_force_recovery > 0 in my.cnf
InnoDB: and force InnoDB to continue crash recovery here.
161105 18:17:04 mysqld_safe mysqld from pid file /home/openxs/sandboxes/rsandbox_mysql-5_6_28/node1/data/mysql_sandbox22294.pid ended
Now, what shell we do? There is no reason to try to start again and again. We should check that partition .ibd file is there (it is), does have proper permissions (it does) and looks like a a proper tablespace. We can check that if we have proper tablespace file for the same partition elsewhere, search for known strings of data in the file etc. In this case hexdump already shown corruption of data without any chance to restore anything. Time to ask a question that started this post...

With the answer in mind (yes, we can get the data from partition .ibd file from non-corrupted table), how to proceed in practice? First of all, we need MySQL server that is started and works, so that we can run SQL statements. This one can not be started normally and, if we plan to restore data on it (for whatever reason) we have to start it first. To do so we have to try to use innodb_force_recovery settings starting for 1. Depending on how and when corruption happened 1 may not be enough. In general I would not recommend to proceed to any value more than 4. Also it makes sense to copy all files related to corrupted table and, ideally, shared tabespace to a safe place. Changes done with forced recovery may corrupt data even more, so it's good to have a way to roll back and start again. I copied all .ibd files for the table to /tmp directory, but in realirty make sure you just rename/move them inside the same physical filesystem, if you care about huge size and time to copy:
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ cp data/test/tp*.ibd /tmp/
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ ls -l /tmp/tp*.ibd
-rw-rw---- 1 openxs openxs 98304 лис  5 18:49 /tmp/tp#P#p0.ibd
-rw-rw---- 1 openxs openxs 98304 лис  5 18:49 /tmp/tp#P#p1.ibd
-rw-rw---- 1 openxs openxs 98304 лис  5 18:49 /tmp/tp#P#px.ibd
Then I set innodb_force_recovery to 1 in [mysqld] section of my configuration file and started the server:

openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ grep innodb_force my.sandbox.cnf
innodb_force_recovery=1
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ ./start
. sandbox server started
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ tail data/msandbox.err    
InnoDB: innodb_force_recovery is on: we do not allow
InnoDB: database modifications by the user. Shut down
InnoDB: mysqld and edit my.cnf so that
InnoDB: innodb_force_... is removed.
2016-11-05 18:25:25 8491 [ERROR] Error writing relay log configuration.
2016-11-05 18:25:25 8491 [ERROR] Error reading relay log configuration.
2016-11-05 18:25:25 8491 [ERROR] Failed to initialize the master info structure
2016-11-05 18:25:25 8491 [Note] Check error log for additional messages. You will not be able to start replication until the issue is resolved and the server restarted.
2016-11-05 18:25:25 8491 [Note] Event Scheduler: Loaded 0 events
2016-11-05 18:25:25 8491 [Note] /home/openxs/5.6.28/bin/mysqld: ready for connections.
Version: '5.6.28-log'  socket: '/tmp/mysql_sandbox22294.sock'  port: 22294  MySQL Community Server (GPL)
It seems server started, but when we try to access the table strange thing happens:
slave1 [localhost] {msandbox} (test) > select * from tp;
ERROR 1146 (42S02): Table 'test.tp' doesn't exist
slave1 [localhost] {msandbox} (test) > show tables;
+----------------+
| Tables_in_test |
+----------------+
| t              |
| t1             |
| t2             |
| t_local        |
| tp             |
+----------------+
5 rows in set (0,00 sec)

slave1 [localhost] {msandbox} (test) > show create table tp\G
ERROR 1146 (42S02): Table 'test.tp' doesn't exist
The table is NOT there and there at the same time. This indicates some problem with InnoDB data dictionary. In the error log we see:
2016-11-05 18:26:14 8491 [ERROR] InnoDB: Failed to find tablespace for table '"test"."tp" /* Partition "p1" */' in the cache. Attempting to load the tablespace with space id 12.
2016-11-05 18:26:14 8491 [ERROR] InnoDB: In file './test/tp#P#p1.ibd', tablespace id and flags are 0 and 0, but in the InnoDB data dictionary they are 12 and 0.
Have you moved InnoDB .ibd files around without using the commands DISCARD TABLESPACE and IMPORT TABLESPACE? Please refer to http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting-datadict.html for how to resolve the issue.
2016-11-05 18:26:14 7f434af03700  InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
2016-11-05 18:26:14 8491 [ERROR] InnoDB: Could not find a valid tablespace file for 'test/tp#P#p1'. See http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting-datadict.html for how to resolve the issue.
2016-11-05 18:26:14 7f434af03700 InnoDB: cannot calculate statistics for table "test"."tp" /* Partition "p1" */ because the .ibd file is missing. For help, please refer to http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting.html
2016-11-05 18:26:18 7f434af03700 InnoDB: cannot calculate statistics for table "test"."tp" /* Partition "p1" */ because the .ibd file is missing. For help, please refer to http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting.html
2016-11-05 18:26:51 7f434af03700 InnoDB: cannot calculate statistics for table "test"."tp" /* Partition "p1" */ because the .ibd file is missing. For help, please refer to http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting.html
Forced recovery mode is designed to be used for one of two main actions with the problematic table: either to select some remaining data from it (we tried and can not do this, and we don't case as we have data on other server in replication setup), or to drop it (we also copied .ibd files elsewhere if you remember). Nothing works with the table, but DROP does work:
slave1 [localhost] {msandbox} (test) > drop table tp;
Query OK, 0 rows affected (0,32 sec)
Now we can try to restart without forced recovery and try to recreate the table and link orphaned .ibd files back:

openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ ./stop
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ vi my.sandbox.cnf
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ ./start. sandbox server started
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ grep innodb_force my.sandbox.cnf
#innodb_force_recovery=1
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ tail data/msandbox.err   
2016-11-05 18:51:50 8861 [Note] Server hostname (bind-address): '127.0.0.1'; port: 22294
2016-11-05 18:51:50 8861 [Note]   - '127.0.0.1' resolves to '127.0.0.1';
2016-11-05 18:51:50 8861 [Note] Server socket created on IP: '127.0.0.1'.
2016-11-05 18:51:50 8861 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysql_sandbox22294-relay-bin' to avoid this problem.
2016-11-05 18:51:50 8861 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2016-11-05 18:51:50 8861 [Note] Slave I/O thread: connected to master 'rsandbox@127.0.0.1:22293',replication started in log 'mysql-bin.000005' at position 1015
2016-11-05 18:51:50 8861 [Note] Event Scheduler: Loaded 0 events
2016-11-05 18:51:50 8861 [Note] /home/openxs/5.6.28/bin/mysqld: ready for connections.
Version: '5.6.28-log'  socket: '/tmp/mysql_sandbox22294.sock'  port: 22294  MySQL Community Server (GPL)
2016-11-05 18:51:51 8861 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000005' at position 1015, relay log './mysql_sandbox22294-relay-bin.000237' position: 4
It looks promising so far. We can get DDL for the table from other server in replication setup (or any other source, like older dump) and try to create the table in a hurry:
master [localhost] {msandbox} (test) > show create table tp\G
*************************** 1. row ***************************
       Table: tp
Create Table: CREATE TABLE `tp` (
  `id` int(11) NOT NULL DEFAULT '0',
  `val` int(11) DEFAULT NULL,
  `ts` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (`id`,`ts`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (year(ts))
(PARTITION p0 VALUES LESS THAN (2006) ENGINE = InnoDB,
 PARTITION p1 VALUES LESS THAN (2015) ENGINE = InnoDB,
 PARTITION px VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
1 row in set (0,00 sec)

slave1 [localhost] {msandbox} (test) > show tables;
+----------------+
| Tables_in_test |
+----------------+
| t              |
| t1             |
| t2             |
| t_local        |
+----------------+
4 rows in set (0,00 sec)

slave1 [localhost] {msandbox} (test) > CREATE TABLE `tp` (   `id` int(11) NOT NULL DEFAULT '0',   `val` int(11) DEFAULT NULL,   `ts` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',   PRIMARY KEY (`id`,`ts`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE (year(ts)) (PARTITION p0 VALUES LESS THAN (2006) ENGINE = InnoDB,  PARTITION p1 VALUES LESS THAN (2015) ENGINE = InnoDB,  PARTITION px VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;
ERROR 1813 (HY000): Tablespace for table '`test`.`tp` /* Partition `p1` */' exists. Please DISCARD the tablespace before IMPORT.
So, looks like DROP TABLE worked but corrupted partition file remained? Yes, it remained:

openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ ls -l data/test/tp*.ibd 
-rw-rw---- 1 openxs openxs 98304 лис  5 17:41 data/test/tp#P#p1.ibd
We have to remove it and then try again:

openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ rm data/test/tp*.ibd

slave1 [localhost] {msandbox} (test) > CREATE TABLE `tp` (   `id` int(11) NOT NULL DEFAULT '0',   `val` int(11) DEFAULT NULL,   `ts` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',   PRIMARY KEY (`id`,`ts`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE (year(ts)) (PARTITION p0 VALUES LESS THAN (2006) ENGINE = InnoDB,  PARTITION p1 VALUES LESS THAN (2015) ENGINE = InnoDB,  PARTITION px VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;
Query OK, 0 rows affected (0,75 sec)
Now we can mostly follow Jervin's blog post and create non-partitioned table of the same structure with the following DDL statements:

create table tp_tmp like tp;
alter table tp_tmp remove partitioning;alter table tp_tmp discard tablespace;
At this moment we can re-import partitions one by one. In reality instead of copy we may use rename or move for those partitions that were non-corrupted.
slave1 [localhost] {msandbox} (test) > \! cp /tmp/tp#P#p0.ibd data/test/tp_tmp.ibd
slave1 [localhost] {msandbox} (test) > \! ls -l data/test/tp_tmp.*             
-rw-rw---- 1 openxs openxs  8610 лис  5 19:05 data/test/tp_tmp.frm
-rw-rw---- 1 openxs openxs 98304 лис  5 19:07 data/test/tp_tmp.ibd
slave1 [localhost] {msandbox} (test) > alter table tp_tmp import tablespace;
Query OK, 0 rows affected, 1 warning (0,34 sec)

slave1 [localhost] {msandbox} (test) > show warnings\G
*************************** 1. row ***************************
  Level: Warning
   Code: 1810
Message: InnoDB: IO Read error: (2, No such file or directory) Error opening './test/tp_tmp.cfg', will attempt to import without schema verification
1 row in set (0,00 sec)

slave1 [localhost] {msandbox} (test) > select * from tp_tmp;
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  1 |   10 | 2005-10-01 00:00:00 |
+----+------+---------------------+
1 row in set (0,00 sec)
We get the warning, but it can be safely ignored - we know for sure that table structure matches .ibd file imported. Now we are ready to exchange emprty partiton with proper partiton data in the table:

slave1 [localhost] {msandbox} (test) > alter table tp exchange partition p0 with table tp_tmp;
Query OK, 0 rows affected (0,51 sec)

slave1 [localhost] {msandbox} (test) > select * from tp_tmp;
Empty set (0,00 sec)

slave1 [localhost] {msandbox} (test) > select * from tp;
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  1 |   10 | 2005-10-01 00:00:00 |
+----+------+---------------------+
1 row in set (0,01 sec)
Then we can do the same for other partition(s) stored on the same server, all but corrupted one:
slave1 [localhost] {msandbox} (test) > alter table tp_tmp discard tablespace;
Query OK, 0 rows affected (0,05 sec)

slave1 [localhost] {msandbox} (test) > \! cp /tmp/tp#P#px.ibd data/test/tp_tmp.ibd
slave1 [localhost] {msandbox} (test) > alter table tp_tmp import tablespace;
Query OK, 0 rows affected, 1 warning (0,35 sec)

slave1 [localhost] {msandbox} (test) > select * from tp_tmp;
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  3 |   30 | 2016-11-05 17:34:30 |
+----+------+---------------------+
1 row in set (0,00 sec)

slave1 [localhost] {msandbox} (test) > alter table tp exchange partition px with table tp_tmp;Query OK, 0 rows affected (0,53 sec)

slave1 [localhost] {msandbox} (test) > select * from tp_tmp;                   
Empty set (0,00 sec)

slave1 [localhost] {msandbox} (test) > select * from tp;
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  1 |   10 | 2005-10-01 00:00:00 |
|  3 |   30 | 2016-11-05 17:34:30 |
+----+------+---------------------+
2 rows in set (0,00 sec)
The file for corrupted partititon we should copy form elsewhere, in my case it's a datadir on master sandbox instance:
slave1 [localhost] {msandbox} (test) > alter table tp_tmp discard tablespace;  
Query OK, 0 rows affected (0,04 sec)

slave1 [localhost] {msandbox} (test) > \! cp ../master/data/test/tp#P#p1.ibd data/test/tp_tmp.ibd
slave1 [localhost] {msandbox} (test) > alter table tp_tmp import tablespace;   
Query OK, 0 rows affected, 1 warning (0,36 sec)

slave1 [localhost] {msandbox} (test) > select * from tp_tmp;
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  2 |   10 | 2014-10-02 00:00:00 |
+----+------+---------------------+
1 row in set (0,00 sec)

slave1 [localhost] {msandbox} (test) > alter table tp exchange partition px with table tp_tmp;
ERROR 1737 (HY000): Found a row that does not match the partition
slave1 [localhost] {msandbox} (test) > alter table tp exchange partition p1 with table tp_tmp;
Query OK, 0 rows affected (0,47 sec)

slave1 [localhost] {msandbox} (test) > select * from tp_tmp;
Empty set (0,00 sec)

slave1 [localhost] {msandbox} (test) > select * from tp partition(p1);
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  2 |   10 | 2014-10-02 00:00:00 |
+----+------+---------------------+
1 row in set (0,01 sec)
As you can see from my small mistake above, if data imported do not match range partition condition, they are not imported and you get error. We are done with restoring the data and now we can checkif replication works. Let me change all the data on master:
master [localhost] {msandbox} (test) > update tp set val = val + 1;
Query OK, 3 rows affected (0,03 sec)
Rows matched: 3  Changed: 3  Warnings: 0

master [localhost] {msandbox} (test) > select * from tp;
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  1 |   11 | 2005-10-01 00:00:00 |
|  2 |   11 | 2014-10-02 00:00:00 |
|  3 |   31 | 2016-11-05 17:34:30 |
+----+------+---------------------+
3 rows in set (0,00 sec)
and make sure they are replicated to slave:
slave1 [localhost] {msandbox} (test) > select * from tp;
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  1 |   11 | 2005-10-01 00:00:00 |
|  2 |   11 | 2014-10-02 00:00:00 |
|  3 |   31 | 2016-11-05 17:34:30 |
+----+------+---------------------+
3 rows in set (0,00 sec)

In the error log we'll see only messages related to importing tablespaces, no errors of any kind:
...
2016-11-05 19:01:02 9201 [ERROR] InnoDB: The file './test/tp#P#p1.ibd' already exists though the corresponding table did not exist in the InnoDB data dictionary. Have you moved InnoDB .ibd files around without using the SQL commands DISCARD TABLESPACE and IMPORT TABLESPACE, or did mysqld crash in the middle of CREATE TABLE? You can resolve the problem by removing the file './test/tp#P#p1.ibd' under the 'datadir' of MySQL.
2016-11-05 19:07:58 9201 [Note] InnoDB: Sync to disk
2016-11-05 19:07:58 9201 [Note] InnoDB: Sync to disk - done!
2016-11-05 19:07:58 9201 [Note] InnoDB: Phase I - Update all pages
2016-11-05 19:07:58 9201 [Note] InnoDB: Sync to disk
2016-11-05 19:07:58 9201 [Note] InnoDB: Sync to disk - done!
2016-11-05 19:07:58 9201 [Note] InnoDB: Phase III - Flush changes to disk
2016-11-05 19:07:58 9201 [Note] InnoDB: Phase IV - Flush complete
2016-11-05 19:10:48 9201 [Note] InnoDB: Sync to disk
2016-11-05 19:10:48 9201 [Note] InnoDB: Sync to disk - done!
2016-11-05 19:10:48 9201 [Note] InnoDB: Phase I - Update all pages
2016-11-05 19:10:48 9201 [Note] InnoDB: Sync to disk
2016-11-05 19:10:48 9201 [Note] InnoDB: Sync to disk - done!
2016-11-05 19:10:48 9201 [Note] InnoDB: Phase III - Flush changes to disk
2016-11-05 19:10:49 9201 [Note] InnoDB: Phase IV - Flush complete
2016-11-05 19:12:42 9201 [Note] InnoDB: Sync to disk
2016-11-05 19:12:42 9201 [Note] InnoDB: Sync to disk - done!
2016-11-05 19:12:42 9201 [Note] InnoDB: Phase I - Update all pages
2016-11-05 19:12:42 9201 [Note] InnoDB: Sync to disk
2016-11-05 19:12:42 9201 [Note] InnoDB: Sync to disk - done!
2016-11-05 19:12:42 9201 [Note] InnoDB: Phase III - Flush changes to disk
2016-11-05 19:12:42 9201 [Note] InnoDB: Phase IV - Flush complete
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$
I am too tired to write a nice summary right now, so I'll write it in some other post. Now I (and you, my readers) have a detailed enough reference of all boring steps involved in recovery of corrupted partition from .ibd file taken from other server in replication setup.

2 comments:

  1. Hello, Valeriy!
    Fine post, thanks for that.
    Misprint: "messages related to imprting tablespaces"

    ReplyDelete