1.背景概述 目前需要搭建一個從庫,由於單表數據量較大,時間比較有限,考慮到導入導出的時間,並且GreatSQL支持並行load data的功能,能夠加速數據的導入,因此決定使用 select into outfile 和 load data 的方式進行數據的遷移; 在數據導入完成後進行數據同步,從 ...
1.背景概述
目前需要搭建一個從庫,由於單表數據量較大,時間比較有限,考慮到導入導出的時間,並且GreatSQL支持並行load data的功能,能夠加速數據的導入,因此決定使用 select into outfile 和 load data 的方式進行數據的遷移;
在數據導入完成後進行數據同步,從庫發生報錯 1032 找不到記錄,但是登錄從庫中可以查詢到此條記錄,這裡就很奇怪;
最後通過解析relaylog,根據relaylog中的update記錄,以每個欄位為查詢條件進行查詢,發現是由於NULL值列導致的,主庫這列的值是 NULL,從庫在導入後導成了字元串"NULL",因此導致回放update操作時匹配不到數據而報錯1032.
2.問題復現
本次測試基於 GreatSQL 8.0.32-24
2.1 初始化2個單機實例
略
2.2 主節點創建測試表
greatsql> create database test;
greatsql> use test;
greatsql> create table t1 (id int,
name varchar(30),
age int,
addr varchar(30),
school varchar(30),
unique key (id)) engine=innodb;
greatsql> insert into t1 values
(1,'小紅',10,'北京','一中'),
(2,'小綠',11,'北京','一中'),
(3,'小黃',12,'北京',NULL),
(4,'小藍',13,'北京',NULL),
(5,'小黑',14,'北京',NULL);
2.3 查看數據
greatsql> select * from t1;
+----+--------+------+--------+--------+
| id | name | age | addr | school |
+----+--------+------+--------+--------+
| 1 | 小紅 | 10 | 北京 | 一中 |
| 2 | 小綠 | 11 | 北京 | 一中 |
| 3 | 小黃 | 12 | 北京 | NULL |
| 4 | 小藍 | 13 | 北京 | NULL |
| 5 | 小黑 | 14 | 北京 | NULL |
+----+--------+------+--------+--------+
5 rows in set (0.00 sec)
2.4 主節點導出數據
greatsql> select * from test.t1 into outfile '/greatsql/t1.csv' FIELDS TERMINATED BY '|+|' ESCAPED BY '' LINES TERMINATED BY '/*rowsxxx*/';
2.5 查看導出的數據
$ cat t1.csv
1|+|小紅|+|10|+|北京|+|一中/*rowsxxx*/2|+|小綠|+|11|+|北京|+|一中/*rowsxxx*/3|+|小黃|+|12|+|北京|+|NULL/*rowsxxx*/4|+|小藍|+|13|+|北京|+|NULL/*rowsxxx*/5|+|小黑|+|14|+|北京|+|NULL/*rowsxxx
可以看到導出的數據中包含 NULL
2.6 從庫創建表並導入數據
greatsql> create database test;
use test;
create table t1 (id int,
name varchar(30),
age int,
addr varchar(30),
school varchar(30),
unique key (id)) engine=innodb;
導入數據
greatsql> load data infile '/greatsql/t1.csv' into table test.t1 fields terminated by '|+|' ESCAPED BY '' lines terminated by '/*rowsxxx*/';
2.7 從庫查詢數據
greatsql> select * from test.t1;
+----+--------+------+--------+--------+
| id | name | age | addr | school |
+----+--------+------+--------+--------+
| 1 | 小紅 | 10 | 北京 | 一中 |
| 2 | 小綠 | 11 | 北京 | 一中 |
| 3 | 小黃 | 12 | 北京 | NULL |
| 4 | 小藍 | 13 | 北京 | NULL |
| 5 | 小黑 | 14 | 北京 | NULL |
+----+--------+------+--------+--------+
5 rows in set (0.00 sec)
2.8 從庫建立複製
greatsql> reset master;
Query OK, 0 rows affected (0.04 sec)
greatsql> set global gtid_purged='b94e6517-68dd-11ee-b43b-00163ecb92e3:1-5755';
Query OK, 0 rows affected (0.00 sec)
greatsql> show master status;
+---------------+----------+--------------+------------------+---------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+---------------------------------------------+
| binlog.000001 | 153 | | | b94e6517-68dd-11ee-b43b-00163ecb92e3:1-5755 |
+---------------+----------+--------------+------------------+---------------------------------------------+
1 row in set (0.00 sec)
greatsql> change master to master_user='root',master_password='greatdb',master_host='192.168.137.162',master_port=6001,master_auto_position=1;
Query OK, 0 rows affected, 7 warnings (0.02 sec)
greatsql> start slave;
Query OK, 0 rows affected, 1 warning (0.04 sec)
greatsql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 192.168.137.162
Master_User: root
Master_Port: 6001
Connect_Retry: 60
Master_Log_File: binlog.000002
Read_Master_Log_Pos: 1861574
Relay_Log_File: relaylog.000002
Relay_Log_Pos: 395
Relay_Master_Log_File: binlog.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
......
2.9 主庫更新數據
greatsql> update test.t1 set name='小小黑' where id=5;
2.10 從庫查看複製狀態
greatsql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 172.17.137.162
Master_User: root
Master_Port: 6001
Connect_Retry: 60
Master_Log_File: binlog.000002
Read_Master_Log_Pos: 1863564
Relay_Log_File: relaylog.000002
Relay_Log_Pos: 395
Relay_Master_Log_File: binlog.000002
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1032
Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'b94e6517-68dd-11ee-b43b-00163ecb92e3:5756' at master log binlog.000002, end_log_pos 1863537. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Skip_Counter: 0
......
greatsql> select * from performance_schema.replication_applier_status_by_worker limit 1\G
*************************** 1. row ***************************
CHANNEL_NAME:
WORKER_ID: 1
THREAD_ID: NULL
SERVICE_STATE: OFF
LAST_ERROR_NUMBER: 1032
LAST_ERROR_MESSAGE: Worker 1 failed executing transaction 'b94e6517-68dd-11ee-b43b-00163ecb92e3:5756' at master log binlog.000002, end_log_pos 1863537; Could not execute Update_rows event on table test.t1; Can't find record in 't1', Error_code: 1032; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 1863537
LAST_ERROR_TIMESTAMP: 2023-10-17 10:02:46.396166
LAST_APPLIED_TRANSACTION:
LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION: b94e6517-68dd-11ee-b43b-00163ecb92e3:5756
APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2023-10-17 10:02:46.392331
APPLYING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2023-10-17 10:02:46.392331
APPLYING_TRANSACTION_START_APPLY_TIMESTAMP: 2023-10-17 10:02:46.393814
LAST_APPLIED_TRANSACTION_RETRIES_COUNT: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION_RETRIES_COUNT: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
1 row in set (0.00 sec)
可以看到從庫發生1032報錯,找不到記錄
2.11 解析從庫relay log
#231017 10:02:46 server id 1 end_log_pos 1863456 Table_map: `test`.`t1` mapped to number 180
\# has_generated_invisible_primary_key=0
\# at 673
\#231017 10:02:46 server id 1 end_log_pos 1863537 Update_rows: table id 180 flags: STMT_END_F
\### UPDATE `test`.`t1`
\### WHERE
\### @1=5 /* INT meta=0 nullable=1 is_null=0 */
\### @2='小黑' /* VARSTRING(120) meta=120 nullable=1 is_null=0 */
\### @3=14 /* INT meta=0 nullable=1 is_null=0 */
\### @4='北京' /* VARSTRING(120) meta=120 nullable=1 is_null=0 */
\### @5=NULL /* VARSTRING(120) meta=120 nullable=1 is_null=1 */
\### SET
\### @1=5 /* INT meta=0 nullable=1 is_null=0 */
\### @2='小小黑' /* VARSTRING(120) meta=120 nullable=1 is_null=0 */
\### @3=14 /* INT meta=0 nullable=1 is_null=0 */
\### @4='北京' /* VARSTRING(120) meta=120 nullable=1 is_null=0 */
\### @5=NULL /* VARSTRING(120) meta=120 nullable=1 is_null=1 */
\# at 754
\#231017 10:02:46 server id 1 end_log_pos 1863564 Xid = 5940
COMMIT/*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by greatsqlbinlog */ /*!*/;
DELIMITER ;
\# End of log file
可以看到update更新前的數據與從庫的數據一致,那麼為什麼還會報錯 找不到記錄呢?
2.12 根據relay log中的內容去從庫查詢數據
greatsql> select * from test.t1 where id=5;
+------+--------+------+--------+--------+
| id | name | age | addr | school |
+------+--------+------+--------+--------+
| 5 | 小黑 | 14 | 北京 | NULL |
+------+--------+------+--------+--------+
1 row in set (0.01 sec)
greatsql> select * from test.t1 where name='小黑';
+------+--------+------+--------+--------+
| id | name | age | addr | school |
+------+--------+------+--------+--------+
| 5 | 小黑 | 14 | 北京 | NULL |
+------+--------+------+--------+--------+
1 row in set (0.01 sec)
greatsql> select * from test.t1 where age=14;
+------+--------+------+--------+--------+
| id | name | age | addr | school |
+------+--------+------+--------+--------+
| 5 | 小黑 | 14 | 北京 | NULL |
+------+--------+------+--------+--------+
1 row in set (0.00 sec)
greatsql> select * from test.t1 where addr='北京';
+------+--------+------+--------+--------+
| id | name | age | addr | school |
+------+--------+------+--------+--------+
| 1 | 小紅 | 10 | 北京 | 一中 |
| 2 | 小綠 | 11 | 北京 | 一中 |
| 3 | 小黃 | 12 | 北京 | NULL |
| 4 | 小藍 | 13 | 北京 | NULL |
| 5 | 小黑 | 14 | 北京 | NULL |
+------+--------+------+--------+--------+
5 rows in set (0.00 sec)
greatsql> select * from test.t1 where school is null;
Empty set (0.01 sec)
greatsql> select * from test.t1 where school='null';
+------+--------+------+--------+--------+
| id | name | age | addr | school |
+------+--------+------+--------+--------+
| 3 | 小黃 | 12 | 北京 | NULL |
| 4 | 小藍 | 13 | 北京 | NULL |
| 5 | 小黑 | 14 | 北京 | NULL |
+------+--------+------+--------+--------+
3 rows in set (0.00 sec)
可以看到,根據null值作為查詢條件時,匹配不到數據; 根據字元串"null" 進行匹配是可以匹配到數據
2.13 去主庫進行查詢
greatsql> select * from test.t1 where school is null;
+------+-----------+------+--------+--------+
| id | name | age | addr | school |
+------+-----------+------+--------+--------+
| 3 | 小黃 | 12 | 北京 | NULL |
| 4 | 小藍 | 13 | 北京 | NULL |
| 5 | 小小黑 | 14 | 北京 | NULL |
+------+-----------+------+--------+--------+
3 rows in set (0.00 sec)
greatsql> select * from test.t1 where school='null';
Empty set (0.00 sec)
在主庫查詢的結果與從庫相反
可以得出結論,由於從庫導入的數據將NULL值列的數據導入成了字元串 NULL,因此導致主從數據出現了不一致。
2.14 修複從庫
greatsql> stop slave;
Query OK, 0 rows affected, 1 warning (0.00 sec)
greatsql> update test.t1 set school=NULL where school='null';
Query OK, 3 rows affected (0.01 sec)
Rows matched: 3 Changed: 3 Warnings: 0
greatsql> start slave;
Query OK, 0 rows affected, 1 warning (0.04 sec)
greatsql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 172.17.137.162
Master_User: root
Master_Port: 6001
Connect_Retry: 60
Master_Log_File: binlog.000002
Read_Master_Log_Pos: 1863564
Relay_Log_File: relaylog.000003
Relay_Log_Pos: 435
Relay_Master_Log_File: binlog.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
......
可以看到主從狀態已經恢復正常
3.總結
1.如果FIELDS ESCAPED BY字元為空字元,則沒有字元被轉義,並且NULL被作為NULL輸出,而不是\N;這也是導致此次主從報錯的原因。
2.如果這張表使用的是主鍵而不是唯一索引,即使某些列被導入為字元串NULL,也不會報錯。
3.如果這張表沒有索引或有普通索引,則會報錯。
Enjoy GreatSQL