一、問題發現 在一次數據遷移中,用到了INSERT INTO t1 SELECT * FROM t2這樣的 SQL 用來搬遷大表,為了提高插入效率關閉了Binlog,考慮用多線程來插入提高速度。表的類型信息和插入效率如下所示。 測試環境: Linux node-76-11 4.19.90-17.ky ...
一、問題發現
在一次數據遷移中,用到了INSERT INTO t1 SELECT * FROM t2
這樣的 SQL 用來搬遷大表,為了提高插入效率關閉了Binlog,考慮用多線程來插入提高速度。表的類型信息和插入效率如下所示。
測試環境:
- Linux node-76-11 4.19.90-17.ky10.aarch64,128核CPU,512G記憶體。
GreatSQL參數配置如下(為降低 I/O 因素影響,關閉 Binlog):
#**********************Performance*********************
#******connect
max_connections=10000
max_connect_errors=1000000
open_files_limit=65535
back_log=1500
table_definition_cache=10000
thread_stack=256K
thread_cache_size=3000
#******session
sort_buffer_size=4M
join_buffer_size=4M
read_buffer_size=4M
read_rnd_buffer_size=4M
bulk_insert_buffer_size=64M
tmp_table_size=64M
max_heap_table_size=64M
net_buffer_length=16K
max_allowed_packet=1G
#******timeout
lock_wait_timeout=600
connect_timeout=10
interactive_timeout=31536000
wait_timeout=31536000
net_read_timeout=86400
net_write_timeout=86400
net_retry_count=10
#**********************InnoDB**************************
transaction_isolation=READ-COMMITTED
innodb_buffer_pool_size=200G
innodb_buffer_pool_instances=16
innodb_max_dirty_pages_pct=90
innodb_flush_log_at_trx_commit=0
innodb_log_buffer_size=1G
innodb_page_cleaners=8
innodb_buffer_pool_dump_at_shutdown=ON
innodb_buffer_pool_load_at_startup=ON
innodb_buffer_pool_dump_pct=100
innodb_checksum_algorithm=NONE
innodb_log_checksums=NO
innodb_undo_log_truncate=OFF
innodb_change_buffering = none
innodb_spin_wait_delay=6
innodb_spin_wait_pause_multiplier=50
innodb_sync_spin_loops=30
#******feature
innodb_open_files=65535
innodb_flush_method=O_DIRECT
innodb_flush_neighbors=0
innodb_flush_sync=ON
innodb_io_capacity=20000
innodb_io_capacity_max=40000
innodb_lru_scan_depth=9000
innodb_lock_wait_timeout=30
innodb_print_all_deadlocks=ON
innodb_online_alter_log_max_size=4G
innodb_thread_concurrency=0
innodb_read_io_threads=32
innodb_write_io_threads=32
innodb_doublewrite=ON
innodb_doublewrite_pages=64
innodb_adaptive_hash_index=OFF
innodb_status_file=OFF
1、窄表 + 有自增主鍵
greatsql> CREATE TABLE t1 (
c1 int invisible auto_increment primary key,
c2 int,
str1 int DEFAULT(100) NOT NULL,
str2 int DEFAULT(100) NOT NULL,
str3 int DEFAULT(100) NOT NULL,
str4 int DEFAULT(100) NOT NULL
) engine=InnoDB;
greatsql> CREATE TABLE t2 LIKE t1;
行平均長度約 30 位元組
行數 | 插入sql | 線程數 | 總用時 | 解釋 |
---|---|---|---|---|
1000萬行 | insert into t2 select * from t1; | 單線程 | 1 min 46.47 sec | |
1000萬行 | insert into t2 select * from t1; | 2 | 1 min 15.45 sec | |
1000萬行 | insert into t2 select * from t1; | 6 | 55.74 sec | |
1000萬行 | insert into t2 select * from t1; | 10 | 45.34 sec | |
1000萬行 | insert into t2 select * from t1; | 16 | 43.78 sec | 最佳 |
1000萬行 | insert into t2 select * from t1; | 2 | 1 min 11.62 sec | |
1000萬行 | insert into t2 select * from t1; | 6 | 1 min 2.53 sec | |
1000萬行 | insert into t2 select * from t1; | 10 | 1 min 0.55 sec | |
1000萬行 | insert into t2 select * from t1; | 16 | 55.46 sec | |
1000萬行 | insert into t2 select * from t1; | 2 | 1 min 24.13 sec | |
1000萬行 | insert into t2 select * from t1; | 6 | 1 min 14.43 sec | |
1000萬行 | insert into t2 select * from t1; | 10 | 1 min 10.64 sec |
2、中等寬度表 + 有自增主鍵
greatsql> CREATE TABLE t1 (
c1 int invisible auto_increment primary key,
c2 int,
str1 varchar(512) CHARACTER SET latin1 DEFAULT(repeat('a',512)) NOT NULL
) engine=InnoDB;
greatsql> CREATE TABLE t2 LIKE t1;
行平均長度約 500多位元組
行數 | 插入sql | 線程數 | 總用時 | 解釋 |
---|---|---|---|---|
1000萬行 | insert into t2 select * from t1; | 單線程 | 4 min 30.41 sec | |
1000萬行 | insert into t2 select * from t1; | 2 | 4 min 36.58 sec | 效率無提升 |
1000萬行 | insert into t2 select * from t1; | 6 | 4 min 46.58 sec | 效率無提升 |
1000萬行 | insert into t2 select * from t1; | 10 | 4 min 50.43 sec | 效率無提升 |
1000萬行 | insert into t2 select * from t1; | 2 | 4 min 26.28 sec | 效率無提升 |
1000萬行 | insert into t2 select * from t1; | 6 | 4 min 38.84 sec | 效率無提升 |
1000萬行 | insert into t2 select * from t1; | 10 | 4 min 49.42 sec | 效率無提升 |
1000萬行 | insert into t2 select * from t1; | 2 | 4 min 25.67 sec | 效率無提升 |
1000萬行 | insert into t2 select * from t1; | 6 | 4 min 40.42 sec | 效率無提升 |
1000萬行 | insert into t2 select * from t1; | 10 | 4 min 46.70 sec | 效率無提升 |
3、寬表 + 有自增主鍵
greatsql> CREATE TABLE t1 (
c1 int invisible auto_increment primary key,
c2 int,
str1 varchar(1024) CHARACTER SET latin1 DEFAULT(repeat('a1',512)) NOT NULL,
str2 varchar(1024) CHARACTER SET latin1 DEFAULT(repeat('b2',512)) NOT NULL,
str3 varchar(1024) CHARACTER SET latin1 DEFAULT(repeat('c3',512)) NOT NULL,
str4 varchar(1024) CHARACTER SET latin1 DEFAULT(repeat('c3',512)) NOT NULL
) engine=InnoDB;
greatsql> CREATE TABLE t2 LIKE t1;
行平均長度約 5000多位元組
行數 | 插入sql | 線程數 | 總用時/s | 解釋 |
---|---|---|---|---|
100萬行 | insert into t2 select * from t1; | 單線程 | 4 min 32.90 sec | |
100萬行 | insert into t2 select * from t1; | 2 | 4 min 52.36 sec | 性能劣化 |
100萬行 | insert into t2 select * from t1; | 2 | 4 min 52.31 sec | 性能劣化 |
100萬行 | insert into t2 select * from t1; | 6 | 5 min 36.07 sec | 性能劣化 |
100萬行 | insert into t2 select * from t1; | 2 | 4 min 48.34 sec | 性能劣化 |
4、窄表 + 無主鍵(GIPK關)
greatsql> CREATE TABLE t1 (
c1 int invisible auto_increment primary key,
c2 int,
str1 int DEFAULT(100) NOT NULL,
str2 int DEFAULT(100) NOT NULL,
str3 int DEFAULT(100) NOT NULL,
str4 int DEFAULT(100) NOT NULL
) engine=InnoDB;
greatsql> CREATE TABLE t2 (
c2 int,
str1 int DEFAULT(100) NOT NULL,
str2 int DEFAULT(100) NOT NULL,
str3 int DEFAULT(100) NOT NULL,
str4 int DEFAULT(100) NOT NULL
) engine=InnoDB;
行平均寬度約 30 位元組,沒用 GIPKs 特性
行數 | 插入sql | 線程數 | 總用時 | 解釋 |
---|---|---|---|---|
1000萬行 | insert into t2 select * from t1; | 單線程 | 1 min 29.08 sec | |
1000萬行 | insert into t2 select * from t1; | 2 | 1 min 38.84 sec | 性能劣化 |
1000萬行 | insert into t2 select * from t1; | 6 | 2 min 18.88 sec | 性能劣化 |
1000萬行 | insert into t2 select * from t1; | 10 | 2 min 14.51 sec | 性能劣化 |
1000萬行 | insert into t2 select * from t1; | 2 | 1 min 35.96 sec | 性能劣化 |
1000萬行 | insert into t2 select * from t1; | 6 | 2 min 3.55 sec | 性能劣化 |
1000萬行 | insert into t2 select * from t1; | 10 | 2 min 1.52 sec | 性能劣化 |
5、結論
從上面的測試結果可以看出,無主鍵表併發插入劣化嚴重,有主鍵的情況下,窄表的併發性能提升效果好,但是對於寬表反而會造成劣化。
二、問題調查過程
1、首先調查無主鍵的情況下長數據造成性能劣化的情況。用perf查看造成性能劣化的瓶頸在哪裡。
+ 12.55% 0.01% mysqld mysqld [.] lock_clust_rec_read_check_and_lock
+ 12.34% 0.13% mysqld [kernel.kallsyms] [k] x64_sys_call
+ 12.03% 0.01% mysqld mysqld [.] Buf_fetch<Buf_fetch_normal>::mtr_add_page
+ 11.92% 0.01% mysqld mysqld [.] btr_cur_ins_lock_and_undo
+ 11.27% 0.19% mysqld [kernel.kallsyms] [k] __x64_sys_futex
- 11.18% 11.11% mysqld mysqld [.] ut_delay
10.89% start_thread
pfs_spawn_thread
handle_connection
do_command
dispatch_command
dispatch_sql_command
mysql_execute_command
Sql_cmd_dml::execute
Sql_cmd_dml::execute_inner
Query_expression::execute
- Query_expression::ExecuteIteratorQuery
- 10.75% Query_result_insert::send_data
write_record
handler::ha_write_row
ha_innobase::write_row
row_insert_for_mysql
row_insert_for_mysql_using_ins_graph
row_ins_step
row_ins
row_ins_index_entry_step
row_ins_index_entry
row_ins_clust_index_entry
- row_ins_clust_index_entry_low
- 10.16% btr_pcur_t::open
- btr_cur_search_to_nth_level
- 10.10% buf_page_get_gen
- 10.08% Buf_fetch<Buf_fetch_normal>::single_page
- 10.07% Buf_fetch<Buf_fetch_normal>::mtr_add_page
rw_lock_x_lock_gen
- pfs_rw_lock_x_lock_func
- 10.07% rw_lock_x_lock_func
ut_delay
+ 11.08% 0.21% mysqld [kernel.kallsyms] [k] do_futex
+ 10.90% 0.02% mysqld mysqld [.] rw_lock_x_lock_func
+ 10.90% 0.00% mysqld mysqld [.] pfs_rw_lock_x_lock_func
+ 10.90% 0.01% mysqld mysqld [.] rw_lock_x_lock_gen
+ 9.44% 0.01% mysqld mysqld [.] locksys::owns_page_shard
+ 9.39% 0.05% mysqld mysqld [.] locksys::Latches::owns_page_shard
+ 9.29% 0.64% mysqld mysqld [.] PolicyMutex<TTASEventMutex<GenericPolicy> >::trylock
+ 7.66% 0.02% mysqld mysqld [.] locksys::rec_queue_latch_and_validate
+ 7.65% 0.17% mysqld mysqld [.] rw_lock_debug_mutex_exit
+ 7.39% 0.07% mysqld mysqld [.] trx_undo_report_row_operation
+ 7.17% 0.01% mysqld mysqld [.] buf_pool_validate_instance
+ 7.17% 0.00% mysqld mysqld [.] buf_validate
- 6.63% 5.87% mysqld mysqld [.] unlikely
- 5.84% start_thread
pfs_spawn_thread
handle_connection
do_command
dispatch_command
dispatch_sql_command
mysql_execute_command
Sql_cmd_dml::execute
Sql_cmd_dml::execute_inner
Query_expression::execute
- Query_expression::ExecuteIteratorQuery
- 4.44% Query_result_insert::send_data
write_record
handler::ha_write_row
ha_innobase::write_row
- row_insert_for_mysql
- 4.43% row_insert_for_mysql_using_ins_graph
- 4.42% row_ins_step
- 4.41% row_ins
- 4.41% row_ins_index_entry_step
- 4.35% row_ins_index_entry
row_ins_clust_index_entry
- row_ins_clust_index_entry_low
- 1.89% btr_pcur_t::open
- btr_cur_search_to_nth_level
- 1.08% page_cur_search_with_match
0.52% dtuple_t::compare
- 1.87% btr_cur_optimistic_insert
1.13% page_cur_tuple_insert
0.56% btr_cur_ins_lock_and_undo
+ 1.40% TableScanIterator::Read
0.76% unlikely
+ 6.38% 0.45% mysqld [kernel.kallsyms] [k] futex_wait
+ 5.88% 0.02% mysqld mysqld [.] Buf_fetch_normal::get
+ 5.76% 0.01% mysqld mysqld [.] lock_rec_lock
可以看到多線程數據插入時候,在無主鍵的情況下頻繁的索引分裂影響性能,所以導致性能劣化嚴重。
那麼有自增主鍵的情況下性能是什麼情況呢?
看一下有自增主鍵的情況下性能情況,在有自增主鍵的時候與上面無主鍵的性能瓶頸處差不多,但是沒有那麼嚴重的頻繁的索引分裂的性能問題。
+ 10.16% 0.67% mysqld mysqld [.] PolicyMutex<TTASEventMutex<GenericPolicy> >::trylock
+ 9.25% 0.03% mysqld mysqld [.] trx_undo_report_row_operation
+ 8.49% 0.31% mysqld [kernel.kallsyms] [k] futex_wait
+ 7.77% 0.00% mysqld libstdc++.so.6.0.32 [.] 0x00007ffff7958793
+ 7.59% 0.17% mysqld mysqld [.] rw_lock_debug_mutex_exit
- 7.30% 6.87% mysqld mysqld [.] unlikely
6.58% start_thread
pfs_spawn_thread
handle_connection
do_command
dispatch_command
dispatch_sql_command
mysql_execute_command
Sql_cmd_dml::execute
Sql_cmd_dml::execute_inner
Query_expression::execute
- Query_expression::ExecuteIteratorQuery
- 5.01% Query_result_insert::send_data
- 5.00% write_record
handler::ha_write_row
- ha_innobase::write_row
- 4.99% row_insert_for_mysql
- row_insert_for_mysql_using_ins_graph
- 4.96% row_ins_step
- row_ins
- 4.95% row_ins_index_entry_step
- 4.91% row_ins_index_entry
row_ins_clust_index_entry
- row_ins_clust_index_entry_low
- 2.63% btr_cur_optimistic_insert
- 1.35% page_cur_tuple_insert
0.67% page_cur_insert_rec_low
- 0.87% btr_cur_ins_lock_and_undo
0.59% trx_undo_report_row_operation
- 1.22% btr_pcur_t::open
1.20% btr_cur_search_to_nth_level
- 0.55% btr_cur_pessimistic_insert
- 0.55% btr_page_split_and_insert
0.51% page_validate
+ 1.57% TableScanIterator::Read
+ 7.06% 0.03% mysqld mysqld [.] mtr_t::commit
+ 7.02% 0.00% mysqld mysqld [.] buf_validate
+ 7.01% 0.02% mysqld mysqld [.] buf_pool_validate_instance
我們知道對自增主鍵插入新數據時,是先獲取一個區段鎖,這樣可以避免頻繁持鎖造成的性能劣化。而無顯式定義主鍵表(其他可用於聚集索引的非空唯一索引也沒有)時,會採用實例級的 DB_ROW_ID 作為該表的聚集索引,這個 DB_ROW_ID 每插入一行都需要請求加鎖,因此會比自增主鍵表更加耗時。所以上面可以看到無主鍵表的多線程插入性能反而比單線程劣化。
2、接著調查有主鍵的情況下寬表造成性能劣化的情況。用perf查看造成性能劣化的瓶頸在哪裡。
+ 12.66% 0.05% mysqld [kernel.kallsyms] [k] do_syscall_64
- 12.65% 12.30% mysqld mysqld [.] ut_delay
- 7.46% start_thread
pfs_spawn_thread
handle_connection
do_command
dispatch_command
dispatch_sql_command
mysql_execute_command
Sql_cmd_dml::execute
Sql_cmd_dml::execute_inner
Query_expression::execute
- Query_expression::ExecuteIteratorQuery
- 7.27% Query_result_insert::send_data
write_record
handler::ha_write_row
ha_innobase::write_row
row_insert_for_mysql
row_insert_for_mysql_using_ins_graph
row_ins_step
row_ins
row_ins_index_entry_step
row_ins_index_entry
row_ins_clust_index_entry
- row_ins_clust_index_entry_low
- 4.53% btr_pcur_t::open
- btr_cur_search_to_nth_level
- 2.52% mtr_t::sx_lock
rw_lock_sx_lock_gen
- pfs_rw_lock_sx_lock_func
+ 2.52% rw_lock_sx_lock_func
- 1.85% buf_page_get_gen
- 1.84% Buf_fetch<Buf_fetch_normal>::single_page
- 1.81% Buf_fetch<Buf_fetch_normal>::mtr_add_page
rw_lock_s_lock_gen
rw_lock_s_lock_func
rw_lock_s_lock_spin
ut_delay
- 2.07% btr_height_get
btr_root_block_get
btr_block_get
btr_block_get_func
- buf_page_get_gen
- 2.06% Buf_fetch<Buf_fetch_normal>::single_page
- 2.05% Buf_fetch<Buf_fetch_normal>::mtr_add_page
rw_lock_s_lock_gen
rw_lock_s_lock_func
rw_lock_s_lock_spin
ut_delay
0.50% btr_cur_optimistic_insert
- 4.85% 0x7ffff7958793
- 4.70% std::thread::_State_impl<std::thread::_Invoker<std::tuple<Detached_thread, void (*)(unsigned long), unsigned long> > >::_M_run
std::thread::_Invoker<std::tuple<Detached_thread, void (*)(unsigned long), unsigned long> >::operator()
std::thread::_Invoker<std::tuple<Detached_thread, void (*)(unsigned long), unsigned long> >::_M_invoke<0ul, 1ul, 2ul>
std::__invoke<Detached_thread, void (*)(unsigned long), unsigned long>
std::__invoke_impl<void, Detached_thread, void (*)(unsigned long), unsigned long>
Detached_thread::operator()<void (*)(unsigned long), unsigned long>
std::_Bind<void (*(unsigned long))(unsigned long)>::operator()<, void>
std::_Bind<void (*(unsigned long))(unsigned long)>::__call<void, , 0ul>
std::__invoke<void (*&)(unsigned long), unsigned long&>
- std::__invoke_impl<void, void (*&)(unsigned long), unsigned long&>
- 4.53% io_handler_thread
- fil_aio_wait
- 4.14% buf_page_io_complete
- 1.48% buf_flush_write_complete
- 1.48% mutex_enter_inline<PolicyMutex<TTASEventMutex<GenericPolicy> > >
PolicyMutex<TTASEventMutex<GenericPolicy> >::enter
TTASEventMutex<GenericPolicy>::enter
TTASEventMutex<GenericPolicy>::spin_and_try_lock
TTASEventMutex<GenericPolicy>::is_free
ut_delay
- 1.38% ibuf_merge_or_delete_for_page
- 1.19% ibuf_bitmap_get_map_page
buf_page_get_gen
Buf_fetch<Buf_fetch_normal>::single_page
Buf_fetch<Buf_fetch_normal>::mtr_add_page
rw_lock_x_lock_gen
pfs_rw_lock_x_lock_func
rw_lock_x_lock_func
ut_delay
- 1.04% mutex_enter_inline<PolicyMutex<TTASEventMutex<GenericPolicy> > >
PolicyMutex<TTASEventMutex<GenericPolicy> >::enter
TTASEventMutex<GenericPolicy>::enter
TTASEventMutex<GenericPolicy>::spin_and_try_lock
TTASEventMutex<GenericPolicy>::is_free
ut_delay
+ 12.57% 0.41% mysqld [kernel.kallsyms] [k] x64_sys_call
+ 9.93% 0.00% mysqld mysqld [.] buf_validate
+ 9.91% 0.06% mysqld mysqld [.] buf_pool_validate_instance
+ 9.73% 0.11% mysqld [kernel.kallsyms] [k] __x64_sys_futex
+ 9.72% 0.30% mysqld mysqld [.] rw_lock_debug_mutex_enter
+ 9.61% 0.13% mysqld [kernel.kallsyms] [k] do_futex
+ 8.23% 0.03% mysqld mysqld [.] TableScanIterator::Read
+ 8.20% 0.02% mysqld mysqld [.] handler::ha_rnd_next
+ 8.17% 0.00% mysqld mysqld [.] ha_innobase::rnd_next
+ 8.17% 0.00% mysqld mysqld [.] ha_innobase::general_fetch
+ 8.08% 0.05% mysqld mysqld [.] row_search_mvcc
+ 7.75% 0.02% mysqld mysqld [.] buf_page_t::Latching_rules_helpers::assert_latches_let_distinguish
+ 7.71% 0.03% mysqld mysqld [.] buf_page_t::is_io_fix_write
+ 7.41% 0.04% mysqld mysqld [.] ut::Stateful_latching_rules<buf_io_fix, 3ul>::assert_latches_let_distinguish
+ 7.10% 0.01% mysqld mysqld [.] btr_cur_optimistic_insert
+ 6.98% 0.02% mysqld mysqld [.] Buf_fetch_normal::get
+ 6.76% 0.29% mysqld mysqld [.] PolicyMutex<TTASEventMutex<GenericPolicy> >::trylock
+ 6.50% 0.02% mysqld mysqld [.] Buf_fetch<Buf_fetch_normal>::mtr_add_page
+ 6.15% 0.04% mysqld mysqld [.] ut::Stateful_latching_rules<buf_io_fix, 3ul>::assert_latches_let_distinguish
+ 6.11% 0.02% mysqld mysqld [.] ut::Stateful_latching_rules<buf_io_fix, 3ul>::can_leave
+ 5.92% 0.27% mysqld [kernel.kallsyms] [k] futex_wait
+ 5.89% 0.11% mysqld mysqld [.] rw_lock_debug_mutex_exit
+ 5.89% 0.01% mysqld mysqld [.] btr_block_get
+ 5.88% 0.00% mysqld mysqld [.] btr_block_get_func
+ 5.74% 0.02% mysqld mysqld [.] rw_lock_s_lock_func
+ 5.50% 0.01% mysqld mysqld [.] PolicyMutex<TTASEventMutex<GenericPolicy> >::enter
- 5.49% 4.80% mysqld mysqld [.] unlikely
- 3.89% start_thread
pfs_spawn_thread
handle_connection
do_command
dispatch_command
dispatch_sql_command
mysql_execute_command
Sql_cmd_dml::execute
Sql_cmd_dml::execute_inner
Query_expression::execute
- Query_expression::ExecuteIteratorQuery
- 3.37% Query_result_insert::send_data
write_record
handler::ha_write_row
ha_innobase::write_row
row_insert_for_mysql
row_insert_for_mysql_using_ins_graph
row_ins_step
row_ins
- row_ins_index_entry_step
- 3.33% row_ins_index_entry
row_ins_clust_index_entry
- row_ins_clust_index_entry_low
- 1.34% btr_pcur_t::open
- 1.34% btr_cur_search_to_nth_level
0.87% page_cur_search_with_match
- 1.26% btr_cur_pessimistic_insert
1.19% btr_page_split_and_insert
+ 0.52% TableScanIterator::Read
0.90% 0x7ffff7958793
0.69% unlikely
可以看到除了上面第一點提到的索引分裂的影響外,還增加了頁分裂以及頁讀寫相關的 I/O 操作影響性能,寬表情況下對於讀寫性能比短數據更加依賴機器的性能和配置,並且會隨著寫入數據的增加降低寫入效率。
同樣的,看看單線程情況下寬表的 perf 情況。
+ 8.05% 0.02% mysqld mysqld [.] ut::Stateful_latching_rules<buf_io_fix, 3ul>::can_leave
+ 7.95% 0.00% mysqld mysqld [.] mutex_enter_inline<PolicyMutex<TTASEventMutex<GenericPolicy> > >
+ 7.81% 0.03% mysqld [kernel.kallsyms] [k] x64_sys_call
- 7.50% 7.50% mysqld mysqld [.] ut_delay
- 7.41% 0x7ffff7958793
- 7.36% std::thread::_State_impl<std::thread::_Invoker<std::tuple<Detached_thread, void (*)(unsigned long), unsigned long> > >::_M_run
std::thread::_Invoker<std::tuple<Detached_thread, void (*)(unsigned long), unsigned long> >::operator()
std::thread::_Invoker<std::tuple<Detached_thread, void (*)(unsigned long), unsigned long> >::_M_invoke<0ul, 1ul, 2ul>
std::__invoke<Detached_thread, void (*)(unsigned long), unsigned long>
std::__invoke_impl<void, Detached_thread, void (*)(unsigned long), unsigned long>
Detached_thread::operator()<void (*)(unsigned long), unsigned long>
std::_Bind<void (*(unsigned long))(unsigned long)>::operator()<, void>
std::_Bind<void (*(unsigned long))(unsigned long)>::__call<void, , 0ul>
std::__invoke<void (*&)(unsigned long), unsigned long&>
- std::__invoke_impl<void, void (*&)(unsigned long), unsigned long&>
- 7.34% io_handler_thread
fil_aio_wait
- buf_page_io_complete
- 5.01% buf_flush_write_complete
mutex_enter_inline<PolicyMutex<TTASEventMutex<GenericPolicy> > >
PolicyMutex<TTASEventMutex<GenericPolicy> >::enter
TTASEventMutex<GenericPolicy>::enter
TTASEventMutex<GenericPolicy>::spin_and_try_lock
TTASEventMutex<GenericPolicy>::is_free
ut_delay
- 1.25% ibuf_merge_or_delete_for_page
ibuf_bitmap_get_map_page
buf_page_get_gen
Buf_fetch<Buf_fetch_normal>::single_page
Buf_fetch<Buf_fetch_normal>::mtr_add_page
rw_lock_x_lock_gen
pfs_rw_lock_x_lock_func
rw_lock_x_lock_func
ut_delay
- 1.08% mutex_enter_inline<PolicyMutex<TTASEventMutex<GenericPolicy> > >
PolicyMutex<TTASEventMutex<GenericPolicy> >::enter
TTASEventMutex<GenericPolicy>::enter
TTASEventMutex<GenericPolicy>::spin_and_try_lock
TTASEventMutex<GenericPolicy>::is_free
ut_delay
+ 7.30% 0.04% mysqld mysqld [.] buf_page_t::Latching_rules_helpers::assert_latches_let_distinguish
+ 4.37% 0.00% mysqld mysqld [.] Fil_system::flush_file_spaces
- 4.14% 3.88% mysqld mysqld [.] unlikely
- 2.75% start_thread
pfs_spawn_thread
handle_connection
do_command
dispatch_command
dispatch_sql_command
mysql_execute_command
Sql_cmd_dml::execute
Sql_cmd_dml::execute_inner
Query_expression::execute
- Query_expression::ExecuteIteratorQuery
- 2.46% Query_result_insert::send_data
write_record
handler::ha_write_row
ha_innobase::write_row
row_insert_for_mysql
row_insert_for_mysql_using_ins_graph
row_ins_step
- row_ins
- 2.44% row_ins_index_entry_step
- 2.41% row_ins_index_entry
row_ins_clust_index_entry
- row_ins_clust_index_entry_low
- 0.97% btr_pcur_t::open
- btr_cur_search_to_nth_level
0.55% page_cur_search_with_match
- 0.80% btr_cur_pessimistic_insert
0.70% btr_page_split_and_insert
- 1.13% 0x7ffff7958793
- 1.02% std::thread::_State_impl<std::thread::_Invoker<std::tuple<Detached_thread, void (*)(unsigned long), unsigned long> > >::_M_run
std::thread::_Invoker<std::tuple<Detached_thread, void (*)(unsigned long), unsigned long> >::operator()
std::thread::_Invoker<std::tuple<Detached_thread, void (*)(unsigned long), unsigned long> >::_M_invoke<0ul, 1ul, 2ul>
std::__invoke<Detached_thread, void (*)(unsigned long), unsigned long>
std::__invoke_impl<void, Detached_thread, void (*)(unsigned long), unsigned long>
Detached_thread::operator()<void (*)(unsigned long), unsigned long>
std::_Bind<void (*(unsigned long))(unsigned long)>::operator()<, void>
std::_Bind<void (*(unsigned long))(unsigned long)>::__call<void, , 0ul>
std::__invoke<void (*&)(unsigned long), unsigned long&>
- std::__invoke_impl<void, void (*&)(unsigned long), unsigned long&>
- 0.70% io_handler_thread
- fil_aio_wait
- 0.65% buf_page_io_complete
buf_flush_write_complete
dblwr::write_complete
Double_write::write_complete
fil_flush_file_spaces
Fil_system::flush_file_spaces
- Fil_shard::flush_file_spaces
+ 0.65% Fil_shard::acquire
+ 3.97% 0.58% mysqld mysqld [.] Fil_shard::flush_file_spaces
對比多線程和單線程的堆棧信息,可以看到多線程下搶 I/O 資源情況變嚴重,同時索引分裂和頁分裂更嚴重,要花更多時間進行頁位置比較、計算數據占用的空間和插入,另外由於搶鎖導致多線程性能劣化更嚴重。
三、問題解決方案
通過上面分析發現,要想保證多線程插入速度比單線程快需要滿足一定的條件:
- 目標表有主鍵,當主鍵列沒有設置自增屬性的情況下,每個線程間的主鍵值需要是離散且單個線程內主鍵值需要連續,這是為了保證索引插入的性能最高。比如線程1主鍵值為[1-1000],線程2主鍵值為[10000-11000],線程3主鍵值為[30000-31000]這樣排列效率會比[1-3000]分三個線程高很多。
- 如果目標表主鍵是自增主鍵的話,因為無法定義自增主鍵的值,只能按照插入時候確定鍵值,因此自增主鍵會比非自增主鍵多線程離散值性能慢,但是因為自增主鍵持有的是區段鎖,因此不會頻繁持鎖方面性能又會比非自增主鍵好。實際使用的時候還是要看源表的數據分佈情況以及目標表結構來決定要怎麼分配多線程的插入主鍵值。
- 目標表是寬表時,從上面的測試數據來看單行數據超過 520Bytes 多線程就開始劣化了,因此多線程插入只針對窄表有效。
四、問題總結
一般我們都認為多線程插入數據肯定比單線程插入快,但是通過本次的測試和分析可以看到實際情況需要結合資料庫參數配置、索引、存儲頁以及磁碟的 I/O 性能等一起看,還要看源表主鍵值分佈情況和目標表的表結構,並不是所有多線程插入都能產生更好的效果節省時間,有時候反而可能多線程比單線程更耗時間以及資源。
Enjoy GreatSQL