Drop Table對MySQL的性能影響分析

【作者】王棟：攜程技術保障中心資料庫專家，對資料庫疑難問題的排查和資料庫自動化智能化運維工具的開發有強烈的興趣。【問題描述】最近碰到有台MySQL實例出現了MySQL服務短暫hang死，表現為瞬間的併發線程上升，連接數暴增。排查Error Log文件中有page_cleaner超時的信息，引 ...

【作者】

王棟：攜程技術保障中心資料庫專家，對資料庫疑難問題的排查和資料庫自動化智能化運維工具的開發有強烈的興趣。

【問題描述】

最近碰到有台MySQL實例出現了MySQL服務短暫hang死，表現為瞬間的併發線程上升，連接數暴增。
排查Error Log文件中有page_cleaner超時的信息，引起我們的關註:

2019-08-24T23:47:09.361836+08:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 24915ms. The settings might not be optimal. (flushed=182 and evicted=0, during the time.)

2019-08-24T23:47:16.211740+08:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4849ms. The settings might not be optimal. (flushed=240 and evicted=0, during the time.)

2019-08-24T23:47:23.362286+08:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 6151ms. The settings might not be optimal. (flushed=215 and evicted=0, during the time.)

【問題分析】

1、 error log中page_cleaner信息是如何產生的

通過源碼storage/innobase/buf/buf0flu.cc可以看出，當滿足curr_time > next_loop_time + 3000條件（大於4秒）時，會列印上面信息到error log中。next_loop_time為1000ms，即1秒鐘做一次刷新頁的操作。

 if (ret_sleep == OS_SYNC_TIME_EXCEEDED) {
                     ulint curr_time = ut_time_ms();

                     if (curr_time > next_loop_time + 3000) {
                            if (warn_count == 0) {
                                   ib::info() << "page_cleaner: 1000ms"
                                          " intended loop took "
                                          << 1000 + curr_time
                                             - next_loop_time
                                          << "ms. The settings might not"
                                          " be optimal. (flushed="
                                          << n_flushed_last
                                          << " and evicted="
                                          << n_evicted
                                          << ", during the time.)";
                                   if (warn_interval > 300) {
                                          warn_interval = 600;
                                   } else {
                                          warn_interval *= 2;
                                   }

                                   warn_count = warn_interval;
                            } else {
                                   --warn_count;
                            }
                     } else {
                            /* reset counter */
                            warn_interval = 1;
                            warn_count = 0;
                     }

                     next_loop_time = curr_time + 1000;
                     n_flushed_last = n_evicted = 0;
              }

後半部分(flushed=182 and evicted=0, during the time.)日誌，對應n_flushed_last和n_evicted兩個變數，這兩個變數來自於下麵2個變數。

 n_evicted += n_flushed_lru;

n_flushed_last += n_flushed_list;

從源碼註釋中可以看出，n_flushed_lru表示從LRU list尾部刷新的頁數，也就是日誌中如evicted=0指標的所表示的值。
n_flushed_list：這個是從flush_list刷新列表中刷新的頁數，也就是臟頁數，日誌中flushed=182的值。

/**
Wait until all flush requests are finished.
@param n_flushed_lru    number of pages flushed from the end of the LRU list.
@param n_flushed_list   number of pages flushed from the end of the
            flush_list.
@return         true if all flush_list flushing batch were success. */
static
bool
pc_wait_finished(
    ulint*  n_flushed_lru,
    ulint*  n_flushed_list)

從pc_wait_finished函數可以看出page_cleaner線程包含了LRU list和flush_list兩部分刷新，而且需要等待兩部分都完成刷新。

2、MySQL5.7.4引入了多線程page cleaner，但由於LRU列表刷新和臟頁列表刷新仍然耦合在一起，在遇到高負載，或是熱數據的buffer pool instance時，仍存在性能問題。

1） LRU List刷臟在先，Flush list的刷臟在後，但是是互斥的。也就是說在進Flush list刷髒的時候，LRU list不能繼續去刷臟，必須等到下一個迴圈周期才能進行。
2）另外一個問題是，刷髒的時候，page cleaner coodinator會等待所有的page cleaner線程完成之後才會繼續響應刷臟請求。這帶來的問題就是如果某個buffer pool instance比較熱的話，page cleaner就不能及時進行響應。
Percona版本對LRU list刷臟做了很多優化。

3、分析問題實例的binlog日誌，可以看到從2019-08-24 23:46:15到2019-08-24 23:47:25之間沒有記錄任何日誌，說明這段時間內mysql服務無法正常處理請求，短暫hang住了

mysqlbinlog -vvv binlog --start-datetime='2019-08-24 23:46:15' --stop-datetime='2019-08-24 23:47:25'|less

從計數器指標可以看出期間併發線程積壓，QPS處理能力下降，稍後MySQL服務恢復，積壓的請求集中釋放，導致併發連接進一步上升

4、從Innodb_buffer_pool_pages_misc和Innodb_buffer_pool_pages_free指標來看，在問題時間段buffer pool在1分鐘內集中釋放了約16*(546893-310868)=3776400，3.7G可用記憶體。
可能期間LRU list的mutex資源鎖定，導致page cleaner線程在LRU list刷新時阻塞，從而表現出page_cleaner線程執行時間過長。

innodb_buffer_pool_pages_misc與adaptive hash index有關，下麵是官網的描述

• Innodb_buffer_pool_pages_misc

The number of pages in the InnoDB buffer pool that are busy because they have been allocated for administrative overhead, such as row locks or the adaptive hash index. This value can also be calculated as Innodb_buffer_pool_pages_total − Innodb_buffer_pool_pages_free − Innodb_buffer_pool_pages_data. When using compressed tables, Innodb_buffer_pool_pages_misc may report an out-of-bounds value (Bug #59550).

5、為什麼AHI短時間內會釋放大量的記憶體呢，通過慢查詢日誌我們排查到期間有drop table的操作，但drop的表容量約50G，並不是很大，drop table為什麼會導致MySQL服務短暫hang死，它對伺服器性能會產生多大的影響，我們做了模擬測試。

【測試重現過程】

為了進一步驗證，我們在測試環境下模擬測試了drop table對高併發MySQL性能的影響。
1、使用sysbench工具做壓測，首先在測試環境下創建了8張2億條記錄的表，單表容量48G
2、開啟innodb_adaptive_hash_index，用olap模板壓測1個小時，填充目標8張表所對應的AHI記憶體
3、再次開啟壓測線程只對sbtest1表做訪問，觀察MySQL的訪問情況
4、新建會話運行drop table test.sbtest2，看到drop一張48G的表消耗了64.84秒

5、用自定義腳本檢測每秒Innodb_buffer_pool_pages_misc和Innodb_buffer_pool_pages_free指標的變化情況，看到在drop table期間Innodb_buffer_pool_pages_misc大量釋放，Innodb_buffer_pool_pages_free值同時增長，釋放和增加的內容總量基本一致，如下圖所示

6、drop table期間，MySQL處於hang死狀態，QPS長時間跌0

7、errorlog中也重現了page_cleaner的日誌信息

至此復現了問題現象。

【為什麼MySQL會短暫hang死】

1、壓測期間，抓取了pstack，show engine innodb status，以及performance_schema下events_waits_summary_global_by_event_name表中相關mutex等現場信息
2、在SEMAPHORES相關信息中，可以看到hang死期間大量Thread請求S-lock，被同一個線程140037411514112持有的鎖所阻塞，時間持續了64秒

--Thread 140037475792640 has waited at row0purge.cc line 862 for 64.00 seconds the semaphore:

S-lock on RW-latch at 0x966f6e38 created in file dict0dict.cc line 1183

a writer (thread id 140037411514112) has reserved it in mode exclusive

number of readers 0, waiters flag 1, lock_word: 0

Last time read locked in file row0purge.cc line 862

Last time write locked in file /mysql-5.7.23/storage/innobase/row/row0mysql.cc line 4253

--Thread 140037563102976 has waited at srv0srv.cc line 1982 for 57.00 seconds the semaphore:

X-lock on RW-latch at 0x966f6e38 created in file dict0dict.cc line 1183

a writer (thread id 140037411514112) has reserved it in mode exclusive

number of readers 0, waiters flag 1, lock_word: 0

Last time read locked in file row0purge.cc line 862

Last time write locked in file /mysql-5.7.23/storage/innobase/row/row0mysql.cc line 4253

3、通過ROW OPERATIONS相關信息，可以看到MySQL的Main Thread也被同一個線程140037411514112阻塞，狀態處於enforcing dict cache limit狀態

3 queries inside InnoDB, 0 queries in queue

17 read views open inside InnoDB

Process ID=39257, Main thread ID=140037563102976, state: enforcing dict cache limit

Number of rows inserted 1870023456, updated 44052478, deleted 22022445, read 9301843315

0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s

4、可以看到線程140037411514112執行的SQL就是drop table test.sbtest2語句，說明drop table期間持有的鎖，阻塞了Main Thread及其他線程

---TRANSACTION 44734025, ACTIVE 64 sec dropping table——

10 lock struct(s), heap size 1136, 7 row lock(s), undo log entries 1

MySQL thread id 440836, OS thread handle 140037411514112, query id 475074428 localhost root checking permissions

drop table test.sbtest2

5、下麵是抓到的drop table 的簡化後的調用棧信息，對比可以看出，64秒的時間，drop table 都在執行函數btr_search_drop_page_hash_index，清空對應的AHI記錄信息

Thread 32 (Thread 0x7f5d002b2700 (LWP 8156)):

#0 ha_remove_all_nodes_to_page

#1 btr_search_drop_page_hash_index

#2 btr_search_drop_page_hash_when_freed

#3 fseg_free_extent

#4 fseg_free_step

#5 btr_free_but_not_root

#6 btr_free_if_exists

#7 dict_drop_index_tree

#8 row_upd_clust_step

#9 row_upd

#10 row_upd_step

#11 que_thr_step

#12 que_run_threads_low

#13 que_run_threads

#14 que_eval_sql

#15 row_drop_table_for_mysql

#16 ha_innobase::delete_table

#17 ha_delete_table

#18 mysql_rm_table_no_locks

#19 mysql_rm_table

#20 mysql_execute_command

#21 mysql_parse

#22 dispatch_command

#23 do_command

#24 handle_connection

#25 pfs_spawn_thread

#26 start_thread ()

#27 clone ()

6、通過代碼我們可以看到drop table調用row_drop_table_for_mysql函數時，在row_mysql_lock_data_dictionary(trx);位置對元數據加了排它鎖

row_drop_table_for_mysql(

const char* name,

trx_t* trx,

bool drop_db,

bool nonatomic,

dict_table_t* handler)

{

dberr_t err;

dict_foreign_t* foreign;

dict_table_t* table = NULL;

char* filepath = NULL;

char* tablename = NULL;

bool locked_dictionary = false;

pars_info_t* info = NULL;

mem_heap_t* heap = NULL;

bool is_intrinsic_temp_table = false;

DBUG_ENTER("row_drop_table_for_mysql");

DBUG_PRINT("row_drop_table_for_mysql", ("table: '%s'", name));

ut_a(name != NULL);

/* Serialize data dictionary operations with dictionary mutex:

no deadlocks can occur then in these operations /

trx->op_info = "dropping table";

if (handler != NULL && dict_table_is_intrinsic(handler)) {

table = handler;

is_intrinsic_temp_table = true;

}

if (table == NULL) {

if (trx->dict_operation_lock_mode != RW_X_LATCH) {

/ Prevent foreign key checks etc. while we are

dropping the table */

row_mysql_lock_data_dictionary(trx);

locked_dictionary = true;

nonatomic = true;

}

7、以Main Thread為例，在調用srv_master_evict_from_table_cache函數獲取X-lock on RW-latch時被阻塞

/********************************************************************//**

Make room in the table cache by evicting an unused table.

@return number of tables evicted. /

static

ulint

srv_master_evict_from_table_cache(

/==============================/

ulint pct_check) /!< in: max percent to check */

{

ulint n_tables_evicted = 0;

 rw_lock_x_lock(dict_operation_lock);

dict_mutex_enter_for_mysql();

n_tables_evicted = dict_make_room_in_cache(

innobase_get_table_cache_size(), pct_check);

dict_mutex_exit_for_mysql();

rw_lock_x_unlock(dict_operation_lock);

return(n_tables_evicted);

}

8、查看dict_operation_lock的註釋，create drop table操作時需要X鎖，而一些其他後臺線程，比如Main Thread檢查dict cache時，也需要獲取dict_operation_lock的X鎖，因此被阻塞

/** @brief the data dictionary rw-latch protecting dict_sys

table create, drop, etc. reserve this in X-mode; implicit or

backround operations purge, rollback, foreign key checks reserve this

in S-mode; we cannot trust that MySQL protects implicit or background

operations a table drop since MySQL does not know of them; therefore

we need this; NOTE: a transaction which reserves this must keep book

on the mode in trx_t::dict_operation_lock_mode /

rw_lock_t dict_operation_lock;

9、期間用戶線程由於獲取不到鎖而處於掛起狀態，當無法立刻獲得鎖時，會調用row_mysql_handle_errors將錯誤碼傳到上層進行處理

/****************************************************************//

Handles user errors and lock waits detected by the database engine.

@return true if it was a lock wait and we should continue running the

query thread and in that case the thr is ALREADY in the running state. /

bool

row_mysql_handle_errors

下麵是簡化後的用戶線程調用棧信息：

Thread 29 (Thread 0x7f5d001ef700 (LWP 8159)):
#0  pthread_cond_wait@@GLIBC_2.3.2
#1  wait
#2  os_event::wait_low
#3  os_event_wait_low
#4  lock_wait_suspend_thread
#5  row_mysql_handle_errors
#6  row_search_mvcc
#7  ha_innobase::index_read
#8  handler::ha_index_read_map
#9  handler::read_range_first
#10 handler::multi_range_read_next
#11 QUICK_RANGE_SELECT::get_next
#12 rr_quick
#13 mysql_update
#14 Sql_cmd_update::try_single_table_update
#15 Sql_cmd_update::execute
#16 mysql_execute_command
#17 mysql_parse
#18 dispatch_command
#19 do_command
#20 handle_connection
#21 pfs_spawn_thread
#22 start_thread
#23 clone

10、page_cleaner後臺線程沒有抓到明顯的阻塞關係，只看到如下正常的調用棧信息

Thread 55 (Thread 0x7f5c7fe15700 (LWP 39287)):

#0 pthread_cond_timedwait@@GLIBC_2.3.2 ()

#1 os_event::timed_wait

#2 os_event::wait_time_low

#3 os_event_wait_time_low

#4 pc_sleep_if_needed

#5 buf_flush_page_cleaner_coordinator

#6 start_thread

#7 clone

Thread 54 (Thread 0x7f5c7f614700 (LWP 39288)):

#0 pthread_cond_wait@@GLIBC_2.3.2 ()

#1 wait

#2 os_event::wait_low

#3 os_event_wait_low

#4 buf_flush_page_cleaner_worker

#5 start_thread

#6 clone

【結論&解決思路】

drop table引起的MySQL 短暫hang死的問題，是由於drop 一張使用AHI空間較大的表時，調用執行AHI的清理動作，會消耗較長時間，執行期間長時間持有dict_operation_lock的X鎖，阻塞了其他後臺線程和用戶線程;
drop table執行結束鎖釋放，MySQL積壓的用戶線程集中運行，出現了併發線程和連接數瞬間上升的現象。
規避問題的方法，可以考慮在drop table前關閉AHI。