MySQL8.0 優化器介紹（三）_ZenDei技術網路在線

GreatSQL社區原創內容未經授權不得隨意使用，轉載請聯繫小編並註明來源。 GreatSQL是MySQL的國產分支版本，使用上與MySQL一致。作者：奧特曼愛小怪獸文章來源：GreatSQL社區原創往期回顧 MySQL8.0 優化器介紹（一） MySQL8.0 優化器介紹（二）本篇將進一 ...

GreatSQL社區原創內容未經授權不得隨意使用，轉載請聯繫小編並註明來源。
GreatSQL是MySQL的國產分支版本，使用上與MySQL一致。
作者：奧特曼愛小怪獸
文章來源：GreatSQL社區原創

往期回顧

MySQL8.0 優化器介紹（一）

MySQL8.0 優化器介紹（二）

本篇將進一步深入介紹優化器相關的join優化

為更好的理解本篇內容需要提前看一下以下內容：

單表訪問的方法，參考《MySQL 是怎樣運行的：從根兒上理解 MySQL》第10章"單表訪問方法"
更多select語句級別的優化細節見（https://dev.mysql.com/doc/refman/8.0/en/select-optimization.html）

為了讓讀者對join優化有更深的瞭解，章節里的sql例子，留了一些思考和動手的問題。可能大家得到的答案會不同，但探索未知的過程，方式應該是一樣的。

join優化（Join Optimizations）

MySQL可以使用Join Optimizations來改進上次分享過的join algorithms，或者決定如何執行部分查詢。本次主要介紹三種經常用到的join Optimizations，更多的 join type 見下麵的鏈接：（https://dev.mysql.com/doc/refman/8.0/en/explain-output.html#explain-join-types）

index merge

通常MySQL只會對每個表使用一個索引。但是，如果對同一表中的多個列在where後有條件限制，並且沒有覆蓋所有列的單個索引，無論選哪個索引都不是最佳的。對於這些情況，MySQL支持索引合併 (index merge)。select a,b,c from t where a=x1 and b=x2 and c=x3,這種情況下，建立一個多列的複合索引 index_abc 比使用 index merge +index_a+index_b+index_c 性能更好。

Index merge 支持三種演算法見下表

查詢計劃使用index merge 時，會在explain sql 的 access type 列有"index_merge"，key 列會包含所有參與merge的列， key_length 包含一個所用索引的最長關鍵部分的列表。舉個Intersection例子：

Intersection

以下代碼塊註釋中提到的知識點略多

##無論optimizer 是否選擇  index merge 取決於index  statistics. 
## index  statistics 是從哪個試圖獲得呢？mysql.innodb_index_stats  還是 information_schema.statistics 
## 還是 information_schema.INNODB_SYS_TABLESTATS？ 
## 可以參考 https://www.cnblogs.com/ClassicMan/articles/15871403.html
## index_dive  eq_range_index_dive_limit 這兩個參數有什麼作用？
##意味著即使返回相同STATEMENT_DIGEST_TEXT的sql查詢語句, WHERE語句後面跟不同的值，得到的查詢計劃可能是不一樣的  ##比如select * from people where name='唯一值'；
##select * from people where name='超級多的重覆值'

## 同理index statistics 的改變會讓同一個查詢走不同的執行計劃，
## 體現在 select a,b from t where a=1 and b=1  有時走了 index merges，有時沒走。

CREATE TABLE `payment` (
 `payment_id` smallint unsigned NOT NULL,
 `customer_id` smallint unsigned NOT NULL,
 `staff_id` tinyint unsigned NOT NULL,
 `rental_id` int(DEFAULT NULL,
 `amount` decimal(5,2) NOT NULL,
 `payment_date` datetime NOT NULL,
 `last_update` timestamp NULL,
 PRIMARY KEY (`payment_id`),
 KEY `idx_fk_staff_id` (`staff_id`),
 KEY `idx_fk_customer_id` (`customer_id`),
 KEY `fk_payment_rental` (`rental_id`)
) ENGINE=InnoDB;

## case1  等值查詢
SELECT *
  FROM sakila.payment
 WHERE staff_id = 1
   AND customer_id = 75;
 
 mysql> EXPLAIN SELECT *
                  FROM sakila.payment
                 WHERE staff_id = 1
                   AND customer_id = 75\G
**************************** 1. row *****************************
 id: 1
 select_type: SIMPLE
 table: payment
 partitions: NULL
 type: index_merge
 possible_keys: idx_fk_staff_id,idx_fk_customer_id
 key: idx_fk_customer_id,idx_fk_staff_id
 key_len: 2,1
 ref: NULL
 rows: 20
 filtered: 100
 Extra: Using intersect(idx_fk_customer_id,idx_fk_staff_id); Using 
where 1 row in set, 1 warning (0.0007 sec)
 
mysql> EXPLAIN FORMAT=TREE
 SELECT *
 FROM sakila.payment
 WHERE staff_id = 1
 AND customer_id = 75\G
**************************** 1. row ****************************
EXPLAIN: -> Filter: ((sakila.payment.customer_id = 75) and (sakila.payment.staff_id = 1)) (cost=14.48 rows=20)
             -> Index range scan on payment using intersect(idx_fk_customer_id,idx_fk_staff_id) (cost=14.48 rows=20)
1 row in set (0.0004 sec)

##註意"Index range scan on payment",兩個等值查詢條件，為啥觸發了rang scan？


## case2  下麵的sql範圍查詢也能用到index  merge 嗎？執行計劃 自己下去測試驗證
SELECT *
 FROM sakila.payment
 WHERE payment_id > 10
 AND customer_id = 318;

Union Algorithm

##case1 等值查詢
SELECT *
 FROM sakila.payment
 WHERE staff_id = 1
 OR customer_id = 318;
 
 mysql> EXPLAIN
 SELECT *
 FROM sakila.payment
 WHERE staff_id = 1
 OR customer_id = 318\G
**************************** 1. row *****************************
 id: 1
 select_type: SIMPLE
 table: payment
 partitions: NULL
 type: index_merge
 possible_keys: idx_fk_staff_id,idx_fk_customer_id
 key: idx_fk_staff_id,idx_fk_customer_id
 key_len: 1,2
 ref: NULL
 rows: 8069
 filtered: 100
 Extra: Using union(idx_fk_staff_id,idx_fk_customer_id); Using where
1 row in set, 1 warning (0.0008 sec)
 
 mysql> EXPLAIN FORMAT=TREE
 SELECT *
 FROM sakila.payment
 WHERE staff_id = 1
 OR customer_id = 318\G
**************************** 1. row ****************************
EXPLAIN: -> Filter: ((sakila.payment.staff_id = 1) or (sakila.payment.customer_id = 318)) (cost=2236.18 rows=8069)
             -> Index range scan on payment using union(idx_fk_staff_id,idx_fk_customer_id) (cost=2236.18 rows=8069)
1 row in set (0.0010 sec)

 ## case2 範圍查詢也能用到index  merge 嗎？執行計劃 自己下去測試驗證， 
 ## 有主鍵參與後，和Intersection 章節的case2 執行計劃中用到的索引個數有啥不同？
SELECT *
 FROM sakila.payment
 WHERE payment_id > 15000
 OR customer_id = 318;

Sort-Union Algorithm

SELECT *
 FROM sakila.payment
 WHERE customer_id < 30
 OR rental_id < 10;
 
 mysql> EXPLAIN
 SELECT *
 FROM sakila.payment
 WHERE customer_id < 30
 OR rental_id < 10\G
**************************** 1. row *****************************
 id: 1
 select_type: SIMPLE
 table: payment
 partitions: NULL
 type: index_merge
possible_keys: idx_fk_customer_id,fk_payment_rental
 key: idx_fk_customer_id,fk_payment_rental
 key_len: 2,5
 ref: NULL
 rows: 826
 filtered: 100
 Extra: Using sort_union(idx_fk_customer_id,fk_payment_rental); 
Using where 1 row in set, 1 warning (0.0009 sec)

mysql> EXPLAIN FORMAT=TREE
 SELECT *
 FROM sakila.payment
 WHERE customer_id < 30
 OR rental_id < 10\G
**************************** 1. row *****************************
EXPLAIN: -> Filter: ((sakila.payment.customer_id < 30) or (sakila.payment.rental_id < 10)) (cost=1040.52 rows=826)
             -> Index range scan on payment using sort_union(idx_fk_customer_id,fk_payment_rental) (cost=1040.52 rows=826)
1 row in set (0.0005 sec)

Multi-Range Read (MRR)

多範圍讀取（MRR）優化旨在減少對輔助索引進行範圍掃描所導致的隨機I/O量。優化讀取索引

首先，根據行id（InnoDB的聚集索引）對鍵進行排序，然後按行的存儲順序檢索行。多量程讀取優化

可以用於範圍掃描和使用索引的等值連接。不支持虛擬生成列上的輔助索引。

使用InnoDB進行多範圍讀取優化的主要用例是用於沒有覆蓋索引的磁碟綁定查詢（ disk-bound queries 另外一個層面對disk-bound 的優化，詳細可見：https://dev.mysql.com/doc/refman/8.0/en/optimizing-innodb-diskio.html）。優化的效果取決於需要多少行以及存儲器的查找時間。MySQL將會估算（estimate）是否有用。然而，成本估算在過於悲觀而不是過於樂觀的一面，因此可能有必要提供幫助優化器做出正確決策的信息。

有兩個 optimizer switches 控制MRR優化

mrr: Whether the optimizer is allowed to use the Multi-Range Read optimization. The default is ON.
mrr_cost_based: Whether the decision to use the Multi-Range Read optimization is cost based. You can disable this option to always use the optimization when it is supported. The default is ON

可以用MRR() 和NO_MRR() 兩個optimizer switches 來控製表級別or 索引級別的 Multi-Range Read ，舉個例子：

mysql> EXPLAIN
 SELECT /*+ MRR(city) */
 *
 FROM world.city
 WHERE CountryCode BETWEEN 'AUS' AND 'CHN'\G
**************************** 1. row *****************************
 id: 1
 select_type: SIMPLE
 table: city
 partitions: NULL
 type: range
possible_keys: CountryCode
 key: CountryCode
 key_len: 3
 ref: NULL
 rows: 812
 filtered: 100
 Extra: Using index condition; Using MRR
1 row in set, 1 warning (0.0006 sec)


有必要使用MRR（）優化器提示或禁用基於MRR_cost_based的優化器開關。
示例中查詢的估計行數太小，沒有MRR的hint時，基於成本的優化無法使用MRR。
只能顯示用hint來干預查詢計劃使用MRR。
當MRR的優化被使用時, MySQL需要用到random read buffer來存儲indexes. 
有一個參數可以影響MRR的性能 read_rnd_buffer_size.

Batched Key Access (BKA)

可以簡單認為 BKA=BNL+MRR .這使得可以以與非索引連接類似的方式將連接緩衝區用於索引連接，並使用多範圍讀取優化來減少隨機I/O的數量。BKA 用於大量 disk-bound 查詢的場景。但是，沒有明確的說明來確定優化何時有幫助，何時會導致性能下降。

可以借鑒一下國外知名dba在MySQL 優化方面的blog（http://oysteing.blogspot.com/2012/04/improved-dbt-3-results-with-mysql-565.html）

MRR 在最優時，查詢耗時減少20%,最糟糕時查詢耗時增加2/3。

BKA 主要的獲益在一個相對較窄的查詢範圍，而其他查詢的性能可能會降低，因此預設情況下禁用該優化。

（可以簡單理解為 MySQL5.6時，bka優化帶來的收益小於bka帶來的成本開銷）除非確定開啟bka能來提升時，再用hint BKA() 來啟用。session級別開啟：

SET SESSION optimizer_switch = 'mrr=on,mrr_cost_based=off,batched_key_access=on';

一個使用BKA的例子

mysql> EXPLAIN
 SELECT /*+ BKA(ci) */
 co.Code, co.Name AS Country,
 ci.Name AS City
 FROM world.country co
 INNER JOIN world.city ci
 ON ci.CountryCode = co.Code\G
**************************** 1. row *****************************
 id: 1
 select_type: SIMPLE
 table: co
 partitions: NULL
 type: ALL
 possible_keys: PRIMARY
 key: NULL
 key_len: NULL
 ref: NULL
 rows: 239
 filtered: 100
 Extra: NULL
**************************** 2. row *****************************
 id: 1
 select_type: SIMPLE
 table: ci
 partitions: NULL
 type: ref
 possible_keys: CountryCode
 key: CountryCode
 key_len: 3
 ref: world.co.Code
 rows: 18
 filtered: 100
 Extra: Using join buffer (Batched Key Access)
2 rows in set, 1 warning (0.0007 sec)

註意看執行計劃中Extra 的關鍵字 "Using join buffer",說明 join_buffer_size 會影響BKA 特性的性能。從全局怎麼調整join_buffer_size，並又能充分利用上BKA，是一個極大的挑戰。調優最常見的問題，搞定了A sql,又引出了其他問題，比如記憶體使用率過高。

其他join優化

MySQL 還自動支持其他join 優化，一旦對查詢有性能幫助，優化器會自動選擇他們，一般不需要手動。

瞭解一下其他join的優化方式，有助於我們在遇到sql性能問題時，可以適當給與優化器，一些有用的hint。

具體有哪些join 優化方式，可以查看explain 輸出中的Extra 的內容說明。本文可能列舉的不全，精力有限只做了一些簡單的介紹，具體細節需要查看官網，以及大量的實踐。

Condition Filtering 條件過濾當一個表有兩個或多個與之相關聯的條件，並且一個索引可以用於部分條件時，使用條件過濾優化。啟用條件過濾後，在估計表的總體過濾時，將考慮其餘條件的過濾效果。
- Optimizer Switch: condition_fanout_filter – enabled by default
- Optimizer Hints: None
- EXPLAIN Output: None
Derived Merge 優化器可以將派生表(derived table)、視圖引用和公共表表達式合併到它們所屬的查詢塊中。優化的替代方法是物化表（materialize the table）、視圖引用或公共表表達式。
- Optimizer Switch: derived_merge – enabled by default.
- Optimizer Hints: MERGE(), NO_MERGE().
- EXPLAIN Output: The query plan reflects that the derived table has been merged
Engine Condition Pushdown 此優化將條件向下推到存儲引擎。目前僅NDBCluster存儲引擎支持它。
Index Condition Pushdown

官方文檔中給的例子和解釋如下：people表中（zipcode，lastname，firstname）構成一個索引 SELECT * FROM people WHERE zipcode='95054' AND lastname LIKE '%etrunia%' AND address LIKE '%Main Street%';

如果沒有使用索引下推技術，則MySQL會通過zipcode='95054'從存儲引擎中查詢對應的數據，返回到MySQL服務端，然後MySQL服務端基於lastname LIKE '%etrunia%'和address LIKE '%Main Street%'來判斷數據是否符合條件。

如果使用了索引下推技術，則MySQL首先會返回符合zipcode='95054'的索引，然後根據lastname LIKE '%etrunia%'和address LIKE '%Main Street%'來判斷索引是否符合條件。如果符合條件，則根據該索引來定位對應的數據，如果不符合，則直接reject掉。

有了索引下推優化，可以在有like條件查詢的情況下，減少回表次數。

該優化也用於二級索引的範圍條件。
- Optimizer Switch: index_condition_pushdown – enabled by default.
- Optimizer Hints: NO_ICP().
- EXPLAIN Output: The traditional format has Using index condition in the Extra column, and the JSON format sets the index_condition field with the index condition that is pushed
Index Extensions InnoDB中的所有二級非唯一索引都將主鍵列附加到索引中。當啟用索引擴展優化時，MySQL會將主鍵列視為索引的一部分。
- Optimizer Switch: use_index_extensions – enabled by default
- Optimizer Hints: None
- EXPLAIN Output: None
Index Visibility 當表具有不可見的索引（ invisible index）時，預設情況下，優化器在創建查詢計劃時不會考慮它。如果啟用了索引可見性優化器開關，則將考慮不可見的索引。例如，這可以用於測試已添加但尚未可見的索引的效果。
- Optimizer Switch: use_invisible_indexes – disabled by default
- Optimizer Hints: None
- EXPLAIN Output: None
Loose Index Scan 在某些情況下，MySQL可以使用部分索引來提高聚合數據或包含DISTINCT子句的查詢的性能。這要求列用於通過形成多列索引的左首碼以及不用於分組的其他列來分組數據。當存在GROUP BY子句時，只有MIN（）和MAX（）聚合函數才能使用這個特性。

distinct效率更高還是group by效率更高？
- Optimizer Switch: None.
- Optimizer Hints: NO_RANGE_OPTIMIZATION() disables the loose index scan optimization as well as index merges and range scans.
- EXPLAIN Output: The traditional format has Using index for group-by in the Extra column. The JSON format sets the using_index_for_group_by field to true.
Range Access Method 範圍優化與其他優化略有不同，因為它被認為是一種訪問方法。MySQL將只掃描表或索引的一個或多個部分，而不是執行完整的表或索引掃描。範圍訪問方法通常用於涉及運算符>、>=、<、=<、BETWEEN、IN（），為NULL、LIKE等 range-Optimization 與 index merge Optimization 具有同樣的重要性。（https://dev.mysql.com/doc/refman/8.0/en/range-optimization.html）
- Optimizer Switch: None.
- Optimizer Hints: NO_RANGE_OPTIMIZATION() – this also disables the loose index scan and index merge optimizations. It does however not disable the skip scan optimization even though that also uses range access.
- EXPLAIN Output: The access method is set to range. range_optimizer_max_mem_size 可以限制 range access使用的記憶體。預設是8M
Semijoin 半聯接優化用於IN和EXIST條件。支持四個策略:

當subquery materialization 開啟時， Semijoin 會儘可能的使用 materialization策略。EXISTS 在MySQL8.0.16 以後支持半聯接。NOT EXISTS 在MySQL8.0.17 以後支持半聯接。每種策略，都可以以參數的形式，用於SEMIJOIN() and NO_SEMIJOIN() hint

SEMIJOIN(DUPSWEEDOUT):The duplicate weedout strategy executes the semijoin as if it is a normal join and removes the duplicates using a temporary table. EXPLAIN Output: The traditional format has Start temporary and End temporary in the Extra column for the tables involved. The JSON-formatted output uses a block named duplicates_removal

SEMIJOIN(FIRSTMATCH)：The first match strategy returns the first match for each value rather than all values. EXPLAIN Output: The traditional format has FirstMatch(...) in the Extra column where the value between parentheses is the name of the reference table. The JSON format sets the value of the first_match field to the name of the reference table

SEMIJOIN(LOOSESCAN)：The loose scan strategy uses an index to choose a single value from each of the subquery’s value groups. EXPLAIN Output: The traditional format has LooseScan(m..n) in the Extra column where m and n indicate which parts of the index are used for the loose scan. The JSON format sets the loosescan field equal to true

半連接特別怕 null 值，Oracle 經常在以下方面出問題：
- where null in (a,b,c,null), null exists (null) .
- sum（null) 返回null,count(null) 返回 0
- materialization
- duplicate weedout
- first match
- loose scan (不要和 loose index scan optimization混淆)。
Skip Scan

Skip Scan MySQL 8.0.13 引入，工作方式類似loose index scan.當多列索引的第二列上存在範圍條件，但第一列上沒有條件時使用。Skip Scan將整個索引掃描轉換為一系列範圍掃描（對索引中第一列的每個值進行一次範圍掃描）。

scan in the Extra column, and the JSON format sets the using_index_for_skip_scan field to true
- Optimizer Switch: skip_scan – enabled by default.
- Optimizer Hints: SKIP_SCAN(), NO_SKIP_SCAN().
- EXPLAIN Output: The traditional format has Using index for skip
Subquery Materialization

子查詢物化策略將子查詢的結果存儲在內部臨時表中。如果可能的話，優化器將在臨時表上添加一個自動生成的哈希索引，將使其快速連接到查詢的其餘部分。

當啟用了subquery_materialization_cost_based優化器開關（預設開）時，優化器將使用估計的成本來決定是使用Subquery Materialization 還是使用 IN-to-EXIST子查詢轉換（將IN條件重寫為EXISTS）。

當開關關閉時，優化器總是選擇Subquery Materialization。
- Optimizer Switch: materialization – enabled by default.
- Optimizer Hints: SUBQUERY(MATERIALIZATION).
- EXPLAIN Output: The traditional format has MATERIALIZED as the select type. The JSON format creates a block named materialized_from_subquery.

還有我們可以用哪些方法影響優化器，下篇文章再續。

Enjoy GreatSQL