說明:本文的所有步驟不適用於生產環境,僅用於個人測試環境的快速部署和學習,下述操作過程在Oracle Linux 7.9上安裝Oracle 11.2.0.4單實例為例。 1 安裝環境檢查 安裝環境的檢查可以參考官方文檔Oracle Database Quick Installation Guide ...
Semi-Join Subquery優化策略
Semi-Join Subquery(半連接子查詢):對應IN或EXISTS子查詢,僅需要檢查"外表記錄"在"子查詢結果集"中是否存在匹配記錄,不需要計算"子查詢結果集"中記錄匹配次數,也不需要返回"子查詢結果集"中匹配記錄內容
在MariaDB(MySQL)中,常用優化Semi-Join(半連接)的策略有:
- First Match
- Table Pullout
- Semi-join Materialization
- Loose Scan
- Duplicate Weedout
First Match策略
當迴圈"外部查詢結果集"的每條記錄去"子查詢中"確認"是否匹配"時,只需要找到第一條匹配記錄(First Match)既可跳出子查詢。
如下麵查詢:
SELECT * FROM Country
WHERE Country.code IN (
SELECT City.Country
FROM City
WHERE City.Population > 1*1000*1000
)
AND Country.continent='Europe'
如果不使用First Match策略,當處理到Country表上滿足" Country.continent='Europe' "條件的德國(Deu)記錄時,會掃描City表上滿足" City.Population > 110001000 AND City.Country='DEU' "的所有記錄,再根據匹配記錄總數返回"是否匹配"結果:
如果使用First Match策略,當處理到Country表上滿足" Country.continent='Europe' "條件的德國(Deu)記錄時,會掃描City表上滿足" City.Population > 110001000 AND City.Country='DEU' "的第一條記錄"Berlin"後,立即返回"是否匹配"結果:
在MariaDB上使用First Match策略的查詢的執行計劃為:
MariaDB [world]> explain select * from Country where Country.code IN (select City.Country from City where City.Population > 1*1000*1000) and Country.continent='Europe';
+----+-------------+---------+------+--------------------+-----------+---------+--------------------+------+----------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+--------------------+-----------+---------+--------------------+------+----------------------------------+
| 1 | PRIMARY | Country | ref | PRIMARY,continent | continent | 17 | const | 60 | Using index condition |
| 1 | PRIMARY | City | ref | Population,Country | Country | 3 | world.Country.Code | 18 | Using where; FirstMatch(Country) |
+----+-------------+---------+------+--------------------+-----------+---------+--------------------+------+----------------------------------+
2 rows in set (0.00 sec)
MariaDB的執行計劃中會有明顯的FirstMatch標識。
在MySQL上使用First Match策略的查詢的執行計劃為:
MySQL [world]> explain select * from Country where Country.code IN (select City.Country from City where City.Population > 1*1000*1000) and Country.continent='Europe';
+----+--------------------+---------+----------------+--------------------+-----------+---------+-------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------+----------------+--------------------+-----------+---------+-------+------+------------------------------------+
| 1 | PRIMARY | Country | ref | continent | continent | 17 | const | 60 | Using index condition; Using where |
| 2 | DEPENDENT SUBQUERY | City | index_subquery | Population,Country | Country | 3 | func | 18 | Using where |
+----+--------------------+---------+----------------+--------------------+-----------+---------+-------+------+------------------------------------+
2 rows in set (0.01 sec)
MariaDB的執行計劃中僅顯示為依賴子查詢(DEPENDENT SUBQUERY)
First Match策略和將IN子查詢轉換為EXISTS依賴子查詢很相似,但兩者還是存在明顯差異,並非所有EXISTS操作都能使用First Match策略,如子查詢中使用GROUP BY相關的聚合函數時,需要先完成GROUP BY操作才能確認"是否匹配"。
Table Pullout策略
當子查詢的查詢列表項只有主鍵或唯一索引鍵時,能推算出"子查詢結果集"不存在重覆記錄,因此可以將子查詢改為關聯查詢,即將子查詢中的表上提到關聯查詢。
對於查詢:
SELECT *
FROM City
WHERE City.Country IN (
SELECT Country.Code
FROM Country
WHERE Country.Population < 100*1000
);
在MariaDB 5.2 和MySQL 5.6版本及之前版本上,執行計劃為:
MySQL [world]> explain select * from City where City.Country in (select Country.Code from Country where Country.Population < 100*1000);
+----+--------------------+---------+-----------------+--------------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------+-----------------+--------------------+---------+---------+------+------+-------------+
| 1 | PRIMARY | City | ALL | NULL | NULL | NULL | NULL | 4079 | Using where |
| 2 | DEPENDENT SUBQUERY | Country | unique_subquery | PRIMARY,Population | PRIMARY | 3 | func | 1 | Using where |
+----+--------------------+---------+-----------------+--------------------+---------+---------+------+------+-------------+
2 rows in set (0.00 sec)
如果Country.Code是主鍵或唯一索引,則可以將SQL改寫為:
SELECT City.*
FROM City
INNER JOIN Country
ON City.Country=Country.Code
WHERE Country.Population < 100*1000;
改為關聯查詢後,可以根據兩張關聯表的統計數據來選擇驅動表和被驅動表,因此在MariaDB 5.3或MySQL 5.7版本,執行計劃為:
MariaDB [world]> explain select * from City where City.Country in (select Country.Code from Country where Country.Population < 100*1000);
+----+-------------+---------+-------+--------------------+------------+---------+--------------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+--------------------+------------+---------+--------------------+------+-----------------------+
| 1 | PRIMARY | Country | range | PRIMARY,Population | Population | 4 | NULL | 37 | Using index condition |
| 1 | PRIMARY | City | ref | Country | Country | 3 | world.Country.Code | 18 | |
+----+-------------+---------+-------+--------------------+------------+---------+--------------------+------+-----------------------+
2 rows in set (0.00 sec)
Materialization策略
在使用Table Pullout策略時,需要能明確推算出"子查詢結果集"不存在重覆記錄時才能將"子查詢"改為"關聯查詢",如果將"子查詢結果集"通過臨時表去重固化後消除重覆記錄,則可以將子查詢轉換為"關聯查詢",即Materialization策略。
如對於查詢:
SELECT * FROM Country
WHERE Country.code IN (
SELECT City.Country
FROM City
WHERE City.Population > 1*1000*1000
)
AND Country.continent='Europe'
在轉換為"關聯查詢"後,按照"關聯查詢"中臨時表是否為"驅動表"可以將Semi-join Materialization策略細分為:
- Materialization/scan 策略,將臨時表作為"驅動表",遍歷臨時表中每條記錄去另外關聯表中查找匹配記錄。
- Materialization/lookup 策略,將臨時表作為"被驅動表",遍歷另外的關聯表在臨時表中查詢匹配記錄。
使用Materialization/scan 策略時,MariaDB 查詢計劃為:
MariaDB [world]> explain select * from Country where Country.code IN (select City.Country from City where City.Population > 7*1000*1000);
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
| 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 15 | |
| 1 | PRIMARY | Country | eq_ref | PRIMARY | PRIMARY | 3 | world.City.Country | 1 | |
| 2 | MATERIALIZED | City | range | Population,Country | Population | 4 | NULL | 15 | Using index condition |
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
3 rows in set (0.01 sec)
使用Materialization/lookup 策略時,MariaDB 查詢計劃為:
MariaDB [world]> explain select * from Country where Country.code IN (select City.Country from City where City.Population > 1*1000*1000) ;
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
| 1 | PRIMARY | Country | ALL | PRIMARY | NULL | NULL | NULL | 239 | |
| 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 3 | func | 1 | |
| 2 | MATERIALIZED | City | range | Population,Country | Population | 4 | NULL | 238 | Using index condition |
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
3 rows in set (0.00 sec)
Loose Scan策略
在Materialization/scan 策略時,需要先將"子查詢結果集"移除重覆記錄並固化到臨時表,再作為驅動表進行關聯查詢。MySQL特性Index Loose Scan能在一次掃描中得跳過重覆索引鍵得到"沒有重覆記錄的臨時結果集",Loose Scan策略基於Index Loose Scan特性保證關聯查詢不會出現"重覆關聯問題"。
如對於查詢:
SELECT * FROM Country
WHERE Country.code IN (
SELECT country_code FROM Satellite
)
如果Satellite.country_code 存在索引,基於Index Loose Scan特性則能快速獲得"SELECT DISTINCT country_code FROM Satellite"的效果,如圖所示:
使用Loose Scan 策略時,MariaDB 查詢計劃為:
MariaDB [world]> explain select * from Country where Country.code in (select country_code from Satellite);
+----+-------------+-----------+--------+---------------+--------------+---------+------------------------------+------+-------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+--------+---------------+--------------+---------+------------------------------+------+-------------------------------------+
| 1 | PRIMARY | Satellite | index | country_code | country_code | 9 | NULL | 932 | Using where; Using index; LooseScan |
| 1 | PRIMARY | Country | eq_ref | PRIMARY | PRIMARY | 3 | world.Satellite.country_code | 1 | Using index condition |
+----+-------------+-----------+--------+---------------+--------------+---------+------------------------------+------+-------------------------------------+
Loose Scan 策略和Materialization/scan 策略區別:
- Materialization/scan 策略:先將子查詢的查詢結果固化去重後,再作為驅動表與外部表進行關聯查詢,查詢使用到臨時表。
- Loose Scan 策略:在對子查詢的表進行Index Loose Scan操作過程中,直接將遍歷到的記錄與與外部表進行關聯查詢,查詢未使用到臨時表。
Duplicate Weedout策略
當無法根據表結構信息推算出"子查詢結果集"不存在重覆記錄時,如果將子查詢改寫為關聯查詢,則會導致"外表記錄"被關聯匹配多次而產生重覆記錄,可以通過將關聯結果集插入到"帶有唯一索引的臨時表"的方式來移除重覆記錄,保證最終查詢結果的準確性。
對於查詢:
SELECT *
FROM Country
WHERE Country.code IN (
SELECT City.Country
FROM City
WHERE City.Population > 0.33 * Country.Population
AND City.Population > 1*1000*1000
);
可以改寫為:
CREATE tmp_Country LIKE Country;
INSERT IGNORE INTO tmp_Country
SELECT Country.*
FROM Country
INNER JOIN City
ON Country.code = City.Country
WHERE City.Population > 0.33 * Country.Population
AND City.Population > 1*1000*1000
SELECT * FROM tmp_Country;
如圖所示:
使用Duplicate Weedout 策略時,MariaDB 查詢計劃為:
explain select * from Country where Country.code IN (select City.Country from City where City.Population > 0.33 * Country.Population and City.Population > 1*1000*1000)\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: City
type: range
possible_keys: Population,Country
key: Population
key_len: 4
ref: NULL
rows: 238
Extra: Using index condition; Start temporary
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: Country
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 3
ref: world.City.Country
rows: 1
Extra: Using where; End temporary
2 rows in set (0.00 sec)
學習總結
First Match策略通過"找到第一條匹配記錄即返回"的方式來跳過無效子查詢掃描。
除First Match策略外都是子查詢轉換為關聯查詢來優化提升查詢效率,按照不同查詢場景採用不同策略來"避免重覆記錄":
- Table Pullout策略,通過唯一索引和主鍵索引邏輯來確認"子查詢結果集"中重覆記錄。
- Materialization策略,通過臨時表來移除"子查詢結果集"中重覆記錄。
- Loose Scan策略,通過Index Loose Scan特性來跳過"子查詢結果集"中重覆記錄。
- Duplicate Weedout策略,通過臨時表來將移除"關聯查詢結果集"中重覆記錄。
參考資料
Semi-join Subquery Optimizations