SQL子查詢的感悟_ZenDei技術網路在線

SQL子查詢的感悟

-Advertisement-

今天在聽陳華軍老師的課時；感觸頗多。其中講到“不同執行計劃的選擇（子查詢）”這一欄。我們在平時工作也經常要用到子查詢。有哪些思路來優化這種子查詢呢？例如我們今天實驗的表結構表T1 有10000條記錄;併在id欄位創建btree索引表T2 有1000條記錄 postgres=# create t ...

今天在聽陳華軍老師的課時；感觸頗多。其中講到“不同執行計劃的選擇（子查詢）”這一欄。我們在平時工作也經常要用到子查詢。有哪些思路來優化這種子查詢呢？

例如我們今天實驗的表結構

表T1 有10000條記錄;併在id欄位創建btree索引
表T2 有1000條記錄

postgres=# create table t1(id int primary key, info text, reg_time timestamp);
CREATE TABLE
postgres=# create table t2(id int, name text);
CREATE TABLE
postgres=# insert into t1 select generate_series(1, 10000),'lottu', now();
INSERT 0 10000
postgres=# insert into t2 select (random()*1000)::int, 'lottu'||id  from generate_series(1,1000) id;
INSERT 0 1000
postgres=# create index ind_t1_id on t1(id);
CREATE INDEX

實驗對象SQL;

select * from t1 where id in (select id from t2);

SQL語法改造

我們先看下這SQL的執行計劃

postgres=# explain (analyze,verbose,costs,timing) select * from t1 where id in (select id from t2);
                                                             QUERY PLAN                                                             
----------------------------------------------------------------------
 Merge Join  (cost=54.25..99.73 rows=628 width=18) (actual time=1.319..2.365 rows=628 loops=1)
   Output: t1.id, t1.info, t1.reg_time
   Inner Unique: true
   Merge Cond: (t1.id = t2.id)
   ->  Index Scan using ind_t1_id on public.t1  (cost=0.29..337.29 rows=10000 width=18) (actual time=0.014..0.421 rows=997 loops=1)
         Output: t1.id, t1.info, t1.reg_time
   ->  Sort  (cost=53.97..55.54 rows=628 width=4) (actual time=1.298..1.387 rows=628 loops=1)
         Output: t2.id
         Sort Key: t2.id
         Sort Method: quicksort  Memory: 54kB
         ->  HashAggregate  (cost=18.50..24.78 rows=628 width=4) (actual time=0.730..0.877 rows=628 loops=1)
               Output: t2.id
               Group Key: t2.id
               ->  Seq Scan on public.t2  (cost=0.00..16.00 rows=1000 width=4) (actual time=0.013..0.267 rows=1000 loops=1)
                     Output: t2.id
 Planning Time: 0.454 ms
 Execution Time: 2.507 ms
(17 rows)

從該執行計劃可以看到很多信息;

其中獲取的行數只有628條;
執行時間是2.507ms;
兩表之間採用Merge Join；由於t2表沒有索引且無須存放；需要使用記憶體進行排序。

若採用join的方式

如果子查詢被迴圈執行導致SQL慢，可嘗試改成等價的join；

postgres=# explain (analyze,verbose,costs,timing) select t1,* from t1 , t2 where t1.id = t2.id ;
                                                             QUERY PLAN                                                             
------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=66.11..117.17 rows=1000 width=72) (actual time=0.601..2.184 rows=1000 loops=1)
   Output: t1.*, t1.id, t1.info, t1.reg_time, t2.id, t2.name
   Merge Cond: (t1.id = t2.id)
   ->  Index Scan using ind_t1_id on public.t1  (cost=0.29..337.29 rows=10000 width=60) (actual time=0.021..0.726 rows=997 loops=1)
         Output: t1.*, t1.id, t1.info, t1.reg_time
   ->  Sort  (cost=65.83..68.33 rows=1000 width=12) (actual time=0.573..0.721 rows=1000 loops=1)
         Output: t2.id, t2.name
         Sort Key: t2.id
         Sort Method: quicksort  Memory: 71kB
         ->  Seq Scan on public.t2  (cost=0.00..16.00 rows=1000 width=12) (actual time=0.013..0.226 rows=1000 loops=1)
               Output: t2.id, t2.name
 Planning Time: 0.288 ms
 Execution Time: 2.421 ms
(13 rows)

性能有點提升；其實兩個SQL之間不等價；因為T2有重覆id；導致最後的結果集是1000條；而非上面的628.

採用array的方式改寫

postgres=# explain (analyze,verbose,costs,timing) select * from t1 where id = any(array(select id from t2));
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Index Scan using ind_t1_id on public.t1  (cost=16.29..59.03 rows=10 width=18) (actual time=0.418..1.108 rows=628 loops=1)
   Output: t1.id, t1.info, t1.reg_time
   Index Cond: (t1.id = ANY ($0))
   InitPlan 1 (returns $0)
     ->  Seq Scan on public.t2  (cost=0.00..16.00 rows=1000 width=4) (actual time=0.014..0.127 rows=1000 loops=1)
           Output: t2.id
 Planning Time: 0.106 ms
 Execution Time: 1.178 ms
(8 rows)

結果跟SQL1是等價的；用時只有1.178ms；且未用記憶體；效果最優。選它準沒錯

思路轉換

前面我們t2表只有1000條記錄，且id小於1000；若我們t2表有1000000條甚至更多；且ID也沒有限制。

select * from t1 where id in (select id from t2 where id <= 1000);
或者
with t as
(select id from t2 where id <= 1000)
select t1.* from t1 where id in (select id from t);

我相信很多人還是會採用這種寫法。這些寫不好；雖然你一個SQL搞定；但是效率慢。這是有人說你可以在t2表建個索引；這個是可以的；效率確實提升很多。若t2沒有這個索引；你沒必要單獨為這個需求創建一個索引。

我建議可以用一個子表用來存放；

select id from t2 where id <= 1000);

子表：你可以用臨時表/表/物化視圖。

這樣的優勢；減少多次掃描t2表的數據塊；只要掃描一次即可

您的分享是我們最大的動力!

-Advertisement-

更多相關文章

Linux命令（7）rpm命令

[toc] rpm是什麼？ rpm的全稱為 The RPM Package Manager ，是RHEL系操作系統的軟體包管理器，這些軟體包的尾碼為.rpm。 RPM命令用於在Linux系統上安裝，卸載，升級，查詢，列出和檢查RPM軟體包。安裝 i，安裝軟體包 v，顯示命令執行過程 h，輸出進度條 ...
ubuntu修改鍵盤鍵位映射

目標鍵位：Caps Lock映射為Control L，Control L映射為Escape，Escape映射為Caps Lock 方法：修改/usr/share/X11/xkb/keycodes/evdev，使 <CAPS> = 9 <LCTL> = 66 <ESC> = 37 執行 sudo dp ...
Vi 和 Vim 的使用

Vi （Visual Interface）是 Linux下基於Shell 的文本編輯器，Vim （Visual Interface iMproved）是 Vi的增強版本，擴展了很多功能，比如對程式源文件的語法高亮。不管是 Vi 還是 Vim，我們習慣上都管它叫 Vi，但實際上用的更多的是 Vim。 ...
Shell腳本關於迴圈的一些總結

不管是哪一門電腦語言，迴圈都是不可繞開的一個話題，Shell 當然也不是例外。下麵總結一些 Shell 腳本里常用的迴圈相關的知識點，新手朋友可以參考。 for 迴圈 Shell 腳本里最簡單的迴圈當屬迴圈，有編程基礎的朋友應該都有使用過 for 迴圈。最簡單的 for 迴圈如下所示，你只需將變 ...
Linux 計劃任務摘要

Linux計劃任務中對應的時間含義及指令指令 : --linux定時任務 crontab -e # 創建自己的一個任務調度，此時會進入到vi編輯界面，來編寫我們要調度的任務 crontab -l # 列出定時的任務時間對應關係 : 55 7 * * * csh -c "/home/dpower/ ...
安裝和使用ArchLiunx超詳細教程

安裝系統一、下載介質 https://www.archlinux.org/download/ 二、啟動 ISO 到Live 環境此步驟由很多種方式：製作ISO為U盤啟動工具，可以使用 Ultra ISO 或大白菜有Linux/Unix系統的，可以硬碟寫入Grub，製作啟動項三、安裝前的準 ...
Redis學習筆記（十）客戶端

Redis伺服器是典型的一對多伺服器程式：一個伺服器可以與多個客戶端建立網路連接，每個客戶端可以向伺服器發送命令請求，而伺服器則接收並處理客戶端發送的命令請求，並向客戶端返回命令回覆。通過使用由I/O多路復用技術實現的文件事件處理器，Redis伺服器使用單線程單進程的方式處理命令請求，並於多個客戶 ...
【漫畫】CAS原理分析！無鎖原子類也能解決併發問題！

本文來源於微信公眾號【胖滾豬學編程】、轉載請註明出處在漫畫併發編程系統博文中，我們講了N篇關於鎖的知識，確實，鎖是解決併發問題的萬能鑰匙，可是併發問題只有鎖能解決嗎？今天要出場一個大BOSS：CAS無鎖演算法，可謂是併發編程核心中的核心！溫故首先我們再回顧一下原子性問題的原因，參考 "【漫畫】J ...