在in\not in\except這種查詢結構時,如果涉及到的數據量較大,建議堅決用連接left join/inner join等替換掉,否則查詢效率十分低下。 ...
新系統上線,用戶基數16萬,各種查詢timeout。打開砂鍋問到底,直接看sql語句吧,都是淚呀,一大堆in\not in\except。這裡總結一下,怎麼替換掉in\not in\except。
1. in/except->left join
查詢目的:
根據
- 客戶表(Customer,按照站點、冊本劃分,16萬數據)
- 水錶表(Meter,16萬數據)
- 水錶抄表數據表(Meter_Data,遠傳表每天更新,27萬數據)
關聯查詢,查詢某天某個冊本下水錶未上傳抄表數據的用戶。
原查詢結構
select *
from Customer cs
where
cs.Group_No = '冊本編號' and
cs.Customer_No in
(
select Customer_No
from Customer cs
left join Meter me on cs.Customer_No = me.Customer_No
where cs.Group_No = '冊本編號'
except
select Customer_No
from Customer cs
left join Meter me on cs.Customer_No = me.Customer_No
inner join Meter_data md on me.meter_no = md.meter_no and md.date = '2019-04-09'
where cs.Group_NO='冊本編號'
)
原查詢思路
- 查詢出目標冊本已上傳數據的用戶編號
select Customer_No
from Customer cs
left join Meter me on cs.Customer_No = me.Customer_No
inner join Meter_data md on me.meter_no = md.meter_no and md.date = '2019-04-09'
where cs.Group_NO='冊本編號'
- 查詢出目標冊本全部用戶編號
select Customer_No
from Customer cs
left join Meter me on cs.Customer_No = me.Customer_No
where cs.Group_No = '冊本編號'
- 全部用戶編號中排除已上傳數據的用戶編號,即為未上傳數據的用戶編號
全部用戶編號 except 已抄表的用戶編號
- 查詢出在未抄表用戶編號集合中的用戶信息。
select *
from Customer cs
where
cs.Group_No = '冊本編號' and
cs.Customer_No in
(全部用戶編號 except 已抄表的用戶編號)
思路倒是沒有問題,但是in+except查詢效率不要太慢了,本來想測試個時間,結果執行了幾分鐘愣是沒出結果,直接終止掉了
優化查詢結構
其實in\not in\except這些語法在查詢中使用,效率不高是公認的事實,但是可能是由於語義比較明顯吧,很多人還是喜歡這樣用。我們這裡使用left join來替代in+except。這裡就來改掉上面的查詢:
select cs.*
from Customer cs
left join Meter me on cs.Customer_No = me.Customer_No
left join Meter_data md on me.meter_no = md.meter_no and md.date = '2019-04-09'
where cs.Group_NO='冊本編號' and md.meter_no is null;
優化查詢思路
- 用left join代替in+except,通過left join獲取目標冊本下全部用戶的信息,並與當天上傳的抄表數據進行連接;
- 連接中,右表為空即抄表數據為空的,即為當前未上傳數據的客戶信息;
left join on expression where expression 執行時,首先確保左表數據全部返回,然後應用on後指定的條件。因此,on的條件如果是對左表數據的過濾,是無效的;對右表數據的過濾是有效的。對左表數據的過濾條件,需要放到where條件中。
2. not in->left join
上面in+except的寫法,可以使用not in簡化一下,但是一樣效率不高。這裡想要說明的是not in也可以很方便的使用left join替換。
not in結構
select *
from Customer cs
where
cs.Group_No = '冊本編號' and
cs.Customer_No not in
(
select Customer_No
from Customer cs
left join Meter me on cs.Customer_No = me.Customer_No
inner join Meter_data md on me.meter_no = md.meter_no and md.date = '2019-04-09'
where cs.Group_NO='冊本編號'
)
left join結構
select cs.*
from Customer cs
left join Meter me on cs.Customer_No = me.Customer_No
left join Meter_data md on me.meter_no = md.meter_no and md.date = '2019-04-09'
where cs.Group_NO='冊本編號' and md.meter_no is null;
3. in->inner join
查詢目的
還是上面的查詢背景,這裡查詢某天某個冊本已經上傳抄表數據的用戶信息。
in結構
select *
from Customer cs
where
cs.Group_No = '冊本編號' and
cs.Customer_No in
(
select Customer_No
from Customer cs
left join Meter me on cs.Customer_No = me.Customer_No
inner join Meter_data md on me.meter_no = md.meter_no and md.date = '2019-04-09'
where cs.Group_NO='冊本編號'
)
這裡使用in不夠高效,但是我們使用left join是否可以呢?
left join結構
select cs.*
from Customer cs
left join Meter me on cs.Customer_No = me.Customer_No
left join Meter_data md on me.meter_no = md.meter_no and md.date = '2019-04-09'
where cs.Group_NO='冊本編號' and md.meter_no is not null;
left join結構的話,這裡需要使用is not null作為篩選條件。但是is not null同樣非常低效。因此我們使用inner join
inner join結構
select cs.*
from Customer cs
left join Meter me on cs.Customer_No = me.Customer_No
inner join Meter_data md on me.meter_no = md.meter_no and md.date = '2019-04-09'
where cs.Group_NO='冊本編號';
inner join通過連接操作,直接獲取到已上傳抄表數據的用戶信息。
4. not in -> in -> inner join
前面的查詢場景中,我們預設的條件是未上傳抄表數據的用戶,當天在meter_data表是沒有記錄的。現在假設我們每天凌晨初始化meter_data表,設置抄表數值預設為零,抄表數據上傳預設為state=0未上傳。上傳後,更新抄表數值和抄表狀態state=1。
這時,我們來優化上面的not in查詢結構還有另外一種思路。
not in結構
select *
from Customer cs
where
cs.Group_No = '冊本編號' and
cs.Customer_No not in
(
select Customer_No
from Customer cs
left join Meter me on cs.Customer_No = me.Customer_No
inner join Meter_data md on me.meter_no = md.meter_no and md.date = '2019-04-09'
where cs.Group_NO='冊本編號' and meter.state=1
)
in結構
通過篩選條件取反,變換not in->in
select *
from Customer cs
where
cs.Group_No = '冊本編號' and
cs.Customer_No in
(
select Customer_No
from Customer cs
left join Meter me on cs.Customer_No = me.Customer_No
inner join Meter_data md on me.meter_no = md.meter_no and md.date = '2019-04-09'
where cs.Group_NO='冊本編號' and meter.state=0
)
inner join結構
select cs.*
from Customer cs
left join Meter me on cs.Customer_No = me.Customer_No
inner join Meter_data md on me.meter_no = md.meter_no and md.date = '2019-04-09'
where cs.Group_NO='冊本編號' and meter.state=0;
5. 總結如下
上面的查詢結構拆分出來後,大家可能覺得這麼簡單的sql怎麼可能寫成這個沙雕。其實真實業務系統,還有關聯其他將近10張表。這裡想說的是,在in\not in\except這種查詢結構時,如果涉及到的數據量較大,建議堅決用連接替換。
- ... in (all except sub)... 查詢結構可以轉換為->left join
- ... not in ... 查詢結構可以轉換為->left join
- ... not in ... 查詢也可以轉換為 in -> inner join,這裡需要確認轉換查詢條件時,是否有對應的數據
- ... in 查詢結構可以轉換為->inner join