本文分享自華為雲社區《【SQL優化】為什麼有時候無法走執行性能更優的hashjoin》,作者: leapdb。 1. hash join通常優於nestloop join 通常nestloop join的複雜度是O(N方),hash join時間複雜度是O(N),所以我們一般傾向於使用hash jo ...
本文分享自華為雲社區《【SQL優化】為什麼有時候無法走執行性能更優的hashjoin》,作者: leapdb。
1. hash join通常優於nestloop join
通常nestloop join的複雜度是O(N方),hash join時間複雜度是O(N),所以我們一般傾向於使用hash join。 在SQL腳本調優過程中通常有兩種方式,強制走hash join方式:1. 在session級關閉nestloop方式,set enable_nestloop to off;
2. 在SQL中通過 /*+ hashjoin(a b) */ 方式,讓a和b表走hash join;CREATE DATABASE test_td WITH DBCOMPATIBILITY='td'; create table dim_day(day_code char(8)); create table dwr_rpo as select current_date - 1 as day_code; --返回了date類型 test_td=# \d+ dwr_rpo Table "public.dwr_rpo" Column | Type | Modifiers | Storage | Stats target | Description ----------+------+-----------+---------+--------------+------------- day_code | date | | plain | | Has OIDs: no Distribute By: ROUND ROBIN Location Nodes: ALL DATANODES Options: orientation=row, compression=no explain select * from dwr_rpo a left join dim_day c on c.day_code = a.day_code; id | operation | E-rows | E-distinct | E-memory | E-width | E-costs ---+----------------------------------------------+---------+------------+----------+---------+-------------- 1 | -> Streaming (type: GATHER) | 1310148 | | | 1694 | 279235196.70 2 | -> Nested Loop Left Join (3, 4) | 1310148 | | 1MB | 1694 | 279229682.93 3 | -> Seq Scan on dwr_rpo a | 1310148 | | 1MB | 1676 | 46589.16 4 | -> Materialize | 109575 | | 16MB | 22 | 3747.76 5 | -> Streaming(type: BROADCAST) | 109575 | | 2MB | 22 | 3565.14 6 | -> Seq Scan on dim_day c | 36525 | | 1MB | 22 | 272.75 Predicate Information (identified by plan id) ----------------------------------------------------------------------------- 2 --Nested Loop Left Join (3, 4) Join Filter: ((c.day_code)::timestamp without time zone = a.day_code)可是,以上SQL無論用哪種方式都走不上hash join。我們需要看一下,join兩端的數據類型是否支持hash比較。 1. 為什麼有時候無法走執行性能更優的hashjoin
不同數據類型計算hash函數不同,互不相容的數據類型無法進行hash比較。
2. 為什麼hashjoin秒級,nestloop需要兩個小時nestloop複雜度:131w * 10w = 1310億
hashjoin複雜度:131w
所以兩種方式性能差距很大。 3. 為什麼有類型轉換,還不能hash join看似類型相近,但由於兩端的精度,格式,有無時區等不一樣,無法認為直接相等。
4. 都哪些數據類型間的join不支持hash?select oprname,oprkind,oprcanhash, (select typname from pg_type where oid=oprleft) oprleft, (select typname from pg_type where oid=oprright) oprright from pg_operator where oprname='=' and oprcanhash='f'; oprname | oprkind | oprcanhash | oprleft | oprright ---------+---------+------------+---------------+--------------- = | b | f | xid | int8 = | b | f | xid32 | int4 = | b | f | tid | tid = | b | f | box | box = | b | f | path | path = | b | f | tinterval | tinterval = | b | f | money | money = | b | f | circle | circle = | b | f | lseg | lseg = | b | f | line | line = | b | f | bit | bit = | b | f | varbit | varbit = | b | f | date | timestamp = | b | f | date | timestamptz = | b | f | timestamp | date = | b | f | timestamptz | date = | b | f | timestamp | timestamptz = | b | f | timestamptz | timestamp = | b | f | tsvector | tsvector = | b | f | tsquery | tsquery = | b | f | record | record = | b | f | hll | hll = | b | f | hll_hashval | hll_hashval = | b | f | roaringbitmap | roaringbitmap (24 rows)主要是timestamp, timestamptz, date間互相join是無法走hash。其它數據類型不常見。 開發建議:join兩端的數據類型儘量一致或互相相容。 5. 為什麼 oracle 相容模式沒有問題,td相容模式有問題?
current_date 在TD相容模式下為date類型;
current_date 在Oracle相容模式下為timestamp類型;