本文分享自華為雲社區《GaussDB(DWS)性能調優:表掃描時過濾行數過多引起的性能瓶頸問題案例》,作者: O泡果奶~ 。 1、【問題描述】 SQL語句執行過程中,對12億數據量的大表進行掃描,過濾99%的數據僅留617行數據,性能瓶頸位於掃描該表這裡。 2、【原始語句】 set search_p ...
本文分享自華為雲社區《GaussDB(DWS)性能調優:表掃描時過濾行數過多引起的性能瓶頸問題案例》,作者: O泡果奶~ 。
1、【問題描述】
SQL語句執行過程中,對12億數據量的大表進行掃描,過濾99%的數據僅留617行數據,性能瓶頸位於掃描該表這裡。
2、【原始語句】
set search_path = 'bi_dashboard'; WITH F_SRV_DB_DIM_PRD_D AS (SELECT EXTERNAL_NAME FROM ( SELECT MKT_NAME EXTERNAL_NAME FROM BI_DASHBOARD.DM_MSS_ITEM_PRODUCT_D PRD WHERE PRD.COMPANY_BRAND =any(array[string_to_array('HUAWEI',',')]) AND PRD.MKT_NAME =any(array[string_to_array('暢享 60,暢享 50,暢享 60X,暢享 60 Pro,暢享 50 Pro,暢享 50z,nova 10z,暢享 20e,暢享20 Pro,暢享 10e,暢享10 Plus,暢享20 SE,暢享10,nova 11i,暢享20 Plus,暢享9 Plus,暢享20 5G,nova Y90,暢享 10S,nova Y70,暢享Z,暢享 9S,nova 8 SE 活力版,麥芒9 5G,Y9s,麥芒9 5G',',')]) ) WHERE EXTERNAL_NAME<>'SNULL' GROUP BY EXTERNAL_NAME), V_PERIOD AS ( SELECT PERIOD_ID AS PERIOD_ID_M, LEAST(TO_CHAR(PERIOD_END_DATE, 'YYYYMMDD'), '20230630') AS PERIOD_ID, PERIOD_ID AS DATES FROM BI_DASHBOARD.RPT_TML_ACCOUNT_PERIOD_D WHERE PERIOD_TYPE = 'M' AND PERIOD_ID BETWEEN 202207 AND 202306 ), V_DATA_BASE AS ( SELECT A.PERIOD_ID, IFNULL(A.CHANNEL_NAME, 'SNULL') AS DISTRIBUTOR_CHANNEL_NAME, SUM(A.SO_QTY_MTD) AS SO_QTY, SUM(DECODE(A.PERIOD_ID, 20230630, A.SO_QTY_MTD)) AS SO_QTY_ORDER select count(*) FROM DM_MSS_CN_PC_REP_RP_ST_D_F A INNER JOIN F_SRV_DB_DIM_PRD_D PRD ON A.EXTERNAL_NAME = PRD.EXTERNAL_NAME WHERE 1 = 1 AND A.CHANNEL_ID IN ('100013388802') AND A.ORG_KEY IN (10000651) AND A.SALES_FLAG IN ('1', '0') AND A.PERIOD_ID IN (20220731,20221031,20220930,20220831,20221130,20221231,20230131,20230228,20230430,20230331,20230531,20230630) AND (A.SO_QTY_MTD <> 0) -- 過濾所有日期SO_QTY為0的數據 GROUP BY A.PERIOD_ID, IFNULL(A.CHANNEL_NAME, 'SNULL') ), V_DATA AS ( SELECT PERIOD_ID, NVL(DISTRIBUTOR_CHANNEL_NAME, 'Total') AS DISTRIBUTOR_CHANNEL_NAME, SUM(SO_QTY) AS SO_QTY, SUM(SO_QTY_ORDER) AS SO_QTY_ORDER FROM V_DATA_BASE A GROUP BY GROUPING SETS ((PERIOD_ID), (PERIOD_ID, DISTRIBUTOR_CHANNEL_NAME)) ) SELECT STRING_AGG(P.DATES, ',' ORDER BY P.PERIOD_ID_M) AS PERIOD_LIST, B.DISTRIBUTOR_CHANNEL_NAME, STRING_AGG(NVL(TO_CHAR(ROUND(A.SO_QTY)), '0'), ',' ORDER BY P.PERIOD_ID_M) AS SO_QTY FROM V_PERIOD P FULL JOIN (SELECT DISTINCT DISTRIBUTOR_CHANNEL_NAME FROM V_DATA) B ON 1 = 1 LEFT JOIN V_DATA A ON A.PERIOD_ID = P.PERIOD_ID AND A.DISTRIBUTOR_CHANNEL_NAME = B.DISTRIBUTOR_CHANNEL_NAME GROUP BY B.DISTRIBUTOR_CHANNEL_NAME ORDER BY DECODE(B.DISTRIBUTOR_CHANNEL_NAME, 'Total', 0, 'SOURCE IS NULL', 2, '源為空', 3, 'SNULL', 4, 1), SUM(A.SO_QTY_ORDER) DESC NULLS LAST LIMIT 50 OFFSET 0
3、【性能分析】
從上圖的performance執行計劃中可以看出(完整執行計劃放在附件一),該SQL語句慢在掃描表a(bi_dashboard.dm_mss_cn_pc_rep_rp_st_d_f_test)。掃描時過濾條件包括:sales_flag、so_qty_mtd、channel_id、org_key、period_id,該表上原本的局部聚簇鍵PCK只包含了period_id,並沒有包括其餘三個過濾條件之一,因此,可以調整PCK,以減少掃描表a的執行時間。
補充:局部聚簇鍵
局部聚簇 (Partial Cluster Key, 簡稱PCK),列存儲下一種通過min/max稀疏索引實現基表快速掃描的索引技術。Partial Cluster Key可以指定多列,但是一般不建議超過2列。PCK適用於列存大表點查詢加速。另外,查看語句中where條件中in值較多(12個),在DWS中,in後面的條件預設就只能是5個,超過6個就過濾不下推,此時,可以用or將12個值改寫,
A.PERIOD_ID IN (20220731,20221031,20220930,20220831,20221130) or A.PERIOD_ID IN (20221231,20230131,20230228,20230430,20230331) or A.PERIOD_ID IN (20230531,20230630)
此時,SQL語句執行時間減少為487ms,完整performance計劃如附件二所示。
- 附件:優化後—performance.txt 466.64KB
- 附件:優化前—performance.txt 449.47KB