在前一篇博文數據壓縮簡要的基礎上,我希望把數據壓縮評估自動化。於是有了這篇博文。 白皮書推薦對符合如下條件的大型表和索引使用頁壓縮: 表或索引的掃描操作占到所有操作的75%及以上時 表或索引的更新操作占到所有操作的20%及以下時 註意,這是白皮書中的結論和建議,只做參考,最為最佳實踐的考慮點之一。 ...
在前一篇博文數據壓縮簡要的基礎上,我希望把數據壓縮評估自動化。於是有了這篇博文。
白皮書推薦對符合如下條件的大型表和索引使用頁壓縮:
- 表或索引的掃描操作占到所有操作的75%及以上時
- 表或索引的更新操作占到所有操作的20%及以下時
註意,這是白皮書中的結論和建議,只做參考,最為最佳實踐的考慮點之一。
此腳本的原始作者是Louis Li。但是它的腳本有一些限制,我在這此基礎上做了修改:
- 輔助表由用戶表改成臨時表
- 只分析頁數大於1000的分區
- 判斷範圍擴大到所有的表和索引,而不只是堆和聚集索引
- 判斷粒度改成分區級別。
- 增加各分區使用空間的統計
- 修改生成語句,增加提高性能的選項: MAXDOP=8,SORT_IN_TEMPDB=ON
- 修改過濾條件。原來只分析Scan大於75%的分區,這樣流水日誌類型的表(S~=0%,U~=0%)會被過濾掉。改成(S>75%或者Update<20%)的。
下麵腳本會找出符合以下條件的對象並生成相應的壓縮數據腳本。
1. 掃描當前資料庫的所有索引,找出同時符合下麵條件的索引:
- 索引的頁數超過1000
- 索引的SELECT操作在所有操作中的占比高於75%或者索引的UPDATE操作在所有操作中的占比小於20%
註意此處的粒度是基於分區的。所以如果表和索行,做了分區會在分區級別上做出判斷。
2. 對於被上一步找出的索引,分別評估頁和行壓縮能節省的空間(用百分比表示)。
3. 對比行和頁壓縮的數據,進行推薦。對於沒有UPDATE操作或者頁壓縮節省的空間比行壓縮多10%,則推薦頁壓縮。其餘索引都推薦行壓縮。
4. 腳本的結果分為兩部分,第一部分是推薦的壓縮的索引,第二部分是推薦壓縮的方式和相應腳本。
--Collect all index stats if object_id('tempdb..#index_estimates') is not null drop table #index_estimates go create table #index_estimates ( database_name sysname not null, [schema_name] sysname not null, table_name sysname not null, index_id int not null, partition_number int not null, update_pct decimal(5,2) not null, select_pct decimal(5,2) not null, used_size_kb int not null, constraint pk_index_estimates primary key (database_name,[schema_name],table_name,index_id,partition_number) ) ; go insert into #index_estimates select db_name() as database_name, schema_name(t.schema_id) as [schema_name], t.name, i.index_id, p.partition_number, i.leaf_update_count * 100.0 / (i.leaf_delete_count + i.leaf_insert_count + i.leaf_update_count + i.range_scan_count + i.singleton_lookup_count + i.leaf_page_merge_count) as UpdatePct, i.range_scan_count * 100.0 / (i.leaf_delete_count + i.leaf_insert_count + i.leaf_update_count + i.range_scan_count + i.singleton_lookup_count + i.leaf_page_merge_count) as SelectPct ,p.used_page_count*8 as used_size_kb from sys.dm_db_index_operational_stats(db_id(),null,null,null) i inner join sys.tables t on i.object_id = t.object_id inner join sys.dm_db_partition_stats p on i.object_id = p.object_id and i.index_id=p.index_id and i.partition_number=p.partition_number where i.leaf_delete_count + i.leaf_insert_count + i.leaf_update_count + i.range_scan_count + i.singleton_lookup_count + i.leaf_page_merge_count > 0 and p.used_page_count >= 1000 -- only consider tables contain more than 1000 pages --and i.index_id<2 --only consider heap and clustered index and ( (i.range_scan_count / (i.leaf_delete_count + i.leaf_insert_count + i.leaf_update_count + i.range_scan_count + i.singleton_lookup_count + i.leaf_page_merge_count) > .75 or (i.range_scan_count/ (i.leaf_delete_count + i.leaf_insert_count + i.leaf_update_count + i.range_scan_count + i.singleton_lookup_count + i.leaf_page_merge_count))< .2 )) order by t.name, i.index_id go --show data compression candidates select * from #index_estimates; --Prepare 2 intermediate tables for row compression and page compression estimates if OBJECT_ID('tempdb..#page_compression_estimates') is not null drop table #page_compression_estimates; go create table #page_compression_estimates ([object_name] sysname not null, [schema_name] sysname not null, index_id int not null, partition_number int not null, [size_with_current_compression_setting(KB)] bigint not null, [size_with_requested_compression_setting(KB)] bigint not null, [sample_size_with_current_compression_setting(KB)] bigint not null, [sample_size_with_requested_compression_setting(KB)] bigint not null, constraint pk_page_compression_estimates primary key ([object_name],[schema_name],index_id,partition_number) ); go if OBJECT_ID('tempdb..#row_compression_estimates') is not null drop table #row_compression_estimates; go create table #row_compression_estimates ([object_name] sysname not null, [schema_name] sysname not null, index_id int not null, partition_number int not null, [size_with_current_compression_setting(KB)] bigint not null, [size_with_requested_compression_setting(KB)] bigint not null, [sample_size_with_current_compression_setting(KB)] bigint not null, [sample_size_with_requested_compression_setting(KB)] bigint not null, constraint pk_row_compression_estimates primary key ([object_name],[schema_name],index_id,partition_number) ); go --Use cursor and dynamic sql to get estimates 9:18 on my laptop declare @script_template nvarchar(max) = 'insert ###compression_mode##_compression_estimates exec sp_estimate_data_compression_savings ''##schema_name##'',''##table_name##'',##index_id##,##partition_number##,''##compression_mode##'''; declare @executable_script nvarchar(max); declare @schema sysname, @table sysname, @index_id smallint ,@partition_number smallint,@compression_mode nvarchar(20); declare cur cursor fast_forward for select i.[schema_name], i.[table_name], i.index_id, i.partition_number, em.estimate_mode from #index_estimates i cross join (values('row'),('page')) AS em(estimate_mode) group by i.[schema_name], i.[table_name], em.estimate_mode, i.index_id, i.partition_number; open cur; fetch next from cur into @schema, @table,@index_id,@partition_number, @compression_mode; while (@@FETCH_STATUS=0) begin set @executable_script = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@script_template,'##schema_name##',@schema),'##table_name##',@table),'##compression_mode##',@compression_mode),'##index_id##',@index_id),'##partition_number##',@partition_number); print @executable_script; exec(@executable_script); fetch next from cur into @schema,@table,@index_id,@partition_number, @compression_mode; end close cur; deallocate cur; --Show estimates and proposed data compression with all_estimates as ( select '[' + i.schema_name + '].[' + i.table_name + ']' as table_name, case when i.index_id > 0 then '[' + idx.name + ']' else null end as index_name, i.partition_number, i.select_pct, i.update_pct, case when r.[size_with_current_compression_setting(KB)] > 0 then 100 - r.[size_with_requested_compression_setting(KB)] * 100.0 / r.[size_with_current_compression_setting(KB)] else 0.0 end as row_compression_saving_pct, case when p.[size_with_current_compression_setting(KB)] > 0 then 100 - p.[size_with_requested_compression_setting(KB)] * 100.0 / p.[size_with_current_compression_setting(KB)] else 0.0 end as page_compression_saving_pct, (case when ps.name is null then 0 else 1 end) as is_partitioned from #index_estimates i inner join #row_compression_estimates r on i.schema_name = r.schema_name and i.table_name = r.object_name and i.index_id = r.index_id inner join #page_compression_estimates p on i.schema_name = p.schema_name and i.table_name = p.object_name and i.index_id = p.index_id inner join sys.indexes idx on i.index_id = idx.index_id and object_name(idx.object_id) = i.table_name left join sys.partition_schemes ps on idx.data_space_id=ps.data_space_id ), recommend_compression as ( select table_name, index_name, select_pct, update_pct, row_compression_saving_pct, page_compression_saving_pct, partition_number, is_partitioned, case when update_pct = 0 then 'Page' when update_pct >= 20 then 'Row' when update_pct > 0 and update_pct < 20 and page_compression_saving_pct - row_compression_saving_pct < 10 then 'Row' else 'Page' end as recommended_data_compression from all_estimates where row_compression_saving_pct > 0 and page_compression_saving_pct > 0 ) select table_name, index_name, select_pct, update_pct, cast(row_compression_saving_pct as decimal(5,2)) as row_compression_saving_pct, cast(page_compression_saving_pct as decimal(5,2)) as page_compression_saving_pct, recommended_data_compression, case when index_name is null and is_partitioned =0 then 'ALTER TABLE ' + table_name + ' REBUILD WITH ( data_compression = ' + recommended_data_compression + ',MAXDOP=8)' when index_name is null and is_partitioned =1 then 'ALTER TABLE ' + table_name + ' REBUILD PARTITION='+CAST(partition_number AS VARCHAR(2))+' WITH ( data_compression = ' + recommended_data_compression + ',MAXDOP=8)' when index_name is not null and is_partitioned =0 then 'ALTER INDEX ' + index_name + ' ON ' + table_name + ' REBUILD WITH (data_compression = ' + recommended_data_compression + ',MAXDOP=8,SORT_IN_TEMPDB=ON)' when index_name is not null and is_partitioned =1 then 'ALTER INDEX ' + index_name + ' ON ' + table_name + ' REBUILD PARTITION='+CAST(partition_number AS VARCHAR(2))+' WITH ( data_compression = ' + recommended_data_compression + ',MAXDOP=8,SORT_IN_TEMPDB=ON)' end collate database_default as [statement] from recommend_compression order by table_name --Clean up drop table #index_estimates; drop table #page_compression_estimates; drop table #row_compression_estimates; GOEvaluate Data Compression
註意:
這個腳本的分析時長由要分析對象的數量和數據量決定。可能你會發現,這個跟在SSMS中的Storage-Compression中評估值有一些差別。兩種方式都使用的是sp_estimate_data_compression_savings,但是SSMS中不會指定@index_id參數,所以它評估的表中或者分區中所有對象的總合,這對於多個索引的表是非常不准確的。
總結:
1. 此腳本,我在很多生產環境中已經使用,均表現正常。但是如果你使用此腳本,請認真評估再使用。
2. 數據壓縮還會跟複製,AlwaysOn,列存儲等相互影響,這又是另一個故事了。
3. 數據壓縮不會壓縮行外的LOB數據。如果要壓縮只能在程式端壓縮,或者使用FileStream+壓縮捲。SQL Server 2016提供了新的函數COMPRESS/DECOMPRESS來壓縮單個數據,但不是用來解決行外LOB壓縮問題的。