在關係型 DB中,分區表經常使用DateKey(int 數據類型)作為Partition Column,每個月的數據填充到同一個Partition中,由於在Fore-End呈現的報表大多數是基於Month的查詢,按照Month分區的設計能夠提高查詢性能,但是,如果,前任DBA沒有創建Schedule ...
在關係型 DB中,分區表經常使用DateKey(int 數據類型)作為Partition Column,每個月的數據填充到同一個Partition中,由於在Fore-End呈現的報表大多數是基於Month的查詢,按照Month分區的設計能夠提高查詢性能,但是,如果,前任DBA沒有創建Schedule來維護Partition Function,不能增加新的Partition,所有新增加的數據都會插入到最後一個Partition中,導致最後一個Partition填充的數據量十分大,降低了分區表的查詢性能。
一,最佳實踐(Best Practices )
微軟建議,避免對已填充的分區執行split或merge操作。在分區表的兩端都保持空的分區(Empty Partition),第一個分區和最後一個分區是Empty Partition。通過對尾端的Empty Partition進行Split操作,就能在尾端增加分區,並且不會產生數據移動;當將數據從前端的第二個分區中歸檔後,第一個分區和第二個分區都是Empty Partition,Empty Partition進行Merge操作,不會產生數據的移動。
Always keep empty partitions at both ends of the partition range to guarantee that the partition split (before loading new data) and partition merge (after unloading old data) do not incur any data movement. Avoid splitting or merging populated partitions. This can be extremely inefficient, as this may cause as much as four times more log generation, and may also cause severe locking.
因此,受此Best Practices的影響,DB開發人員在創建分區表時,一般的做法是只創建特定數量的分區,並且只保持兩端是Empty Partition,例如:
CREATE PARTITION FUNCTION [funcPartition_DateKey](int) AS RANGE RIGHT FOR VALUES (20100701, 20100801, <.....> , 20150501, 20150601) GO CREATE PARTITION SCHEME [schePartition_DataKey] AS PARTITION [funcPartition_DateKey] TO ([PRIMARY], <....>, [PRIMARY]) GO
改進建議:在SQL Server 2012中,在一個Table或索引上,最多能夠創建15 000個分區(In SQL Server 2012, a table or index can have a maximum of 15,000 partitions),未雨綢繆,我們可以按照月份,一次性創建360個分區,每月一個分區,每年12個分區,服務30年,一勞永逸,當然,也可以創建Schedule,前提是,你必須保證Schedule按時執行成功。
如果,必須要對已填充的分區表,增加分區,要怎麼做?
方法1,直接修改Partition Function,通過拆分末端Partition來增加分區
由於很多 Big Table 使用相同的Partition Schema進行分區,簡單地從末端Partition為起點,逐個增加分區,在短時間內會產生海量的Disk IO操作,對系統產生極大的影響,例如
declare @DateKey int set @DateKey=20150701 while @DateKey<20200101 begin alter partition scheme [schePartition_DataKey] Next Used [primary];
alter partition function [funcPartition_DateKey]() split range(@DateKey);
set @DateKey=cast(convert(nvarchar(8),dateadd(month,1,cast(cast(@DateKey as nvarchar(8)) as date)),112) as int); end
方法2,更改分區表的Partition Schema
創建新的Partition function 和 Partition Schema,逐個Table修改其Partition Schema,這個方法(Workaround),雖然實現過程比較繁瑣,但是對系統性能的副作用最小,將影響控制在當前操作的Target Table。
Script1,創建新的Partition設計
--create Partition function declare @DateKeyList varchar(max) declare @DateKey int --set initial DateKey set @DateKey=20140701;
while @DateKey<20200101 begin set @DateKeyList=coalesce(@DateKeyList+','+cast(@DateKey as varchar(8)),cast(@DateKey as varchar(8))) --Increase iterator set @DateKey=cast(convert(nvarchar(8),dateadd(month,1,cast(cast(@DateKey as nvarchar(8)) as date)),112) as int); end --print DateKey List --select @DateKeyList declare @sql nvarchar(max) set @sql=N' CREATE PARTITION FUNCTION [funcPartition_new_DateKey](int) AS RANGE RIGHT FOR VALUES ('+@DateKeyList+N');' EXEC sys.sp_executesql @sql GO --create partition schema CREATE PARTITION SCHEME [schePartition_new_DataKey] AS PARTITION [funcPartition_new_DateKey] all TO ([PRIMARY]); GO
Script2,逐個更新Table的Patition Schema
由於Patition Table中,可能存在不止一個Index,在rebuild table時,使用 drop_existing=on 能夠減少分區表上nonclustered indexes的重建時間。
--rebuild table create unique clustered index [PK__SchemaName_TableName_KeyColumn] on SchemaName.TableName([KeyColumn],[CreatedDateKey]) with(data_compression=page,drop_existing=on) on [schePartition_new_DataKey]([CreatedDateKey]); --rebuild columnstore index CREATE NONCLUSTERED COLUMNSTORE INDEX [idx_ColumnStore_SchemaName_TableName] ON [SchemaName].[TableName] ( column list.... )
三,在同一個文件組中創建分區
在一個文件組中創建表的所有分區,每個分區在物理上都是獨立的存儲對象,只不過這些獨立的存儲對象位於同一個FileGroup。
1,創建Patition Schema時,使用 ALL 關鍵字指定只能指定一個FileGroup,所有的Partition 都創建在同一個FileGroup上;在Patition Schema創建成功之後,預設會將該FileGroup標記為Next Used
ALL Specifies that all partitions map to the filegroup provided in file_group_name, or to the primary filegroup if [PRIMARY] is specified. If ALL is specified, only one file_group_name can be specified.
If [PRIMARY] is specified, the partition is stored on the primary filegroup. If ALL is specified, only one file_group_name can be specified. Partitions are assigned to filegroups, starting with partition 1, in the order in which the filegroups are listed in [,...n]. The same file_group_name can be specified more than one time in [,...n].
2,在Patition schema中,只有一個FileGroup會被指定(Mark)為Next Used
如果FileGroup被指定為Next Used,意味著分區表的下一個分區將會創建在該FileGroup上。在創建Patition Schema時,指定 ALL關鍵字,不僅指定將表的所有分區都創建在同一個FileGroup上,而且,還將該FileGroup指定為Next Used。
If ALL is specified, the sole file_group_name maintains its NEXT USED property for this partition_function_name. The NEXT USED filegroup will receive an additional partition if one is created in an ALTER PARTITION FUNCTION statement.
3,在執行Patition Split 操作時,必須存在一個FileGroup被指定為Next Used,否則,Split 操作失敗
在指定Next Used時,要註意:
- 在創建Partition Scheme時,指定ALL關鍵字,只會將一個FileGroup指定為Next Used;
- 可以多次指定Next Used,最後一次指定的FileGroup是Next Used;
- 一個FileGroup可以被多次指定為Next Used;即使該File Group已經用於存儲(Hold)分區的數據; A filegroup that already holds partitions can be designated to hold additional partitions.
- 在完成一次Partition Split操作之後,之前的Next Used 已被使用,當前不存在被指定為Next Used的FileGroup,必須顯式指定一個FileGroup作為新的Next Used,才能繼續執行Partition Split操作;
If you create all the partitions in the same filegroup, that filegroup is initially assigned to be the NEXT USED filegroup automatically. However, after a split operation is performed, there is no longer a designated NEXT USED filegroup. You must explicitly assign the filegroup to be the NEXT USED filegroup by using ALTER PARITION SCHEME or a subsequent split operation will fail.
顯式將一個FileGroup 指定為 Next Used
ALTER PARTITION SCHEME partition_scheme_name
NEXT USED [ filegroup_name ] [ ; ]
Appendix
使用Alter Partition Function 命令執行拆分或合併分區的操作,每次操作,只能拆分一個,或合併一個分區:
ALTER PARTITION FUNCTION partition_function_name() { SPLIT RANGE ( boundary_value ) | MERGE RANGE ( boundary_value ) } [ ; ]
ALTER PARTITION FUNCTION repartitions any tables and indexes that use the function in a single atomic operation. However, this operation occurs offline, and depending on the extent of repartitioning, may be resource-intensive.
參考doc:
Rebuilding Existing Partitioned Tables to a New Partition Scheme
ALTER PARTITION FUNCTION (Transact-SQL)
ALTER PARTITION SCHEME (Transact-SQL)