在SQL Server 2017的錯誤日誌中出現"Parallel redo is started for database 'xxx' with worker pool size [2]"和“Parallel redo is shutdown for database 'xxx' with wor... ...
在SQL Server 2017的錯誤日誌中出現"Parallel redo is started for database 'xxx' with worker pool size [2]"和“Parallel redo is shutdown for database 'xxx' with worker pool size [2].”這種信息,這意味著什麼呢? 如下所示
Date 2020/5/16 11:07:38
Log SQL Server (Current - 2020/5/16 11:08:00)
Source spid33s
Message
Parallel redo is started for database 'YourSQLDba' with worker pool size [2].
Date 2020/5/16 11:07:38
Log SQL Server (Current - 2020/5/16 11:08:00)
Source spid33s
Message
Parallel redo is shutdown for database 'YourSQLDba' with worker pool size [2].
其實這個要涉及parallel redo這個概念,官方文檔有詳細介紹,摘抄部分如下【詳情請見參考資料】:
When availability group was initially released with SQL Server 2012, the transaction log redo was handled by a single redo thread for each database in an AG secondary replica. This redo model is also called as serial redo. In SQL Server 2016, the redo model was enhanced with multiple parallel redo worker threads per database to share the redo workload. In addition, each database has a new helper worker thread for handling the dirty page disk flush IO. This new redo model is called parallel redo. With the new parallel redo model that is the default setting since SQL Server 2016, workloads with highly concurrent small transactions are expected to achieve better redo performance. When the transaction redo operation is CPU intensive, such as when data encryption and/or data compression are enabled, parallel redo has even higher redo throughput (Redone Bytes/sec) compared to serial redo. Moreover, indirect checkpoint allows parallel redo to offload more disk IO (and IO waits for slow disk) to its helper worker thread and frees main redo thread to enumerate more received log records in secondary replica. It further speeds up the redo performance. However parallel redo, which enables multi-threading model, has an associated cost.
其實錯誤日誌中出現這些信息,這是在SQL Server 2017中添加的與可用性組的並行重做(Parallel redo)相關的信息性日誌消息。我們的SQL Server實例是單實例,並不是AG中的一個節點,怎麼會有parallel redo的信息呢? 其實資料庫沒有參與AG,所以在資料庫啟動的時候,該資料庫的parallel redo線程啟動,然後資料庫檢查發現並沒有可用性組。那麼就會關閉parallel redo的線程。
所以在資料庫實例重啟過後,你會在錯誤日誌看到“Parallel redo is started for database 'xxxx' with worker pool size [2].” 這樣的輸出信息,然後立馬又會看到“Parallel redo is shutdown for database 'xxxx' with worker pool size [2].”.
其實呢,還有一種情況,就是你的用戶數據設置開啟了AUTO_CLOSE選項。如下所示,我將資料庫的YourSQLDba的AUTO_CLOSE開啟。
USE [master]
GO
ALTER DATABASE [YourSQLDba] SET AUTO_CLOSE ON WITH NO_WAIT
GO
SELECT d.name AS database_name
,SUSER_SNAME(owner_sid) AS database_owner
,d.create_date AS create_date
,d.collation_name AS collcation_name
,d.state_desc AS state_desc
,d.is_auto_close_on AS is_auto_close_on
FROM sys.databases d
如下所示,當會話訪問此資料庫,就會出現大量這樣的日誌信息。此時可以通過將資料庫AUTO_CLOSE選項關閉,就不會在錯誤日誌中出現大量這樣的信息,但是在SQL Server實例啟動的時候,你還是還是會看到這些日誌信息
我們可以通過啟用跟蹤標記3459來關閉parallel redo這個功能。註意,這個跟蹤標記(trace flag)僅僅適用於SQL Server 2016/2017或更高的版本。建議在資料庫實例啟動時通過使用 -T 命令行選項來啟用全局跟蹤標誌。 這樣可確保跟蹤標誌在伺服器重新啟動後保持活動狀態。 若要讓跟蹤標誌生效,請重啟 SQL Server。
另外,註意關於parallel redo在特定版本有個Bug:“FIX: Parallel redo does not work after you disable Trace Flag 3459 in an instance of SQL Server”,希望你不在測試過程中命中了這個Bug,否則會影響測試結果(具體版本信息,請閱讀參考資料的官方鏈接)
Assume that you use Always On Availability Groups in Microsoft SQL Server. After you switch to serial redo from parallel redo by enabling Trace Flag 3459, serial redo works as expected. However, when you switch back to parallel redo by disabling Trace Flag 3459, parallel redo does not work. If you restart the instance of SQL Server, parallel redo works as expected.
參考資料:
https://docs.microsoft.com/zh-cn/archive/blogs/sql_server_team/sql-server-20162017-availability-group-secondary-replica-redo-model-and-performance
https://dba.stackexchange.com/questions/239181/messages-about-parallel-redo
https://support.microsoft.com/en-us/help/4339858/fix-parallel-redo-does-not-work-after-you-disable-trace-flag-3459-in-a