1.綜述 Hive的聚合函數衍生的視窗函數在我們進行數據處理和數據分析過程中起到了很大的作用 在Hive中,視窗函數允許你在結果集的行上進行計算,這些計算不會影響你查詢的結果集的行數。 Hive提供的視窗和分析函數可以分為聚合函數類視窗函數,分組排序類視窗函數,偏移量計算類視窗函數。 本節主要介紹聚 ...
1.Online DDL Space Requirements
Disk space requirements for online DDL operations are outlined【ˈaʊtlaɪnd 概述;略述;顯示…的輪廓;勾勒…的外形;】 below. The requirements do not apply to operations that are performed instantly.
• Temporary log files:
A temporary log file records concurrent DML when an online DDL operation creates an index or alters a table. The temporary log file is extended as required by the value of innodb_sort_buffer_size up to a maximum specified by innodb_online_alter_log_max_size. If the operation takes a long time and concurrent DML modifies the table so much that the size of the temporary log file exceeds the value of innodb_online_alter_log_max_size, the online DDL operation fails with a DB_ONLINE_LOG_TOO_BIG error, and uncommitted concurrent DML operations are rolled back. A large innodb_online_alter_log_max_size setting permits more DML during an online DDL operation, but it also extends the period of time at the end of the DDL operation when the table is locked to apply logged DML.
The innodb_sort_buffer_size variable also defines the size of the temporary log file read buffer and write buffer.
• Temporary sort files:
Online DDL operations that rebuild the table write temporary sort files to the MySQL temporary directory ($TMPDIR on Unix, %TEMP% on Windows, or the directory specified by --tmpdir) during index creation. Temporary sort files are not created in the directory that contains the original table. Each temporary sort file is large enough to hold one column of data, and each sort file is removed when its data is merged into the final table or index. Operations involving temporary sort files may require temporary space equal to the amount of data in the table plus indexes. An error is reported if online DDL operation uses all of the available disk space on the file system where the data directory resides.
If the MySQL temporary directory is not large enough to hold the sort files, set tmpdir to a different directory. Alternatively, define a separate temporary directory for online DDL operations using innodb_tmpdir. This option was introduced to help avoid temporary directory overflows that could occur as a result of large temporary sort files.
• Intermediate table files:
Some online DDL operations that rebuild the table create a temporary intermediate table file in the same directory as the original table. An intermediate table file may require space equal to the size of the original table. Intermediate table file names begin with #sql-ib prefix and only appear briefly during the online DDL operation.
The innodb_tmpdir option is not applicable to intermediate table files.
2. Online DDL Memory Management
Online DDL operations that create or rebuild secondary indexes allocate temporary buffers during different phases of index creation. The innodb_ddl_buffer_size variable, introduced in MySQL 8.0.27, defines the maximum buffer size for online DDL operations. The default setting is 1048576 bytes (1 MB). The setting applies to buffers created by threads executing online DDL operations. Defining an appropriate buffer size limit avoids potential out of memory errors for online DDL operations that create or rebuild secondary indexes. The maximum buffer size per DDL thread is the maximum buffer size divided by the number of DDL threads (innodb_ddl_buffer_size/innodb_ddl_threads).
Prior to MySQL 8.0.27, innodb_sort_buffer_size variable defines the buffer size for online DDL operations that create or rebuild secondary indexes.
3. Configuring Parallel Threads for Online DDL Operations
The workflow of an online DDL operation that creates or rebuilds a secondary index involves:
• Scanning the clustered index and writing data to temporary sort files
• Sorting the data
• Loading sorted data from the temporary sort files into the secondary index
The number of parallel【ˈpærəlel 平行的;並行的;對應的;相應的;同時發生的;極相似的;】threads that can be used to scan clustered index is defined by the innodb_parallel_read_threads variable. The default setting is 4. The maximum setting is 256, which is the maximum number for all sessions. The actual number of threads that scan the clustered index is the number defined by the innodb_parallel_read_threads setting or the number of index subtrees to scan, whichever is smaller. If the thread limit is reached, sessions fall back to using a single thread.
The number of parallel threads that sort and load data is controlled by the innodb_ddl_threads variable, introduced in MySQL 8.0.27. The default setting is 4. Prior to MySQL 8.0.27, sort and load operations are single-threaded.
The following limitations apply:
• Parallel threads are not supported for building indexes that include virtual 【ˈvɜːrtʃuəl 虛擬的;】columns.
• Parallel threads are not supported for full-text index creation.
• Parallel threads are not supported for spatial 【ˈspeɪʃl 空間的;】index creation.
• Parallel scan is not supported on tables defined with virtual【ˈvɜːrtʃuəl 虛擬的;】columns.
• Parallel scan is not supported on tables defined with a full-text index.
• Parallel scan is not supported on tables defined with a spatial【ˈspeɪʃl 空間的;】index.
4. Simplifying DDL Statements with Online DDL
Before the introduction of online DDL, it was common practice to combine many DDL operations into a single ALTER TABLE statement. Because each ALTER TABLE statement involved【ɪnˈvɑːlvd 參與;卷入的;關註;有關聯;關係密切;複雜難解的;作為一部分;耗費很多時間;】 copying and rebuilding the table, it was more efficient to make several changes to the same table at once, since【因為,由於】those changes could all be done with a single rebuild operation for the table. The downside【ˈdaʊnsaɪd 負面,不利方面;底側;】 was that SQL code involving DDL operations was harder to maintain and to reuse in different scripts. If the specific changes were different each time, you might have to construct a new complex ALTER TABLE for each slightly different scenario【səˈnærioʊ 方案;設想;預測;腳本;(電影或戲劇的)劇情梗概;】.
For DDL operations that can be done online, you can separate them into individual ALTER TABLE statements for easier scripting and maintenance, without sacrificing efficiency. For example, you might take a complicated statement such as:
ALTER TABLE t1 ADD INDEX i1(c1), ADD UNIQUE INDEX i2(c2), CHANGE c4_old_name c4_new_name INTEGER UNSIGNED;
and break it down into simpler parts that can be tested and performed independently, such as:
ALTER TABLE t1 ADD INDEX i1(c1); ALTER TABLE t1 ADD UNIQUE INDEX i2(c2); ALTER TABLE t1 CHANGE c4_old_name c4_new_name INTEGER UNSIGNED NOT NULL;
You might still use multi-part ALTER TABLE statements for:
• Operations that must be performed in a specific sequence, such as creating an index followed by a foreign key constraint that uses that index.
• Operations all using the same specific LOCK clause, that you want to either succeed or fail as a group.
• Operations that cannot be performed online, that is, that still use the table-copy method.
• Operations for which you specify ALGORITHM=COPY or old_alter_table=1, to force the table-copying behavior if needed for precise backward-compatibility in specialized scenarios.
5.Online DDL Failure Conditions
The failure of an online DDL operation is typically due to【due to 由於;由於,因為;】 one of the following conditions:
• An ALGORITHM clause specifies an algorithm that is not compatible【kəmˈpætəbl相容的,可共存的;<生>親和的;協調一致的;(因志趣等相投而)關係好的,和睦相處的;合得來的,意氣相投的;可共用的;】with the particular type of DDL operation or storage engine.
• A LOCK clause specifies a low degree of locking (SHARED or NONE) that is not compatible with the particular type of DDL operation.
• A timeout occurs while waiting for an exclusive lock on the table, which may be needed briefly during the initial and final phases of the DDL operation.
• The tmpdir or innodb_tmpdir file system runs out of disk space, while MySQL writes temporary sort files on disk during index creation.
• The operation takes a long time and concurrent DML modifies the table so much that the size of the temporary online log exceeds the value of the innodb_online_alter_log_max_size configuration option. This condition causes a DB_ONLINE_LOG_TOO_BIG error.
• Concurrent DML makes changes to the table that are allowed with the original table definition, but not with the new one. The operation only fails at the very end, when MySQL tries to apply all the changes from concurrent DML statements. For example, you might insert duplicate values into a column while a unique index is being created, or you might insert NULL values into a column while creating a primary key index on that column. The changes made by the concurrent DML take precedence, and the ALTER TABLE operation is effectively rolled back.
6.Online DDL Limitations
The following limitations apply to online DDL operations:
• The table is copied when creating an index on a TEMPORARY TABLE.
• The ALTER TABLE clause LOCK=NONE is not permitted if there are ON...CASCADE or ON...SET NULL constraints on the table.
• Before an in-place online DDL operation can finish, it must wait for transactions that hold metadata locks on the table to commit or roll back. An online DDL operation may briefly【ˈbriːfli 簡要地;簡短地;短暫地;暫時地;】 require an exclusive metadata lock on the table during its execution phase, and always requires one in the final phase of the operation when updating the table definition. Consequently【ˈkɑːnsɪkwentli 因此;所以;】, transactions holding metadata locks on the table can cause an online DDL operation to block. The transactions that hold metadata locks on the table may have been started before or during the online DDL operation. A long running or inactive【ɪnˈæktɪv 不活躍的;無效的;不活動的;未使用的;無行動的;無作用的;不運轉的;】 transaction that holds a metadata lock on the table can cause an online DDL operation to timeout.
• When running an in-place online DDL operation, the thread that runs the ALTER TABLE statement applies an online log of DML operations that were run concurrently on the same table from other connection threads. When the DML operations are applied, it is possible to encounter a duplicate key entry error (ERROR 1062 (23000): Duplicate entry), even if the duplicate entry is only temporary and would be reverted by a later entry in the online log. This is similar to the idea of a foreign key constraint check in InnoDB in which constraints must hold during a transaction.
• OPTIMIZE TABLE for an InnoDB table is mapped to an ALTER TABLE operation to rebuild the table and update index statistics and free unused space in the clustered index. Secondary indexes are not created as efficiently because keys are inserted in the order they appeared in the primary key. OPTIMIZE TABLE is supported with the addition of online DDL support for rebuilding regular and partitioned InnoDB tables.
• Tables created before MySQL 5.6 that include temporal columns (DATE, DATETIME or TIMESTAMP) and have not been rebuilt using ALGORITHM=COPY do not support ALGORITHM=INPLACE. In this case, an ALTER TABLE ... ALGORITHM=INPLACE operation returns the following error:
ERROR 1846 (0A000): ALGORITHM=INPLACE is not supported. Reason: Cannot change column type INPLACE. Try ALGORITHM=COPY.
• The following limitations are generally applicable to online DDL operations on large tables that involve rebuilding the table:
- There is no mechanism to pause an online DDL operation or to throttle I/O or CPU usage for an online DDL operation.
- Rollback of an online DDL operation can be expensive should the operation fail.
- Long running online DDL operations can cause replication lag. An online DDL operation must finish running on the source before it is run on the replica. Also, DML that was processed concurrently on the source is only processed on the replica after the DDL operation on the replica is completed.