簡介 資料庫索引對於優化資料庫性能至關重要。它們通過提供表中行的快速訪問路徑來幫助加快數據檢索速度。瞭解索引的工作原理、類型及其最佳實踐可以顯著提高資料庫查詢的效率。 什麼是索引? 索引是一種數據結構,可以提高資料庫表上數據檢索操作的速度。它就像書中的索引一樣,讓您無需掃描整個文本即可快速找到信息。 ...
書接上回
• Recursive (JSON property: recursive)
This indicates that the row applies to the recursive SELECT part of a recursive common table expression.
• Rematerialize (JSON property: rematerialize)
Rematerialize (X,...) is displayed in the EXPLAIN row for table T, where X is any lateral【ˈlætərəl 橫向的;側面的;向側面移動的;】 derived table whose rematerialization is triggered when a new row of T is read. For example:
SELECT ... FROM t, LATERAL (derived table that refers to t) AS dt ...
The content of the derived table is rematerialized to bring it up to date each time a new row of t is processed by the top query.
• Scanned N databases (JSON property: message)
This indicates how many directory scans the server performs when processing a query for INFORMATION_SCHEMA tables.The value of N can be 0, 1, or all.
• Select tables optimized away (JSON property: message)
The optimizer determined 1) that at most one row should be returned, and 2) that to produce this row, a deterministic【dɪˌtɜːrmɪˈnɪstɪk (思想、解釋等)基於決定論的;(力量、因素)不可抗拒的,不可逆轉的;】 set of rows must be read. When the rows to be read can be read during the optimization phase (for example, by reading index rows), there is no need to read any tables during query execution.
The first condition is fulfilled when the query is implicitly grouped (contains an aggregate function but no GROUP BY clause). The second condition is fulfilled when one row lookup is performed per index used. The number of indexes read determines the number of rows to read.
Consider the following implicitly grouped query:
SELECT MIN(c1), MIN(c2) FROM t1;
Suppose that MIN(c1) can be retrieved by reading one index row and MIN(c2) can be retrieved by reading one row from a different index. That is, for each column c1 and c2, there exists an index where the column is the first column of the index. In this case, one row is returned, produced by reading two deterministic rows.
This Extra value does not occur if the rows to read are not deterministic. Consider this query:
SELECT MIN(c2) FROM t1 WHERE c1 <= 10;
Suppose that (c1, c2) is a covering index. Using this index, all rows with c1 <= 10 must be scanned to find the minimum c2 value. By contrast【ˈkɑːntræst , kənˈtræst 對比;對照;(電視屏幕的)圖像明暗對比度,反差;(攝影或繪畫中的)顏色反差,明暗對比;明顯的差異;明顯不同的人(或事物);】, consider this query:
SELECT MIN(c2) FROM t1 WHERE c1 = 10;
In this case, the first index row with c1 = 10 contains the minimum c2 value. Only one row must be read to produce the returned row.
For storage engines that maintain an exact row count per table (such as MyISAM, but not InnoDB), this Extra value can occur for COUNT(*) queries for which the WHERE clause is missing or always true and there is no GROUP BY clause. (This is an instance of an implicitly grouped query where the storage engine influences whether a deterministic number of rows can be read.)
• Skip_open_table, Open_frm_only, Open_full_table (JSON property: message)
These values indicate file-opening optimizations that apply to queries for INFORMATION_SCHEMA tables.
• Skip_open_table: Table files do not need to be opened. The information is already available from the data dictionary.
• Open_frm_only: Only the data dictionary need be read for table information.
• Open_full_table: Unoptimized information lookup. Table information must be read from the data dictionary and by reading table files.
• Start temporary, End temporary (JSON property: message)
This indicates temporary table use for the semijoin Duplicate Weedout【wiːd aʊt 淘汰;清除;剔除;】 strategy.
• unique row not found (JSON property: message)
For a query such as SELECT ... FROM tbl_name, no rows satisfy the condition for a UNIQUE index or PRIMARY KEY on the table.
• Using filesort (JSON property: using_filesort)
MySQL must do an extra pass to find out how to retrieve the rows in sorted order. The sort is done by going through all rows according to the join type and storing the sort key and pointer to the row for all rows that match the WHERE clause. The keys then are sorted and the rows are retrieved in sorted order.
• Using index (JSON property: using_index)
The column information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row. This strategy can be used when the query uses only columns that are part of a single index.
For InnoDB tables that have a user-defined clustered index, that index can be used even when Using index is absent from the Extra column. This is the case if type is index and key is PRIMARY.
Information about any covering indexes used is shown for EXPLAIN FORMAT=TRADITIONAL and EXPLAIN FORMAT=JSON. Beginning with MySQL 8.0.27, it is also shown for EXPLAIN FORMAT=TREE.
• Using index condition (JSON property: using_index_condition)
Tables are read by accessing index tuples and testing them first to determine whether to read full table rows. In this way, index information is used to defer (“push down”) reading full table rows unless it is necessary.
• Using index for group-by (JSON property: using_index_for_group_by)
Similar to the Using index table access method, Using index for group-by indicates that MySQL found an index that can be used to retrieve all columns of a GROUP BY or DISTINCT query without any extra disk access to the actual table. Additionally, the index is used in the most efficient way so that for each group, only a few index entries are read.
• Using index for skip scan (JSON property: using_index_for_skip_scan)
Indicates that the Skip Scan access method is used.
• Using join buffer (Block Nested Loop), Using join buffer (Batched Key Access), Using join buffer (hash join) (JSON property: using_join_buffer)
Tables from earlier joins are read in portions into the join buffer, and then their rows are used from the buffer to perform the join with the current table. (Block Nested Loop) indicates use of the Block Nested-Loop algorithm, (Batched Key Access) indicates use of the Batched Key Access algorithm, and (hash join) indicates use of a hash join. That is, the keys from the table on the preceding line of the EXPLAIN output are buffered, and the matching rows are fetched in batches from the table represented by the line in which Using join buffer appears.
In JSON-formatted output, the value of using_join_buffer is always one of Block Nested Loop, Batched Key Access, or hash join.
Hash joins are available beginning with MySQL 8.0.18; the Block Nested-Loop algorithm is not used in MySQL 8.0.20 or later MySQL releases.
• Using MRR (JSON property: message)
Tables are read using the Multi-Range Read optimization strategy.
• Using sort_union(...), Using union(...), Using intersect(...) (JSON property: message)
These indicate the particular algorithm showing how index scans are merged for the index_merge join type.
• Using temporary (JSON property: using_temporary_table)
To resolve the query, MySQL needs to create a temporary table to hold the result. This typically happens if the query contains GROUP BY and ORDER BY clauses that list columns differently.
• Using where (JSON property: attached_condition)
A WHERE clause is used to restrict which rows to match against the next table or send to the client. Unless you specifically intend to fetch or examine all rows from the table, you may have something wrong in your query if the Extra value is not Using where and the table join type is ALL or index.
Using where has no direct counterpart in JSON-formatted output; the attached_condition property contains any WHERE condition used.
• Using where with pushed condition (JSON property: message)
This item applies to NDB tables only. It means that NDB Cluster is using the Condition Pushdown optimization to improve the efficiency of a direct comparison between a nonindexed column and a constant. In such cases, the condition is “pushed down” to the cluster's data nodes and is evaluated on all data nodes simultaneously. This eliminates the need to send nonmatching rows over the network, and can speed up such queries by a factor of 5 to 10 times over cases where Condition Pushdown could be but is not used.
• Zero limit (JSON property: message)
The query had a LIMIT 0 clause and cannot select any rows.
2.4 EXPLAIN Output Interpretation
You can get a good indication【ˌɪndɪˈkeɪʃn 指示;表明;顯示;標示;象徵;】 of how good a join is by taking the product of the values in the rows column of the EXPLAIN output. This should tell you roughly【ˈrʌfli 大致;大約;粗略地;差不多;粗暴地;粗魯地;粗糙地;凹凸不平地;】 how many rows MySQL must examine to execute the query. If you restrict queries with the max_join_size system variable, this row product also is used to determine which multiple-table SELECT statements to execute and which to abort.
The following example shows how a multiple-table join can be optimized progressively【prəˈɡresɪvli 逐步地;愈益;持續穩定地;】 based on the information provided by EXPLAIN.
Suppose that you have the SELECT statement shown here and that you plan to examine it using EXPLAIN:
EXPLAIN SELECT tt.TicketNumber, tt.TimeIn, tt.ProjectReference, tt.EstimatedShipDate, tt.ActualShipDate, tt.ClientID, tt.ServiceCodes, tt.RepetitiveID, tt.CurrentProcess, tt.CurrentDPPerson, tt.RecordVolume, tt.DPPrinted, et.COUNTRY, et_1.COUNTRY, do.CUSTNAME FROM tt, et, et AS et_1, do WHERE tt.SubmitTime IS NULL AND tt.ActualPC = et.EMPLOYID AND tt.AssignedPC = et_1.EMPLOYID AND tt.ClientID = do.CUSTNMBR;
For this example, make the following assumptions【əˈsʌmpʃənz 假設;假定;擔任;(責任的)承擔;(權力的)獲得;】:
• The columns being compared have been declared as follows.
• The tables have the following indexes.
• The tt.ActualPC values are not evenly【ˈiːvnli 均勻;平均地;平靜地;平滑地;相等地;有規律地;鎮靜地;】 distributed.
Initially【ɪˈnɪʃəli 開始;最初;起初;】, before any optimizations have been performed, the EXPLAIN statement produces the following information:
table type possible_keys key key_len ref rows Extra et ALL PRIMARY NULL NULL NULL 74 do ALL PRIMARY NULL NULL NULL 2135 et_1 ALL PRIMARY NULL NULL NULL 74 tt ALL AssignedPC, NULL NULL NULL 3872 Range checked for each record (index map: 0x23) ClientID, ActualPC
Because type is ALL for each table, this output indicates that MySQL is generating【ˈdʒenəreɪtɪŋ 產生;引起;】 a Cartesian【kɑːrˈtiːziən 笛卡爾坐標系;笛卡爾;直角;笛卡爾的;笛卡兒積;】 product of all the tables; that is, every combination of rows. This takes quite a long time, because the product of the number of rows in each table must be examined. For the case at hand, this product is 74 × 2135 × 74 × 3872 = 45,268,558,720 rows. If the tables were bigger, you can only imagine how long it would take.
One problem here is that MySQL can use indexes on columns more efficiently if they are declared as the same type and size. In this context, VARCHAR and CHAR are considered the same if they are declared as the same size. tt.ActualPC is declared as CHAR(10) and et.EMPLOYID is CHAR(15), so there is a length mismatch.
To fix this disparity【dɪˈspærəti 差距;(尤指因不公正對待引起的)不同,不等,差異,懸殊;】 between column lengths, use ALTER TABLE to lengthen ActualPC from 10 characters to 15 characters:
mysql> ALTER TABLE tt MODIFY ActualPC VARCHAR(15);
Now tt.ActualPC and et.EMPLOYID are both VARCHAR(15). Executing the EXPLAIN statement again produces this result:
table type possible_keys key key_len ref rows Extra tt ALL AssignedPC, NULL NULL NULL 3872 Using where ClientID, ActualPC do ALL PRIMARY NULL NULL NULL 2135 Range checked for each record (index map: 0x1) et_1 ALL PRIMARY NULL NULL NULL 74 Range checked for each record (index map: 0x1) et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1
This is not perfect, but is much better: The product of the rows values is less by a factor of 74. This version executes in a couple of seconds.
A second alteration can be made to eliminate the column length mismatches for the tt.AssignedPC = et_1.EMPLOYID and tt.ClientID = do.CUSTNMBR comparisons:
mysql> ALTER TABLE tt MODIFY AssignedPC VARCHAR(15), MODIFY ClientID VARCHAR(15);
After that modification, EXPLAIN produces the output shown here:
table type possible_keys key key_len ref rows Extra et ALL PRIMARY NULL NULL NULL 74 tt ref AssignedPC, ActualPC 15 et.EMPLOYID 52 Using Where ClientID, ActualPC et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1 do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1
At this point, the query is optimized almost as well as possible. The remaining problem is that, by default, MySQL assumes that values in the tt.ActualPC column are evenly distributed, and that is not the case for the tt table. Fortunately, it is easy to tell MySQL to analyze the key distribution:
mysql> ANALYZE TABLE tt;
With the additional index information, the join is perfect and EXPLAIN produces this result:
table type possible_keys key key_len ref rows Extra tt ALL AssignedPC NULL NULL NULL 3872 Using Where ClientID, ActualPC et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1 et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1 do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1
The rows column in the output from EXPLAIN is an educated【ˈedʒukeɪtɪd 受過…教育(或訓練)的;受過良好教育(或訓練)的;有教養的;上過…學校的;】 guess from the MySQL join optimizer. Check whether the numbers are even close to the truth by comparing the rows product with the actual number of rows that the query returns. If the numbers are quite different, you might get better performance by using STRAIGHT_JOIN in your SELECT statement and trying to list the tables in a different order in the FROM clause. (However, STRAIGHT_JOIN may prevent indexes from being used because it disables semijoin transformations.
It is possible in some cases to execute statements that modify data when EXPLAIN SELECT is used with a subquery.
3 Extended EXPLAIN Output Format
The EXPLAIN statement produces extra (“extended”) information that is not part of EXPLAIN output but can be viewed by issuing a SHOW WARNINGS statement following EXPLAIN. As of MySQL 8.0.12, extended information is available for SELECT, DELETE, INSERT, REPLACE, and UPDATE statements. Prior to 8.0.12, extended information is available only for SELECT statements.
The Message value in SHOW WARNINGS output displays how the optimizer qualifies【ˈkwɑːlɪfaɪz 合格;使合格;有權,使有權(做某事);使具備資格;取得資格(或學歷);】 table and column names in the SELECT statement, what the SELECT looks like after the application of rewriting and optimization rules, and possibly other notes about the optimization process.
The extended information displayable with a SHOW WARNINGS statement following EXPLAIN is produced only for SELECT statements. SHOW WARNINGS displays an empty result for other explainable statements (DELETE, INSERT, REPLACE, and UPDATE).
Here is an example of extended EXPLAIN output:
mysql> EXPLAIN SELECT t1.a, t1.a IN (SELECT t2.a FROM t2) FROM t1\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: t1 type: index possible_keys: NULL key: PRIMARY key_len: 4 ref: NULL rows: 4 filtered: 100.00 Extra: Using index *************************** 2. row *************************** id: 2 select_type: SUBQUERY table: t2 type: index possible_keys: a key: a key_len: 5 ref: NULL rows: 3 filtered: 100.00 Extra: Using index 2 rows in set, 1 warning (0.00 sec) mysql> SHOW WARNINGS\G *************************** 1. row *************************** Level: Note Code: 1003 Message: /* select#1 */ select `test`.`t1`.`a` AS `a`, <in_optimizer>(`test`.`t1`.`a`,`test`.`t1`.`a` in ( <materialize> (/* select#2 */ select `test`.`t2`.`a` from `test`.`t2` where 1 having 1 ), <primary_index_lookup>(`test`.`t1`.`a` in <temporary table> on <auto_key> where ((`test`.`t1`.`a` = `materialized-subquery`.`a`))))) AS `t1.a IN (SELECT t2.a FROM t2)` from `test`.`t1` 1 row in set (0.00 sec)
Because the statement displayed by SHOW WARNINGS may contain special markers to provide information about query rewriting or optimizer actions, the statement is not necessarily valid SQL and is not intended to be executed. The output may also include rows with Message values that provide additional non-SQL explanatory notes about actions taken by the optimizer.
The following list describes special markers that can appear in the extended output displayed by SHOW WARNINGS:
• <auto_key>
An automatically generated key for a temporary table.
• (expr)
The expression (such as a scalar subquery) is executed once and the resulting value is saved in memory for later use. For results consisting of multiple values, a temporary table may be created and is shown instead.
• (query fragment)
The subquery predicate【ˈpredɪkət , ˈpredɪkeɪt 斷言;使基於;使以…為依據;表明;闡明;】 is converted to an EXISTS predicate and the subquery is transformed so that it can be used together with the EXISTS predicate.
• <in_optimizer>(query fragment)
This is an internal optimizer object with no user significance.
• <index_lookup>(query fragment)
The query fragment is processed using an index lookup to find qualifying rows.
• (condition, expr1, expr2)
If the condition is true, evaluate to expr1, otherwise expr2.
• <is_not_null_test>(expr)
A test to verify that the expression does not evaluate to NULL.
• <materialize> (query fragment)
Subquery materialization is used.
• `materialized-subquery`.col_name
A reference to the column col_name in an internal temporary table materialized to hold the result from evaluating a subquery.
• <primary_index_lookup>(query fragment)
The query fragment is processed using a primary key lookup to find qualifying rows.
• <ref_null_helper>(expr)
This is an internal optimizer object with no user significance.
• /* select#N */ select_stmt
The SELECT is associated with the row in non-extended EXPLAIN output that has an id value of N
• outer_tables semi join (inner_tables)
A semijoin operation. inner_tables shows the tables that were not pulled out.
• <temporary table>
This represents an internal temporary table created to cache an intermediate result.
When some tables are of const or system type, expressions involving columns from these tables are evaluated early by the optimizer and are not part of the displayed statement. However, with FORMAT=JSON, some const table accesses are displayed as a ref access that uses a const value.
4 Obtaining Execution Plan Information for a Named Connection
To obtain the execution plan for an explainable【可解釋的;可說明的;】 statement executing in a named connection, use this statement:
EXPLAIN [options] FOR CONNECTION connection_id;
EXPLAIN FOR CONNECTION returns the EXPLAIN information that is currently being used to execute a query in a given connection. Because of changes to data (and supporting statistics) it may produce a different result from running EXPLAIN on the equivalent【ɪˈkwɪvələnt (價值、數量、意義、重要性等)相同的;相等的;】 query text. This difference in behavior【bɪ'heɪvjər 行為;性能;表現;(生物的)習性;】 can be useful in diagnosing【ˌdaɪəɡˈnoʊsɪŋ 診斷(疾病);判斷(問題的原因);】 more transient【ˈtrænʃnt 短暫的;臨時的;轉瞬即逝的;暫住的;過往的;倏忽;】 performance problems. For example, if you are running a statement in one session that is taking a long time to complete, using EXPLAIN FOR CONNECTION in another session may yield【jiːld 產生(收益、效益等);屈服;提供;(受壓)活動,變形,彎曲,折斷;讓步;放棄;出產(作物);給(大路上的車輛)讓路;繳出;】 useful information about the cause of the delay.
connection_id is the connection identifier, as obtained from the INFORMATION_SCHEMA PROCESSLIST table or the SHOW PROCESSLIST statement. If you have the PROCESS privilege, you can specify the identifier for any connection. Otherwise, you can specify the identifier only for your own connections. In all cases, you must have sufficient privileges to explain the query on the specified connection.
If the named connection is not executing a statement, the result is empty. Otherwise, EXPLAIN FOR CONNECTION applies only if the statement being executed in the named connection is explainable. This includes SELECT, DELETE, INSERT, REPLACE, and UPDATE. (However, EXPLAIN FOR CONNECTION does not work for prepared statements, even prepared statements of those types.)
If the named connection is executing an explainable statement, the output is what you would obtain by using EXPLAIN on the statement itself.
If the named connection is executing a statement that is not explainable, an error occurs. For example, you cannot name the connection identifier for your current session because EXPLAIN is not explainable:
mysql> SELECT CONNECTION_ID(); +-----------------+ | CONNECTION_ID() | +-----------------+ | 373 | +-----------------+ 1 row in set (0.00 sec) mysql> EXPLAIN FOR CONNECTION 373; ERROR 1889 (HY000): EXPLAIN FOR CONNECTION command is supported only for SELECT/UPDATE/INSERT/DELETE/REPLACE
The Com_explain_other status variable indicates the number of EXPLAIN FOR CONNECTION statements executed.
5 Estimating Query Performance
In most cases, you can estimate query performance by counting disk seeks. For small tables, you can usually find a row in one disk seek (because the index is probably cached). For bigger tables, you can estimate that, using B-tree indexes, you need this many seeks to find a row: log(row_count) / log(index_block_length / 3 * 2 / (index_length + data_pointer_length)) + 1.
In MySQL, an index block is usually 1,024 bytes and the data pointer is usually four bytes. For a 500,000-row table with a key value length of three bytes (the size of MEDIUMINT), the formula indicates log(500,000)/log(1024/3*2/(3+4)) + 1 = 4 seeks.
This index would require storage of about 500,000 * 7 * 3/2 = 5.2MB (assuming a typical index buffer fill ratio of 2/3), so you probably have much of the index in memory and so need only one or two calls to read data to find the row.
For writes, however, you need four seek requests to find where to place a new index value and normally two seeks to update the index and write the row.
The preceding discussion does not mean that your application performance slowly degenerates【dɪˈdʒenəreɪts 惡化;衰退;蛻變;】 by log N. As long as everything is cached by the OS or the MySQL server, things become only marginally【ˈmɑːrdʒɪnəli 輕微地;很少地;微不足道地;】 slower as the table gets bigger. After the data gets too big to be cached, things start to go much slower until your applications are bound only by disk seeks (which increase by log N). To avoid this, increase the key cache size as the data grows. For MyISAM tables, the key cache size is controlled by the key_buffer_size system variable.