COUNT全表記錄 在MySQL中,相同的SQL不同的存儲引擎執行計劃不同: 現有測試表TB101: 對於沒有WHERE條件的COUNT(*)/COUNT(1)/COUNT(ID)/COUNT(C1)的執行計劃為: 對於沒有WHERE條件的COUNT(C2)的執行計劃為: 可以發現,對於MyISAM ...
COUNT全表記錄
在MySQL中,相同的SQL不同的存儲引擎執行計劃不同:
對於MyISAM引擎,由於使用表鎖進行併發控制,同一時間點多個併發線程執行相同查詢獲得的結果相同,且MyISAM存儲引擎專門存儲表總記錄數,因此使用COUNT(*)查詢全表記錄時能直接返回。
而對於InnoDB存儲引擎,由於使用MVCC和行鎖進行併發控制,同一時間點多個併發線程執行相同查詢獲得的結果存在差異(每個回話的READVIEW不同),且沒有專門存儲表總記錄數,因此每次查詢都需要掃描全表或掃描某個索引的全部記錄。
For transactional storage engines such as InnoDB, storing an exact row count is problematic. Multiple transactions may be occurring at the same time, each of which may affect the count.
InnoDB does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. Consequently, SELECT COUNT(*) statements only count rows visible to the current transaction.
Prior to MySQL 5.7.18, InnoDB processes SELECT COUNT(*) statements by scanning the clustered index. As of MySQL 5.7.18, InnoDB processes SELECT COUNT(*) statements by traversing the smallest available secondary index unless an index or optimizer hint directs the optimizer to use a different index. If a secondary index is not present, the clustered index is scanned.
Processing SELECT COUNT(*) statements takes some time if index records are not entirely in the buffer pool. For a faster count, create a counter table and let your application update it according to the inserts and deletes it does. However, this method may not scale well in situations where thousands of concurrent transactions are initiating updates to the same counter table. If an approximate row count is sufficient, use SHOW TABLE STATUS.
InnoDB handles SELECT COUNT(*) and SELECT COUNT(1) operations in the same way. There is no performance difference.
For MyISAM tables, COUNT(*) is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause.
This optimization only applies to MyISAM tables, because an exact row count is stored for this storage engine and can be accessed very quickly. COUNT(1) is only subject to the same optimization if the first column is defined as NOT NULL。
現有測試表TB101:
CREATE TABLE `tb101` ( `ID` int(11) NOT NULL AUTO_INCREMENT, `C1` int(11) NOT NULL, `C2` int(11) DEFAULT NULL, PRIMARY KEY (`ID`) ) ENGINE=MyISAM AUTO_INCREMENT=140001 DEFAULT CHARSET=utf8
對於沒有WHERE條件的COUNT(*)/COUNT(1)/COUNT(ID)/COUNT(C1)的執行計劃為:
mysql> EXPLAIN SELECT COUNT(*) FROM TB101 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: NULL
partitions: NULL
type: NULL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: NULL
filtered: NULL
Extra: Select tables optimized away
1 row in set, 1 warning (0.00 sec)
對於沒有WHERE條件的COUNT(C2)的執行計劃為:
mysql> EXPLAIN SELECT COUNT(C2) FROM TB101 \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: TB101 partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 140000 filtered: 100.00 Extra: NULL 1 row in set, 1 warning (0.00 sec)
可以發現,對於MyISAM存儲引擎,在沒有WHERE條件下,如果C1列為NOT NULL,則可以將COUNT(C1)與COUNT(*)和COUNT(1)做相同的處理。
針對上面的測試,對於InnoDB存儲引擎,在沒有WHERE條件下:
1、ID列為NOT NULL主鍵,COUNT(ID)和COUNT(1)或COUNT(*)的執行計劃相同,返回結果相同。 2、C1列為NOT NULL,COUNT(C1)和COUNT(1)或COUNT(*)的執行結果相同,但執行計劃不同。
COUNT(expr)異同
1、COUNT(1)和COUNT(*)等價,兩者在執行計劃和執行效率上完全相同。
個人推薦使用COUNT(1)替換COUNT(*),原因是簡單直觀, 他人推薦使用COUNT(1),原因是符合SQL92標準,阿裡巴巴Java開發手冊推薦。
2、COUNT(*)和COUNT(C1)不一定等價,兩者執行計劃和執行結果會存在差異。
COUNT(*):執行返回滿足WHERE條件的行數,不考慮NULL值問題
COUNT(C1): 執行返回滿足WHERE條件且C1不等於NULL的行數,不統計C1等於NULL的行。
換種理解思路:
對於MyISAM引擎表和InnoDB引擎表,無論是顯式主鍵還是因此ROWID,都要求非空唯一,每行記錄都肯定存在一個不為NULL的列(列組),因此計算COUNT(*)時不需要考慮NULL值問題。
一個有趣的擴展,如果C1為NOT NULL,那麼COUNT(C1)與COUNT(1)的返回結果相同,那麼MySQL會對此進行優化麽?
現有測試表結果如下:
CREATE TABLE `tb01` ( `ID` int(11) NOT NULL AUTO_INCREMENT, `C1` int(11) NOT NULL, `C2` int(11) NOT NULL, PRIMARY KEY (`ID`), KEY `IDX_C2` (`C2`) ) ENGINE=InnoDB AUTO_INCREMENT=16384 DEFAULT CHARSET=utf8
查看COUNT(*)和COUNT(C1)的執行計劃:
mysql> EXPLAIN SELECT COUNT(*) FROM TB01 WHERE C2<100 \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: TB01 partitions: NULL type: range possible_keys: IDX_C2 key: IDX_C2 key_len: 4 ref: NULL rows: 99 filtered: 100.00 Extra: Using where; Using index 1 row in set, 1 warning (0.00 sec) mysql> EXPLAIN SELECT COUNT(C1) FROM TB01 WHERE C2<100 \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: TB01 partitions: NULL type: range possible_keys: IDX_C2 key: IDX_C2 key_len: 4 ref: NULL rows: 99 filtered: 100.00 Extra: Using index condition 1 row in set, 1 warning (0.00 sec)
從上面執行計劃可以發現,在處理COUNT(*)時,僅需要使用IDX_C2即可完成查詢,因此Extra為Using index,而在處理COUNT(C1)時,需要使用IDX_C2進行過濾後再執行回表查詢,因此Extra為Using index condition。
針對上面的測試,MyISAM存儲引擎和InnoDB存儲引擎的測試結果相同。
COUNT(DISTINC ...)操作
MySQL官網解釋為:
COUNT(DISTINCT expr,[expr...]) Returns a count of the number of rows with different non-NULL expr values. In MySQL, you can obtain the number of distinct expression combinations that do not contain NULL by giving a list of expressions. In standard SQL, you would have to do a concatenation of all expressions inside COUNT(DISTINCT ...).
在MySQL中允許執行:
SELECT COUNT(DISTINCT ID,C1) FROM TB02;
但不允許執行:
SELECT COUNT(ID,C1) FROM TB02;
總結
1、對於InnoDB和MyISAM存儲引擎,COUNT(1)和COUNT(*)在任何場景下都等價,執行性能和執行計劃相同。 2、在查詢全表記錄(沒有WHERE條件)時,對於MyISAM存儲引擎,存儲引擎存儲表總記錄數,無需掃描數據因此查詢可以很快返回,對於InnoDB存儲引擎,需要掃描全表或某個索引的全部記錄因此查詢可能比較耗時。 3、對於MyISAM存儲引擎,在沒有WHERE條件情況下,如果列C1為NOT NULL,那麼COUNT(C1)和COUNT(*)執行操作相同。 4、對於InnoDB存儲引擎,如果列C1為主鍵,那麼COUNT(C1)和COUNT(*)執行計劃和執行效率相同,如果C1為NOT NULL,那麼COUNT(C1)和COUNT(*)執行計劃和執行效率不一定相同,只有在查詢使用C1列上索引時才可能相同。
參考鏈接:
https://dev.mysql.com/doc/refman/8.0/en/group-by-functions.html#function_count
https://dev.mysql.com/doc/refman/5.7/en/create-index.html
https://mp.weixin.qq.com/s/IOHvtel2KLNi-Ol4UBivbQ