統計一個表的數據量是經常遇到的需求,但是不同的表設計及不同的寫法,統計性能差別會有較大的差異,下麵就簡單通過實驗進行測試(大家測試的時候註意緩存的情況,否則影響測試結果)。 1、 準備工作 為了後續測試工作的進行,先準備幾張用於測試的表及數據,為了使測試數據具有參考意義,建議測試表的數據量大一點,以 ...
統計一個表的數據量是經常遇到的需求,但是不同的表設計及不同的寫法,統計性能差別會有較大的差異,下麵就簡單通過實驗進行測試(大家測試的時候註意緩存的情況,否則影響測試結果)。
1、 準備工作
為了後續測試工作的進行,先準備幾張用於測試的表及數據,為了使測試數據具有參考意義,建議測試表的數據量大一點,以免查詢時間太小,因此,可以繼續使用之前常用的連續數生成大法,如下:
/* 創建連續數表 */ CREATE TABLE nums(id INT primary key); /* 生成連續數的存儲過程,優化過後的 */ DELIMITER $$ CREATE PROCEDURE `sp_createNum`(cnt INT ) BEGIN DECLARE i INT DEFAULT 1; TRUNCATE TABLE nums; INSERT INTO nums SELECT i; WHILE i < cnt DO BEGIN INSERT INTO nums SELECT id + i FROM nums WHERE id + i<=cnt; SET i = i*2; END; END WHILE; END$$ DELIMITER ;
生成數據,本次準備生成1kw條記錄
/* 調用存儲過程 */ mysql> call sp_createNum(10000000); Query OK, 1611392 rows affected (32.07 sec)
如果逐條迴圈,那時間相當長,大家可以自行測試,參考鏈接 效率提升16800倍的連續整數生成方法
1.1 創建innodb表
生成3張表innodb表,如下:
nums_1表只有字元串主鍵欄位
/* 生成只有一個字元串類型欄位主鍵的表nums_1 */ mysql> create table nums_1 (p1 varchar(32) primary key ) engine=innodb; Query OK, 0 rows affected (0.01 sec) /* 導入數據,將id通過md5函數轉換為字元串 */ mysql> insert into nums_1 select md5(id) from nums; Query OK, 10000000 rows affected (1 min 12.63 sec) Records: 10000000 Duplicates: 0 Warnings: 0
nums_2表有5個欄位 ,其中主鍵為字元串類型欄位的p1,其他欄位為整型的id,非空的c1,可為空的c2,可為空的c3。
其中c1,c2欄位內容完全一致,差別是欄位約束不一樣(c1不可為空,c2可為空),c3與c1,c2的差別在於c1中aa開頭的值在c3中為null,其他內容一樣。
/* 創建表nums_2 */ mysql> create table nums_2(p1 varchar(32) primary key ,id int ,c1 varchar(10) not null, c2 varchar(10),c3 varchar(10)) engine=innodb; Query OK, 0 rows affected (1.03 sec) /*導入數據 */ mysql> insert into nums_2(id,p1,c1,c2,c3) select id,md5(id),left(md5(id),10),left(md5(id),10),if(,left(md5(id),10) like 'aa%',null,,left(md5(id),10)) from nums; Query OK, 10000000 rows affected (5 min 6.68 sec) Records: 10000000 Duplicates: 0 Warnings: 0
nums_3表的內容與nums_2完全一樣,區別在於主鍵欄位不一樣,c3表為整型的id
/* 創建表nums_3 */ mysql> create table nums_3(p1 varchar(32) ,id int primary key ,c1 varchar(10) not null, c2 varchar(10),c3 varchar(10)) engine=innodb; Query OK, 0 rows affected (0.01 sec) /* 因為內容完全一致,直接從nums_2 中導入 */ mysql> insert into nums_3 select * from nums_2; Query OK, 10000000 rows affected (3 min 18.81 sec) Records: 10000000 Duplicates: 0 Warnings: 0
1.2 創建MyISAM引擎表
再創建一張MyISAM的表,表結構及內容均與nums_2也一致,只是引擎為MyISAM。
/* 創建MyISAM引擎的nums_4表*/ mysql> create table nums_4(p1 varchar(32) not null primary key ,id int ,c1 varchar(10) not null, c2 varchar(10),c3 varchar(10)) engine=MyISAM; Query OK, 0 rows affected (0.00 sec) /* 直接從nums_2表導入數據 */ mysql> insert into nums_4 select * from nums_2; Query OK, 10000000 rows affected (3 min 16.78 sec) Records: 10000000 Duplicates: 0 Warnings: 0
2、 查詢一張表數據量的方法
查詢一張表的數據量有如下幾種:
查詢大致數據量,可以查統計信息,2.1中會介紹具體方法
精確查找數據量,則可以通過count(主鍵欄位),count(*), count(1) [這裡的1可以替換為任意常量]
2.1 非精確查詢
如果只是查一張表大致有多少數據,尤其是很大的表 只是查詢其表屬於什麼量級的(百萬、千萬還是上億條),可以直接查詢統計信息,查詢方式有如下幾種:
查詢索引信息,其中Cardinality 為大致數據量(查看主鍵PRIMARY行的值,如果為多列的複合主鍵,則查看最後一列的Cardinality 值)
mysql> show index from nums_2; +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | nums_2 | 0 | PRIMARY | 1 | p1 | A | 9936693 | NULL | NULL | | BTREE | | | +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 1 row in set (0.00 sec)
查看表狀態,其中Rows為大致數據量
mysql> show table status like 'nums_2'; +--------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+ | Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment | +--------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+ | nums_2 | InnoDB | 10 | Dynamic | 9936693 | 111 | 1105182720 | 0 | 2250178560 | 4194304 | NULL | 2020-04-04 19:31:34 | NULL | NULL | utf8_general_ci | NULL | | | +--------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+ 1 row in set (0.00 sec)
直接查看STATISTICS或TABLES表,內容與查看索引信息或表狀態類似,其中TABLE_ROWS的內容為大致的數據量
mysql> select * from information_schema.tables where table_schema='testdb' and table_name like 'nums_2'; +---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+ | TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | ENGINE | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME | UPDATE_TIME | CHECK_TIME | TABLE_COLLATION | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT | +---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+ | def | testdb | nums_2 | BASE TABLE | InnoDB | 10 | Dynamic | 9936693 | 111 | 1105182720 | 0 | 2250178560 | 4194304 | NULL | 2020-04-04 19:31:34 | NULL | NULL | utf8_general_ci | NULL | | | +---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+ 1 row in set (0.00 sec)
註意:
- innodb引起的表通過以上3種方式均可查詢對應表的大致數據量,且結果相同,因為均是取自相同的統計信息
- MyISAM表的結果是精確值(表數據量,不包含其他欄位)
mysql> select * from information_schema.tables where table_schema='testdb' and table_name like 'nums_4'; +---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+----------------+---------------+ | TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | ENGINE | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME | UPDATE_TIME | CHECK_TIME | TABLE_COLLATION | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT | +---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+----------------+---------------+ | def | testdb | nums_4 | BASE TABLE | MyISAM | 10 | Dynamic | 10000000 | 75 | 759686336 | 281474976710655 | 854995968 | 0 | NULL | 2020-04-04 19:20:23 | 2020-04-04 19:21:45 | 2020-04-04 19:23:45 | utf8_general_ci | NULL | | | +---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+----------------+---------------+ 1 row in set (0.00 sec)
2.2 精確查找
因為2.1中innodb的表查詢的結果都是統計值,非準備值,實際工作中大多數情況下需要統計精確值,那麼查詢精確值的方法有如下幾種,且所有引擎的表都適用。
count(主鍵)
mysql> select count(p1) from nums_2; +-----------+ | count(p1) | +-----------+ | 10000000 | +-----------+ 1 row in set (1.60 sec)
count(1)
其中的1可以是任意常量,例如 count(2),count('a‘)等
mysql> select count(1) from nums_2; +----------+ | count(1) | +----------+ | 10000000 | +----------+ 1 row in set (1.45 sec)
count(*)
mysql> select count(*) from nums_2; +----------+ | count(*) | +----------+ | 10000000 | +----------+ 1 row in set (1.52 sec)
3、 count的性能對比
對比 count(主鍵) count(1) count(*) count(非空欄位) count(可為空欄位) 性能對比
3.1 MyISAM引擎表
3.1.1 查詢整張表數據量
如果想精確查詢一張MyISAM表的數據量,使用 count(主鍵) count(1) count(*) 效率均一致,直接查出準確結果,耗時幾乎為0s
mysql> select count(p1) from nums_4; +-----------+ | count(p1) | +-----------+ | 10000000 | +-----------+ 1 row in set (0.00 sec) mysql> select count(1) from nums_4; +----------+ | count(1) | +----------+ | 10000000 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from nums_4; +----------+ | count(*) | +----------+ | 10000000 | +----------+ 1 row in set (0.00 sec)
執行計劃也均一致,可以看出沒有通過主鍵或其他索引掃描的方式統計
mysql> explain select count(*) from nums_4; +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+ | 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> explain select count(p1) from nums_4; +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+ | 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> explain select count(1) from nums_4; +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+ | 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+ 1 row in set, 1 warning (0.00 sec)
小結:
MyISAM的方法查整表數據量效率情況為 count(主鍵)= count(1) = count(*)
3.1.2 查詢部分數據
查詢部分數據的時候則無法直接從統計信息獲取,因此耗時情況大致如下:
mysql> select count(p1) from nums_4 where p1 like 'aa%'; +-----------+ | count(p1) | +-----------+ | 39208 | +-----------+ 1 row in set (0.14 sec) mysql> select count(1) from nums_4 where p1 like 'aa%'; +----------+ | count(1) | +----------+ | 39208 | +----------+ 1 row in set (0.13 sec) mysql> select count(*) from nums_4 where p1 like 'aa%'; +----------+ | count(*) | +----------+ | 39208 | +----------+ 1 row in set (0.13 sec)
執行計劃其實均一樣:
mysql> explain select count(1) from nums_4 where p1 like 'aa%'; +----+-------------+--------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+--------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------+ | 1 | SIMPLE | nums_4 | NULL | range | PRIMARY | PRIMARY | 98 | NULL | 42603 | 100.00 | Using where; Using index | +----+-------------+--------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------+ 1 row in set, 1 warning (0.00 sec)
小結: MyISAM引擎表統計部分數據的時候直接得出數據量,也許掃描數據進行統計,幾種寫法效率相近。
3.2 innodb引擎表
innodb引擎因為要支持MVCC,因此不能整表數據量持久化保存,每次查詢均需遍歷統計,但是不同的寫法,查詢效率是有差別的,後面將進行不同維度進行對比。
3.2.1 不同寫法的性能對比
通過 count(主鍵),count(1) , count(*) 對比查詢效率
mysql> select count(p1) from nums_2 ; +-----------+ | count(p1) | +-----------+ | 10000000 | +-----------+ 1 row in set (1.68 sec) mysql> select count(1) from nums_2 ; +----------+ | count(1) | +----------+ | 10000000 | +----------+ 1 row in set (1.37 sec) mysql> select count(*) from nums_2 ; +----------+ | count(*) | +----------+ | 10000000 | +----------+ 1 row in set (1.38 sec)
簡單的對比發現,查詢性能結果為 count(主鍵) < count(1) ≈ count(*)
但是查看執行計劃都是如下情況
mysql> explain select count(p1) from nums_2; +----+-------------+--------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+--------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+ | 1 | SIMPLE | nums_2 | NULL