背景 最近有個同事對字元串加索引,加完後,發現多了個奇奇怪怪的數字執行的SQL如下: alter table string_index_test add index `idx_name` (`name`) USING BTREE; 這個奇怪數字就是191,它很是疑惑,也沒指定索引的長度通過查看MyS ...
背景
最近有個同事對字元串加索引,加完後,發現多了個奇奇怪怪的數字
執行的SQL如下:
alter table string_index_test add index `idx_name` (`name`) USING BTREE;
這個奇怪數字就是191,它很是疑惑,也沒指定索引的長度
通過查看MySQL官方文檔
InnoDB has a maximum index length of 767 bytes for tables that use COMPACT or REDUNDANT row format, so for utf8mb3 or utf8mb4 columns, you can index a maximum of 255 or 191 characters, respectively. If you currently have utf8mb3 columns with indexes longer than 191 characters, you must index a smaller number of characters.
In an InnoDB table that uses COMPACT or REDUNDANT row format, these column and index definitions are legal:
col1 VARCHAR(500) CHARACTER SET utf8, INDEX (col1(255))
To use utf8mb4 instead, the index must be smaller:
col1 VARCHAR(500) CHARACTER SET utf8mb4, INDEX (col1(191))
大概意思就是InnoDB最大索引長度為 767 位元組數,用的編碼是utf8mb4,則可以存儲191個字元(767/4 約等於 191),編碼欄位長度超出最大索引長度後MySQL 預設在普通索引追加了191
思考
1、MySQL中如何提高字元串查詢效率?
對字元串加索引?
一般情況下,是不建議在字元串加索引,占空間
如果一定要加,建議可以指定長度,前提是字元串前面部分區分度好的話,此時這類索引就叫首碼索引
2、首碼索引有什麼問題?
區分度不好的話,很容易發生碰撞,進而引發一系列問題
我們再通過執行計劃來分析一波
上面分別演示了首碼索引和普通索引在只有where條件、order by和group by不同執行情況,可以看到Extra的說明,首碼索引只有where條件,無法使用覆蓋索引,order by會使用filesort,group by會使用temporary和filesort
總的來說,首碼索引無法使用覆蓋索引,進而導致order by和group by要使用文件排序,甚至臨時表首碼索引有這麼些問題,不指定長度?怎麼處理?
分析
準備了單表100W的數據進行測試
使用性能壓力測試工具mysqlslap
性能測試腳本
mysqlslap -uroot -p --concurrency=100,200 --iterations=1 --number-of-queries=1 --create-schema=test --query=C:\xxx\query.sql
–concurrency=100,200 測試併發的線程數/客戶端數,第一次100,第二次200
–iterations=1 指定測試重覆次數1次
–number-of-queries=1 指定每個線程執行的 SQL 語句數量上限(不精確)
–create-schema=test 指定查詢的資料庫test
1、不加索引
查詢的SQL:SELECT SQL_NO_CACHE * FROM string_index_test WHERE name=‘forlan’;
Benchmark
Average number of seconds to run all queries: 8.328 seconds
Minimum number of seconds to run all queries: 8.328 seconds
Maximum number of seconds to run all queries: 8.328 seconds
Number of clients running queries: 100
Average number of queries per client: 0
Benchmark
Average number of seconds to run all queries: 18.078 seconds
Minimum number of seconds to run all queries: 18.078 seconds
Maximum number of seconds to run all queries: 18.078 seconds
Number of clients running queries: 200
Average number of queries per client: 0
2、加字元串索引
alter table string_index_test add index idx_name
(name
) USING BTREE;
查詢的SQL:SELECT SQL_NO_CACHE * FROM string_index_test WHERE name=‘forlan’;
Benchmark
Average number of seconds to run all queries: 0.250 seconds
Minimum number of seconds to run all queries: 0.250 seconds
Maximum number of seconds to run all queries: 0.250 seconds
Number of clients running queries: 100
Average number of queries per client: 0
Benchmark
Average number of seconds to run all queries: 1.438 seconds
Minimum number of seconds to run all queries: 1.438 seconds
Maximum number of seconds to run all queries: 1.438 seconds
Number of clients running queries: 200
Average number of queries per client: 0
3、使用CRC32創建索引
CRC全稱為Cyclic Redundancy Check,又叫迴圈冗餘校驗。
CRC32是CRC演算法的一種,返回值的範圍0~2^32-1,使用bigint存儲
加一個name_crc32列,創建這個列的所有,索引空間小很多,利用整型加速查詢
加索引:alter table string_index_test add index idx_nam_crc32
(name_crc32
) USING BTREE;
查詢的SQL:SELECT SQL_NO_CACHE * FROM string_index_test WHERE name_crc32=CRC32(‘forlan’) and name=‘forlan’;因為CRC32存在發生碰撞,所以加上name條件,才能篩選出正確的數據
Benchmark
Average number of seconds to run all queries: 0.266 seconds
Minimum number of seconds to run all queries: 0.266 seconds
Maximum number of seconds to run all queries: 0.266 seconds
Number of clients running queries: 100
Average number of queries per client: 0
Benchmark
Average number of seconds to run all queries: 0.390 seconds
Minimum number of seconds to run all queries: 0.390 seconds
Maximum number of seconds to run all queries: 0.390 seconds
Number of clients running queries: 200
Average number of queries per client: 0
總結
- 通過對字元串加索引,可以提高查詢效率,但需要註意指定長度,無法使用覆蓋索引
- 通過使用CRC32,需要額外存一個欄位,將字元串轉為整數存儲,節省空間,效率提升並不是很大,但存在碰撞問題,可以加多字元串篩選條件
- -對於CRC32存在碰撞問題,可以使用CRC64減少碰撞,但需要安裝 common_schema database函數庫