在之前的這篇文章Oracle/MySQL/PG/SQL Server關係資料庫中NULL與空字元串的區別[1]中,簡單對比、介紹了一下MySQL的NULL/空格/尾部空格處理方式,主要對比了NULL與長度為空的字元串,其實很多地方沒有分析到位就一筆帶過了。這篇文章重新來細說一下MySQL的尾部空格處 ...
在之前的這篇文章Oracle/MySQL/PG/SQL Server關係資料庫中NULL與空字元串的區別[1]中,簡單對比、介紹了一下MySQL的NULL/空格/尾部空格處理方式,主要對比了NULL與長度為空的字元串,其實很多地方沒有分析到位就一筆帶過了。這篇文章重新來細說一下MySQL的尾部空格處理方式。在MySQL中,有幾個因素會影響MySQL如何處理空格。這裡簡單淺析一下.
MySQL的排序規則有一個屬性Pad Attributes屬性,這個屬性的設置會影響資料庫如何處理尾部空格(是否忽略尾部空格),如下官方文檔[2]描述
Collation Pad Attributes
Collations based on UCA 9.0.0 and higher are faster than collations based on UCA versions prior to 9.0.0. They also have a pad attribute of NO PAD, in contrast to PAD SPACE as used in collations based on UCA versions prior to 9.0.0. For comparison of nonbinary strings, NO PAD collations treat spaces at the end of strings like any other character (see Trailing Space Handling in Comparisons).
Comparison of nonbinary string values (CHAR, VARCHAR, and TEXT) that have a NO PAD collation differ from other collations with respect to trailing spaces. For example, 'a' and 'a ' compare as different strings, not the same string. This can be seen using the binary collations for utf8mb4. The pad attribute for utf8mb4_bin is PAD SPACE, whereas for utf8mb4_0900_bin it is NO PAD. Consequently, operations involving utf8mb4_0900_bin do not add trailing spaces, and comparisons involving strings with trailing spaces may differ for the two collations
官方文檔[3],關於比較中尾部空格處理介紹如下:
Trailing Space Handling in Comparisons
MySQL collations have a pad attribute, which has a value of PAD SPACE or NO PAD:
• Most MySQL collations have a pad attribute of PAD SPACE.
• The Unicode collations based on UCA 9.0.0 and higher have a pad attribute of NO PAD; see Section 10.10.1, “Unicode Character Sets”.
For nonbinary strings (CHAR, VARCHAR, and TEXT values), the string collation pad attribute determines treatment in comparisons of trailing spaces at the end of strings:
• For PAD SPACE collations, trailing spaces are insignificant in comparisons; strings are compared without regard to trailing spaces.
• NO PAD collations treat trailing spaces as significant in comparisons, like any other character.
The differing behaviors can be demonstrated using the two utf8mb4 binary collations, one of which is PAD SPACE, the other of which is NO PAD.
我們先來看看PAD SPACE與NO PAD的具體意義:
PAD SPACE:在排序和比較運算中,忽略字元串尾部空格。 NO PAD:在排序和比較運算中,字元串尾部空格當成普通字元,不能忽略。
SELECT *
FROM INFORMATION_SCHEMA.COLLATIONS
WHERE CHARACTER_SET_NAME = 'utf8mb4';
mysql> select @@version;
+-----------+
| @@version |
+-----------+
| 8.0.33 |
+-----------+
1 row in set (0.00 sec)
mysql> SELECT *
-> FROM INFORMATION_SCHEMA.COLLATIONS
-> WHERE CHARACTER_SET_NAME = 'utf8mb4'
-> AND COLLATION_NAME IN('utf8mb4_general_ci','utf8mb4_0900_ai_ci');
+--------------------+--------------------+-----+------------+-------------+---------+---------------+
| COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT | IS_COMPILED | SORTLEN | PAD_ATTRIBUTE |
+--------------------+--------------------+-----+------------+-------------+---------+---------------+
| utf8mb4_0900_ai_ci | utf8mb4 | 255 | Yes | Yes | 0 | NO PAD |
| utf8mb4_general_ci | utf8mb4 | 45 | | Yes | 1 | PAD SPACE |
+--------------------+--------------------+-----+------------+-------------+---------+---------------+
2 rows in set (0.00 sec)
mysql>
下麵我們來測試演練一下,首先新建一個測試資料庫gsp
CREATE DATABASE IF NOT EXISTS gsp
DEFAULT CHARACTER SET utf8mb4
DEFAULT COLLATE utf8mb4_general_ci;
mysql> create table test(id int not null, name varchar(16));
Query OK, 0 rows affected (0.01 sec)
mysql> insert into test(id , name) values(1, null);
Query OK, 1 row affected (0.00 sec)
mysql> insert into test(id , name) values(2, '');
Query OK, 1 row affected (0.01 sec)
mysql> insert into test(id , name) values(3, ' '); -- 包含一個空格
Query OK, 1 row affected (0.00 sec)
mysql> insert into test(id , name) values(4, ' ');-- 包含兩個空格
Query OK, 1 row affected (0.00 sec)
mysql>
#length獲取字元串位元組長度的內置函數,
#char_length獲取字元串長度的內置函數
#hex將字元或數字轉換為十六進位格式
mysql> select id, length(name), char_length(name), hex(name) from test;
+----+--------------+-------------------+-----------+
| id | length(name) | char_length(name) | hex(name) |
+----+--------------+-------------------+-----------+
| 1 | NULL | NULL | NULL |
| 2 | 0 | 0 | |
| 3 | 1 | 1 | 20 |
| 4 | 2 | 2 | 2020 |
+----+--------------+-------------------+-----------+
4 rows in set (0.00 sec)
mysql> select * from test where name='';
+----+------+
| id | name |
+----+------+
| 2 | |
| 3 | |
| 4 | |
+----+------+
3 rows in set (0.00 sec)
mysql>
mysql> select * from test where name=' '; -- 包含一個空格
+----+------+
| id | name |
+----+------+
| 2 | |
| 3 | |
| 4 | |
+----+------+
3 rows in set (0.00 sec)
mysql> select * from test where name=' '; -- 包含兩個空格
+----+------+
| id | name |
+----+------+
| 2 | |
| 3 | |
| 4 | |
+----+------+
3 rows in set (0.00 sec)
mysql>
官方文檔中也要一個例子簡單說明,兩者比較時,如何處理尾部空格。如下所示,相當直觀、明瞭:
mysql> SET NAMES utf8mb4 COLLATE utf8mb4_bin;
mysql> SELECT 'a ' = 'a';
+------------+
| 'a ' = 'a' |
+------------+
| 1 |
+------------+
mysql> SET NAMES utf8mb4 COLLATE utf8mb4_0900_bin;
mysql> SELECT 'a ' = 'a';
+------------+
| 'a ' = 'a' |
+------------+
| 0 |
+------------+
我們上面的例子中,gsp資料庫的排序規則為utf8mb4_general_ci,它的Pad Attributes屬性為PAD SPACE,在排序和比較運算中,資料庫將忽略字元串尾部空格,所以上面幾個SQL的查詢結果是一樣的。如果你資料庫的排序規則選擇utf8mb4_0900_ai_ci,那麼Pad Attributes屬性為NO PAD,此時,在排序和比較運算中,字元串尾部空格當成普通字元處理,不能忽略.我們知道MySQL的排序規則分為伺服器級別,資料庫級別、表級別、列級別、SQL語句級別。優先順序順序為:SQL語句級別>列級別>表級別>資料庫級別>伺服器級別。你可以重新建一個資料庫,設置資料庫排序規則為utf8mb4_0900_ai_ci,也可以修改其他級別的排序規則...。這裡我們就直接修改表的列級別排序規則。腳本如下所示:
drop table test;
create table test(id int not null, name varchar(16) collate utf8mb4_0900_ai_ci );
insert into test(id , name) values(1, null);
insert into test(id , name) values(2, '');
insert into test(id , name) values(3, ' '); -- 包含一個空格
insert into test(id , name) values(4, ' ');-- 包含兩個空格
然後,我們測試一下,如下所示,此時由於資料庫將字元串尾部空格當成普通字元,所以此時的查詢結果就不同了。如下所示
mysql> select * from test where name=''; -- 長度為零的空字元串
+----+------+
| id | name |
+----+------+
| 2 | |
+----+------+
1 row in set (0.00 sec)
mysql> select * from test where name=' '; -- 含一個空格
+----+------+
| id | name |
+----+------+
| 3 | |
+----+------+
1 row in set (0.00 sec)
mysql> select * from test where name=' ';-- 包含兩個空格
+----+------+
| id | name |
+----+------+
| 4 | |
+----+------+
1 row in set (0.00 sec)
mysql>
另外,我們這裡測試的是VARCHAR類型,如果欄位類型為CHAR呢?其實呢,對於CHAR類型和VARCHA類型,它們的存儲略有區別:
CHAR(N):當插入的字元數小於N,它會在字元串的右邊補充空格,直到總字元數達到N再進行存儲;當查詢返回數據時預設會將字元串尾部的空格去掉,除非SQL_MODE設置PAD_CHAR_TO_FULL_LENGTH。
VARCHAR(N):當插入的字元數小於N,它不會在字元串的右邊補充空格,insert內容原封不動的進行存儲;如果原本字元串右邊有空格,在存儲和查詢返回時都會保留空格
The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed unless the PAD_CHAR_TO_FULL_LENGTH SQL mode is enabled.
PAD_CHAR_TO_FULL_LENGTH
By default, trailing spaces are trimmed from CHAR column values on retrieval. If PAD_CHAR_TO_FULL_LENGTH is enabled, trimming does not occur and retrieved CHAR values are padded to their full length. This mode does not apply to VARCHAR columns, for which trailing spaces are retained on retrieval.
Note: As of MySQL 8.0.13, PAD_CHAR_TO_FULL_LENGTH is deprecated. Expect it to be removed in a future version of MySQL.
註意事項:
PAD_CHAR_TO_FULL_LENGTH隻影響CHAR類型,不影響VARCHAR類型。 MySQL 8.0.13後,PAD_CHAR_TO_FULL_LENGTH參數過時/廢棄了。這個參數可能在後續的MySQL版本中被移除。
當前版本(MySQL 8.0.33)中,暫時還可以在SQL_MODE中設置這個參數,不過預設不會設置此參數。那麼我們來測試驗證一下:
drop table test;
create table test(id int not null, name char(16) );
insert into test(id , name) values(1, null);
insert into test(id , name) values(2, '');
insert into test(id , name) values(3, ' '); -- 包含一個空格
insert into test(id , name) values(4, ' ');-- 包含兩個空格
mysql> show variables like 'sql_mode'\G
*************************** 1. row ***************************
Variable_name: sql_mode
Value: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
1 row in set (0.00 sec)
mysql> --如下所示,SQL_MODE沒有設置PAD_CHAR_TO_FULL_LENGTH時,查詢返回數據時預設會將字元串尾部的空格去掉,所以你看到長度為零
mysql> select id, length(name), char_length(name),hex(name) from test;
+----+--------------+-------------------+-----------+
| id | length(name) | char_length(name) | hex(name) |
+----+--------------+-------------------+-----------+
| 1 | NULL | NULL | NULL |
| 2 | 0 | 0 | |
| 3 | 0 | 0 | |
| 4 | 0 | 0 | |
+----+--------------+-------------------+-----------+
4 rows in set (0.00 sec)
mysql>
那麼我們手工設置一下當前會話的SQL_MODE,然後對比測試一下:
mysql> set sql_mode='PAD_CHAR_TO_FULL_LENGTH';
Query OK, 0 rows affected, 1 warning (0.01 sec)
mysql> show variables like 'sql_mode';
+---------------+-------------------------+
| Variable_name | Value |
+---------------+-------------------------+
| sql_mode | PAD_CHAR_TO_FULL_LENGTH |
+---------------+-------------------------+
1 row in set (0.01 sec)
mysql> select id, length(name), char_length(name),hex(name) from test;
+----+--------------+-------------------+----------------------------------+
| id | length(name) | char_length(name) | hex(name) |
+----+--------------+-------------------+----------------------------------+
| 1 | NULL | NULL | NULL |
| 2 | 16 | 16 | 20202020202020202020202020202020 |
| 3 | 16 | 16 | 20202020202020202020202020202020 |
| 4 | 16 | 16 | 20202020202020202020202020202020 |
+----+--------------+-------------------+----------------------------------+
4 rows in set (0.00 sec)
mysql>
通過上面的分析講述,我們知道當資料庫的排序規則的Pad Attributes屬性為NO PAD時,此時SQL_MODE的PAD_CHAR_TO_FULL_LENGTH設置與否將會影響查詢結果。我們新建一個gsp2資料庫,資料庫排序規則為utf8mb4_0900_ai_ci,下麵我們通過實驗對比一下就知道了: