新聞中心
本文轉(zhuǎn)載自微信公眾號「DBA隨筆」,作者DBA隨筆。轉(zhuǎn)載本文請聯(lián)系DBA隨筆公眾號。

讓客戶滿意是我們工作的目標(biāo),不斷超越客戶的期望值來自于我們對這個行業(yè)的熱愛。我們立志把好的技術(shù)通過有效、簡單的方式提供給客戶,將通過不懈努力成為客戶在信息化領(lǐng)域值得信任、有價值的長期合作伙伴,公司提供的服務(wù)項目有:域名申請、虛擬主機、營銷軟件、網(wǎng)站建設(shè)、棲霞網(wǎng)站維護、網(wǎng)站推廣。
今天在線上遇到了一個MySQL字符比較的問題,感覺很有意思,專門研究了下,估計大家都沒有遇到過,這里跟大家分享一下。
1.背景
背景介紹:
MySQL里面有一張表,根據(jù)where條件匹配查詢某一條記錄的時候,手誤輸入了一個空格,發(fā)現(xiàn)這一條數(shù)據(jù)仍然能查出來,我建了一個測試表,還原如下:
22:57:02> create table t00 (id int primary key,name varchar(10));
Query OK, 0 rows affected (0.01 sec)
22:57:11> insert into t00 values (1,'aaa'),(2,'bbb');
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0
22:57:22> select * from t00 where name='aaa';
+----+------+
| id | name |
+----+------+
| 1 | aaa |
+----+------+
1 row in set (0.00 sec)
22:57:32> select * from t00 where name='aaa ';
+----+------+
| id | name |
+----+------+
| 1 | aaa |
+----+------+
1 row in set (0.00 sec)
插入(1,'aaa')這條記錄,使用where='aaa'和'aaa '這兩個條件去匹配,居然都能夠查到這條記錄。
一開始我懷疑是這個8.0.19版本MySQL實例配置有問題,換了一個5.5低版本的MySQL實例,再次測試,還是復(fù)現(xiàn)這個問題??磥聿皇前姹旧系膯栴},一定是某種配置的問題。
晚上回到家,又用了自己搭建的一個8.0.22版本的MySQL實例重新執(zhí)行上面的命令,竟然驚奇的發(fā)現(xiàn),不復(fù)現(xiàn)了。暈死。8.0.22版本測試的結(jié)果是:
23:35:30>>select * from t0;
+------+------+
| id | name |
+------+------+
| 1 | aaa |
| 2 | bbb |
+------+------+
2 rows in set (0.01 sec)
23:35:34>>select * from t0 where name='aaa';
+------+------+
| id | name |
+------+------+
| 1 | aaa |
+------+------+
1 row in set (0.00 sec)
23:35:46>>select * from t0 where name='aaa ';
Empty set (0.00 sec)
2.分析思路
1)為什么'aaa'和'aaa '一樣?
首先我用命令在MySQL上檢測了一下這兩個字符串在MySQL中是否一樣:
### MySQL實例一
23:39:09> select 'aaa' = 'aaa ';
+------------------+
| 'aaa' = 'aaa ' |
+------------------+
| 1 |
+------------------+
1 row in set (0.00 sec)
### MySQL實例二
23:35:54>>select 'aaa' = 'aaa ';
+------------------+
| 'aaa' = 'aaa ' |
+------------------+
| 0 |
+------------------+
1 row in set (0.00 sec)
從上面的結(jié)果可以看出來,這兩個實例上,關(guān)于字符的比較規(guī)則不一樣。
到這里,可能部分同學(xué)就已經(jīng)知道答案了。不過還是往下再看看。
2)比較規(guī)則哪里不一樣?
我們可以用下面的命令,先看一下utf8相關(guān)的字符集下的比較規(guī)則,如下:
23:45:18> show collation like 'utf8%';
+----------------------------+---------+-----+---------+----------+---------+---------------+
| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute |
+----------------------------+---------+-----+---------+----------+---------+---------------+
| utf8mb4_0900_ai_ci | utf8mb4 | 255 | Yes | Yes | 0 | NO PAD |
| utf8mb4_0900_as_ci | utf8mb4 | 305 | | Yes | 0 | NO PAD |
| utf8mb4_0900_as_cs | utf8mb4 | 278 | | Yes | 0 | NO PAD |
| utf8mb4_0900_bin | utf8mb4 | 309 | | Yes | 1 | NO PAD |
| utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 | PAD SPACE |
| utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 | PAD SPACE |
| utf8mb4_cs_0900_ai_ci | utf8mb4 | 266 | | Yes | 0 | NO PAD |
| utf8mb4_cs_0900_as_cs | utf8mb4 | 289 | | Yes | 0 | NO PAD |
| utf8_unicode_ci | utf8 | 192 | | Yes | 8 | PAD SPACE |
........
| utf8_vietnamese_ci | utf8 | 215 | | Yes | 8 | PAD SPACE |
+----------------------------+---------+-----+---------+----------+---------+---------------+
103 rows in set (0.00 sec)
在最后一列,我們可以看到一個pad屬性,這個屬性里面包含2個值,分別是no pad 和pad space。
3)嘗試去官方文檔中查找這倆屬性的意思
果然,不出意外,找到了一些蛛絲馬跡:
https://dev.mysql.com/doc/refman/8.0/en/char.html
To determine the pad attribute for a collation, use the INFORMATION_SCHEMA COLLATIONS table, which has a PAD_ATTRIBUTE column.
For nonbinary strings (CHAR, VARCHAR, and TEXT values), the string collation pad attribute determines treatment in comparisons of trailing spaces at the end of strings. NO PAD collations treat trailing spaces as significant in comparisons, like any other character. PAD SPACE collations treat trailing spaces as insignificant in comparisons; strings are compared without regard to trailing spaces.
上面這段話描述的意思大概是:
要確定排序規(guī)則的填充屬性,請使用 information_schema.collations 表,該表具有 pad_attribute 列。
對于非二進制字符串(char,varchar和text),字符串的填充屬性決定了比較字符串末尾空格時的處理方式。
NO PAD 排序規(guī)則將尾隨空格視為重要的比較,更加嚴(yán)格,就像任何其他字符一樣;
PAD SPACE 排序規(guī)則在比較中將尾隨空格視為無關(guān)緊要,比較字符串時不考慮尾隨空格,也就是有無空格一個樣。
這里我們就可以根據(jù)實際使用的比較規(guī)則來查看對應(yīng)的pad屬性了:
先看實例一:
### MySQL實例一
00:01:31>show variables like '%colla%';
+-------------------------------+--------------------+
| Variable_name | Value |
+-------------------------------+--------------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8mb4_0900_ai_ci |
| collation_server | utf8mb4_0900_ai_ci |
| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
+-------------------------------+--------------------+
4 rows in set (0.01 sec)
00:01:45>select collation_name,character_set_name,pad_attribute from information_schema.collations where collation_name like 'utf8_gen
eral_ci';
+-----------------+--------------------+---------------+
| collation_name | character_set_name | pad_attribute |
+-----------------+--------------------+---------------+
| utf8_general_ci | utf8 | PAD SPACE |
+-----------------+--------------------+---------------+
1 row in set (0.00 sec)
再來看實例二:
### 實例二
mysql--root@localhost:(none) 23:53:52>>show variables like '%colla%';
+-------------------------------+--------------------+
| Variable_name | Value |
+-------------------------------+--------------------+
| collation_connection | utf8mb4_0900_ai_ci |
| collation_database | utf8mb4_0900_ai_ci |
| collation_server | utf8mb4_0900_ai_ci |
| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
+-------------------------------+--------------------+
4 rows in set (0.00 sec)
00:03:47>>select collation_name,character_set_name,pad_attribute from information_schema.collations where collation_name like 'utf8mb4_0900_ai_ci';
+--------------------+--------------------+---------------+
| collation_name | character_set_name | pad_attribute |
+--------------------+--------------------+---------------+
| utf8mb4_0900_ai_ci | utf8mb4 | NO PAD |
+--------------------+--------------------+---------------+
1 row in set (0.00 sec)
到這里,真相大白。
實例一的連接比較規(guī)則是utf8_general_ci,對應(yīng)的填充規(guī)則是pad space屬性,代表字符比較過程中,末尾空格不重要,所以加不加空格結(jié)果都是一樣的;
實例二的連接比較規(guī)則是utf8mb4_0900_ai_ci,對應(yīng)的填充規(guī)則是no pad屬性,代表字符比較過程中,末尾空格重要,所以加不加空格結(jié)果不一樣。
3.如何讓字符匹配更嚴(yán)格?
1)修改連接的比較規(guī)則為utf8mb4_0900_ai_ci,當(dāng)然,這個修改需要搭配默認字符集
這個方案比較容易理解,不贅述。
2)使用like模糊匹配進行比較
3)where條件之前,添加binary關(guān)鍵字
上述2、3兩種方法可見下面的測試:
00:19:13>select * from t00;
+----+------+
| id | name |
+----+------+
| 1 | aaa |
| 2 | bbb |
+----+------+
2 rows in set (0.00 sec)
00:19:18>select * from t00 where name='aaa';
+----+------+
| id | name |
+----+------+
| 1 | aaa |
+----+------+
1 row in set (0.00 sec)
00:19:28>select * from t00 where name='aaa ';
+----+------+
| id | name |
+----+------+
| 1 | aaa |
+----+------+
1 row in set (0.00 sec)
### 下面兩種方案,可以防止'aaa '匹配到'aaa'
00:19:31>select * from t00 where name like 'aaa ';
Empty set (0.00 sec)
00:19:57>select * from t00 where binary name = 'aaa ';
Empty set (0.00 sec)
今天文章就到這里吧。
新聞名稱:MySQL關(guān)閉,kill還是kill-9?
當(dāng)前URL:http://m.fisionsoft.com.cn/article/dhjsepi.html


咨詢
建站咨詢
