早上同事要我寫個MySQL去除重覆數據的SQL,想起來上次寫過一篇MySQL去除重覆數據的博客,使用導入導出加唯一索引實現的,但是那種方式對業務影響較大,所以重新寫一個存儲過程來刪重覆數據,這一寫就寫了一個上午,這種BUG確實是很令人沮喪和浪費時間的。 這裡把流程簡單的描述一下,刪重覆數據的邏輯很簡 ...
早上同事要我寫個MySQL去除重覆數據的SQL,想起來上次寫過一篇MySQL去除重覆數據的博客,使用導入導出加唯一索引實現的,但是那種方式對業務影響較大,所以重新寫一個存儲過程來刪重覆數據,這一寫就寫了一個上午,這種BUG確實是很令人沮喪和浪費時間的。
這裡把流程簡單的描述一下,刪重覆數據的邏輯很簡單:
1.根據重覆判斷條件找出重覆記錄的最小主鍵(一般是ID列)。
2.在符合重覆條件的記錄中,把主鍵大於最小主鍵的記錄全部刪掉即可。
假設我有如下表,需要刪除start_time和end_time都一樣的重覆記錄。
那麼存儲過程如下:
DELIMITER // DROP PROCEDURE IF EXISTS Del_Dup_FOR_TEST; CREATE PROCEDURE Del_Dup_FOR_TEST() BEGIN DECLARE min_id INT; DECLARE v_start_time,v_end_time DATETIME; DECLARE v_count INT; DECLARE done INT DEFAULT 0; DECLARE my_cur CURSOR FOR SELECT start_time,end_time,min(id),count(1) AS count FROM leo.test GROUP BY start_time,end_time HAVING count>1; DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1; OPEN my_cur; myloop: LOOP FETCH my_cur INTO v_start_time,v_end_time,min_id,v_count; IF done=1 THEN LEAVE myloop; END IF; DELETE FROM leo.test WHERE start_time=v_start_time AND end_time=v_end_time AND id>min_id; COMMIT; END LOOP myloop; CLOSE my_cur; END; // DELIMITER ;
邏輯很清晰,就是根據重覆判斷條件依次刪掉重覆組中主鍵大於最小主鍵的記錄們。
但是在編寫過程中卻遇到一個很噁心的BUG,我最初的內容是這麼寫的:
DELIMITER // DROP PROCEDURE IF EXISTS Del_Dup_FOR_TEST; CREATE PROCEDURE Del_Dup_FOR_TEST() BEGIN DECLARE min_id INT; DECLARE start_time,end_time DATETIME; DECLARE count INT; DECLARE done INT DEFAULT 0; DECLARE my_cur CURSOR FOR SELECT start_time,end_time,min(id),count(1) AS count FROM leo.test GROUP BY start_time,end_time HAVING count>1; DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1; OPEN my_cur; myloop: LOOP FETCH my_cur INTO start_time,end_time,min_id,count; IF done=1 THEN LEAVE myloop; END IF; DELETE FROM leo.test WHERE start_time=start_time AND end_time=end_time AND id>min_id; COMMIT; END LOOP myloop; CLOSE my_cur; END; // DELIMITER ;
不同的部分在於變數定義的名稱,即:
FETCH INTO的變數名絕對不能是你定義CURSOR時SQL語句查出來的列名或者列別名,也就說你定義的變數名既不能是表中已經存在的列名,也不能是你定義游標時用過的別名(如本例中的count),只要一個條件不符合,FETCH INTO就把全部的變數賦NULL值,這點你可以嘗試在FETCH INTO後加一句Select列印變數名驗證。
在查詢到這個BUG之前去官網頁面特地看了一下是否是我的語法有錯誤:https://dev.mysql.com/doc/refman/5.5/en/cursors.html ,確信語法沒問題,但倒數第二條評論顯示可能是列名的隱藏BUG,最後一條評論反駁了BUG說法,但沒有辦法我還是根據BUG REPORT做了以上修改,然後功能就正常了。
關於此BUG的BUG報告頁面詳見MySQL BUG:#28227 和 BUG:#5967
那麼再回頭看一下官網文檔下的最後一條評論,開始我認為最後一條反駁BUG的評論完全是扯淡,是哪個傻X說這不是個BUG的?後來仔細想了想,他倆都對,這確實也算個BUG,傻X的也是我。
貼一下頁面下最後兩條評論(截止2018.08.01):
Posted by Brent Roady on May 9, 2012 It should be noted that the local variable names used in FETCH [cursor] INTO must be different than the variable names used in the SELECT statement
defining the CURSOR. Otherwise the values will be NULL. In this example, DECLARE a VARCHAR(255); DECLARE cur1 CURSOR FOR SELECT a FROM table1; FETCH cur1 INTO a; the value of a after the FETCH will be NULL. This is also described here: http://bugs.mysql.com/bug.php?id=28227 Posted by Jérémi Lassausaie on February 3, 2015 Answer for Brent Roady : I don't see any bug in the bahaviour described. DECLARE a VARCHAR(255); /* you declare a variable "a" without a specified default value, a=NULL */ DECLARE cur1 CURSOR FOR SELECT a FROM table1; /* You declare a cursor that selects "a" FROM a table */ OPEN cur1; /* You execute your cursor query, a warning is raised because a is ambiguously defined but you don't see it */ FETCH cur1 INTO a; /* you put your unique field in your unique row into a (basically you do "SET a=a;") so a is still NULL */ There is no bug report, just a misunderstanding.
Brent遇到的現象與我相同,併列出了BUG Report的鏈接。
Jeremi(猜測可能是個程式員)回答,這是一個顯而易見的誤解,當你聲明瞭變數a(初始值為NULL),然後FETCH INTO a就相當於set a=a,在任何程式語言中這都是無解的。
因此在編寫存儲過程中為定義的變數加個首碼標識是很好的習慣,想起以前Oracle寫存儲過程確實都加v_首碼,SQL Server 都用@首碼,現在輪到mysql卻忽略了,確實需要牢記下。