讓 GPT-4 來修複 Golang “數據競爭”問題(續) - 每天5分鐘玩轉 GPT 編程系列(7)

[TOC] ## 1. 我以為我以為 [GoPool](https://github.com/devchat-ai/gopool) 這個項目會曇花一現，從此在 GitHub 上封塵。 > 關於 GoPool 項目誕生的故事：[《僅三天，我用 GPT-4 生成了性能全網第一的 Golang Work ...

1. 我以為
2. 陰魂不散的“數據競爭”問題
3. 老規矩，關門放 GPT-4
4. 總結

1. 我以為

我以為 GoPool 這個項目會曇花一現，從此在 GitHub 上封塵。

關於 GoPool 項目誕生的故事：《僅三天，我用 GPT-4 生成了性能全網第一的 Golang Worker Pool，輕鬆打敗 GitHub 萬星項目》

不過這兩天陸續有越來越多的人開始試用 GoPool，並且發現了一些 bug，提到了 GitHub 上。

那麼今天就繼續分享下用 GPT-4 解決 issues 的過程，不管你是對 Golang 感興趣，還是對 GPT-4 感興趣，下麵的內容都會是你的菜。

2. 陰魂不散的“數據競爭”問題

我才發了一篇和“數據競爭”相關的文章：

《讓 GPT-4 來修複 Golang “數據競爭”問題》

在上面這篇文章里介紹瞭如何用 GPT-4 來解決一個單元測試中的“數據競爭”問題，然後提了個 PR：

Refactor TestGoPoolWithRetry to use atomic operations

還沒來得及喝口水，就發現又有用戶提了一個新 issue：

TestGoPoolWithMutex 測試用例添加 WithTimeout 無法通過 race 測試

好家伙，看樣子今天不能提前下班了。

3. 老規矩，關門放 GPT-4

下文使用 DevChat 和 GPT-4 交互。如果你對 DevChat 感興趣，可以跳轉閱讀這篇文章：

《DevChat：將 GPT-4 無縫融入 VS Code，極致提升你的編程體驗》

如果你想要註冊 DevChat，點擊這個鏈接可以免費註冊，目前 DevChat 還在公測中，不收費。

3.1 復現問題

用戶在 issue 里提到他在 TestGoPoolWithMutex() 這個測試用例中加了 WithTimeout(10*time.Microsecond) 後就遇到了“數據競爭”問題，代碼如下：

func TestGoPoolWithMutex(t *testing.T) {
	pool := NewGoPool(100, WithLock(new(sync.Mutex)), WithTimeout(10*time.Microsecond))
	defer pool.Release()
	for i := 0; i < 1000; i++ {
		pool.AddTask(func() (interface{}, error) {
			time.Sleep(10 * time.Millisecond)
			return nil, nil
		})
	}
	pool.Wait()
}

我在本地能夠穩定復現這個問題。那麼接下來，就是簡化場景，用一個單獨的 UT 來測試 WithTimeout(10*time.Microsecond) 功能。

3.2 讓 GPT-4 寫一個單元測試

我對 DevChat 說：

DevChat 的完整回答是：

要添加一個名為 TestGoPoolWithTimeout() 的測試函數，我們需要考慮以下幾點：

我們需要創建一個帶有超時選項的 GoPool。
我們需要添加一個任務，該任務的執行時間超過了設置的超時時間，以便測試超時功能是否正常工作。
我們需要檢查任務是否因為超時而失敗。

以下是添加的 TestGoPoolWithTimeout() 函數：

func TestGoPoolWithTimeout(t *testing.T) {
    var taskError = errors.New("task error")
    var taskRun = false

    pool := NewGoPool(100, WithTimeout(100*time.Millisecond), WithErrorCallback(func(err error) {
        if err != taskError {
            t.Errorf("Expected error %v, but got %v", taskError, err)
        }
        taskRun = true
    }))
    defer pool.Release()

    pool.AddTask(func() (interface{}, error) {
        time.Sleep(200 * time.Millisecond)
        return nil, taskError
    })

    pool.Wait()

    if !taskRun {
        t.Errorf("Expected task to run and timeout, but it did not run")
    }
}

這個測試函數首先創建了一個帶有 100 毫秒超時的 GoPool。然後，它添加了一個任務，該任務需要 200 毫秒才能完成，這超過了設置的超時時間。如果任務因為超時而失敗，它將調用錯誤回調函數，並將 taskRun 設置為 true。最後，測試函數檢查 taskRun 是否為 true，如果不是，說明任務沒有運行，這是一個錯誤。

這段代碼“看起來精緻，但是跑不通”。能不能發現錯誤所在決定了這段代碼最終能不能用起來。你發現問題了嗎？（答案在下文揭曉）

我又對 DevChat 說：

這個測試用例沒有跑過。我註意到你在 WithErrorCallback 中判斷 err != taskError 則報錯，然而超時的時候 task 返回的錯誤不是 taskError，而是 "task timed out"，所以 != 是否應該改成 ==？

DevChat 給出的回答：

你是對的，我犯了一個錯誤。在這個測試用例中，我們應該期望的錯誤是 "task timed out"，而不是 taskError。因此，我們應該在 WithErrorCallback 中檢查錯誤是否等於 "task timed out"。以下是修正後的測試函數：

func TestGoPoolWithTimeout(t *testing.T) {
    var taskRun = false

    pool := NewGoPool(100, WithTimeout(100*time.Millisecond), WithErrorCallback(func(err error) {
        if err.Error() != "task timed out" {
            t.Errorf("Expected error 'task timed out', but got %v", err)
        }
        taskRun = true
    }))
    defer pool.Release()

    pool.AddTask(func() (interface{}, error) {
        time.Sleep(200 * time.Millisecond)
        return nil, nil
    })

    pool.Wait()

    if !taskRun {
        t.Errorf("Expected task to run and timeout, but it did not run")
    }
}

在這個修正後的版本中，我們在 WithErrorCallback 中檢查錯誤是否等於 "task timed out"。如果不是，我們就報告一個錯誤。我們還修改了添加的任務，使其在執行時不返回錯誤，因為我們期望的錯誤是由超時引起的，而不是任務本身引起的。

這時候 TestGoPoolWithTimeout() 其實能跑通了。不過當我執行 go test 命令加上了 -race 之後，有趣：

WARNING: DATA RACE

GPT-4 寫的 bug 讓 GPT-4 自己去修！

於是我語(一)重(股)心(腦)長(兒)地把錯誤信息發給了 DevChat：

這個測試用例能夠跑通，但是如果執行 race detector 則會報錯：

go test -race ./...
==================
WARNING: DATA RACE
Write at 0x00c00001a14f by goroutine 641:
  github.com/devchat-ai/gopool.TestGoPoolWithTimeout.func1()
      /Users/danielhu/go/mycode/gopool/gopool_test.go:160 +0xf4
  github.com/devchat-ai/gopool.(*worker).handleResult()
      /Users/danielhu/go/mycode/gopool/worker.go:92 +0x94
  github.com/devchat-ai/gopool.(*worker).start.func1()
      /Users/danielhu/go/mycode/gopool/worker.go:27 +0xa4

Previous read at 0x00c00001a14f by goroutine 541:
  github.com/devchat-ai/gopool.TestGoPoolWithTimeout()
      /Users/danielhu/go/mycode/gopool/gopool_test.go:171 +0x1a8
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1576 +0x188
  testing.(*T).Run.func1()
      /usr/local/go/src/testing/testing.go:1629 +0x40

Goroutine 641 (running) created at:
  github.com/devchat-ai/gopool.(*worker).start()
      /Users/danielhu/go/mycode/gopool/worker.go:23 +0xdc
  github.com/devchat-ai/gopool.NewGoPool()
      /Users/danielhu/go/mycode/gopool/gopool.go:75 +0x490
  github.com/devchat-ai/gopool.TestGoPoolWithTimeout()
      /Users/danielhu/go/mycode/gopool/gopool_test.go:156 +0x138
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1576 +0x188
  testing.(*T).Run.func1()
      /usr/local/go/src/testing/testing.go:1629 +0x40

Goroutine 541 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:1629 +0x5e4
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:2036 +0x80
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1576 +0x188
  testing.runTests()
      /usr/local/go/src/testing/testing.go:2034 +0x700
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:1906 +0x950
  main.main()
      _testmain.go:63 +0x300
==================
--- FAIL: TestGoPoolWithTimeout (0.11s)
    gopool_test.go:172: Expected task to run and timeout, but it did not run
    testing.go:1446: race detected during execution of test
FAIL
FAIL    github.com/devchat-ai/gopool    1.267s
FAIL

這次 GPT-4 沒有說對不起，可惡，這態度，差評：

DevChat 這次給出的完整回答：

這個數據競爭問題是由於在多個 goroutine 中同時讀寫 taskRun 變數導致的。在 Go 中，如果你有多個 goroutine 同時讀寫一個變數，你需要使用某種形式的同步來確保數據的一致性。

在這個測試用例中，我們可以使用 sync/atomic 包提供的原子操作來解決這個問題。以下是修正後的測試函數：

import "sync/atomic"

func TestGoPoolWithTimeout(t *testing.T) {
    var taskRun int32

    pool := NewGoPool(100, WithTimeout(100*time.Millisecond), WithErrorCallback(func(err error) {
        if err.Error() != "task timed out" {
            t.Errorf("Expected error 'task timed out', but got %v", err)
        }
        atomic.StoreInt32(&taskRun, 1)
    }))
    defer pool.Release()

    pool.AddTask(func() (interface{}, error) {
        time.Sleep(200 * time.Millisecond)
        return nil, nil
    })

    pool.Wait()

    if atomic.LoadInt32(&taskRun) == 0 {
        t.Errorf("Expected task to run and timeout, but it did not run")
    }
}

在這個修正後的版本中，我們使用 sync/atomic 包的 StoreInt32 和 LoadInt32 函數來安全地更新和讀取 taskRun 變數。這樣，即使在多個 goroutine 中同時操作這個變數，也不會出現數據競爭的問題。

行，至少 DATA RACE 問題解決了。

3.3 修複 Wait() 中的邏輯漏洞

前面解決了 DATA RACE 問題，不過 TestGoPoolWithTimeout() 運行結果是：

$ go test -race ./...
--- FAIL: TestGoPoolWithTimeout (0.11s)
    gopool_test.go:172: Expected task to run and timeout, but it did not run
FAIL
FAIL    github.com/devchat-ai/gopool    1.203s
FAIL

開始燒腦起來了，繼續把這個日誌拋給 GPT-4 吧。

這次 DevChat 的回答是：

看起來測試用例沒有通過，原因是期望的任務沒有運行。這可能是因為 pool.Wait() 在任務超時之前就返回了。pool.Wait() 的實現只是簡單地等待任務隊列為空，但這並不能保證所有的任務都已經完成執行，特別是在任務超時的情況下。

為瞭解決這個問題，我們需要在 pool.Wait() 中添加一些邏輯來確保所有的任務都已經完成執行。一種可能的解決方案是使用一個 sync.WaitGroup 來跟蹤正在執行的任務數量。當一個任務開始執行時，我們調用 WaitGroup.Add(1)，當任務完成執行時，我們調用 WaitGroup.Done()。然後，我們可以在 pool.Wait() 中調用 WaitGroup.Wait() 來等待所有的任務完成執行。

這是一個涉及到多個函數的修改，所以我建議你根據你的代碼庫的具體情況來進行這個修改。如果你需要更具體的幫助，可以提供更多的代碼上下文，我會很樂意幫助你。

GPT-4 說問題在 pool.Wait() 中，不過它給出的方案還是讓我猶豫。當前 Wait() 方法如下：

// Wait waits for all tasks to be dispatched.
 func (p *goPool) Wait() {
 	for len(p.taskQueue) > 0 {
 		time.Sleep(100 * time.Millisecond)
 	}
 }

這裡的邏輯是等待 taskQueue 為空，也就是所有的任務都被分發出去，有對應的 workers 去執行。GPT-4 分析的沒錯，這裡確實少了“等待所有任務執行完成”的邏輯。

我對 DevChat 說：

在執行 task 的時候會有 popWorker() 動作，而 task 完成則有 pushWorker() 動作，因此如果 len(goPool.workerStack) == len(goPool.workers) 是不是也能說明所有的 tasks 都完成了？

DevChat 回答：

你的理解是正確的。在這個特定的 goPool 實現中，每當一個任務開始執行時，都會從 workerStack 中 pop 一個 worker，而當任務完成時，會將 worker push 回 workerStack。因此，如果 len(goPool.workerStack) == len(goPool.workers)，那麼這確實意味著所有的任務都已經完成了。

因此，你可以修改 Wait() 函數，使其等待直到所有的 worker 都返回到 workerStack。以下是修改後的 Wait() 函數：

// Wait waits for all tasks to be dispatched and completed.
func (p *goPool) Wait() {
    for len(p.taskQueue) > 0 || len(p.workerStack) != len(p.workers) {
        time.Sleep(100 * time.Millisecond)
    }
}

在這個修改後的版本中，Wait() 函數會等待直到任務隊列為空並且所有的 worker 都返回到 workerStack。這樣，當 Wait() 函數返回時，我們就可以確保所有的任務都已經完成了。

於是我這樣改之後，繼續運行了下 Race Detector，結果如下：

$ go test -race -run TestGoPoolWithTimeout
==================
WARNING: DATA RACE
Read at 0x00c000146298 by goroutine 6:
  github.com/devchat-ai/gopool.(*goPool).Wait()
      /Users/danielhu/go/mycode/gopool/gopool.go:89 +0x7c
  github.com/devchat-ai/gopool.TestGoPoolWithTimeout()
      /Users/danielhu/go/mycode/gopool/gopool_test.go:169 +0x1a0
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1576 +0x188
  testing.(*T).Run.func1()
      /usr/local/go/src/testing/testing.go:1629 +0x40

Previous write at 0x00c000146298 by goroutine 118:
  github.com/devchat-ai/gopool.(*goPool).popWorker()
      /Users/danielhu/go/mycode/gopool/gopool.go:113 +0xec
  github.com/devchat-ai/gopool.(*goPool).dispatch()
      /Users/danielhu/go/mycode/gopool/gopool.go:164 +0x10c
  github.com/devchat-ai/gopool.NewGoPool.func2()
      /Users/danielhu/go/mycode/gopool/gopool.go:78 +0x34

Goroutine 6 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:1629 +0x5e4
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:2036 +0x80
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1576 +0x188
  testing.runTests()
      /usr/local/go/src/testing/testing.go:2034 +0x700
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:1906 +0x950
  main.main()
      _testmain.go:63 +0x300

Goroutine 118 (running) created at:
  github.com/devchat-ai/gopool.NewGoPool()
      /Users/danielhu/go/mycode/gopool/gopool.go:78 +0x660
  github.com/devchat-ai/gopool.TestGoPoolWithTimeout()
      /Users/danielhu/go/mycode/gopool/gopool_test.go:156 +0x138
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1576 +0x188
  testing.(*T).Run.func1()
      /usr/local/go/src/testing/testing.go:1629 +0x40
==================
--- FAIL: TestGoPoolWithTimeout (0.21s)
    testing.go:1446: race detected during execution of test
FAIL
exit status 1
FAIL    github.com/devchat-ai/gopool    0.519s

慘啊，今天是和 DATA RACE 杠上了？？？

我把這段日誌直接貼給了 GPT-4，然後 GPT-4 感慨了下：看起來我們又遇到了數據競爭問題。

DevChat 此處的完整回答是：

看起來我們又遇到了數據競爭問題。這次的問題是在 Wait() 函數和 popWorker() 函數中同時讀寫 workerStack。為瞭解決這個問題，我們需要在訪問 workerStack 時使用互斥鎖來保護數據。

以下是修改後的 Wait() 函數和 popWorker() 函數：

import "sync"

type goPool struct {
    // other fields...
    workerStackMutex sync.Mutex
}

// Wait waits for all tasks to be dispatched and completed.
func (p *goPool) Wait() {
    for {
        p.workerStackMutex.Lock()
        workerStackLen := len(p.workerStack)
        p.workerStackMutex.Unlock()

        if len(p.taskQueue) == 0 && workerStackLen == len(p.workers) {
            break
        }

        time.Sleep(100 * time.Millisecond)
    }
}

// popWorker pops a worker from the worker stack.
func (p *goPool) popWorker() *worker {
    p.workerStackMutex.Lock()
    defer p.workerStackMutex.Unlock()

    if len(p.workerStack) == 0 {
        return nil
    }

    w := p.workerStack[len(p.workerStack)-1]
    p.workerStack = p.workerStack[:len(p.workerStack)-1]
    return w
}

在這個修改後的版本中，我們添加了一個 workerStackMutex 欄位到 goPool 結構體中，用於在訪問 workerStack 時提供互斥保護。在 Wait() 和 popWorker() 函數中，我們都使用了這個互斥鎖來保護 workerStack 的讀寫操作，從而避免了數據競爭的問題。

看起來來這個修改能跑通，不過似乎不夠優雅。這時候就又一次體現“GPT 也不能幫你完成你認知之外的工作”這個道理了。這段代碼整體不錯，但是有瑕疵，能看懂，能微調，GPT 就能幫你提效；反之，和 GPT 聊天就是浪費口水。

我又對 DevChat 說：

但是 goPool 本來就有一個 lock 欄位，能不能直接用？

這次 GPT-4 和我說“對不起”了。我很欣慰，她又開始變得懂禮貌了。

DevChat 回答：

對不起，我沒有註意到 goPool 結構體中已經有一個 lock 欄位。你完全可以使用這個已有的互斥鎖來保護 workerStack 的訪問，而不需要添加一個新的互斥鎖。

以下是使用 lock 欄位來保護 workerStack 訪問的修改後的 Wait() 函數和 popWorker() 函數：

// Wait waits for all tasks to be dispatched and completed.
func (p *goPool) Wait() {
    for {
        p.lock.Lock()
        workerStackLen := len(p.workerStack)
        p.lock.Unlock()

        if len(p.taskQueue) == 0 && workerStackLen == len(p.workers) {
            break
        }

        time.Sleep(100 * time.Millisecond)
    }
}

// popWorker pops a worker from the worker stack.
func (p *goPool) popWorker() *worker {
    p.lock.Lock()
    defer p.lock.Unlock()

    if len(p.workerStack) == 0 {
        return nil
    }

    w := p.workerStack[len(p.workerStack)-1]
    p.workerStack = p.workerStack[:len(p.workerStack)-1]
    return w
}

在這個修改後的版本中，我們使用 lock 欄位來保護 workerStack 的讀寫操作，從而避免了數據競爭的問題。

這個回答有點“後遺症”，也就是前面提到了 popWorker() 方法的修改，所以這裡它也給出了修改內容，其實這個方法不需要動，我的原版代碼是：

func (p *goPool) popWorker() int {
	p.lock.Lock()
	workerIndex := p.workerStack[len(p.workerStack)-1]
	p.workerStack = p.workerStack[:len(p.workerStack)-1]
	p.lock.Unlock()
	return workerIndex
}

len(p.workerStack) == 0 這個判斷是多餘的，因為調用 popWorker() 的 dispatch() 方法長這樣：

// dispatch dispatches tasks to workers.
func (p *goPool) dispatch() {
	for t := range p.taskQueue {
		p.cond.L.Lock()
		for len(p.workerStack) == 0 {
			p.cond.Wait()
		}
		p.cond.L.Unlock()
		workerIndex := p.popWorker()
		p.workers[workerIndex].taskQueue <- t
	}
}

這裡通過 len(p.workerStack) == 0 判斷保證了調用 popWorker() 被調用時，len(p.workerStack) != 0。

所以最後我採納了 GPT-4 給出的這個 Wait() 方法：

func (p *goPool) Wait() {
    for {
        p.lock.Lock()
        workerStackLen := len(p.workerStack)
        p.lock.Unlock()

        if len(p.taskQueue) == 0 && workerStackLen == len(p.workers) {
            break
        }

        time.Sleep(100 * time.Millisecond)
    }
}

最後我提了這個 PR：Improve task execution and testing in goPool

4. 總結

還是那句話：GPT 多數情況下並不能幫你解決認知之外的問題。

除非是比較簡單的代碼，不然如果 GPT 寫的代碼你看不懂，那麼大概率這段代碼也不可用。GPT 給的100行代碼里很可能有1行是錯的，如果你能發現這一行錯誤，能夠把它改對，那你就省了99行的時間。反之，這100行跑不通的代碼對你來說一文不值。

總之，GPT 還是能幫你省下不少事，但是不能幫你幹完所有事。