「Go语言面试题」17 - Goroutine 泄漏检测实战：检测代码中是否存在goroutine泄漏

引言

“服务运行一段时间后内存暴涨”、“CPU 使用率异常升高”、“请求响应越来越慢”… 这些是否是你运维 Go 服务时遇到的噩梦？很多时候，罪魁祸首就是那些悄悄累积的 goroutine 泄漏。作为一个有经验的 Go 开发者，掌握 goroutine 泄漏的检测和预防技能至关重要。

本文将带你从实战角度出发，通过完整的代码示例，学习如何检测、定位和修复 goroutine 泄漏问题。

什么是 Goroutine 泄漏？

Goroutine 泄漏指的是程序中启动了 goroutine，但这些 goroutine 无法正常退出，导致它们占用的资源无法被回收。长期运行的服务中，即使每个泄漏的 goroutine 只占用少量资源，累积起来也会造成严重问题。

先来看一个典型的泄漏示例：

package main

import (
    "fmt"
    "net/http"
    "time"
)

// 有泄漏的版本：goroutine 无法退出
func leakyHandler(w http.ResponseWriter, r *http.Request) {
    go func() {
        // 模拟一些工作
        time.Sleep(10 * time.Second)
        fmt.Println("Work done")
        // 注意：这里没有退出机制，goroutine 会一直存在
    }()
    
    w.Write([]byte("Request processed"))
}

func main() {
    http.HandleFunc("/leak", leakyHandler)
    fmt.Println("Server started at :8080")
    http.ListenAndServe(":8080", nil)
}

方法一：使用 runtime 包实时监控

Go 的 runtime 包提供了查看当前 goroutine 数量的能力：

package main

import (
    "fmt"
    "net/http"
    "runtime"
    "time"
)

func monitorGoroutines() {
    for {
        time.Sleep(2 * time.Second)
        num := runtime.NumGoroutine()
        fmt.Printf("Current goroutines: %d\n", num)
    }
}

func properHandler(w http.ResponseWriter, r *http.Request) {
    done := make(chan struct{})
    
    go func() {
        defer close(done)
        time.Sleep(2 * time.Second)
        fmt.Println("Work done properly")
    }()
    
    select {
    case <-done:
    case <-time.After(3 * time.Second):
        fmt.Println("Work timeout")
    }
    
    w.Write([]byte("Request processed properly"))
}

func main() {
    go monitorGoroutines()
    
    http.HandleFunc("/proper", properHandler)
    fmt.Println("Server started at :8080")
    http.ListenAndServe(":8080", nil)
}

运行这个程序并多次访问 /proper，你会看到 goroutine 数量保持稳定。

方法二：使用 pprof 进行深度分析

pprof 是 Go 最强大的性能分析工具，可以详细查看 goroutine 的状态：

package main

import (
    "fmt"
    "net/http"
    _ "net/http/pprof" // 自动注册 pprof 处理器
    "time"
)

func createLeak() {
    go func() {
        ch := make(chan struct{})
        <-ch // 永久阻塞，导致泄漏
    }()
}

func main() {
    // 每秒钟创建一个泄漏的 goroutine
    go func() {
        for {
            createLeak()
            time.Sleep(time.Second)
        }
    }()
    
    fmt.Println("Server started at :6060")
    fmt.Println("Access goroutine info: http://localhost:6060/debug/pprof/goroutine?debug=2")
    http.ListenAndServe(":6060", nil)
}

运行后访问 http://localhost:6060/debug/pprof/goroutine?debug=2，你可以看到所有 goroutine 的堆栈信息，轻松找到泄漏的来源。

方法三：集成测试中的泄漏检测

在测试代码中集成泄漏检测机制：

package main

import (
    "runtime"
    "testing"
    "time"
)

func TestGoroutineLeak(t *testing.T) {
    // 记录测试开始前的 goroutine 数量
    initialGoroutines := runtime.NumGoroutine()
    
    // 执行可能产生泄漏的操作
    createLeak()
    
    // 给一些时间让 goroutine 启动
    time.Sleep(100 * time.Millisecond)
    
    // 检查 goroutine 数量
    finalGoroutines := runtime.NumGoroutine()
    
    if finalGoroutines > initialGoroutines {
        t.Errorf("Possible goroutine leak: initial %d, final %d", 
            initialGoroutines, finalGoroutines)
    }
}

func createLeak() {
    go func() {
        select {} // 永久阻塞
    }()
}

完整的实战示例：修复真实的泄漏场景

package main

import (
    "context"
    "fmt"
    "net/http"
    "runtime"
    "sync"
    "time"
)

type WorkerManager struct {
    wg     sync.WaitGroup
    cancel context.CancelFunc
}

func NewWorkerManager() *WorkerManager {
    return &WorkerManager{}
}

// 正确的实现：使用 context 控制生命周期
func (wm *WorkerManager) StartWorkers(num int) {
    ctx, cancel := context.WithCancel(context.Background())
    wm.cancel = cancel
    
    for i := 0; i < num; i++ {
        wm.wg.Add(1)
        go wm.worker(ctx, i)
    }
}

func (wm *WorkerManager) worker(ctx context.Context, id int) {
    defer wm.wg.Done()
    
    ticker := time.NewTicker(time.Second)
    defer ticker.Stop()
    
    for {
        select {
        case <-ticker.C:
            fmt.Printf("Worker %d working...\n", id)
        case <-ctx.Done():
            fmt.Printf("Worker %d shutting down...\n", id)
            return
        }
    }
}

func (wm *WorkerManager) Stop() {
    if wm.cancel != nil {
        wm.cancel()
    }
    wm.wg.Wait()
    fmt.Println("All workers stopped")
}

func main() {
    // 启动监控
    go func() {
        for {
            time.Sleep(2 * time.Second)
            fmt.Printf("Current goroutines: %d\n", runtime.NumGoroutine())
        }
    }()
    
    manager := NewWorkerManager()
    manager.StartWorkers(3)
    
    // 模拟运行一段时间
    time.Sleep(5 * time.Second)
    
    // 优雅关闭
    manager.Stop()
    
    // 检查最终状态
    time.Sleep(1 * time.Second)
    fmt.Printf("Final goroutines: %d\n", runtime.NumGoroutine())
}

预防 Goroutine 泄漏的最佳实践

总是使用 context：为所有可能长时间运行的 goroutine 提供退出机制
使用 WaitGroup：确保所有 goroutine 都能正确等待和退出
设置超时：为阻塞操作设置合理的超时时间
定期监控：在生产环境中集成 goroutine 数量监控
代码审查：在代码审查时特别注意 goroutine 的生命周期管理

实战排查步骤

当怀疑有 goroutine 泄漏时，可以按以下步骤排查：

实时监控：使用 runtime.NumGoroutine() 观察数量变化
获取堆栈：通过 http://localhost:6060/debug/pprof/goroutine?debug=2 查看详细堆栈
分析阻塞：使用 http://localhost:6060/debug/pprof/goroutine?debug=1 查看阻塞情况
压力测试：使用 wrk 或 ab 进行压力测试，观察 goroutine 增长情况
逐步排查：通过注释法或二分法定位泄漏源

思考与讨论

在你的项目中，是否曾经遇到过 goroutine 泄漏的问题？你是如何发现和解决的？欢迎在评论区分享你的实战经验和教训！记住：预防胜于治疗。良好的并发编程习惯和定期的代码审查，是避免 goroutine 泄漏的最佳策略。

如有疑问关注公众号给我留言

「Go语言面试题」16 - 实战经验：避免 panic！如何安全地关闭一个正在被多个goroutine读写的channel？「Go语言面试题」18 - 值传递 vs 指针传递：Go 工程师必须掌握的参数传递艺术景

Go 编程语言之旅

archive