Go Memory Model: Understanding Happens-Before and Data Races

The Go memory model defines when a write to a variable in one goroutine is guaranteed to be visible to a read in another goroutine. Without understanding happens-before relationships, you can write co

Introduction#

The Go memory model defines when a write to a variable in one goroutine is guaranteed to be visible to a read in another goroutine. Without understanding happens-before relationships, you can write code that appears correct but has data races that manifest only under load, on specific hardware, or after compiler optimization.

The Problem: Visibility Without Synchronization#

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
package main

import "fmt"

var ready bool
var data int

func setup() {
    data = 42      // write to data
    ready = true   // write to ready
}

func reader() {
    for !ready {}  // spin until ready
    fmt.Println(data) // might print 0, not 42!
}

// This is WRONG. The Go memory model does NOT guarantee
// that a write to data is visible to another goroutine
// just because ready is true.
// The compiler and CPU can reorder these writes.

Happens-Before in Go#

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// A happens-before B means: all writes visible to A are visible at B

// Guarantees in Go:
// 1. Within a goroutine: sequential (program order)
// 2. Goroutine creation: everything before go f() happens-before f() starts
// 3. Channel operations: send happens-before receive completes
// 4. sync primitives: Unlock happens-before subsequent Lock
// 5. sync.Once: Do(f) happens-before Do returns to any caller

// CORRECT: using a channel to establish happens-before
func correct() {
    ch := make(chan struct{})
    data := 0

    go func() {
        data = 42
        ch <- struct{}{} // send happens-before receive completes
    }()

    <-ch // receive establishes happens-before
    fmt.Println(data) // safe: always prints 42
}

Data Races#

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import "sync"

// RACE: concurrent write and read without synchronization
var counter int

func increment() {
    counter++ // read-modify-write: NOT atomic
}

func main() {
    for i := 0; i < 1000; i++ {
        go increment()
    }
    // counter value is undefined: data race
}

// Detect races with -race flag:
// go run -race main.go
// go test -race ./...

// CORRECT: use atomic operations
import "sync/atomic"

var counter int64

func incrementAtomic() {
    atomic.AddInt64(&counter, 1)
}

// CORRECT: use mutex
var (
    mu      sync.Mutex
    counter int
)

func incrementMutex() {
    mu.Lock()
    counter++
    mu.Unlock()
}

sync/atomic: Low-Level Primitives#

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import (
    "sync/atomic"
    "unsafe"
)

// Atomic operations establish happens-before:
// atomic store in goroutine A happens-before atomic load
// of the same variable that observes the stored value in goroutine B

type AtomicBool struct {
    v int32
}

func (b *AtomicBool) Store(val bool) {
    if val {
        atomic.StoreInt32(&b.v, 1)
    } else {
        atomic.StoreInt32(&b.v, 0)
    }
}

func (b *AtomicBool) Load() bool {
    return atomic.LoadInt32(&b.v) != 0
}

// Correct flag-based signaling:
var ready atomic.Bool
var data int

func producer() {
    data = 42
    ready.Store(true) // store establishes happens-before
}

func consumer() {
    for !ready.Load() {}
    fmt.Println(data) // safe: happens-after ready.Store(true)
}

sync.Mutex Memory Guarantees#

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
// Unlock happens-before any subsequent Lock on the same mutex
// This provides the sequencing needed for safe shared state

type SafeCache struct {
    mu    sync.RWMutex
    items map[string]string
}

func (c *SafeCache) Get(key string) (string, bool) {
    c.mu.RLock()
    defer c.mu.RUnlock()
    v, ok := c.items[key]
    return v, ok
}

func (c *SafeCache) Set(key, value string) {
    c.mu.Lock()
    defer c.mu.Unlock()
    c.items[key] = value
}

// Pattern: copy-on-write for read-heavy workloads
type COWConfig struct {
    ptr atomic.Pointer[map[string]string]
}

func (c *COWConfig) Get(key string) string {
    cfg := c.ptr.Load()
    if cfg == nil {
        return ""
    }
    return (*cfg)[key]
}

func (c *COWConfig) Set(key, value string) {
    for {
        old := c.ptr.Load()
        var newMap map[string]string
        if old != nil {
            newMap = make(map[string]string, len(*old)+1)
            for k, v := range *old {
                newMap[k] = v
            }
        } else {
            newMap = make(map[string]string)
        }
        newMap[key] = value
        if c.ptr.CompareAndSwap(old, &newMap) {
            return
        }
        // CAS failed: another writer modified it, retry
    }
}

Channel Synchronization Patterns#

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
// Unbuffered channel: send blocks until receiver is ready
// The receive completing happens-after the send

// Pattern: worker pool with results
func workerPool(jobs []Job, numWorkers int) []Result {
    jobCh := make(chan Job, len(jobs))
    resultCh := make(chan Result, len(jobs))

    // Start workers
    for i := 0; i < numWorkers; i++ {
        go func() {
            for job := range jobCh {
                resultCh <- process(job)
            }
        }()
    }

    // Send jobs
    for _, job := range jobs {
        jobCh <- job
    }
    close(jobCh)

    // Collect results
    results := make([]Result, 0, len(jobs))
    for range jobs {
        results = append(results, <-resultCh)
    }
    return results
}

// Pattern: fan-out, fan-in
func fanOut(input <-chan int, workers int) []<-chan int {
    outputs := make([]<-chan int, workers)
    for i := range outputs {
        ch := make(chan int)
        outputs[i] = ch
        go func() {
            for v := range input {
                ch <- v * v
            }
            close(ch)
        }()
    }
    return outputs
}

func merge(channels ...<-chan int) <-chan int {
    out := make(chan int)
    var wg sync.WaitGroup
    for _, ch := range channels {
        wg.Add(1)
        go func(c <-chan int) {
            defer wg.Done()
            for v := range c {
                out <- v
            }
        }(ch)
    }
    go func() {
        wg.Wait()
        close(out)
    }()
    return out
}

sync.Once#

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// sync.Once guarantees: f() executes exactly once across goroutines
// All calls to Do(f) that observe the first call completing
// happen-after f() returns

type Singleton struct {
    once     sync.Once
    instance *ExpensiveResource
}

func (s *Singleton) Get() *ExpensiveResource {
    s.once.Do(func() {
        s.instance = initExpensiveResource()
    })
    return s.instance
}

// Safe for concurrent use: first caller initializes, others wait and then
// observe the fully initialized instance

Race Detector in CI#

1
2
3
4
5
6
7
8
9
# .github/workflows/test.yml
- name: Test with race detector
  run: go test -race -count=1 ./...

# The race detector:
# - ~5-20x slower, ~5-10x more memory
# - Detects races at runtime (not all code paths exercised)
# - Essential in CI to catch races before they reach production
# - Use -count=1 to prevent test caching with race detector

Common Race Patterns to Avoid#

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// RACE: closing over loop variable
for _, item := range items {
    go func() {
        process(item) // item is shared — all goroutines may see the last value
    }()
}

// CORRECT: capture loop variable
for _, item := range items {
    item := item // new variable per iteration
    go func() {
        process(item)
    }()
}

// RACE: map concurrent read/write
cache := make(map[string]int)
go func() { cache["key"] = 1 }()
go func() { _ = cache["key"] }() // concurrent read+write = panic

// CORRECT: use sync.Map for concurrent access
var cache sync.Map
go func() { cache.Store("key", 1) }()
go func() { cache.Load("key") }()

Conclusion#

The Go memory model defines correctness, not just performance. Code that appears to work without synchronization may silently produce incorrect results under different hardware, compiler versions, or load. Always use -race in tests. Use channels for communication between goroutines and sync.Mutex/sync.RWMutex for protecting shared state. Use sync/atomic for simple flags and counters. Understand that only operations that establish a happens-before relationship guarantee visibility across goroutines.

Contents