Mutex in Go

Go Mutex

1. Overview of Go's `sync.Mutex`

Go's sync.Mutex is implemented in a way that it avoids system calls whenever possible by trying to handle mutexes in user space. However, it can fall back to system-level operations like kernel threads or sleeping/waking up threads when necessary.

Here's how the code behaves at the runtime level when you call mutex.Lock().

2. Definition of `sync.Mutex`

In Go, the sync.Mutex is defined like this (simplified):

type Mutex struct {
    state int32
    sema  uint32
}

The state field is a combination of flags that indicate whether the mutex is locked or unlocked and whether other goroutines are waiting.
The sema field is used to coordinate sleeping/waking of goroutines.

const (
	mutexLocked = 1 << iota // mutex is locked
	mutexWoken
	mutexStarving
	mutexWaiterShift = iota

	// Mutex fairness.
	//
	// Mutex can be in 2 modes of operations: normal and starvation.
	// In normal mode waiters are queued in FIFO order, but a woken up waiter
	// does not own the mutex and competes with new arriving goroutines over
	// the ownership. New arriving goroutines have an advantage -- they are
	// already running on CPU and there can be lots of them, so a woken up
	// waiter has good chances of losing. In such case it is queued at front
	// of the wait queue. If a waiter fails to acquire the mutex for more than 1ms,
	// it switches mutex to the starvation mode.
	//
	// In starvation mode ownership of the mutex is directly handed off from
	// the unlocking goroutine to the waiter at the front of the queue.
	// New arriving goroutines don't try to acquire the mutex even if it appears
	// to be unlocked, and don't try to spin. Instead they queue themselves at
	// the tail of the wait queue.
	//
	// If a waiter receives ownership of the mutex and sees that either
	// (1) it is the last waiter in the queue, or (2) it waited for less than 1 ms,
	// it switches mutex back to normal operation mode.
	//
	// Normal mode has considerably better performance as a goroutine can acquire
	// a mutex several times in a row even if there are blocked waiters.
	// Starvation mode is important to prevent pathological cases of tail latency.
	starvationThresholdNs = 1e6
)

3. Lock Implementation (`mutex.Lock()`)

The Lock() function works by manipulating the state of the mutex. Here’s the high-level Go code:

// Lock locks m.
// If the lock is already in use, the calling goroutine
// blocks until the mutex is available.
func (m *Mutex) Lock() {
	// Fast path: grab unlocked mutex.
	if atomic.CompareAndSwapInt32(&m.state, 0, mutexLocked) {
		if race.Enabled {
			race.Acquire(unsafe.Pointer(m))
		}
		return
	}
	// Slow path (outlined so that the fast path can be inlined)
	m.lockSlow()
}

4. Fast Path (User-Space Locking)

Go first tries the fast path, where it attempts to acquire the lock using an atomic operation. In this case, atomic.CompareAndSwapInt32() (CAS) is used to check if the mutex is unlocked (m.state == 0) and, if so, it sets the lock by updating the state.

Compare-and-Swap (CAS): This is an atomic instruction provided by the CPU, which checks whether the value of m.state is 0 (unlocked) and swaps it to 1 (locked). If successful, the function returns immediately, and the lock is acquired without any system calls or blocking.

atomic.CompareAndSwapInt32(&m.state, 0, mutexLocked)

If the CAS operation is successful, the lock is acquired, and no kernel intervention is required. This is fast because it doesn't involve the OS and remains in user space.

5. Slow Path (Contention Handling)

If the fast path fails (meaning another goroutine already holds the lock), the code enters the slow path (m.lockSlow()), which handles contention between multiple goroutines trying to acquire the lock.

func (m *Mutex) lockSlow() {
	var waitStartTime int64
	starving := false
	awoke := false
	iter := 0
	old := m.state
	for {
		// Don't spin in starvation mode, ownership is handed off to waiters
		// so we won't be able to acquire the mutex anyway.
		if old&(mutexLocked|mutexStarving) == mutexLocked && runtime_canSpin(iter) {
			// Active spinning makes sense.
			// Try to set mutexWoken flag to inform Unlock
			// to not wake other blocked goroutines.
			if !awoke && old&mutexWoken == 0 && old>>mutexWaiterShift != 0 &&
				atomic.CompareAndSwapInt32(&m.state, old, old|mutexWoken) {
				awoke = true
			}
			runtime_doSpin()
			iter++
			old = m.state
			continue
		}
		new := old
		// Don't try to acquire starving mutex, new arriving goroutines must queue.
		if old&mutexStarving == 0 {
			new |= mutexLocked
		}
		if old&(mutexLocked|mutexStarving) != 0 {
			new += 1 << mutexWaiterShift
		}
		// The current goroutine switches mutex to starvation mode.
		// But if the mutex is currently unlocked, don't do the switch.
		// Unlock expects that starving mutex has waiters, which will not
		// be true in this case.
		if starving && old&mutexLocked != 0 {
			new |= mutexStarving
		}
		if awoke {
			// The goroutine has been woken from sleep,
			// so we need to reset the flag in either case.
			if new&mutexWoken == 0 {
				throw("sync: inconsistent mutex state")
			}
			new &^= mutexWoken
		}
		if atomic.CompareAndSwapInt32(&m.state, old, new) {
			if old&(mutexLocked|mutexStarving) == 0 {
				break // locked the mutex with CAS
			}
			// If we were already waiting before, queue at the front of the queue.
			queueLifo := waitStartTime != 0
			if waitStartTime == 0 {
				waitStartTime = runtime_nanotime()
			}
			runtime_SemacquireMutex(&m.sema, queueLifo, 1)
			starving = starving || runtime_nanotime()-waitStartTime > starvationThresholdNs
			old = m.state
			if old&mutexStarving != 0 {
				// If this goroutine was woken and mutex is in starvation mode,
				// ownership was handed off to us but mutex is in somewhat
				// inconsistent state: mutexLocked is not set and we are still
				// accounted as waiter. Fix that.
				if old&(mutexLocked|mutexWoken) != 0 || old>>mutexWaiterShift == 0 {
					throw("sync: inconsistent mutex state")
				}
				delta := int32(mutexLocked - 1<<mutexWaiterShift)
				if !starving || old>>mutexWaiterShift == 1 {
					// Exit starvation mode.
					// Critical to do it here and consider wait time.
					// Starvation mode is so inefficient, that two goroutines
					// can go lock-step infinitely once they switch mutex
					// to starvation mode.
					delta -= mutexStarving
				}
				atomic.AddInt32(&m.state, delta)
				break
			}
			awoke = true
			iter = 0
		} else {
			old = m.state
		}
	}

	if race.Enabled {
		race.Acquire(unsafe.Pointer(m))
	}
}

runtime_SemacquireMutex(): If another goroutine holds the lock, Go’s runtime will call this function to put the current goroutine to sleep. It uses the sema field to handle semaphore-like synchronization. When the mutex is unlocked, the waiting goroutine will be woken up and will retry acquiring the lock.

In this slow path, the thread will either spin (check again if the lock becomes available) or sleep, depending on the circumstances.

Detailed Walkthrough of the Code:

if old&(mutexLocked|mutexStarving) == mutexLocked && runtime_canSpin(iter) {
    // Spinning: actively retry to acquire the lock.
    if !awoke && old&mutexWoken == 0 && old>>mutexWaiterShift != 0 &&
        atomic.CompareAndSwapInt32(&m.state, old, old|mutexWoken) {
        awoke = true // Mark as awake to prevent unnecessary wakes.
    }
    runtime_doSpin() // Do the actual spin.
    iter++ // Keep track of spinning attempts.
    old = m.state // Update to the latest state.
    continue
}

Active Spinning: If the mutex is locked but not in starvation mode, and spinning is allowed, the goroutine will spin in an attempt to acquire the lock. If the lock isn’t available after multiple spins, the goroutine gives up spinning and prepares to sleep.

if starving && old&mutexLocked != 0 {
    new |= mutexStarving // Switch to starvation mode if needed.
}

Starvation Mode: If the goroutine has been waiting for too long, the mutex switches to starvation mode. This prioritizes existing waiters over newly arriving goroutines, ensuring fairness.

if atomic.CompareAndSwapInt32(&m.state, old, new) {
    if old&(mutexLocked|mutexStarving) == 0 {
        break // Successfully locked the mutex with CAS.
    }
    // Sleep and wait for the lock to be released by others.
    runtime_SemacquireMutex(&m.sema, queueLifo, 1)
    starving = starving || runtime_nanotime()-waitStartTime > starvationThresholdNs
    old = m.state // Check the updated state.
}

Acquiring the Lock: If the goroutine successfully updates the mutex state using the atomic compare-and-swap (CAS) operation, it has acquired the lock. If not, it puts itself to sleep using the semaphore (runtime_SemacquireMutex()), waiting for the lock to become available.

Purpose of `mutexWoken`

The mutexWoken flag is used in the Go sync.Mutex implementation to manage goroutines waiting to acquire a lock. Its purpose is to ensure that only one waiting goroutine is woken up when the mutex becomes available. This avoids unnecessary contention and ensures efficient scheduling.

When a goroutine attempts to acquire a lock and the lock is already held by another goroutine, the goroutine goes to sleep (or waits), and the mutexWoken flag ensures that the next goroutine in line is properly woken up once the lock is released.

How `mutexWoken` is Turned On and Off

i. Setting mutexWoken (Turning It On): When a goroutine is about to be woken up, the mutexWoken flag is set to notify that this particular goroutine has been woken up. This occurs in the slow path (lockSlow()), where the goroutine may go into a waiting queue if the lock is already held.

Here's the relevant snippet:

if !awoke && old&mutexWoken == 0 && old>>mutexWaiterShift != 0 &&
    atomic.CompareAndSwapInt32(&m.state, old, old|mutexWoken) {
    awoke = true
}

This part checks:

If the awoke flag is not set yet (i.e., the goroutine hasn’t been woken up yet).
If no other goroutine is already marked as woken (old & mutexWoken == 0).
If there are goroutines waiting in the queue (old >> mutexWaiterShift != 0).

If these conditions are met, it atomically sets the mutexWoken flag using CompareAndSwapInt32(). This ensures only one goroutine is woken up at a time to acquire the lock.

ii. Checking and Clearing mutexWoken (Turning It Off): Once the goroutine has been woken up and is now competing for the lock, the mutexWoken flag needs to be cleared to allow other waiting goroutines to be woken up in the future.

Here's the code snippet:

if awoke {
    // The goroutine has been woken from sleep, so we need to reset the flag.
    if new&mutexWoken == 0 {
        throw("sync: inconsistent mutex state")
    }
    new &^= mutexWoken
}

Check: First, it checks if the mutexWoken flag is set. If it’s not (new & mutexWoken == 0), this indicates an inconsistency because the goroutine believes it was woken up (awoke is true), but the flag doesn’t reflect that, so it throws an error (throw("sync: inconsistent mutex state")).
Clearing: If everything is consistent, the mutexWoken flag is cleared (new &^= mutexWoken). This ensures that future goroutines can be woken up when the lock is released, and no more than one goroutine is woken at a time.

Why This Mechanism Is Used:

i. Efficiency and Fairness: Only one goroutine should be woken up at a time when a lock is released. Without this mechanism, multiple goroutines might be woken up, causing them to contend for the lock unnecessarily. This can lead to thundering herd problems, where many goroutines wake up and compete for the same lock, leading to inefficiencies. ii. Avoiding Multiple Wakeups: The mutexWoken flag prevents the same goroutine from being woken up multiple times. Once a goroutine has been marked as "woken up," it sets the awoke flag and clears mutexWoken to ensure that other waiting goroutines can be considered for wakeup next. iii. State Consistency: The checks on mutexWoken help ensure that the mutex's internal state remains consistent. If the flag is not set properly (either too early or too late), it indicates a problem with the mutex's state, which could lead to incorrect behavior in multi-goroutine scenarios. This is why there's a strict check that throws an error if the state is inconsistent.

What Is Starvation Mode?

Starvation mode is a special state that the mutex can enter when a goroutine has been waiting too long (more than 1 millisecond by default) to acquire the lock. This mode is designed to ensure fairness by preventing newly arriving goroutines from repeatedly acquiring the lock before older, waiting goroutines, thus avoiding starvation (where a goroutine waits indefinitely for the lock).

How Starvation Mode Works:

i. Normal Operation:

In normal mode, when a mutex is unlocked, new goroutines that arrive (those trying to acquire the lock) compete with goroutines that are already waiting in line. The new goroutines may have a slight advantage since they are already running and don’t need to be woken up, which could lead to long wait times for goroutines that have already been waiting.

ii. Transition to Starvation Mode:

When a goroutine has been waiting for longer than a predefined threshold (1 millisecond by default), it sets the mutex to starvation mode. This is indicated by the mutexStarving bit in the mutex.state.
Once in starvation mode, newly arriving goroutines are no longer allowed to acquire the lock. The lock is handed directly to the goroutine that has been waiting the longest (the one at the front of the queue). This ensures that waiting goroutines are prioritized over new arrivals.

iii. Mutex Ownership Hand-off:

In starvation mode, when the mutex is unlocked, ownership of the mutex is immediately transferred to the first waiting goroutine, bypassing the normal competition with new goroutines.
Newly arriving goroutines are added to the back of the queue and do not compete for the lock while the mutex is in starvation mode.

iv. Exiting Starvation Mode:

The mutex exits starvation mode when a goroutine successfully acquires the lock, and either:
- It was the last goroutine in the wait queue.
- It waited for less than 1 millisecond.
This allows the system to return to normal operation mode (with FIFO-style queueing), balancing performance and fairness.

6. System Calls and Blocking

If the lock is held for a long time, the goroutine might block (via the runtime_SemacquireMutex() function), and the Go runtime will coordinate with the OS to put the goroutine to sleep. When the lock becomes available, the OS will wake the goroutine up, and it will retry locking.

7. Unlocking (`mutex.Unlock()`)

When mutex.Unlock() is called, it does the reverse:

// Unlock unlocks m.
// It is a run-time error if m is not locked on entry to Unlock.
//
// A locked [Mutex] is not associated with a particular goroutine.
// It is allowed for one goroutine to lock a Mutex and then
// arrange for another goroutine to unlock it.
func (m *Mutex) Unlock() {
	if race.Enabled {
		_ = m.state
		race.Release(unsafe.Pointer(m))
	}

	// Fast path: drop lock bit.
	new := atomic.AddInt32(&m.state, -mutexLocked)
	if new != 0 {
		// Outlined slow path to allow inlining the fast path.
		// To hide unlockSlow during tracing we skip one extra frame when tracing GoUnblock.
		m.unlockSlow(new)
	}
}

Atomic Subtraction: The unlock operation decrements the lock state (m.state), marking the mutex as unlocked.
Semaphore Release: If there are other goroutines waiting (new != 0), the runtime_Semrelease() function is called to wake them up and let them retry locking.
Slow Path Unlock

func (m *Mutex) unlockSlow(new int32) {
	if (new+mutexLocked)&mutexLocked == 0 {
		fatal("sync: unlock of unlocked mutex")
	}
	if new&mutexStarving == 0 {
		old := new
		for {
			// If there are no waiters or a goroutine has already
			// been woken or grabbed the lock, no need to wake anyone.
			// In starvation mode ownership is directly handed off from unlocking
			// goroutine to the next waiter. We are not part of this chain,
			// since we did not observe mutexStarving when we unlocked the mutex above.
			// So get off the way.
			if old>>mutexWaiterShift == 0 || old&(mutexLocked|mutexWoken|mutexStarving) != 0 {
				return
			}
			// Grab the right to wake someone.
			new = (old - 1<<mutexWaiterShift) | mutexWoken
			if atomic.CompareAndSwapInt32(&m.state, old, new) {
				runtime_Semrelease(&m.sema, false, 1)
				return
			}
			old = m.state
		}
	} else {
		// Starving mode: handoff mutex ownership to the next waiter, and yield
		// our time slice so that the next waiter can start to run immediately.
		// Note: mutexLocked is not set, the waiter will set it after wakeup.
		// But mutex is still considered locked if mutexStarving is set,
		// so new coming goroutines won't acquire it.
		runtime_Semrelease(&m.sema, true, 1)
	}
}

8. Memory Barriers

Memory barriers are automatically handled by the atomic operations (atomic.CompareAndSwapInt32() and atomic.AddInt32()) in Go. These barriers ensure proper memory visibility between goroutines, so changes made before the mutex is unlocked are visible to other goroutines after they acquire the lock.

Summary of the Mutex Lock Mechanism in Go:

Fast Path (CAS): Tries to acquire the lock using atomic compare-and-swap (CAS) in user space.
Slow Path: If the lock is already held, the goroutine might either spin (recheck lock) or block (sleep), using semaphores for synchronization.
System Calls: If the slow path is taken and the lock is unavailable, the Go runtime uses system calls to put the goroutine to sleep and wake it when the lock is free.
Unlocking: On unlocking, the lock state is atomically modified, and waiting goroutines (if any) are woken up using semaphore-like mechanisms.

This lock mechanism balances efficiency (by avoiding system calls with CAS) and fairness (by eventually blocking goroutines to avoid CPU wastage).