Threads at the OS Level: Kernel Threads, User Threads, and Scheduling
A kernel thread is the OS scheduling unit — it gets CPU time directly. A user thread is managed by a runtime library and must be mapped to a kernel thread to run. M:N threading (Go goroutines, Java virtual threads) multiplexes M user threads onto N kernel threads, reducing context-switch cost. Thread scheduling is preemptive within a process — the OS can interrupt any thread at any quantum boundary.
Kernel threads vs user threads
A kernel thread (also called an OS thread or 1:1 thread) is directly managed by the OS scheduler. It appears in /proc/<pid>/task/ on Linux. The OS allocates CPU time to kernel threads and can preempt them.
A user thread (or green thread) is managed entirely by a user-space runtime. The OS doesn't know it exists. The runtime multiplexes user threads onto kernel threads.
| Property | Kernel thread | User thread | |---|---|---| | Scheduled by | OS scheduler | User-space runtime | | Context switch cost | ~1–10 µs (syscall, kernel stack swap) | ~100 ns (no syscall, register swap only) | | Stack size (default) | 1–8 MB | 2–8 KB (growable) | | Blocks on syscall | Blocks only that thread | Blocks the entire kernel thread (1:1) or is scheduled around (M:N) | | Visibility to OS | Yes — appears in top, ps | No | | Limit | ~thousands per process | ~millions per process |
Thread models: 1:1, N:1, M:N
1:1 model (Java, C pthreads, Rust):
user thread → kernel thread → CPU
One kernel thread per user thread. Full OS scheduling, blocking I/O is fine.
Cost: 8 MB stack default × 1000 threads = 8 GB memory.
N:1 model (early green threads, old Python greenlets):
N user threads → 1 kernel thread → CPU
All user threads share one OS thread. No parallelism on multicore.
A blocking syscall in any user thread stalls all others.
M:N model (Go goroutines, Java virtual threads JDK 21+, Erlang processes):
M goroutines → N OS threads → N CPU cores
Runtime schedules goroutines across OS threads (N = GOMAXPROCS, default = CPU count).
A blocking syscall parks the OS thread, scheduler moves other goroutines to a free thread.
Thread scheduling vs process scheduling
Process scheduling determines which process gets CPU time. Thread scheduling determines which thread within a process (or across processes) gets CPU time. On Linux with 1:1 threads, the kernel schedules threads directly — processes are just thread groups.
Process A: [thread 1] [thread 2] [thread 3]
Process B: [thread 1] [thread 2]
Linux CFS scheduler sees 5 schedulable entities.
Each gets a share of CPU time based on priority (nice value) and vruntime.
Threads within Process A compete with threads from Process B.
Within a process, threads share virtual address space, file descriptors, and signal handlers. Each thread has its own:
- Stack (kernel stack for syscalls, user stack for function calls)
- Registers (saved/restored on context switch)
- Thread-local storage (TLS)
- Signal mask
Go's M:N scheduler parks OS threads on blocking syscalls to prevent goroutines from stalling
Deep DiveGo Runtime / OS SchedulingWhen a goroutine calls a blocking syscall (read, write, accept), Go's runtime can't yield cooperatively — the syscall blocks the OS thread. Go handles this with two mechanisms: (1) for network I/O, Go uses non-blocking syscalls internally + epoll/kqueue; goroutines park on the Go scheduler, not the OS thread. (2) For other blocking syscalls (file I/O, cgo), Go detaches the OS thread from the processor (P), allowing another OS thread to pick up runnable goroutines. The blocking thread rejoins a thread pool after the syscall returns.
Prerequisites
- goroutines
- GOMAXPROCS
- epoll
- Linux syscalls
Key Points
- GOMAXPROCS controls the number of OS threads actively running Go code (default: CPU count).
- Network I/O: goroutines use epoll/kqueue via Go's netpoller — no OS thread is blocked.
- File/cgo blocking syscalls: OS thread is handed off, scheduler creates/reuses another thread.
- runtime.LockOSThread(): pins a goroutine to a specific OS thread — needed for C libraries that use thread-local state.
Context switch cost
A context switch between kernel threads requires:
- Save current thread's registers, stack pointer, program counter
- Switch kernel stack
- Update page table (if switching processes)
- Invalidate TLB (partial or full, depending on CPU architecture)
- Load new thread's registers
Thread context switches within the same process skip step 3-4 (shared page table), making them cheaper than process switches but still requiring kernel entry (~2,000–5,000 cycles on modern hardware).
User thread context switches (goroutine-to-goroutine within the same OS thread) swap only the goroutine stack pointer and registers in user space — no kernel entry, ~200 cycles.
A program spawns 10,000 goroutines, each sleeping 1 second then printing. GOMAXPROCS=4. How many OS threads are running Go code at peak?
mediumGOMAXPROCS limits OS threads actively running Go code. Sleeping goroutines are parked by the Go scheduler.
A10,000 — one OS thread per goroutine
Incorrect.Go uses M:N threading. The runtime does not create an OS thread per goroutine. GOMAXPROCS controls how many OS threads run Go code concurrently.B4 — GOMAXPROCS=4 means at most 4 OS threads run Go code, sleeping goroutines are parked on the scheduler queue
Correct!Sleeping goroutines call time.Sleep(), which parks them on a timer heap in the Go scheduler — they consume no OS thread. At peak, only 4 OS threads are running Go code (one per logical processor P). If all 10,000 goroutines wake simultaneously, they queue on the 4 Ps and run in batches of 4, not 10,000 simultaneous OS threads.C1 — goroutines are cooperative and only one runs at a time
Incorrect.Go goroutines run in parallel on multiple OS threads when GOMAXPROCS > 1. They are preemptable as of Go 1.14 (signal-based preemption at function call boundaries).D4 + 10,000 — 4 for Go code plus one OS thread per sleeping goroutine
Incorrect.Sleeping goroutines do not hold OS threads. time.Sleep() parks the goroutine on the Go scheduler's timer heap without blocking any OS thread.
Hint:What does GOMAXPROCS control? What happens to a goroutine that calls time.Sleep()?