Linux Kernel Macros: SYSCALL_DEFINE, IS_ERR, and Why Macros Beat Functions Here — Blog

SYSCALL_DEFINE: why not just a function?

int __sys_socket(int family, int type, int protocol)
{
    struct socket *sock;
    sock = __sys_socket_create(family, type,
                   update_socket_protocol(family, type, protocol));
    if (IS_ERR(sock))
        return PTR_ERR(sock);
    return sock_map_fd(sock, flags & (O_CLOEXEC | O_NONBLOCK));
}

SYSCALL_DEFINE3(socket, int, family, int, type, int, protocol)
{
    return __sys_socket(family, type, protocol);
}

SYSCALL_DEFINE3(socket, ...) expands to platform-specific code that:

Sets up the kernel stack frame for the syscall entry point (sys_socket)
Handles architecture differences in argument passing (x86 uses registers differently than ARM64)
Generates syscall auditing hooks (used by strace, seccomp, audit subsystem)
Provides consistent naming so syscall_table[__NR_socket] = sys_socket works on all architectures

The underlying logic is in __sys_socket() — a regular function that can be called directly from kernel code (e.g., in-kernel socket creation). The macro wrapper handles the syscall entry machinery.

IS_ERR / PTR_ERR: error codes in pointers

Kernel functions that return pointers face a problem: how to signal an error without a separate error return channel? Two options: return NULL (loses the error code), or use a special region of the address space.

Linux uses the latter. Valid kernel pointers are never in the last 4KB of virtual address space (0xFFFFF000 to 0xFFFFFFFF on 32-bit). This region is reserved for error-encoded pointers:

// Encoding: return (void *)(unsigned long)(-errno)
sock = __sys_socket_create(family, type, protocol);
if (IS_ERR(sock)) {
    int err = PTR_ERR(sock);  // extracts -EINVAL, -ENOMEM, etc.
    return err;               // returns the negative errno
}

// How IS_ERR works (simplified):
#define IS_ERR(ptr) ((unsigned long)(ptr) > (unsigned long)(-MAX_ERRNO))

// PTR_ERR extracts the negative errno:
#define PTR_ERR(ptr) ((long)(ptr))

A pointer of -ENOMEM (-12) is 0xFFFFFFF4 on 32-bit — in the reserved error region. IS_ERR(ptr) checks if the pointer is in that region. This pattern lets functions return a valid pointer on success and an error-encoded pointer on failure, checked with a single if (IS_ERR(...)).

container_of: walking from member to container

// Given a pointer to a list_head member, find the containing struct
#define container_of(ptr, type, member) ({                    \
    const typeof(((type *)0)->member) *__mptr = (ptr);        \
    (type *)((char *)__mptr - offsetof(type, member));        \
})

struct net_device {
    char name[IFNAMSIZ];
    struct list_head dev_list;  // embedded list node
    int ifindex;
    // ...
};

// Given a list_head *, find the net_device it's embedded in:
struct net_device *dev = container_of(list_ptr, struct net_device, dev_list);

This is impossible with a regular function — offsetof computation and the type cast must happen at compile time for the compiler to generate correct pointer arithmetic. The macro captures typeof for type safety and computes the offset at compile time.

Kernel macros often replace functions because they need compile-time type information or must generate different code per call site

ConceptSystems Programming

Three reasons kernels use macros where functions would be insufficient: (1) Generic type handling — container_of works on any struct/member pair; a function would need void * and lose type safety. (2) Inlining at call site — BUILD_BUG_ON(sizeof(x) != 4) becomes a compile-time check at the exact line; a function call can't do this. (3) Automatic argument capture — __FILE__ and __LINE__ in pr_err() macros capture the call site; a function only sees its own __LINE__.

Prerequisites

C preprocessor
sizeof and offsetof
Kernel data structures

Key Points

SYSCALL_DEFINE: generates platform-specific syscall entry points with auditing hooks.
IS_ERR/PTR_ERR: uses last 4KB of virtual address space to encode errno in pointer.
container_of: subtract member offset from member pointer to get struct start — compile-time type safety.
BUILD_BUG_ON: forces compile error if condition is true — impossible with a function call.

A kernel function returns a struct socket *. The caller checks `if (!sock)` to detect errors. What's wrong with this approach compared to IS_ERR?

medium

The kernel function uses ERR_PTR(-ENOMEM) or ERR_PTR(-EINVAL) on failure instead of NULL.

ANothing — checking for NULL is the standard way to detect pointer errors
Incorrect.NULL checking only works if the function returns NULL on error. Kernel functions using ERR_PTR return a non-NULL pointer (an error-encoded address in the last 4KB of virtual memory). `!sock` is false for an error pointer, so the error goes undetected.
BThe caller misses errors: ERR_PTR returns a non-NULL pointer. IS_ERR detects the error; !sock doesn't. The caller then dereferences the error-encoded pointer, causing a kernel panic.
Correct!ERR_PTR(-ENOMEM) returns a pointer like 0xFFFFFFF4 — non-NULL. `if (!sock)` evaluates to false, so the caller thinks it has a valid socket. It then tries to use the socket (dereference, pass to other functions), which reads from or writes to an invalid address, causing a kernel oops or panic. Always use IS_ERR() for functions documented to use ERR_PTR. Then use PTR_ERR() to get the errno: `int err = PTR_ERR(sock); return err;`
CIS_ERR is just a style preference — both check the same condition
Incorrect.IS_ERR and `!ptr` check different conditions. IS_ERR checks if the pointer is in the error-pointer range. `!ptr` checks if the pointer is NULL. ERR_PTR(-ENOMEM) is non-NULL, so only IS_ERR catches it.
DNULL pointers are invalid in the kernel — all kernel pointers are non-NULL
Incorrect.NULL pointers exist in the kernel and are used in some error conventions. Some functions return NULL on allocation failure. The key is knowing which convention a function uses: NULL or ERR_PTR. Using IS_ERR on a NULL-returning function also works — IS_ERR(NULL) returns false since NULL (0) is not in the error range.

Hint:What value does ERR_PTR(-ENOMEM) return? Is it NULL? What does !sock evaluate to for that value?