Linux Kernel Scheduling Explained: Policies, Priorities, and Preemption

Table of Contents

🧠 Overview
#

This article provides a high-level introduction to process scheduling in the Linux kernel, drawing primarily from Linux Kernel Development by Robert Love. The focus is on core concepts, scheduling policies, and execution behavior rather than low-level implementation details.

The goal is to build conceptual clarity around how Linux allocates CPU time fairly and efficiently across diverse workloads.

⚙️ What Is the Scheduler?
#

The scheduler is a core kernel subsystem responsible for dividing the finite resource of processor time among all runnable processes. Its responsibilities include deciding:

Which process runs next
When execution begins
How long a process is allowed to run

Effective scheduling directly impacts system responsiveness, throughput, and perceived performance.

📊 Scheduling Policies
#

I/O-Bound vs. CPU-Bound Workloads
#

Linux scheduling behavior depends heavily on how a process uses the CPU:

I/O-Bound Processes
Spend most of their lifetime waiting for I/O operations such as user input, disk access, or network events. Examples include GUI applications and terminal shells. These processes run frequently but for short bursts.
CPU-Bound Processes
Spend most of their time executing instructions. Typical examples include video encoding, compression, and scientific computation. These workloads tend to run continuously until preempted.

Distinguishing between these behaviors is essential for maintaining both interactivity and throughput.

Process Priority
#

Linux uses two independent priority domains:

Nice Value
Ranges from -20 (highest priority) to +19 (lowest priority). A higher nice value means the process yields more CPU time to others. The default nice value is 0.
Real-Time Priority (PRI)
Ranges from 0 to 99, where a higher number represents a higher priority. This scale is reserved for real-time scheduling policies.

Both values can be inspected using:

ps -elf

Here, NI represents the nice value and PRI represents the kernel priority.

Timeslice
#

A timeslice defines how long a task may execute before being preempted.

Excessively long timeslices reduce responsiveness
Excessively short timeslices increase context-switch overhead

Modern Linux schedulers dynamically balance this trade-off rather than relying on fixed timeslices.

🔄 Scheduling Strategy in Practice
#

Consider two processes:

A text editor (I/O-bound)
A video encoder (CPU-bound)

Even with identical nice values, Linux treats these workloads differently under the Completely Fair Scheduler (CFS).

Because the editor spends most of its time sleeping, it accumulates very little CPU usage. When a user presses a key and the editor wakes up, CFS detects that it has not received its fair share of processor time and immediately preempts the video encoder. This behavior ensures low latency and high interactivity.

🧩 Scheduling Classes in Linux
#

Linux organizes scheduling policies into classes, ordered by priority from highest to lowest:

Priority	Class	Description
1	Stop	Reserved for stop-machine operations; cannot be preempted
2	Deadline	Uses Earliest Deadline First (EDF) for strict timing guarantees
3	Real-Time	Includes `SCHED_FIFO` and `SCHED_RR` policies
4	Fair (CFS)	Default class for normal user processes
5	Idle	Runs only when no other task is runnable

Each class implements its own scheduling logic while integrating into a unified framework.

🔀 Preemption and Context Switching
#

A context switch occurs when the kernel transitions execution from one runnable process to another. This operation is orchestrated by context_switch() and consists of two major steps:

switch_mm()
Updates the virtual memory mappings for the new process.
switch_to()
Saves and restores CPU registers, stack pointers, and processor state.

User-Space Preemption
#

User preemption occurs when the kernel is preparing to return to user space and the need_resched flag is set. This commonly happens:

On return from a system call
On return from an interrupt handler

Kernel-Space Preemption
#

Since Linux 2.6, the kernel supports full preemption. A task may be preempted even while executing kernel code, provided it is not holding a lock or otherwise marked non-preemptible.

Kernel preemption can occur:

When an interrupt handler finishes
When kernel code releases a lock and becomes preemptible
When a task explicitly calls schedule()

This design significantly improves latency on modern systems.

✅ Summary
#

Linux scheduling is fundamentally about balancing fairness, throughput, and responsiveness. By combining scheduling classes, dynamic priorities, and aggressive preemption, the kernel ensures that interactive tasks remain responsive while CPU-intensive workloads continue to make progress.

Understanding these principles provides essential context for performance tuning, real-time workloads, and kernel-level debugging.