Linux 7.2 Scheduler CAS: Storage, Rust, and Kernel Performance Fixes

Table of Contents

Linux 7.2 Scheduler CAS: Storage, Rust, and Kernel Performance Fixes

The Linux 7.2 merge window in mid-2026 introduces a tightly focused set of optimizations across the scheduler, storage stack, and Rust infrastructure. Rather than large-scale feature additions, this release is defined by surgical fixes targeting long-standing architectural assumptions that no longer match modern hardware and workload behavior.

🧠 Cache-Aware Scheduling (CAS) and CPU Topology Evolution
#

One of the most significant changes is the introduction of Cache Aware Scheduling (CAS) under CONFIG_SCHED_CACHE, addressing a structural blind spot in the Linux scheduler around modern multi-LLC CPU designs.

Legacy Scheduler Assumptions vs Modern Hardware
#

Historically, Linux assumed that all cores within a socket shared a unified Last Level Cache (LLC). This model breaks down on contemporary server CPUs:

AMD EPYC (Naples): Multi-die CCX-based L3 partitioning
AMD EPYC (Rome and beyond): CCD + I/O die separation with isolated caches
Intel Xeon 6: Partitioned LLC and hybrid topology designs

Without awareness of these boundaries, the scheduler may place cache-sharing tasks across separate LLC domains, causing:

Increased memory latency from cross-domain cache misses
Cache line bouncing between partitions
Reduced throughput in shared-data workloads

CAS Design Principles
#

CAS introduces explicit LLC-domain awareness into scheduling decisions:

Tasks with shared data are preferentially co-located within the same LLC domain
Cross-LLC migrations incur higher scheduling cost weights
Load balancing respects cache topology boundaries while preserving fairness

This enables improved behavior in database, networking, and high-concurrency systems where cache locality dominates performance.

Legacy:
LLC A <---- cache bounce ----> LLC B
Tasks migrate freely, ignoring cache boundaries

CAS:
[ LLC Domain A ]  <--- high migration cost --->  [ LLC Domain B ]
Co-located tasks remain cache-local

Early evaluations on EPYC and Xeon platforms show measurable gains in:

PostgreSQL throughput
Key-value store latency (e.g., Valkey-style workloads)
Network packet processing under load

💾 Storage Stack Optimization: EXT4 and XFS IOPS Gains
#

A seemingly minor patch in the VFS layer produces a meaningful performance improvement across EXT4 and XFS.

Eliminating Redundant Memory Writes in `iomap_iter()`
#

In the shared iomap_iter() path, a defensive memset() was historically used to clear an iomap structure after each iteration. However, analysis under NVMe 4K random read workloads revealed a subtle inefficiency:

The structure is discarded at loop completion
The final memory wipe provides no functional benefit
The memset() consumes memory bandwidth on hot I/O paths

By removing this redundant operation, high-IOPS workloads using io_uring and NVMe polling observe:

~5% IOPS improvement in EXT4 and XFS paths
Reduced memory bandwidth pressure in tight loops

This change illustrates a recurring kernel theme: correctness-preserving micro-optimizations can still unlock measurable gains in modern storage stacks.

FS-VERITY Expansion for XFS
#

The same series also advances integrity support by introducing FS-VERITY compatibility for XFS using a post-EOF Merkle tree layout, closing a long-standing feature gap with EXT4.

🦀 Rust Infrastructure: Zerocopy and Safer Kernel Memory Patterns
#

Linux’s Rust subsystem continues to evolve toward safer low-level memory manipulation patterns, highlighted by the integration of the zerocopy library.

Replacing Fragmented Unsafe Patterns
#

Kernel Rust code frequently interacts with:

Raw pointers
C ABI structures
Byte-level casting operations

These operations traditionally rely on scattered unsafe blocks, increasing audit complexity.

Zerocopy Abstraction Model
#

The zerocopy approach shifts safety guarantees into type-level abstractions:

FromBytes, IntoBytes, Unaligned traits encode layout guarantees
Compile-time validation replaces repeated local unsafe impl usage
Byte reinterpretation becomes declarative rather than procedural

This effectively centralizes memory-safety reasoning at the type definition level instead of dispersing it across call sites.

In practice, this reduces unsafe surface area and improves auditability of Rust subsystems interacting with legacy kernel components.

Performance Side Effects
#

In addition to safety improvements:

Binder Rust paths optimized via AutoFDO
~13% performance improvement observed in targeted workloads

📂 `/proc/filesystems` RCU Rewrite and 444% Read Speedup
#

Another high-impact change comes from restructuring how /proc/filesystems is generated and served.

Legacy Implementation Bottleneck
#

The original implementation relied on:

A linked list traversal for every read
Line-by-line printf formatting
Frequent access from user-space utilities (e.g., SELinux-related tooling chains)

Although the file is small, its access frequency makes it a measurable hotspot.

RCU-Based Static Output Model
#

The redesigned implementation replaces runtime traversal with:

RCU-backed updates on filesystem registration/unregistration
Pre-generated cached output string
Constant-time read path without per-entry formatting

This removes pointer chasing and repeated formatting overhead entirely.

Result
#

Up to 444% read performance improvement
Linear scalability under concurrent reads
Near-zero overhead for repeated system queries

This change demonstrates how even trivial procfs interfaces can become bottlenecks under high-frequency system workloads.

🧩 Conclusion: When Kernel Assumptions Break at Scale
#

Linux 7.2 does not introduce large architectural overhauls. Instead, it systematically removes outdated assumptions embedded deep in core subsystems:

Scheduler locality unaware of modern LLC partitioning
Storage paths wasting bandwidth on redundant operations
Rust abstractions still fragmented around unsafe usage
Procfs implementations relying on legacy traversal models

Each fix is small in isolation, but collectively they reflect a consistent theme: performance regressions in modern systems often stem not from complexity, but from outdated mental models encoded in long-lived abstractions.