Understanding Linux Page Cache and Memory Caching

Table of Contents

Understanding Linux Page Cache

The Page Cache (often called the disk cache) is a transparent memory layer used by the Linux kernel to cache file data read from or written to disk. Any physical memory not actively used by applications is automatically repurposed by the kernel as Page Cache, improving I/O performance without requiring application awareness.

🚀 Why Page Cache Exists
#

The primary goal of the Page Cache is to reduce disk I/O and accelerate file access. Linux exploits two fundamental principles:

Temporal locality: Recently accessed data is likely to be accessed again.
Spatial locality: Nearby data is likely to be accessed together.

Since RAM access is orders of magnitude faster than HDDs or SSDs, caching file data in memory delivers substantial performance gains.

When a file is read for the first time, its contents are loaded from disk and stored in the Page Cache. Subsequent reads are served directly from RAM.

Note: Applications can bypass the Page Cache using O_DIRECT (Direct I/O) or implement their own caching layers, such as the MySQL Buffer Pool.

🧬 Evolution of Page Cache in Linux
#

Linux file caching has matured significantly across kernel versions:

Before Linux 2.4
Page Cache (file pages, typically 4 KB) and Buffer Cache (raw disk blocks, typically 1 KB) were separate, often resulting in duplicated cached data.
Linux 2.4
The two caches were partially merged. File pages lived in the Page Cache, while the Buffer Cache pointed to them.
Linux 2.6 and later
Full integration. A Page Cache entry may reference multiple Buffer Cache blocks. The Page Cache manages file-level semantics, while the Buffer Cache tracks disk block mappings.

📏 Calculating Page Cache Size
#

Kernel memory statistics are available via:

cat /proc/meminfo

A practical approximation of Page Cache is: $$ [ \text{Page Cache} = \text{Buffers} + \text{Cached} + \text{SwapCached} ] $$ Which is also equivalent to: $$ [ \text{Active(file)} + \text{Inactive(file)} + \text{Shmem} + \text{SwapCached} ] $$

Key Fields Explained
#

Buffers Cache for raw disk blocks, mainly metadata and block-device access.
Cached File-backed page cache (excluding swapped pages).
SwapCached Pages that were swapped out and later read back but still exist in swap storage.
Active(file) / Inactive(file) File-backed pages managed by an LRU policy. Inactive pages are reclaimed first under memory pressure.
Shmem Shared anonymous memory and tmpfs-backed files.

🧠 Page Cache vs `free` Command’s `buff/cache`
#

A common source of confusion is the buff/cache column shown by:

free -h

Its calculation is: $$ [ \text{buff/cache} = \text{Buffers} + \text{Cached} + \text{SReclaimable} ] $$

What Is SReclaimable?
#

Linux uses the Slab allocator to cache filesystem metadata:

Inodes: Store file metadata such as size, permissions, and disk location.
Dentries: Represent directory entries and filename-to-inode mappings.

The SReclaimable portion of the Slab contains these structures and can be reclaimed when memory is needed.

📊 Memory Reclamation Summary
#

Memory Type	Description	Reclaimability
File-backed pages	Page Cache, Buffer Cache, reclaimable Slab	Reclaimable (dirty pages must be flushed first)
Anonymous pages	Heap, stack, `malloc`, anonymous `mmap`	Not directly reclaimable (must be swapped)

🧩 Key Takeaway
#

The Page Cache represents Linux’s intelligent use of free memory to accelerate disk access, while buff/cache reflects memory the kernel can reclaim immediately if applications demand it. High Page Cache usage is not a problem—it is a sign of a healthy, efficient Linux system.