Skip to main content

Linux mmap Explained: How Memory Mapping Really Works

·626 words·3 mins
Linux Kernel Memory Management Systems Programming
Table of Contents

🧠 What Is mmap?
#

mmap (memory map) is a Linux system call that maps files or devices directly into a process’s virtual address space.
Once mapped, a file can be accessed as if it were normal memory, using simple pointer dereferences.

Instead of explicitly calling read() or write(), the kernel transparently handles data movement between disk and RAM on demand.


📦 Core Concepts
#

At a high level, mmap establishes a relationship between:

  • A file offset on disk
  • A range of virtual addresses in a process

This enables several powerful behaviors:

  • Pointer-based access: Read and write file data using normal memory operations
  • Lazy loading: No data is copied into RAM until it is actually accessed
  • Automatic syncing: The kernel writes modified pages back to disk when needed
  • Shared visibility: Multiple processes can see changes instantly (with MAP_SHARED)

🧬 Kernel Data Structures
#

Linux tracks memory mappings using well-defined kernel structures:

  • task_struct
    Represents a process and points to its memory descriptor

  • mm_struct
    Describes the entire virtual address space of the process

  • vm_area_struct (VMA)
    Represents a single contiguous virtual memory region with:

    • Start and end addresses
    • Access permissions (vm_prot)
    • Behavior flags (vm_flags)
    • Optional file backing information

Each mmap() call typically creates one new VMA.


🔄 How mmap Works Internally
#

Phase 1: Virtual Area Creation
#

When a process calls mmap():

  1. The kernel finds a free virtual address range
  2. A new vm_area_struct is allocated
  3. The VMA is inserted into the process’s VMA tree or list

At this point, no physical memory is allocated.


Phase 2: File Association
#

The kernel links the VMA to:

  • The file referenced by the file descriptor
  • The specified file offset

Still, no data is loaded into RAM. The mapping is purely virtual.


Phase 3: Page Fault Handling
#

When the process accesses mapped memory:

  1. A page fault occurs
  2. The kernel determines the corresponding file page
  3. The page is read from disk into physical memory
  4. Page tables are updated
  5. Execution resumes transparently

From the process’s perspective, it feels like normal memory access.


🧪 mmap API Overview
#

Function Prototype
#

#include <sys/mman.h>

void *mmap(
    void *addr,
    size_t length,
    int prot,
    int flags,
    int fd,
    off_t offset
);

Key Parameters
#

  • prot

    • PROT_READ, PROT_WRITE, PROT_EXEC
  • flags

    • MAP_SHARED: changes visible to other processes and written back to disk
    • MAP_PRIVATE: copy-on-write behavior

🛠️ Supporting System Calls
#

  • msync() Forces modified pages to be written back to disk (sync or async)

  • munmap() Removes the mapping and releases the virtual address range


⚖️ mmap vs read/write I/O
#

Traditional File I/O
#

Standard read() / write() typically involves two copies:

  1. Disk → Page Cache (kernel space)
  2. Page Cache → User Buffer (user space)

mmap-Based I/O
#

With mmap, only one copy is required:

  1. Disk → Page Cache → Mapped User Space

Pages are mapped directly into the process address space.


🚀 Why mmap Is Faster
#

  • Fewer system calls
  • Fewer data copies
  • Zero-copy access patterns
  • Excellent scalability for large files
  • Natural fit for IPC and shared memory

This makes mmap ideal for databases, multimedia processing, and large-scale data analysis.


🔍 Important Technical Details
#

  • Page alignment Mapping sizes and offsets must align to the system page size (commonly 4 KB)

  • File descriptor lifetime The mapping remains valid even after the file descriptor is closed

  • File growth Access is valid only within the mapped VMA range, regardless of later file expansion

  • Error handling Invalid access results in SIGSEGV or SIGBUS


🧩 When to Use mmap
#

Use mmap when you need:

  • High-performance file access
  • Shared memory between processes
  • Efficient handling of very large files
  • Fine-grained control over memory behavior

Avoid it for small, short-lived I/O where setup cost outweighs benefits.


In short: mmap turns files into memory, page faults into I/O, and the kernel into your invisible data mover.

Related

Linux Kernel Memory Management Architecture
·839 words·4 mins
Linux Kernel Memory Management
Understanding Linux Page Cache and Memory Caching
·552 words·3 mins
Linux Kernel Memory Management Operating-Systems
Linux Boot Process Explained: From Power-On to Kernel
·657 words·4 mins
Linux Boot Kernel