Week 7.b
CS6640
02/20 2026
https://naizhengtan.github.io/26spring/

□ 1. Page fault intro
□ 2. Classic use cases for page fault
□ 3. Device drivers
□ 4. Mechanics of communication
------

Admin:
- Lab5?

1. page fault intro

  * functionalities of virtual memory

    (1) translation
       VA -> PA
       for better programability

    (2) protection: isolation + access control
      each process has its own address space

    (3) Virtual memory provides a level-of-indirection
      provides kernel with opportunity to do cool stuff
       - shared a page
       - guard page
       - and more
      that is transparent to user applications.

   "Any problem in computer science can be solved with another level of indirection."
     -- Butler Lampson and David J. Wheeler

    Q: how this "level-of-indirection" is done?
    A: by page faults

  * cool things you can do with vm
    - Better performance/efficiency
      e.g., on-demand allocation
      e.g., copy-on-write fork
    - New features
      e.g., memory-mapped files

  * Q: what is page fault?
    a type of exception
    a process accesses (R/W/X) a virtual address and fails
      -- the VA is not mapped to PA
      -- permission violations

  * Page fault is a form of a trap (like a system call)
    egos-2k+ fatal error on page fault
      But you don't have to panic!
    Instead:
      update page table instead of fatal error
      restart instruction
    Combination of page faults and updating page table is powerful!

  * RISC-V page faults mechanics

    -- three exceptions are related to page fault
       Exceptions cause controlled transfers to kernel

    -- Information we might need at page fault to do something interesting:
      1) The type of violation that caused the fault
        See mcause register value (instruction, load, and store page fault)
      2) The virtual address that caused the fault
        See mtval register; CPU sets it to the fault address
      3) The instruction and mode where the fault occurred
        pc: mepc
        User vs. kernel: see below

    Q: how to tell if a page fault is within kernel?
       tell U/K mode: by stack pointer

    Q: how to tell what is the problem of page faults?
    A: walking the page table

  * Example: on-demand page allocation

    * Motivation

      [a demo of malloc]

      Q: What do you expect to see?
      [page fault!]

    * lazy/on-demand page allocation
      * sbrk() is old fashioned;
        applications often ask for memory they need
        - for example, the allocate for the largest possible input but
          an application will typically use less
        if they ask for much, sbrk() could be expensive
        - for example, if all memory is in use, have to wait until
          kernel has evicted some pages to free up memory
        sbrk allocates memory that may never be used.

      * moderns OSes allocate memory lazily
          allocate physical memory when application needs it
          adjust brk (in library/libc/malloc.c), but don't allocate
          when application uses that memory, it will result in page fault
          on pagefault allocate memory
          resume at the fault instruction
        may use less memory
          if not used, no fault, no allocation
        spreads the cost of allocation over the page faults instead
        of upfront in sbrk()

    * [a demo of crash3]

      Q: What do you expect to see?

    * [demo: implementing on-demand page allocation]

      Q: How to fix this problem? (mimicing normal OS)

      [mapping VA->PA]

    * a note on lab5 debugging
      * mepc
      * mtval

2. classic use cases

  * copy-on-write fork
    [ask students what is fork?]
    * observation:
      fork (should potentially) copies all pages from parent
      but fork is often immediately followed by exec
    * idea: share address space between parent and child
      on page fault, make copy of page and map it read/write
      need to refcount physical pages

  Q: how to decide if a page is COW or just read-only?
  A: many possible solutions;
     an elegant one:
       use extra available system bits (RSW) in PTEs

    In practice, implementing COW is non-trivial:
    see https://lwn.net/Articles/849638/

  * Overcommitting memory: use virtual memory larger than physical memory
    * observation: application may need more memory than there is physical memory
    * idea: think of memory as a cache to disk
      page-in and page-out pages of the address address space transparently
    * works when working sets fits in physical memory
      most popular replacement strategy: least-recently used (LRU)
      the A(cess) bit in the PTE helps the kernel implementing LRU
    * replacement policies huge topic in OS literature

  * memory-mapped files
    * idea: allow access to files using load and store
      can easily read and writes part of a file
      e.g., don't have to change offset using lseek system call
    * Unix systems a new system call for m-mapped files:
      void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
    * kernel page-in pages of a file on demand
      when memory is full, page-out pages of a file that are not frequently used

  * pgfaults for user applications
    * many useful kernel tricks using pgfaults
      allow user apps also to do such tricks
    * linux: read "mmap/munmap" and "sigaction"
    * a JVM example

  * DSM: distributed shared memory
    * idea: allow processes on different machines to share virtual memory
      gives the illusion of physical shared memory, across a network
    * replicate pages that are only read
    * invalidate copies on write

  * Virtual memory is still evolving
    Recent changes in Linux
      PKTI to handle meltdown side-channel
        (https://en.wikipedia.org/wiki/Kernel_page-table_isolation)
    Somewhat recent changes
      Support for 5-level page tables (57 address bits!)
      Support for ASIDs
    Less recent changes
      Support for large pages
      NX (No eXecute) PTE_X flag

[acknowledgement: Frans Kaashoek]


3. Device drivers

   General:
   [show picture: CPU, Mem, I/O, connected to BUS]

   Device drivers in general solve a software engineering problem ...

    [draw a picture of different devices have different shapes
    and drivers fit them into kernel]

    expose a well-defined interface to the kernel, so that the
    kernel can call comparatively simple read/write calls or whatever.

    For example, read, write, open, close, ...

    this abstracts away nasty hardware details so that the kernel
    doesn't have to understand them.

    When you write a driver, you are implementing this interface,
    and also calling functions that the kernel itself exposes

   - Drivers: a piece of code talking to devices.
     Q: can I use a GPU driver from NVIDIA for AMD's GPU?
     Q: can I use NVIDIA's Windows driver for Linux?

4.  Mechanics of communication between CPU and I/O devices

    --lots of details.
    --fun to play with.
    --registers that do different things when read vs. write

      [draw some registers in devices, with status and data]

    CPU/device interaction (can think of this as kernel/device
    interaction, since user-level processes classically do not
    interact with devices directly.)

    Q: if you were the IO designer, how would you design
    the communication between CPU and the device?

    (a) explicit I/O instructions
        [sometimes called port I/O, port-mapped IO (PMIO)]

        x86 instructions:
            outb, inb, outw, inw

        operands: IO address space
             (separate from memory address space)
             [show slides]

   (b) memory-mapped I/O

     physical address space is mostly ordinary RAM

     low-memory addresses (<1MB), sometimes called "DOS compatibility
     memory", actually refer to other things. 

     You as a programmer read/write from these addresses using
     loads and stores. But they aren't "real" loads and stores to
     memory. They turn into other things: read device registers,
     send instructions, read/write device memory, etc.

         --interface is the same as interface to memory
         (load/store)

         --but does not behave like memory

         + Reads and writes can have "side effects"

         + Read results can change due to external events 

     Example: writing to VGA or CGA memory makes things appear on
     the screen.

         To avoid confusion: this is not the same thing as
         virtual memory. this is talking about the *physical*
         address.

             --> is this an abstraction that the OS provides to
             others or an abstraction that the hardware is
             providing to the OS?  [the latter]

       [if you're interested, check out these slides:
          https://opensecuritytraining.info/IntroBIOS_files/Day1_00_Advanced%20x86%20-%20BIOS%20and%20SMM%20Internals%20-%20Motivation.pdf]

       * Here is a 32−bit PC’s physical memory map:

           +−−−−−−−−−−−−−−−−−−+ <− 0xFFFFFFFF (4GB)
           |       32−bit     |
           |   memory mapped  |
           |      devices     |
           |                  |
           /\/\/\/\/\/\/\/\/\/\

           /\/\/\/\/\/\/\/\/\/\
           |                  |
           |      Unused      |
           |                  |
           +−−−−−−−−−−−−−−−−−−+ <− depends on amount of RAM
           |                  |
           |                  |
           |  Extended Memory |
           |                  |
           |                  |
           +−−−−−−−−−−−−−−−−−−+ <− 0x00100000 (1MB)
           |     BIOS ROM     |
           +−−−−−−−−−−−−−−−−−−+ <− 0x000F0000 (960KB)
           |  16−bit devices, |
           |  expansion ROMs  |
           +−−−−−−−−−−−−−−−−−−+ <− 0x000C0000 (768KB)
           |   VGA Display    |
           +−−−−−−−−−−−−−−−−−−+ <− 0x000A0000 (640KB)
           |                  |
           |   Low Memory     |
           |                  |
           +−−−−−−−−−−−−−−−−−−+ <− 0x00000000

       [Credit to Frans Kaashoek, Robert Morris, and
       Nickolai Zeldovich for this picture]

   (c) interrupts

       Hardware can send "signals" to CPUs. These are interrupts.

       One example: timer interrupt (for scheduling)

   (d) through memory: both CPU and the device see the same memory,
     so they can use shared memory to communicate.

         --> usually, synchronization between CPU and device requires
         lock-free techniques, plus device-specific contracts ("I
         will not overwrite memory until you set a bit in one of my
         registers telling me to do so.")

         --> as usual, need to read the manual