Week 9.b
CS6640
11/02 2023
https://naizhengtan.github.io/23fall/

1. Virtual memory intro
2. Paging?
3. Page table?
4. Today's virtual memory
---

Admin:
 - midterm
 - lab4: first three exercises
   (ecall syscall, U-mode)
 - lab5: release on Monday

1. Virtual memory intro

  - very important idea in computer systems

  - "to virtualize" means "to lie" or "to fool". we'll say how this is
    implemented in a few moments. for now, let's look at the benefits of
    being interposed on memory accesses.

  - Q: Why virtual memory? What's wrong with using physical addresses?

  - benefits:

    (a) programmability

      (i) huge contiguous memory:
        program *thinks* it has lots of memory, organized in a
        contiguous space

      (ii) multiplexing addresses:
        programs can use "easy-to-use" addresses like 0,
        0x20000, whatever. compiler and linker don't have to worry
        about where the program actually lives in physical memory
        when it executes.
         -- good for modularization

    (b) protection:

      (i) separate address space:
      processes cannot read or write each other's memory

      --this protection is at the heart of the isolation among
      processes that is provided by the OS

          --prevents bug in one process from corrupting another
          process. (non-adversarial scenarios)

          --don't even want a process to observe another
          process's memory (like if that process has secret or
          sensitive data). (adversarial scenarios)

      --the idea is that:
         if you cannot name something, you cannot use it.
         this is a deep idea.

      --Question: can you think of another example of this naming idea?
        (answer: file descriptor)

      (ii) access control

      prevent incorrect modifications to data (e.g., writing to RO memory regions),
      adversarial execution on stack/heap (e.g., bufferoverflow)

    (c) effective use of resources:

      (i) overcommitting memory:
      programmers don't have to worry that the sum of the memory
      consumed by all active processes is larger than physical
      memory.

      (ii) secure sharing:
      processes share memory under controlled circumstances,
      but that physical memory may show up at very different
      virtual addresses
      --that is, two processes have a different way to refer
      to the same physical memory cells


  Q: can physical addresses achieve the above benefits?
     Why and why not?

  - today's virtual memory, two main duties:

   (a) translation:
       VA (virtual address)
         =>
       PA (physical address)

   (b) protection:
       access control
       [we talked about this Week7]

  - for now, we mainly focus on the translation job

2. Paging?

  - the translation problem:
      VA => PA
    and hope this translation
      (i) runs fast,
      (ii) has small memory overhead,
      (iii) can be updated quickly.

  - For example:
     * Segmentation: base + VA => PA

  - Q: what's wrong with the segmentation?
       how will you implement this translation?

  A. pages

    -Basic concept: divide all of memory (physical and virtual)
    into *fixed-size* chunks.

      --these chunks are called *PAGES*.

      --they have a size called the PAGE SIZE.
      (different hardware architectures specify different sizes)

      --in the traditional x86, the PAGE SIZE will be 
         4096B = 4KB = 2^{12}
        It is the same for us, RV32.

    Q: does "paging" fundamentally change the translation problem?
    A: not really; now mapping page numbers, instead of memory addresses

    --it is proper and fitting to talk about pages having **NUMBERS**.

      --page 0:   [0,4095]
      --page 1:   [4096, 8191]
      --page 2:   [8192, 12277]
      --page 3:   [12777, 16384]
      ...
      --page 2^{20}-1 [ ..., 2^{32} - 1]

    --unfortunately, it is also proper and fitting to talk about _both_
    virtual and physical pages having numbers.

    --sometimes we will try to be clear with terms like:
        VPN: virtual page number
        PPN: physical page number

  B. per-process translation

    Q: who owns the translation?
      - If a CPU can only run one translation, is it useful?
      - For a multi-core CPU, if a core can only run one translation,
        is it useful?

    - Today, each process (or a program) has a separate mapping

      --And each page separately mapped

      --we will allow the OS to gain control on certain operations
        --Read-only pages trap to OS on write (store)
        --Invalid pages trap to OS on read (load) or write (store)
        --OS can change mapping and resume application


3. Page table?

  Q: if you were re-inventing virtual memory,
     what data structure would you use for the translation?

    Recall our requirements about the translation:
      (i) runs fast,
      (ii) has small memory overhead,
      (iii) can be updated quickly.

    A. Problem statement

        page table conceptually implements a map from 
            VPN --> PPN

        page table is conceptually an index. 

         the address is broken up into bits:

                 [.............|........]
                 [ VPN         | offset ]
                    |             |
                    |             +
                    |             |
                    --> TABLE --> PPN
                                   =
                                 address

         top bits index into page table. contents at that index are the PPN.

         bottom bits are the offset. not changed by the mapping

         physical address = PPN + offset

           (note: "+" here means "concatenate":
              for example, 123 "+" 456 => 123456)

         result is that each page table entry expresses a mapping about a
         contiguous group of addresses.

    B. a naive proposal:

      (assume 32-bit addresses and 4KB pages)

    there is in the sky a 2^{20} sized array that maps the
    virtual address to a *physical* page

      table[VPN] => PPN

    EXAMPLE: 

       if OS wants a program to be able to use address 0x402000
       to refer to physical address 0x3000, then the OS
       conceptually adds an entry:

           table[0x402] = 0x3

       (this is the 1026th virtual page being mapped to the 3rd
       physical page.). in decimal: table[1026] = 3

      NOTE: top 28 bits are doing the indirection. bottom 12 bits just
      figure out where on the page the access should take place.

          --bottom bits are called _offset_.

    --so now all we have to do is create this mapping

    --why is this hard? why not just create the mapping?

    --Question: how large is this table?
    --answer: then you need, per process, roughly 4MB
            (2^{20} entries * 4 bytes per entry).

        [why 4 bytes per entry? in practice, it's convenient to have
        the entry size be the same as a data type on the machine]

    --too much! let's deal with this...

    [draw on bard a black-box MMU: inputs are VAs; outputs are PAs;]

    --Question: if you were MMU designer, how would you design the table?

    C. what we use today: page table

      PT: a radix tree
      [see a radix tree figure]

      Q: Why radix trees? Can you use other trees like binary tree?

     * PT design space:
       -- offset
       -- page size
       -- address length
       -- addressable memory unit
       -- depth of the PT
       -- PTE size

4. Today's virtual memory

  - What we have today:
   [virtual memory -> paging -> page table -> RV32]

   introducing the alternatives:
     * VA but not paging: segmentation
     * paging but not page table: hashing-based translation
     * page table but not RISC-V: x86 and ARM

  - how is this translation implemented?

    --software(OS)-hardware(MMU) co-design

    --in modern systems, hardware does it. this hardware is
    configured by the OS.

    --this hardware is called the MMU, for memory management unit,
    and is part of the CPU

    --why doesn't OS just translate itself? similar to asking why we
    don't execute programs by running them on an emulation of a
    processor (too slow)

  - things to remember in what follows:
    --OS is going to be setting up data structures that
    the hardware sees
    --these data structures are *per-process* PTs