Week 8.a
CS6640
02/24 2026
https://naizhengtan.github.io/26spring/

□ 0. Recap: four ways of communication
□ 1. An example: a tty dev
□ 2. Communication configurations
□ 3. Hints about Lab6 (SD card driver)
----

Schedule for the following:
- Lab6, 02/25, due beginning of week10
- Lab7, fs, 03/09-03/16
- final proj: 03/17 & exam sample

0. Recap: Mechanics of communication between CPU and I/O devices
   (from last time)

    --lots of details.
    --fun to play with.
    --registers that do different things when read vs. write

      [draw some registers in devices, with status and data]

    CPU/device interaction (can think of this as kernel/device
    interaction, since user-level processes classically do not
    interact with devices directly.)

    Q: if you were the IO designer, how would you design
    the communication between CPU and the device?

    (a) explicit I/O instructions
        [sometimes called port I/O, port-mapped IO (PMIO)]

        x86 instructions:
            outb, inb, outw, inw

        operands: IO address space
             (separate from memory address space)
             [show slides]

   (b) memory-mapped I/O

     physical address space is mostly ordinary RAM

     low-memory addresses (<1MB), sometimes called "DOS compatibility
     memory", actually refer to other things. 

     You as a programmer read/write from these addresses using
     loads and stores. But they aren't "real" loads and stores to
     memory. They turn into other things: read device registers,
     send instructions, read/write device memory, etc.

         --interface is the same as interface to memory
         (load/store)

         --but does not behave like memory

         + Reads and writes can have "side effects"

         + Read results can change due to external events 

     Example: writing to VGA or CGA memory makes things appear on
     the screen.

         To avoid confusion: this is not the same thing as
         virtual memory. this is talking about the *physical*
         address.

             --> is this an abstraction that the OS provides to
             others or an abstraction that the hardware is
             providing to the OS?  [the latter]

       [if you're interested, check out these slides:
          https://opensecuritytraining.info/IntroBIOS_files/Day1_00_Advanced%20x86%20-%20BIOS%20and%20SMM%20Internals%20-%20Motivation.pdf]

       * Here is a 32−bit PC’s physical memory map:

           +−−−−−−−−−−−−−−−−−−+ <− 0xFFFFFFFF (4GB)
           |       32−bit     |
           |   memory mapped  |
           |      devices     |
           |                  |
           /\/\/\/\/\/\/\/\/\/\

           /\/\/\/\/\/\/\/\/\/\
           |                  |
           |      Unused      |
           |                  |
           +−−−−−−−−−−−−−−−−−−+ <− depends on amount of RAM
           |                  |
           |                  |
           |  Extended Memory |
           |                  |
           |                  |
           +−−−−−−−−−−−−−−−−−−+ <− 0x00100000 (1MB)
           |     BIOS ROM     |
           +−−−−−−−−−−−−−−−−−−+ <− 0x000F0000 (960KB)
           |  16−bit devices, |
           |  expansion ROMs  |
           +−−−−−−−−−−−−−−−−−−+ <− 0x000C0000 (768KB)
           |   VGA Display    |
           +−−−−−−−−−−−−−−−−−−+ <− 0x000A0000 (640KB)
           |                  |
           |   Low Memory     |
           |                  |
           +−−−−−−−−−−−−−−−−−−+ <− 0x00000000

       [Credit to Frans Kaashoek, Robert Morris, and
       Nickolai Zeldovich for this picture]

   (c) interrupts

       Hardware can send "signals" to CPUs. These are interrupts.

       One example: timer interrupt (for scheduling)

   (d) through memory: both CPU and the device see the same memory,
     so they can use shared memory to communicate.

         --> usually, synchronization between CPU and device requires
         lock-free techniques, plus device-specific contracts ("I
         will not overwrite memory until you set a bit in one of my
         registers telling me to do so.")

         --> as usual, need to read the manual


1. An example: UART and a tty device

   * egos uses UART (Universal Asynchronous Receiver/Transmitter)

   * You will first need to understand the hardware manual.
     For us, show UART manual.
       [Ch18 of https://naizhengtan.github.io/26spring/docs/sifive-fe310-v19p04.pdf]

     Q: which communication method is this UART on our CPU?
     [Answer: memory-mapped IO]

   * implementation in egos
     [show earth/dev_tty.c]

   * Demo: changing all lowercase letters to uppercase


2. Communication configurations

    A. Polling vs. interrupts

      * Polling: check back periodically

          kernel...
         - ... sent a packet? Periodically ask the card when the buffer is free.
         - ... waiting for a packet? Periodically ask whether there is data
         - ... did Disk I/O? Periodically ask whether the disk is done.

          Disadvantages: wasted CPU cycles

      * Interrupts:

        The device interrupts the CPU when its status
        changes (for example, data is ready, or data is fully written).

        This is what most general-purpose OSes do. There is a
        disadvantage, however. This could come up if you need to
        build a high-performance system.

        Namely: If interrupt rate is high, then the computer can
        spend a lot of time handling interrupts (interrupts are
        expensive because they generate a context switch, and the
        interrupt handler runs at high priority).

            --> in the worst case, you can get *receive livelock*
            where you spend 100% of time in an interrupt handler but no
            work gets done.

        How to design systems given these trade-offs?
        Start with interrupts. If you notice that your system is slowing down
        because of livelock, then switch to polling. If polling is chewing up
        too many cycles, then move towards an adaptive switching between
        interrupts and polling. (But of course, never optimize until you
        actually know what the problem.)

        A classic reference on this subject is the paper
            "Eliminating Receive Livelock in an Interrupt-driven Kernel",
            by Mogul and Ramakrishnan, 1996.

     * Q: What do we use for egos tty? Polling or interrupt?
       Which one do you think will work better for egos?

     * Final project idea:
      implementing an interrupt-based tty.

   B. DMA vs. programmed I/O

     * Programmed I/O: what we have been seeing in the example
        so far: CPU writes data directly to device, and reads data
        directly from device.

     * DMA: better way for large and frequent transfers

        CPU (really, device driver programmer) places some buffers
        in main memory.

        Tells device where the buffers are

        Then "pokes" the device by writing to register

            Then device uses *DMA* (direct memory access) to read or
            write the buffers,

            The CPU can poll to see if the DMA completed (or the device
            can interrupt the CPU when done).

            [rough picture:
           buffer descriptor list
           <metadata> --> [  buf ]
           <metadata> --> [  buf ]
           ....
            ]

        DMA process is managed by a hardware known as a DMA controller (DMAC).

        This makes a lot of sense. Instead of having the CPU
        constantly dealing with a small amount of data at a time, the
        device can simply write the contents of its operation straight
        into memory.

        NOTE: textbooks like OSTEP often couple DMA to interrupts,
        but things don't have to work like that.
        You could have all four possibilities in
        {DMA, programmed I/O} x {polling, interrupts}.

          For example, (DMA, polling) would mean requesting a DMA
          and then later polling to see if the DMA is complete.

3. Hints about Lab6 (SD card driver)

   [briefly go through the handout]


[Acknowledgments: Mike Walfish, David Mazieres, Mike Dahlin, Brad Karp]