Week 13.b CS3650 04/03 2024 https://naizhengtan.github.io/24spring/ 1. Device drivers 2. Mechanics of communication 3. Demo: implementing a tty dev (egos-NU) 4. Communication configurations ------ 1. Device drivers Device drivers in general solve a software engineering problem ... [draw a picture of different devices have different shapes and drivers fit them into kernel] expose a well-defined interface to the kernel, so that the kernel can call comparatively simple read/write calls or whatever. For example, read, write, open, close, ... this abstracts away nasty hardware details so that the kernel doesn't have to understand them. When you write a driver, you are implementing this interface, and also calling functions that the kernel itself exposes - Drivers: a piece of code talking to devices. Q: can I use a GPU driver from NVIDIA for AMD's GPU? Q: can I use NVIDIA's Windows driver for Linux? 2. Mechanics of communication between CPU and I/O devices --lots of details. --fun to play with. --registers that do different things when read vs. write [draw some registers in devices, with status and data] CPU/device interaction (can think of this as kernel/device interaction, since user-level processes classically do not interact with devices directly.) (a) explicit I/O instructions [sometimes called port I/O, port-mapped IO (PMIO)] x86 instructions: outb, inb, outw, inw operands: IO address space (separate from memory address space) [show slides] an example: setting blinking cursor [see handout] (b) memory-mapped I/O physical address space is mostly ordinary RAM low-memory addresses (<1MB), sometimes called "DOS compatibility memory", actually refer to other things. You as a programmer read/write from these addresses using loads and stores. But they aren't "real" loads and stores to memory. They turn into other things: read device registers, send instructions, read/write device memory, etc. --interface is the same as interface to memory (load/store) --but does not behave like memory + Reads and writes can have "side effects" + Read results can change due to external events Example: writing to VGA or CGA memory makes things appear on the screen. See handout: console_putc() To avoid confusion: this is not the same thing as virtual memory. this is talking about the *physical* address. --> is this an abstraction that the OS provides to others or an abstraction that the hardware is providing to the OS? [the latter] [if you're interested, check out these slides: https://opensecuritytraining.info/IntroBIOS_files/Day1_00_Advanced%20x86%20-%20BIOS%20and%20SMM%20Internals%20-%20Motivation.pdf] (c) interrupts Hardware can send "signals" to CPUs. These are interrupts. One example: timer interrupt (for scheduling) (d) through memory: both CPU and the device see the same memory, so they can use shared memory to communicate. --> usually, synchronization between CPU and device requires lock-free techniques, plus device-specific contracts ("I will not overwrite memory until you set a bit in one of my registers telling me to do so.") --> as usual, need to read the manual 3. Demo on egos-NU Motivating example: implementing a simple tty (a terminal device) * You will first need to understand the hardware manual. For us, show UART manual. [Ch18 of https://naizhengtan.github.io/23fall/docs/sifive-fe310-v19p04.pdf] * implement read and write a character at a time [show earth/bus_uart.c] * implement higher level functions [show earth/dev_tty.c] * Demo: show a normal "helloworld" program printing out "hello world!" * Demo: changing all lowercase letters to uppercase 4. Communication configurations A. Polling vs. interrupts * Polling: check back periodically kernel... - ... sent a packet? Periodically ask the card when the buffer is free. - ... waiting for a packet? Periodically ask whether there is data - ... did Disk I/O? Periodically ask whether the disk is done. Disadvantages: wasted CPU cycles * Interrupts: The device interrupts the CPU when its status changes (for example, data is ready, or data is fully written). This is what most general-purpose OSes do. There is a disadvantage, however. This could come up if you need to build a high-performance system. Namely: If interrupt rate is high, then the computer can spend a lot of time handling interrupts (interrupts are expensive because they generate a context switch, and the interrupt handler runs at high priority). --> in the worst case, you can get *receive livelock* where you spend 100% of time in an interrupt handler but no work gets done. How to design systems given these tradeoffs? Start with interrupts. If you notice that your system is slowing down because of livelock, then switch to polling. If polling is chewing up too many cycles, then move towards an adaptive switching between interrupts and polling. (But of course, never optimize until you actually know what the problem.) A classic reference on this subject is the paper "Eliminating Receive Livelock in an Interrupt-driven Kernel", by Mogul and Ramakrishnan, 1996. B. DMA vs. programmed I/O * Programmed I/O: what we have been seeing in the handout so far: CPU writes data directly to device, and reads data directly from device. * DMA: better way for large and frequent transfers CPU (really, device driver programmer) places some buffers in main memory. Tells device where the buffers are Then "pokes" the device by writing to register Then device uses *DMA* (direct memory access) to read or write the buffers, The CPU can poll to see if the DMA completed (or the device can interrupt the CPU when done). [rough picture: buffer descriptor list --> [ buf ] --> [ buf ] .... ] DMA process is managed by a hardware known as a DMA controller (DMAC). This makes a lot of sense. Instead of having the CPU constantly dealing with a small amount of data at a time, the device can simply write the contents of its operation straight into memory. NOTE: OSTEP couples DMA to interrupts, but things don't have to work like that. You could have all four possibilities in {DMA, programmed I/O} x {polling, interrupts}. For example, (DMA, polling) would mean requesting a DMA and then later polling to see if the DMA is complete. [Acknowledgments: Mike Walfish, David Mazieres, Mike Dahlin, Brad Karp]