Week 7.a CS6640 10/16 2023 https://naizhengtan.github.io/23fall/ 1. CPU privilege levels 2. Virtual machines 3. Trap-and-emulate virtualization --- Admin: - lab3 fix - online office hours - don't be too optimistic about your productivity 1. Privilege levels [draw figure] hw, kernel, apps M/S/U-Mode Q: motivation: why do people want privilege levels? -- resource multiplexing -- security purpose -- isolating faults -- providing abstractions -- what else? CPU privilege levels define what "you" can and cannot do: * controlling over system operations * accessing system resources - recall what a program is. It has: instructions, registers, memory. Q: running the same piece of code in unprivileged and privileged mode, what will be the difference? Three differences: - privileged instructions, - touching privileged registers (CSRs), - accessing privileged memory [1min intro to virtual memory] Protection mechanics: exceptions [what we learned last time] How to switch privilege levels? - interrupt - exception - ecall - mret Q: How could I know what privilege level the current CPU is running in? """ RISC-V deliberately doesn’t make it easy for code to discover what mode it is running it because this is a virtualisation hole. As a general principle, code should be designed for and implicitly know what mode it will run in. Applications code should assume it is in U mode. The operating system should assume it is in S mode (it might in fact be virtualised and running in U mode, with things U mode can’t do trapped and emulated by the hypervisor). """ [from https://forums.sifive.com/t/how-to-determine-the-current-execution-privilege-mode/2823] 2. Virtual machines Q: what's a virtual machine? simulation of a computer, accurate enough to run an O/S [draw figure] h/w, host/VMM, guest linux and apps, guest windows and apps Q: why VMs? * cloud: many small customer guest "instances" on each physical machine each customer can run whatever O/S they want in their VM * isolate customers from each other, even on the same machine * control and adjust resources (memory, CPU, disk, net traffic) * migrate, suspend/resume, back up * s/w developers: virtual "crash" boxes for testing VMs have a long history 1960s: IBM used VMs to share big expensive machines 1980s: (computers got small and cheap) (then machine rooms got full) 1990s: VMWare re-popularized VMs, for x86 hardware 2000s: widely used in cloud, enterprise Q: why look at virtual machines now? * VMMs have much in common with O/S kernels * some of the most interesting action in O/S design has shifted to VMs * VMs have affected both O/S (above) and hardware (below) * more importantly, we can implement VMs/VMMs using privilege levels. Q: how accurate must a VM be? usual goal is 100% accuracy to be able to boot any guest O/S without modification and prevent a malicious guest from breaking out in practice, VMM and O/S often cooperate e.g. VMM offers special disk/net "devices" that guest knows about How to implement a VM/VMM? One solution: [skipped] we could build a VM by writing software to simulate machine instructions VMM interprets each guest instruction maintain virtual machine state for the guest 32 registers, mstatus, mode, RAM, disk pro: this works e.g qemu con: slow 3. trap-and-emulate virtualization * idea: execute guest instructions directly on the CPU -- fast! Q: what if the guest kernel executes a privileged instruction? e.g. guest updates mstatus for shutting down interrupts can't give guest kernel direct access to machine registers! * idea: run the guest kernel in user mode running the "guest kernel" in U-mode of course the guest kernel assumes it is in supervisior mode * ordinary instructions work fine: adding two registers, function call, ... * privileged RISC-V instructions are illegal in user mode will cause a trap, to the VMM * VMM trap handler emulates privileged instruction maybe apply the privileged operation to the virtual state e.g. read/write mepc maybe transform and apply to real hardware e.g. assignment to satp * the solution: "trap-and-emulate" virtualization This is nice because you can build such a virtual machine entirely in software! Perhaps one could turn xv6 into a trap-and-emulate VMM for RISC-V [more on this about my trial] * the RISC-V is very nice w.r.t. trap-and-emulate virtualization all privileged instructions trap if you try to execute them in user mode however, not all CPUS are as nice -- 32-bit x86, for example some privileged instructions don't trap; x86 ignores if run in user mode [if you're interested, read: Keith Adams and Ole Agesen, "A Comparison of Software and Hardware Techniques for x86 Virtualization", ASPLOS'06 It says, """ Lack of traps when privileged instructions run at user-level. For example, in privileged code popf ("pop flags") may change both ALU flags (e.g., ZF) and system flags (e.g., IF, which controls interrupt delivery). For a deprivileged guest, we need kernel mode "popf" to trap so that the VMM can emulate it against the virtual "IF". Unfortunately, a deprivileged "popf", like any user-mode "popf", simply suppresses attempts to modify "IF"; no trap happens. """ ] for RISC-V trap-and-emulate, what has to happen when: ... guest user code executes ecall to make a system call? [diagram: guest user, guest kernel, VMM, virtual state, real mepc] CPU traps into the VMM (ecall always generates a trap) VMM trap handler: examine the guest instruction virtual mepc <- real mepc virtual mode <- supervisor virtual mcause <- "system call" real mepc <- virtual mtvec // modify (real) page table -- set PTE_V for non-PTE_U entries return from trap ... the guest kernel reads mcause, e.g. csrr a0, mcause trap into VMM (since csrr is a privileged instruction) examine the guest instruction on stack: a0 <- virtual mcause real mepc += 4 return from trap ... the guest kernel executes mret (return to user)? CPU traps into the VMM it's really a trap from user mode to supervisor mode h/w saves guest's PC in (real) mepc VMM trap handler: virtual mode <- user real mepc <- virtual mepc // modify (real) page table -- clear PTE_V for non-PTE_U entries return from trap Q: what RISC-V state must a trap-and-emulate VMM "virtualize"? [skipped] * all "privileged CPU state" CPU state that the guest kernel assumes it can read/write but is forbidden by user mode (plus VMM needs to protect for security) * all CSR registers (mepc, mtvec, mcause) * page table (satp) * PLIC/CLINT (32 registers and memory are virtualized by processes already) Q: how to simulate devices? [skipped] e.g. disk, NIC, display a big challenge! * strategy #1: emulate a common existing real device needed in order to run oblivious guest O/S intercept memory-mapped control register read/write by marking those pages invalid, so VMM gets page faults VMM turns page faults into operations on simulated device state e.g. qemu simulates uart/console for xv6 qemu turns uart r/w into characters to your display or ssh * strategy #2: special virtual device tailored for efficiency requires guest O/S driver -- i.e. guest knows it's in a VM can be more streamlined than trapping on control register r/w e.g. xv6's virtio_disk.c; qemu turns into r/w on file fs.img * strategy #3: pass-through access to a real hardware device guest O/S gets direct access to device h/w, no traps often requires specific support in device modern NICs have separate DMA ring per VM can be very efficient trap-and-emulate works well -- but it can be slow! lots of traps into the VMM better solution? hardware virtualization [Acknowledgement: Robert Morris (the virtual machine notes)]