Week 2.b CS5600 1/26 2022 https://naizhengtan.github.io/22spring/ 1. Last time 2. Stack frames, continued 3. Syscall intro 4. Process/OS control transfers 5. Git and Lab grading --------------------------------------------------------------------------- Admin: - remote access -- ask for the zoom link before each lecture (zoom links will change) -- we don't have enough resources to track your status - communication channels -- you-to-us: piazza and staff mailing list -- I will not monitor Microsoft Teams chat, personal email, or other channels - Lab0 => Monday, midnight; Lab1 after that - Apple M1 -- you should have received my email (if not ping us again) -- QEMU vs. VirtualBox -- emulation vs. virtualization [draw: VM (x86) ----> CPU (x86) VM (x86) -??-> CPU (Arm) ] [show slide] --------------------------------------------------------------------------- 1. last time: - processes - process's view of memory and registers [draw one; with %rip, %rbp, %rsp] - crash course of x86-64 assembly - stack frames -- took a look at how a real program works 2. Stack frames, continued revisit & review: caller func (f) callee func (g) saving registers call g -----------+ +----> # prologue do things # epilogue +----- ret restore registers <-+ abstract picture: [draw on board] | | | +------------+ | | ret %rip | / +============+ %rbp-> | saved %rbp | \ +------------+ | | | | | local | \ | variables, | >- current function's stack frame | call- | / | preserved | / | regs, | | | etc. |/ %rsp-> +------------+ - Unix calling conventions: --how could function main() know what happened in f()? --for example, has %rdx changed? --specifically, what happens to a function's state, that is, the registers, when a function is called? they might need to be saved, or not. --purely a matter of convention in the compiler, **not** hardware architecture --on x86-64, *arguments* are passed through registers: %rdi, %rsi,... (more than six? then spill to stack). And the *return value* is passed from callee to caller in %rax. --call-preserved and call-clobbered registers -- one calling convention: call-preserved registers are RBP, RBX, RSP, R12, R13, R14, and R15 --Question: is %rax call-clobbered or call-preserved? [answer: call-clobbered because return value is stored in %rax so the callee needs to modify it.] [show slide] **a real example: see handout from last time again** // the points here are: - prologue & epilogue: note that the epilogue for f (starting on line 49) does the reverse of the prologue, thus restoring the stack to how it was before. - Calling a function requires agreement between caller and callee about how arguments are passed, and which of them is responsible for saving and restoring registers. - In an executing program, the stack is partitioned into a set of stack frames, one for each function. The stack frame for the current function starts at the base pointer and extends down to the stack pointer. ** Stack frames are how functional scope in languages like C are actually implemented -- allowing each function invocation to refer to different variables with the same name. In other words, the programmer thinks they are writing a function with local variables; compiler has arranged to implement that with stack frames. - de-mystifying pointers: a pointer (like "int* foo") is an address. that's it. repeat: a pointer is an address. that address can be: - on the stack - on the heap - in the text section of the program - because of how stack frames work, it's unequivocally a bug to pass a pointer from a prior stack frame. 3. System call intro - What are system calls? System calls are the process's main interface to the operating system The set of system calls is the API exposed by the kernel to user programs In other words, **syscalls are the mechanism by which user-level programs ask the operating system to do things for them.** --there are ~400 syscalls for Linux now [show slide] - How to use system calls? To the C programmer, a system call looks exactly like a function call: you just issue the function, get a return value, and keep going. - here are some example system calls: int fd = open(const char* path, int flags) write(fd, const void *, size_t) read(fd, void *, size_t) (Aside: fd is a *file descriptor*. This is an abstraction, provided by the operating system, that represents an open file. We'll come back to this later in the course.) - on Unix, type "man 2 " to get documentation. [skipped] - a note on the interface People use processes as an abstraction of a machine. Works good for many years, until...process/VM migration (important feature for cloud computing) A "better" interface for migration is VM. processes --syscall--> kernel VM --ISA--> CPU [if time allows, talk a little bit about VM migration] A similar argument recently: Intel SGX vs. AMD SEV 4. Process/OS control transfers - For C programmer, system calls and function calls are the same. - To the C compiler (or the assembly programmer) and the machine as a whole, a system call has some key differences versus function calls (even though both are transfers of control): (i) there is a small difference in calling conventions -- a process knows that when it invokes "syscall", ALL registers (except RAX) are call-preserved. That means that the callee (in this case the kernel) is required to save and restore all registers (except RAX, which is the exception because that is where return values go). (ii) Rather than using the "call" instruction, the process uses a different instruction (helpfully called the `syscall` instruction). This causes privilege levels to switch. The picture looks like this: user-level application | (open) v user-level --------------------------- ^ | kernel-level | |____> [table] open() | ..... | sysexit ------- - Vocabulary: when a user-level program invokes the kernel via a syscall, it is called *trapping* to the kernel Key distinction: privileged versus unprivileged mode --the difference between these modes is something that the *hardware* understands and enforces --the OS runs in privileged mode --can mess with the hardware configuration --users' tasks run in unprivileged mode --cannot mess with the hardware configuration --the hardware knows the difference between privileged and unprivileged mode (on the x86, these are called ring 0 and ring 3. The middle rings aren't used in the classical setup, but they are used in some approaches to virtualization.) [show slide] - normally, processes run and kernel is not involved -- for example, inference deep neural networks on CPU (a lot matrix multiplications; nothing to do with OS) - there are three ways that the OS (also known as the kernel) is invoked: A. system calls, covered above. B. interrupts. An _interrupt_ is a hardware event; it allows a device, whether peripheral (like a disk) or built-in (like a timer) to notify the kernel that it needs attention. (As we will see later, timers are essential for ensuring that processes don't hog the CPU.) Interrupts are **implicit**: in most cases, the application that was running at the time of the interrupt _has no idea that an interrupt even triggered_, despite the fact that handling the interrupt requires these high-level steps: - process stops running - CPU invokes interrupt handler - interrupt handler is part of kernel code, so kernel starts running - kernel handles interrupt - kernel returns control [draw: app --->[interrupt] +----> | | kernel +---[handler]---+ ] - process does not realize this: from the process's viewpoint, it executed continuously, but an omniscient observer would know perfectly well that the process was in fact _interrupted_ (hence the term). - how is this possible? (in a high level) In order to preserve this illusion, the processor (CPU) and kernel have to be designed very carefully to save _all_ process state on an interrupt, and restore all of it. We will discuss the underlying mechanisms for these control transfers later in the course. C. exceptions An _exception_ means that the CPU cannot execute the instruction issued by the processor. - erroneous cases Classically (and for this part of the course), you can think of this as "the process did something erroneous" (a software bug): dividing by 0, accessing a bogus memory address, or attempting to execute an unknown instruction. - non-erroneous cases But there are non-erroneous causes of exceptions (an example is demand paging, as we will see in the virtual memory unit). --page fault (on-demand page allocation) --JVM case: to save a if-else branch cycle, they trigger exception instead - kernel's handling When an exception happens, the processor (the CPU) knows about it immediately. The CPU then invokes an _exception handler_ (code implemented by the kernel). The kernel can handle exceptions in a variety of ways: - kill the process (this is the default, and what is happening when you see a segfault in one of your programs). - signal to your process (this is how runtimes like Java generate null-pointer exceptions; processes _register_ to catch signals). - silently handle the exception (this is how the kernel handles certain memory exceptions, as in the demand paging case). The mechanisms here relate to those for interrupts. 5. Git and Lab grading explained - if you know why, usually things are not going to be too bad [draw the workflow of students student -> file -> git tracked file -> committed file -> pushed file student --commit id--> Canvas slack hours ] [draw grading workflow Canvas --commit id--> TA computer TA computer --commit id--> GitHub --code--> TA computer TA computer compile, run, grade Loop for all students ] - anyone who don't understand slack hours? - a useful question: A student finished Lab3. Now is 60hrs late. The student only has 10 slack hrs. How many hours should the student apply to this lab? [answer: 0hr; floor is 50pt] - repeat that we will run cheating detection system. - anyone who haven't read integrity policy? --Here are some questions: Looking at a classmate's solution and then coding it by yourself afterward Showing your code to a classmate who has questions Modifying code that you find on StackOverflow Modifying code for a similar assignment that you find on GitHub The correct answer: ALL of these are ruled out by the policy.