Week 4.b CS3650 01/31 2024 https://naizhengtan.github.io/24spring/ 1. pipelines 2. what makes a good abstraction? 3. process memory layout 4. Crash course in x86-64 assembly ---- - recall last time with questions -- after fork(), there are two processes, child and parent. Q: What is fork's return value for child? -- the return value for parent is child's pid -- that's how a parent can wait for a specific child (check out waitpid) -- Q: fd 0/1/2. What are they? -- Q: "$ ls > log.txt": an output redirection How to implement this? Why it works? -- Q: an example int main() { int fd = open("/tmp/tmp.txt", O_CREAT|O_TRUNC|O_WRONLY, 0777); if (fork() == 0) { printf("[child] fd = %d\n", fd); write(fd, "child\n", 6); } else { printf("[parent] fd = %d\n", fd); write(fd, "parent\n", 7); } } 1. Pipelines -- example: cat students.txt | shuf -n 1 [See handout from last time panel 3. Go through the examples.] -- The key mechanisms are: - the pipe() system call. this takes as input a two-element file descriptor array. the first file descriptor is the "read end"; the second is the "write end". after a process writes to the "write end", the data written is available by reading from the "read end". - the actions that the shell takes when the command has the vertical bar (sometimes known as the pipe character). the character is |. (the same character as bitwise-OR in C). - at a high level, when the shell sees |, it uses the system call pipe() to "connect" a write end in one process with a read end in another process. it does this by forking, and then manipulating file descriptors (see handle_pipeline() on handout line 123). 2. Discussion: what makes a good abstraction? --simple but powerful --examples we've seen: --stdin (0), stdout (1), stderr (2) [nice by themselves, but when combined with the mechanisms below, things get even better] --file descriptors --fork/exec() separation --very few mechanisms lead to a lot of possible functionality - Discussion: why fork() is less attractive today? "A fork() in the road" https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf --claim: "Fork today is a convenient API for a single-threaded process with a small memory footprint and simple memory layout that requires fine-grained control over the execution environment of its children but does not need to be strongly isolated from them. In other words, a shell." --Fork doesn’t compose. "Because fork duplicates an entire address space, it is a poor fit for OS abstractions implemented in user-mode. Buffered IO is a classic example: a user must explicitly flush IO prior to fork, lest output be duplicated." example: printf("hello world"); fork(); printf("\n"); QUESTION: what do you expect to see on screen? A: hello world\n hello world\n B: hello world\n \n [answer: A] --Fork isn’t thread-safe. "Unix processes today support threads, but a child created by fork has only a single thread (a copy of the calling thread)." [we will see this during concurrency session] --Fork is insecure. "Programs that fork but don’t exec render address-space layout randomisation ineffective, since each process has the same memory layout." [we will see this during security session] --Fork is slow. [we will see this during virtual machine session] Aside: - Fork bomb at the bash command prompt: $ :(){ : | : & }; : 3. process memory layout - revisit the big picture and where are we: +-------+ +----------+ |source | |executable| +-------+ | code |--[compile]-->| file |--[exec]-->|process| +-------+ +----------+ +-------+ |<--C-->| |<--shell-->|<-mem->| |<--compiler-->|<- NEXT ->| layout - recall memory layout: * the ".text" segment: memory used to store the program itself * the ".data" segment: memory used to store global variables * The memory used by the heap, from which programmer allocates using `malloc()` * The memory used for the stack, which we will talk about in more detail below. [draw process memory layout] 4. Crash course in x86-64 assembly syntax: movq PLACE1, PLACE2 means "move 64-bit quantity from PLACE1 to PLACE2". the places are usually registers or memory addresses, and can also be immediates (constants). pushq %rax equivalent to : [ subq $8, %rsp movq %rax, (%rsp) ] popq %rax [ movq (%rsp), %rax addq $8, %rsp ] call 0x12345 [ pushq %rip movq $0x12345, %rip] ret [ popq %rip ] --above we see how call and ret interact with the stack --call: updates %rip and pushes old %rip on the stack --ret: updates %rip by loading it with stored stack value [want to learn more about x86 assembly code? check out: https://cs.brown.edu/courses/cs033/docs/guides/x64_cheatsheet.pdf]