Week 4.a CS5600 1/30 2023 https://naizhengtan.github.io/23spring/ 1. Shell internal continued & discussions 2. Implementation of processes 3. Context switch intro 4. Scheduling intro ------------------------------------- - Lab1: a fraction of an hour is an hour - Lab2 -- will release today after my office hours -- ddls: 02/05 (soft) and 02/17 -- HW2 hints and we're generous for HW - Questions (again) -- Google -> Piazza -> Mailing list -> office hours -- feel free to ask any questions: "why not give everybody an A?" -- your expectation: I hardly accept changes to course, but you certainly can ask - online office hours? -- TA: still on going ----- 1. Shell internal continued & discussions A) recall last time with questions -- after fork(), there are two processes, child and parent. Q: What is fork's return value for child? -- the return value for parent is child's pid -- that's how a parent can wait for a specific child (check out waitpid) -- Q: fd 0/1/2. What are they? -- Q: "$ ls > log.txt": an output redirection How to implement this? Why it works? B) Pipe -- example: cat students.txt | shuf -n 1 [See handout from last time panel 3. Go through the examples.] -- The key mechanisms are: - the pipe() system call. this takes as input a two-element file descriptor array. the first file descriptor is the "read end"; the second is the "write end". after a process writes to the "write end", the data written is available by reading from the "read end". - the actions that the shell takes when the command has the vertical bar (sometimes known as the pipe character). the character is |. (the same character as bitwise-OR in C). - at a high level, when the shell sees |, it uses the system call pipe() to "connect" a write end in one process with a read end in another process. it does this by forking, and then manipulating file descriptors (see handle_pipeline() on handout line 123). C) The power of the fork/exec separation [an innovation from the original Unix. possibly lucky design choice at the time. but turns out to work really well. allows the child to manipulate environment and file descriptors *before* exec, so that the *new* program may in fact encounter a different environment] --recall how we handle redirection --To generalize redirections and pipelines, there are lots of things the parent shell might want to manipulate in the child process: file descriptors, environment, resource limits. --yet fork() requires no arguments! --syscall CreateProcess on Windows: BOOL CreateProcess( name, commandline, security_attr, thr_security_attr, inheritance?, other flags, new_env, curr_dir_name, ...) [http://msdn.microsoft.com/en-us/library/ms682425(v=VS.85).aspx] [see also Windows syscalls: https://github.com/j00ru/windows-syscalls] there's also CreateProcessAsUser, CreateProcessWithLogonW, CreateProcessWithTokenW, ... * The issue is that any conceivable manipulation of the environment of the new process has to be passed through arguments, instead of via arbitrary code. in other words: because whoever calls CreateProcess() (or its variant) needs to perfectly configure the process before it starts running. with fork(), whoever calls fork() **is still running** so can arrange to do whatever it wants, without having to work through a rigid interface like the above. allows arbitrary "setup" of the process before exec(). D) discussions - Discussion: what makes a good abstraction? --simple but powerful --examples we've seen: --stdin (0), stdout (1), stderr (2) [nice by themselves, but when combined with the mechanisms below, things get even better] --file descriptors --fork/exec() separation --very few mechanisms lead to a lot of possible functionality - Discussion: why fork() is less attractive today? "A fork() in the road" https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf --claim: "Fork today is a convenient API for a single-threaded process with a small memory footprint and simple memory layout that requires fine-grained control over the execution environment of its children but does not need to be strongly isolated from them. In other words, a shell." --Fork doesn’t compose. "Because fork duplicates an entire address space, it is a poor fit for OS abstractions implemented in user-mode. Buffered IO is a classic example: a user must explicitly flush IO prior to fork, lest output be duplicated." example: print("hello world"); fork(); print("\n"); QUESTION: what do you expect to see on screen? A: hello world\n hello world\n B: hello world\n \n [answer: A] --Fork isn’t thread-safe. "Unix processes today support threads, but a child created by fork has only a single thread (a copy of the calling thread)." [we will see this during concurrency session] --Fork is insecure. "Programs that fork but don’t exec render address-space layout randomisation ineffective, since each process has the same memory layout." [we will see this during security session] --Fork is slow. [we will see this during virtual machine session] Aside: - Fork bomb at the bash command prompt: $ :(){ : | : & }; : 2. Implementation of processes Briefly cover the OS's view: - process control block (PCB) ----------------- | process id | | state | (ready, runnable, waiting, etc.) | open file | (0:stdin, 1:stdout, 2:stderr) | VM structures | (will talk about in memory part) | registers | | ..... | (signal mask, terminal, priority, ...) ----------------- [show slide; a simpified version] - process id --for example, the return value of fork() in the parent process - process states: running, ready, waiting [draw state transfer graph] -- this is a simplified model; real linux process states include --running/runable --interruptible sleep --uninterruptible sleep --zombie --stopped Question: Linux has "zombine"; why no "orphan" state? point out that during scheduling, a mechanism that we have not seen, a core switches between processes. will discuss the mechanism next. 3. Context switch intro --motivation: one CPU can run one process at a time; how to run multiple process "at the same time" (multiplexing)? --context switch: OS stops the running process and switches to another ready process. [draw switching between P1 and P2] P1 OS P2 | [trap to kernel] +------------>+ | [save P1 context] [choose P2 to run] [restore P2 context] | +------------->+ | ... [trap to kernel] +<-------------+ | [save P2 context] [choose P1 to run] [restore P1 context] | +<------------+ | ... --some points -- P1 and P2 have no idea they've been cut and switched out. -- OS (scheduler) decides which process to run next (but how make this decision? we will see in scheudling) -- if context switches happen frequently enough, users will fell P1 and P2 are running "at the same time" --context switching has a cost [draw two processes and kernel; switching from one to the other] --CPU time in kernel --save and restore registers --switch address spaces --indirect costs --TLB shootdowns, processor cache, OS caches (e.g., buffer caches) --result: more frequent context switches will lead to worse throughput (higher overhead)