Week 3.b CS5600 9/24 2021 https://naizhengtan.github.io/21fall/ 1. Last time 2. Fork/exec separation 3. Processes: the OS's view 4. Threads 5. Intro to concurrency -------------------------------------------- Admin - no code on piazza - Piazza question -- what are the readings/textbooks for? -- depth vs. width [draw on board] - No student join office hours; two students show up in review session. -- because of time? topic? -------------------------------------------- 1. Last time - Process birth, fork() - Shell crash course - file descriptor [draw "ls" with fd 0 and 1] - Shell internals -- executing cmd; fork() and exec() -- input/output redirection -- pipe [draw how redirection and pipe work for "ls" with fd 0 and 1] - anyone tried the fork bomb? $ :(){ : | : & }; : 2. The power of the fork/exec separation [continue from the last lecture] [an innovation from the original Unix. possibly lucky design choice at the time. but turns out to work really well. allows the child to manipulate environment and file descriptors *before* exec, so that the *new* program may in fact encounter a different environment] --recall how we handle redirection --To generalize redirections and pipelines, there are lots of things the parent shell might want to manipulate in the child process: file descriptors, environment, resource limits. --yet fork() requires no arguments! --Contrast with CreateProcess on Windows: BOOL CreateProcess( name, commandline, security_attr, thr_security_attr, inheritance?, other flags, new_env, curr_dir_name, .....) [http://msdn.microsoft.com/en-us/library/ms682425(v=VS.85).aspx] there's also CreateProcessAsUser, CreateProcessWithLogonW, CreateProcessWithTokenW, ... * The issue is that any conceivable manipulation of the environment of the new process has to be passed through arguments, instead of via arbitrary code. in other words: because whoever calls CreateProcess() (or its variant) needs to perfectly configure the process before it starts running. with fork(), whoever calls fork() **is still running** so can arrange to do whatever it wants, without having to work through a rigid interface like the above. allows arbitrary "setup" of the process before exec(). - Discussion: what makes a good abstraction? --simple but powerful --examples we've seen: --fork/exec() separation --file descriptors --stdin (0), stdout (1), stderr (2) [nice by themselves, but when combined with the mechanisms below, things get even better] --very few mechanisms lead to a lot of possible functionality Question: what if there is no such abstractions? --fork/exec: we've seen that in CreasteProcess() --file descriptor: different ways to handle devices, files, etc. --stdin/stdout/stderr: explicitly tell where the input/output/err should go - Discussion: why fork() is less attractive today? "A fork() in the road" https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf --fork is slow --fork doesn't scale --fork isn't thread-safe 3. Implementation of processes Briefly cover the OS's view: - process control block (PCB) [draw on board] ----------------- | process id | | state | (ready, runnable, blocked, etc.) | IP (ins ptr)| | open file | (0:stdin, 1:stdout, 2:stderr) | VM structures | (will talk about in memory part) | registers | | ..... | (signal mask, terminal, priority, ...) ----------------- - process id --for example, the return value of fork() in the parent process - process states: running, ready, blocked [draw state transfer graph] -- this is a simplified model; real linux process states include --running/runable --interruptible sleep --uninterruptible sleep --zombie --stopped point out that during scheduling, a mechanism that we have not seen, a core switches between processes. will discuss the mechanism for this later. Note: these PCBs will have an analog when considering threads, below. 4. Threads [ask how many students have used threads] Interface to threads: tid thread_create (void (*fn) (void *), void *); Create a new thread, run fn with arg void thread_exit (); void thread_join (tid thr); Wait for thread with tid 'thr' to exit plus a lot of synchronization primitives, which we'll see in the next lecture Assume for now that threads are: --an abstraction created by OS --preemptively scheduled thread vs. process: a frequently asked question -- threads share memory, but they have their own "execution context" (registers and stack). A toy example: void f() {...} void g() {...} int main() { thread_create(f, NULL) thread_create(g, NULL) ... } [draw abstract picture of threads in a process: own registers, share memory] Question: if you were OS designer, what are you going to do if a thread calls fork()? [answer: only one thread left in the child process] 5. Intro to concurrency There are many sources of concurrency. --what is concurrency? --stuff happening at the same time --sources of concurrency --on a single CPU, processes/threads can have their instructions interleaved (helpful to regard the instructions in multiple threads as "happening at the same time") --computers have multiple CPUs and common memory, so instructions in multiple threads can happen at the same time! --interrupts (CPU was doing one thing; now it's doing another) --why is concurrency hard? *** Hard to reason about all possible interleavings *** (deeper reason: human brain is "single-threaded") --handout: 1a: x = 1 or x = 2. 1b: x = 13 or x = 25. 1c: x = 1 or x = 2 or x = 3 say x is at mem location 0x5000 f is "x = x+1;", which might compile to: movq 0x5000, %rbx # load from address 0x5000 into register addq $1, %rbx # add 1 to the register's value movq %rbx, 0x5000 # store back g is "x = x+2;", which might compile to: movq 0x5000, %rbx # load from address 0x5000 into register addq $2, %rbx # add 2 to the register's value movq %rbx, 0x5000 # store back [will continue the handout in the next lecture]