Week 3.b CS5600 2/2 2022 https://naizhengtan.github.io/22spring/ On the board ------------ 1. Shell internal continued & discussions 2. Implementation of processes 3. Context switch intro 4. Scheduling intro ----------------------------------------------- Admin --Lab0 URL problem -- "22spring" vs. "21fall" --Lab1 released; a good chance to revisit or practice your C skills -- feel free to modify skeleton code --study OSes (or systems in general) --the knowledge is not "total ordered" --many notions are intertwined --capture the big picture; then explore/understand one thing at a time -------- 1. Shell internal continued & discussions - anyone tried the fork bomb? $ :(){ : | : & }; : -- process syscalls: fork/wait/exec -- file syscalls: open/read/write/close [here are some other system calls (these are included in the notes so that you know what the basic interface to a Unix-like OS looks like): --int open(char*, int flags, [, int mode]); --int read(int fd, void*, int nbytes): --int write(int fd, void* buf, int nbytes); --off_t lseek(int fd, off_t pos, int whence) --int close(int fd); --int kill(int pid, int signal) --void exit (int status) --int fork(void) --int waitpid(int pid, int* stat, int opt) --int execve(char* prog, char** argv, char** envp) --int dup2 (int oldfd, int newfd) --int pipe(int fds[2]) ] - Shell revisit --redirection, pipe, backgrounding - Pipelines [See handout. The key mechanisms are: - the pipe() system call. this takes as input a two-element file descriptor array. the first file descriptor is the "read end"; the second is the "write end". after a process writes to the "write end", the data written is available by reading from the "read end". - the actions that the shell takes when the command has the vertical bar (sometimes known as the pipe character). the character is |. (the same character as bitwise-OR in C). - at a high level, when the shell sees |, it uses the system call pipe() to "connect" a write end in one process with a read end in another process. it does this by forking, and then manipulating file descriptors (see handle_pipeline() on handout line 123). ] - Discussion: what makes a good abstraction? --simple but powerful --examples we've seen: --stdin (0), stdout (1), stderr (2) [nice by themselves, but when combined with the mechanisms below, things get even better] --file descriptors --fork/exec() separation --very few mechanisms lead to a lot of possible functionality - Discussion: why fork() is less attractive today? "A fork() in the road" https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf --claim: "Fork today is a convenient API for a single-threaded process with a small memory footprint and simple memory layout that requires fine-grained control over the execution environment of its children but does not need to be strongly isolated from them. In other words, a shell." --Fork doesn’t compose. "Because fork duplicates an entire address space, it is a poor fit for OS abstractions implemented in user-mode. Buffered IO is a classic example: a user must explicitly flush IO prior to fork, lest output be duplicated." example: print("hello world"); fork(); print("\n"); QUESTION: what do you expect to see on screen? A: hello world\n hello world\n B: hello world\n \n [answer: A and B] --Fork isn’t thread-safe. "Unix processes today support threads, but a child created by fork has only a single thread (a copy of the calling thread)." [we will see this during concurrency session] --Fork is insecure. "Programs that fork but don’t exec render address-space layout randomisation ineffective, since each process has the same memory layout." [we will see this during security session] --Fork is slow. [we will see this during virtual machine session] 2. Implementation of processes Briefly cover the OS's view: - process control block (PCB) ----------------- | process id | | state | (ready, runnable, waiting, etc.) | open file | (0:stdin, 1:stdout, 2:stderr) | VM structures | (will talk about in memory part) | registers | | ..... | (signal mask, terminal, priority, ...) ----------------- [show slide; a simpified version] - process id --for example, the return value of fork() in the parent process - process states: running, ready, waiting [draw state transfer graph] -- this is a simplified model; real linux process states include --running/runable --interruptible sleep --uninterruptible sleep --zombie --stopped Question: Linux has "zombine"; why no "orphan" state? point out that during scheduling, a mechanism that we have not seen, a core switches between processes. will discuss the mechanism next. 3. Context switch intro --motivation: one CPU can run one process at a time; how to run multiple process "at the same time" (multiplexing)? --context switch: OS stops the running process and switches to another ready process. [draw switching between P1 and P2] P1 OS P2 | [trap to kernel] +------------>+ | [save P1 context] [schedule P2] [restore P2 context] | +------------->+ | ... [trap to kernel] +<-------------+ | [save P2 context] [schedule P1] [restore P1 context] | +<------------+ | ... --some points -- P1 and P2 have no idea they've been cut and switched out. -- OS (scheduler) decides which process to run next (but how make this decision? we will see in scheudling) -- if context switches happen frequently enough, users will fell P1 and P2 are running "at the same time" --context switching has a cost [draw two processes and kernel; switching from one to the other] --CPU time in kernel --save and restore registers --switch address spaces --indirect costs --TLB shootdowns, processor cache, OS caches (e.g., buffer caches) --result: more frequent context switches will lead to worse throughput (higher overhead) 4. Scheduling intro High-level problem: operating system has to decide which process (or thread) to run. A. When scheduling decisions happen: [show process transition diagram] exit (iv) +-------->[terminated] (iii) interrupt | [new] --> [ready] <-------------- [running] ^ ---------------> | I/O completion \ scheduler dispatch | or signal (ii) \ __________| \ / [waiting] <---/ I/O or wait (i) scheduling decisions take place when a process: (i) Switches from running to waiting state (ii) Switches from waiting to ready (iii) Switches from running to ready state (iv) Exits preemptive scheduling: at all four points nonpreemptive scheduling: at points (i), (iv), only (this is the definition of nonpreemptive scheduling) B. what are the metrics and criteria? timeline ----v---------v------------v----> arrival 1st execution completion --turnaround time time for each process to complete (completion - arrival) --waiting/response/output time. variations on this definition, but they all capture the time spent waiting for something to happen rather than the time from launch to exit. --time between when job enters system and starts executing; following the class's textbook, we will call this "response time" (1st execution - arrival) [some texts call this "waiting time"] --time from request to first response (e.g., key press to character echo) we will call this "output time" [some texts call this "response time"] --system throughput # of processes that complete per unit time --fairness different possible definitions: --freedom from starvation --all users get equal time on CPU --having CPU proportional to resources needed --etc. C. scheduling game -- 1 CPU -- multiple processes: P1, P2, P3, ... -- each with arrival time and running time -- ignore context switch costs, unless happening too frequently -- assume no I/O (Completely unrealistic but lets us build intuition, as a warmup.) -- scheduling output: a sequence of scheduling decisions