Week 6.a CS 5600 10/12 2021 On the board ------------ 1. Last time 2. Concurrency issues, continued 3. Other synchronization mechanisms 4. Scheduling intro 5. Scheduling disciplines ----------------------------------------------- Admin - a system seminar talk -- details on Canvas -- programmable switches & smart NICs -------- 1. Last time - mutex & condition variable - deadlock 1. mutual exclusion 2. hold-and-wait 3. no preemption 4. circular wait - deadlock solutions (1) ignore (2) detect & recover (3) avoid algo (4) negate 1 of 4 conditions (5) "third-party" detecting systems - other concurrency issues (a) progress issues -- Livelock -- Starvation -- Priority inversion --T1, T2, T3: (highest, middle, lowest priority) --T1 wants to get lock, T2 runnable, T3 runnable and holding lock --System will run T2; T1 has to wait --Question: can you think of a solution? Answer: --Temporarily bump T3 to highest priority of any thread that is ever waiting on the lock --Disable interrupts, so no preemption (T3 finishes) ... not great because OS sometimes needs control (not for scheduling, under this assumption, but for handling memory [page faults], etc.) --structure app so only adjacent priority processes/threads share locks 2. Concurrency issues, continued [from last time] (b) other concurrency bugs - atomicity-violation bugs T1: if (thd->info) { ... use(thd->info); ... } T2: thd->info = NULL - order-violation bugs T1: void init() { thd = createThread(...) } T2: void main() { state = thd->info } "Learning from Mistakes -- A Comprehensive Study on Real World Concurrency Bug Characteristics" https://people.cs.uchicago.edu/~shanlu/preprint/asplos122-lu.pdf 3. Other synchronization mechanisms - semaphores -- mentioned last time -- first general-purpose synchronization primitive -- (you should not use it) - Barrier -- an example, GPU -- many cores, literally thousands -- no cache coherence (unlike CPU) -- after barrier, all threads are synced - spinlocks -- when acquiring the lock, the thread "spins". -- implementation, *pseudocode*: int lock; // 0: unlock, 1: locked while (1) { if (xchg_val(&lock, 1) == 0) break; } -- v2 = xchg_val(addr, v1) is implemented by atomic hardware instructions; (roughly) it does the following: (1) freeze all CPUs’ memory activity for address addr (2) temp = *addr (3) *addr = v1 (4) v2 = temp (5) un−freeze memory activity -- atomic instructions are expensive --some detailed experiments here: https://arxiv.org/pdf/2010.09852.pdf - reader-writer locks -- recall the "database problem" in handout week 5.a panel 2&3 -- usage (*pseudocode*): rwlock_t lock acquire_rdlock(&lock) acquire_wrlock(&lock) unlock(&lock) -- "passive reader-writer lock" (see citation below) -- main idea: "...leverages bounded staleness of memory consistency to avoid atomic instructions and memory barriers in readers’ common paths..." -- violating the 6th rule (never sleep) -- heated discussion "Scalable Read-mostly Synchronization Using Passive Reader-Writer Locks" https://www.usenix.org/system/files/conference/atc14/atc14-paper-liu.pdf - Read-copy-update (RCU) -- do we have to block readers when modifying a data structure? -- answer: No, using RCU -- an example of deletion in double-linked list [draw on board] --basic idea: -- readers work as normal (as if there is no concurrent updaters) -- updaters -- make copies -- maintain both old and new version for a "grace period" -- delete the old version --see also: https://cs.nyu.edu/~mwalfish/classes/14fa/ref/rcu2.pdf - deterministic multithreading -- concurrent programs have "infinite" possible schedules (hence we don't know what may go wrong) -- BUT, we do know which schedules are okay (for example, by testing) -- idea: avoid unseen interleavings and schedules which are potentially buggy -- see also: "Stable Deterministic Multithreading through Schedule Memoization" https://www.cs.columbia.edu/~junfeng/11sp-w4118/lectures/tern.pdf - discussion: shared memory model vs. nosql vs. database [if have time] - A useful book for concurrency ninjas: Paul E. McKenney, "Is Parallel Programming Hard, And, If So, What Can You Do About It?" https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook-e2.pdf 4. Scheduling intro High-level problem: operating system has to decide which process (or thread) to run. A. When scheduling decisions happen: [draw process transition diagram] exit (iv) +-------->[terminated] (ii) interrupt | [new] --> [ready] <-------------- [running] ^ ---------------> | I/O completion \ scheduler dispatch | or signal (iii) \ __________| \ / [waiting] <---/ I/O or wait (i) scheduling decisions take place when a process: (i) Switches from running to waiting state (ii) Switches from running to ready state (iii) Switches from waiting to ready (iv) Exits preemptive scheduling: at all four points nonpreemptive scheduling: at points (i), (iv), only (this is the definition of nonpreemptive scheduling) B. what are the metrics and criteria? timeline ----v---------v------------v----> arrival 1st execution completion --turnaround time time for each process to complete (completion - arrival) --waiting/response/output time. variations on this definition, but they all capture the time spent waiting for something to happen rather than the time from launch to exit. --time between when job enters system and starts executing; following the class's textbook, we will call this "response time" (1st execution - arrival) [some texts call this "waiting time"] --time from request to first response (e.g., key press to character echo) we will call this "output time" [some texts call this "response time"] --system throughput # of processes that complete per unit time --fairness different possible definitions: --freedom from starvation --all users get equal time on CPU --having CPU proportional to resources needed --etc. C. context switching has a cost [draw two processes and kernel; switching from one to the other] --CPU time in kernel --save and restore registers --switch address spaces --indirect costs --TLB shootdowns, processor cache, OS caches (e.g., buffer caches) --result: more frequent context switches will lead to worse throughput (higher overhead) 5. Scheduling disciplines - scheduling problem: -- multiple processes: P1, P2, P3, ... -- each with arrival time and running time -- 1 CPU -- ignore context switch costs, unless happening too frequently -- scheduling output: a sequence of process running schedule - let's do some intellectual exercises next time