Week 4.a CS5600 9/28 2021 https://naizhengtan.github.io/21fall/ 1. Last time 2. Intro to concurrency, continued 3. Memory consistency model 4. Managing concurrency -------------------------------------------- Admin - review session on Wed, 2-3pm; will be recorded - sample midterm -- for simplicity, repeating handouts/homework; no "hints" here -- all material is fair game for exams - take the survey; silent majority - Lab1, due 09/30 - communication via mailing list: remember to "reply all" or cc mailing list -------------------------------------------- 1. Last time - the fork/exec separation - implementation of processes; PCB - processes vs. threads [draw CPU, memory, and threads on board] - thread APIs thread_create(...) thread_exit() thread_join(...) Question: why not fork and exec? - intro to concurrency Concurrency is challenging because it challenges the "single-threaded" world we're living in! 2. Intro to concurrency, continued [handout_w3a from last time] 2: incorrect list structure 3: incorrect count in buffer all of these are called race conditions; not all of them are errors, though --Question: what are you going to do when debugging concurrency code? --worst part of errors from race conditions is that a program may work fine most of the time but only occasionally show problems. why? (because the instructions of the various threads or processes or whatevever get interleaved in a non-deterministic order.) --and it's worse than that because inserting debugging code may change the timing so that the bug doesn't show up 3. Memory consistency model --hardware makes the problem even harder [look at handout_w3a panel 4; what is the correct answer?] [answer: "it depends on the hardware"] --sequential consistency not always in effect --sequential consistency means: (or what I call, "formal definition of how normal human being think of concurrency") --(1) maintain a single sequential order among operations (namely: all memory operations can form a sequential order where the read ops read from the most recent writes.) --(2) maintain program order on individual processors (namely: all operations from the same thread follow their issue order in the program) --a concurrent execution is **equivalent** to a sequential execution when [--(0) of course, both must have the same operations] --(1) all (corresponding) reads return the same values --(2) the final state are the same --if a memory has _sequential consistency_, it means that: a concurrent program running on multiple CPUs looks as if the same program (with the same number of threads) running on a single CPU. or, more formally, there always _exists_ a sequential order of memory operations that is _equivalent_ to what the concurrent program produces running on multiple CPUs. - On multiple CPUs, we can get "interleavings" _that are impossible on single-CPU machines_. In other words, the number of interleavings that you have to consider is _worse_ than simply considering all possible interleavings of sequential code. - explain why sequential consistency is not held: -- CPUs -> Memory (too slow) -- CPUs -> Caches -> Memory (Write stall: invalidating cache requires a long time) -- CPUs -> StoreBuffers -> Caches -> Memory -- more... -- out-of-order execution -- speculative execution [draw: +-----+ +-----+ | CPU | | CPU | | | | | [later: StoreBuffer] | | | | [later: cache] +--+--+ +--+--+ | | | | +-------+---------+ | v +------------+ | memory | +------------+ ] - To whom interested in knowing more about the details of memory consistency: --http://sadve.cs.illinois.edu/Publications/computer96.pdf --Paul E. McKenney, "Memory Barriers: a Hardware View for Software Hackers" --assume sequential consistency until we explicitly relax it [skipped] Discussion: memory consistency and nosql -- sequential consistency -- causal consistency -- eventual consistency 4. Managing concurrency * critical sections * protecting critical sections * implementing critical sections --step 1: the concept of *critical sections* --Regard accesses of shared variables (for example, "count" in the bounded buffer example) as being in a _critical section_ --Critical sections will be protected from concurrent execution [draw: program {A; c.s.; B} and three threads running it T1 ---[A][c.s.]-----[ B ]--> T2 ---[A]------[c.s.]---[ B ]-----> T3 ---[A]------------[c.s.]-----[B]-----> ] --analogy: a room that only allows one person --people: threads --room: critical section --Now we need a solution to _critical section_ problem --Solution must satisfy 3 properties: 1. mutual exclusion only one thread can be in c.s. at a time [this is the notion of atomicity] 2. progress if no threads executing in c.s., one of the threads trying to enter a given c.s. will eventually get in 3. bounded waiting once a thread T starts trying to enter the critical section, there is a bound on the number of other threads that may enter the critical section before T enters --Note progress vs. bounded waiting --If no thread can enter C.S., don't have progress --If thread A waiting to enter C.S. while B repeatedly leaves and re-enters C.S. ad infinitum, don't have bounded waiting --We will be mostly concerned with mutual exclusion (in fact, real-world synchronization/concurrency primitives often don't satisfy bounded waiting.) --step 2: protecting critical sections. --want lock()/unlock() or enter()/leave() or acquire()/release() --lots of names for the same idea --mutex_init(mutex_t* m), mutex_lock(mutex_t* m), mutex_unlock(mutex_t* m),.... --pthread_mutex_init(), pthread_mutex_lock(), ... --in each case, the semantics are that once the thread of execution is executing inside the critical section, no other thread of execution is executing there --analogy: mutex => door for the room --lock/qcuire => close the door --unlock/release => open the door (let another person to enter the room)