Week 5.b CS5600 02/08 2023 https://naizhengtan.github.io/23spring/ 0. Last time 1. Concurrency and consistency model 2. Managing concurrency ----------------------------------------- 0. Processes vs. Threads (last time) - Q: what's the difference? - an example void *f(void *xx) { sleep(1); printf("this is f\n"); exit(0); } int main(){ pthread_t tid; pthread_create(&tid, NULL, f, NULL); pthread_join(tid, NULL); // line X printf("this is main\n"); } Question: what is the output? [answer: "this is f"] Question: if we comment line X, what is the output? [answer: "this is main"] 1. Concurrency and consistency model A. handout from last time panel 2: incorrect list structure both threads will insert new List_elem as head one List_elem will be lost (later threads cannot access it) panel 3: incorrect count in buffer [draw producer-consumer on board] producer --+ +--> consumer | | +---> [buffer] ---+ (count, in, out) "buffer" of course is not a real ring. it is an array with a wrap around pointer. Question: why might go wrong with this incorrect count? [answer: the producer may overwrite items; the consumer may take out NULL.] points: --all of these are called **race conditions**; not all of them are errors, though --all of these can happen on one-CPU machine (multi-core case can be worse; will see soon) --What are you going to do when debugging concurrency code? --worst part of errors from race conditions is that a program may work fine most of the time but only occasionally show problems. why? (because the instructions of the various threads or processes or whatevever get interleaved in a non-deterministic order.) --and it's worse than that because inserting debugging code may change the timing so that the bug doesn't show up B. Memory consistency model --hardware makes the problem even harder [look at handout_w5a panel 4; what is the correct answer?] [draw these cases on board] [answer: "it depends on the hardware"] --sequential consistency not always in effect --sequential consistency means: (or what I call, "formal definition of how normal human being thinks of concurrency") --if a memory has _sequential consistency_, it means that: a concurrent program running on multiple CPUs looks as if the same program (with the same number of threads) running on a single CPU. or, more formally, there always _exists_ a sequential order of memory operations that is _equivalent_ to what the concurrent program produces running on multiple CPUs. - On multiple CPUs, we can get "interleavings" _that are impossible on single-CPU machines_. In other words, the number of interleavings that you have to consider is _worse_ than simply considering all possible interleavings of sequential code. - explain why sequential consistency is not held: -- CPUs -> Memory (too slow) -- CPUs -> Caches -> Memory (Write stall: invalidating cache requires a long time) -- CPUs -> StoreBuffers -> Caches -> Memory -- more... -- out-of-order execution -- speculative execution [draw: +-----+ +-----+ | CPU | | CPU | | | | | [later: StoreBuffer] | | | | [later: cache] +--+--+ +--+--+ | | +-------+---------+ | v +------------+ | memory | +------------+ ] - To whom interested in knowing more about the details of memory consistency: --http://sadve.cs.illinois.edu/Publications/computer96.pdf --Paul E. McKenney, "Memory Barriers: a Hardware View for Software Hackers" --assume sequential consistency until we explicitly relax it - in fact, checking sequential consistency (SC) is NP-hard - multi-core CPU architecture [show slides] - But it's not just hardware: -- Compiler may play a role as well. -- here is an example (borrowed from https://courses.cs.washington.edu/courses/cse451/15au/notes/l17-mcm-slides.pdf) int x = 0; // a global variable void foo() { for (int i=0; i<100; i++) { x = 1; printf("%d", x); } } void bar() { x = 0; } Q: if foo() and bar() are run on two threads, what can be the possible output? Q: Can we see "1100000000000000..."? [YES! because a compiler may optimize foo() as """ x = 1; for (int i=0; i<100; i++) { printf("%d", x); } """ ] -- point: concurrency is HARD. 2. Managing concurrency * critical sections * protecting critical sections * implementing critical sections --step 1: the concept of *critical sections* --Regard accesses of shared variables (for example, "count" in the bounded buffer example) as being in a _critical section_ --Critical sections will be protected from concurrent execution [draw: program {A; c.s.; B} and three threads running it T1 ---[A][c.s.]-----[ B ]--> T2 ---[A]------[c.s.]---[ B ]-----> T3 ---[A]------------[c.s.]-----[B]-----> ] --analogy: a room that only allows one person --people: threads --room: critical section --Solution must satisfy 3 properties: 1. mutual exclusion only one thread can be in c.s. at a time [this is the notion of atomicity] 2. progress if no threads executing in c.s., one of the threads trying to enter a given c.s. will eventually get in (sometimes called "no lockout") 3. bounded waiting once a thread T starts trying to enter the critical section, there is a bound on the number of other threads that may enter the critical section before T enters (sometimes called "no deadlock") [will start from here next time]