Week 5.a CS5600 2/14 2022 https://naizhengtan.github.io/22spring/ 1. Last time 2. Intro to concurrency 3. Memory consistency model 4. Managing concurrency -------------------------------------------- Admin - Lab2 is released -- substantially harder: C + syscall + shell -- milestone in a week -------------------------------------------- 1. Last time - Interface to threads: tid thread_create (void (*fn) (void *), void *); void thread_exit (); void thread_join (tid thr); - thread vs. process -- threads share memory, but they have their own "execution context" (registers and stack). 2. Intro to concurrency There are many sources of concurrency. --what is concurrency? --stuff happening at the same time --sources of concurrency --on a single CPU, processes/threads can have their instructions interleaved (helpful to regard the instructions in multiple threads as "happening at the same time") --computers have multiple CPUs and common memory, so instructions in multiple threads can happen at the same time! --interrupts (CPU was doing one thing; now it's doing another) --why is concurrency hard? *** Hard to reason about all possible interleavings *** (deeper reason: human brain is "single-threaded") --handout: 1a: x = 1 or x = 2. 1b: x = 13 or x = 25. 1c: x = 1 or x = 2 or x = 3 say x is at mem location 0x5000 f is "x = x+1;", which might compile to: movq 0x5000, %rbx # load from address 0x5000 into register addq $1, %rbx # add 1 to the register's value movq %rbx, 0x5000 # store back g is "x = x+2;", which might compile to: movq 0x5000, %rbx # load from address 0x5000 into register addq $2, %rbx # add 2 to the register's value movq %rbx, 0x5000 # store back 2: incorrect list structure both threads will insert new List_elem as head one List_elem will be lost (later threads cannot access it) 3: incorrect count in buffer [draw producer-consumer on board] producer --+ +--> consumer | | +---> [buffer] ---+ (count, in, out) "buffer" of course is not a real ring. it is an array with a wrap around pointer. Question: why might go wrong with this incorrect count? [answer: the producer may overwrite items; the consumer may take out NULL.] points: --all of these are called race conditions; not all of them are errors, though --all of these can happen on one-CPU machine (multi-core case can be worse; will see soon) - revisit: an example you've seen, HW2 Q2.b int main() { fork(); fork(); // [update 2/4: fix the double quotation marks] printf("\nhello world"); } --expected outputs: [many possibilities; should see four newlines ("\n") and four "hello world", but their order is uncertain.] --What are you going to do when debugging concurrency code? --worst part of errors from race conditions is that a program may work fine most of the time but only occasionally show problems. why? (because the instructions of the various threads or processes or whatevever get interleaved in a non-deterministic order.) --and it's worse than that because inserting debugging code may change the timing so that the bug doesn't show up 3. Memory consistency model --hardware makes the problem even harder [look at handout_w5a panel 4; what is the correct answer?] [draw these cases on board] [answer: "it depends on the hardware"] --sequential consistency not always in effect --sequential consistency means: (or what I call, "formal definition of how normal human being think of concurrency") --(1) maintain a single sequential order among operations (namely: all memory operations can form a sequential order where the read ops read from the most recent writes.) --(2) maintain program order on individual processors (namely: all operations from the same thread follow their issue order in the program) --a concurrent execution is **equivalent** to a sequential execution when [--(0) of course, both must have the same operations] --(1) all (corresponding) reads return the same values --(2) the final state are the same --if a memory has _sequential consistency_, it means that: a concurrent program running on multiple CPUs looks as if the same program (with the same number of threads) running on a single CPU. or, more formally, there always _exists_ a sequential order of memory operations that is _equivalent_ to what the concurrent program produces running on multiple CPUs. - On multiple CPUs, we can get "interleavings" _that are impossible on single-CPU machines_. In other words, the number of interleavings that you have to consider is _worse_ than simply considering all possible interleavings of sequential code. - explain why sequential consistency is not held: -- CPUs -> Memory (too slow) -- CPUs -> Caches -> Memory (Write stall: invalidating cache requires a long time) -- CPUs -> StoreBuffers -> Caches -> Memory -- more... -- out-of-order execution -- speculative execution [draw: +-----+ +-----+ | CPU | | CPU | | | | | [later: StoreBuffer] | | | | [later: cache] +--+--+ +--+--+ | | | | +-------+---------+ | v +------------+ | memory | +------------+ ] - To whom interested in knowing more about the details of memory consistency: --http://sadve.cs.illinois.edu/Publications/computer96.pdf --Paul E. McKenney, "Memory Barriers: a Hardware View for Software Hackers" --assume sequential consistency until we explicitly relax it Discussion: memory consistency and nosql -- sequential consistency -- causal consistency -- eventual consistency 4. Managing concurrency * critical sections * protecting critical sections * implementing critical sections --step 1: the concept of *critical sections* --Regard accesses of shared variables (for example, "count" in the bounded buffer example) as being in a _critical section_ --Critical sections will be protected from concurrent execution [draw: program {A; c.s.; B} and three threads running it T1 ---[A][c.s.]-----[ B ]--> T2 ---[A]------[c.s.]---[ B ]-----> T3 ---[A]------------[c.s.]-----[B]-----> ] --analogy: a room that only allows one person --people: threads --room: critical section --Critical section should satisfy 3 properties: 1. mutual exclusion only one thread can be in c.s. at a time [this is the notion of atomicity] 2. progress if no threads executing in c.s., one of the threads trying to enter a given c.s. will eventually get in 3. bounded waiting once a thread T starts trying to enter the critical section, there is a bound on the number of other threads that may enter the critical section before T enters --Note progress vs. bounded waiting --If no thread can enter C.S., don't have progress --If thread A waiting to enter C.S. while B repeatedly leaves and re-enters C.S. ad infinitum, don't have bounded waiting --We will be mostly concerned with mutual exclusion (in fact, real-world synchronization/concurrency primitives often don't satisfy bounded waiting.) [will continue in the next lecture]