week 11a CS6640 03/17 2026 https://naizhengtan.github.io/26spring/ □ 1. on the difficulty of concurrency □ 2. challenges beyond interleaving □ 3. consistency model --- 1. On the difficulty of concurrency [cont'ed from last time] --why is concurrency hard? *** Hard to reason about all possible interleavings *** Source of non-determinism *** (deeper reason: human brain is "single-threaded") --handout: panel 1: Question: what can go wrong? 1a: x = 1 or x = 2. 1b: x = 13 or x = 25. 1c: x = 1 or x = 2 or x = 3 say x is at mem location 0x5000 f is "x = x+1;", which might compile to: lw t0, 0x5000 # load x into register t0 addi t0, t0, 1 # t0 = t0 + 1 sw t0, 0x5000 # store result back to x g is "x = x+2;", which might compile to: lw t1, 0x5000 # load x into register t1 addi t1, t1, 2 # t1 = t1 + 2 sw t1, 0x5000 # store result back to x panel 2: incorrect list structure Question: what can go wrong? both threads will insert new List_elem as head one List_elem will be lost (later threads cannot access it) points: --all of these are called **race conditions**; not all of them are errors, though --all of these can happen on one-CPU machine (multi-core case can be worse; will see soon) --What are you going to do when debugging concurrency code? --worst part of errors from race conditions is that a program may work fine most of the time but only occasionally show problems. why? (because the instructions of the various threads or processes or whatevever get interleaved in a non-deterministic order.) --and it's worse than that because inserting debugging code may change the timing so that the bug doesn't show up 2. challenges beyond interleaving --hardware makes the problem even harder [look at handout panel 3; what is the correct answer?] [draw these cases on board] [answer: "it depends on the hardware"] Question: why? how come these happen? [will answer soon] ...before that, what occurred in the previous examples? ...those executions correspond to interleavings that humans can reason about... ...this property is called sequential consistency. (sequential consistency not always in effect, which is the short answer to the question) --sequential consistency means: or what I call, "formal definition of how normal human being thinks of concurrency" --if a memory has _sequential consistency_, it means that: a concurrent program running on multiple CPUs looks as if the same program (with the same number of threads) running on a single CPU. or, more formally, there always _exists_ a sequential order of memory operations that is _equivalent_ to what the concurrent program produces running on multiple CPUs. - On multiple CPUs, we can get "interleavings" _that are impossible on single-CPU machines_. In other words, the number of interleavings that you have to consider is _worse_ than simply considering all possible interleavings of sequential code. - explain why sequential consistency is not held: -- CPUs -> Memory (too slow) -- CPUs -> Caches -> Memory (Write stall: invalidating cache requires a long time) -- CPUs -> StoreBuffers -> Caches -> Memory -- more... -- out-of-order execution -- speculative execution [draw: +-----+ +-----+ | CPU | | CPU | | | | | [later: StoreBuffer] | | | | [later: cache] +--+--+ +--+--+ | | +-------+---------+ | v +------------+ | memory | +------------+ ] - To whom interested in knowing more about the details of memory consistency: --http://sadve.cs.illinois.edu/Publications/computer96.pdf --Paul E. McKenney, "Memory Barriers: a Hardware View for Software Hackers" --when you program, most of the time, you can assume sequential consistency in your mental model; better---if you use synchronization primitives properly. - in fact, checking sequential consistency (SC) is NP-hard - multi-core CPU architecture [show slides] - But it's not just hardware: -- Compiler may play a role as well. -- here is an example: int x = 0; void* foo(void* arg) { for (int i = 0; i < 1000000; i++) { x = x + 1; } return NULL; } Q: if foo() is run on two threads, what can be the possible output? Q: Can we see "1000000"? [YES! because a compiler may optimize foo() as """ x = x + 1000000; """ The two threads may interleave and have: (tmp1/2 are registers in CPU, x in memory) tmp1 = x // thread 1 tmp1 = tmp1 + 1000000 // thread 1 (tmp1 = 1000000) tmp2 = x // thread 2 (tmp2 = 1000000) tmp2 = tmp2 + 1000000 // thread 2 x = tmp1 // thread 1 x = tmp2 // thread 2 ] -- point: concurrency is HARD 3. consistency model -- formal specification of correctness for concurrent operations -- a core trade-off: strong consistency (execution order approximates a human-expected sequential schedule) vs. performance (system throughput and latency ) -- hierachical of different consistency models See: https://jepsen.io/consistency/models -- checking them is an active research field See: https://sccp-workshop.github.io/sccp26/index.html