Week 6.a CS3650 02/12 2024 https://naizhengtan.github.io/24spring/ 0. Processes vs. threads 1. Intro to concurrency ----- 0. Processes vs. threads - Q: what's the difference? - an example void *f(void *_) { sleep(1); printf("this is f\n"); exit(0); } int main(){ pthread_t tid; pthread_create(&tid, NULL, f, NULL); pthread_join(tid, NULL); // line X printf("this is main\n"); } Question: what is the output? [answer: "this is f"] Question: if we comment line X, what is the output? [answer: "this is main"] 1. Intro to concurrency There are many sources of concurrency. --what is concurrency? --stuff happening at the same time --sources of concurrency --on a single CPU, processes/threads can have their instructions interleaved (helpful to regard the instructions in multiple threads as "happening at the same time") --computers have multiple CPUs and common memory, so instructions in multiple threads can happen at the same time! --interrupts (CPU was doing one thing; now it's doing another) --why is concurrency hard? *** Hard to reason about all possible interleavings *** (deeper reason: human brain is "single-threaded") [skipped] - an example: int main() { fork(); fork(); printf("\nhello world"); } --expected outputs: [many possibilities; should see four newlines ("\n") and four "hello world", but their order is uncertain.] [handout week05b] panel 1: 1a: x = 1 or x = 2. 1b: x = 13 or x = 25. 1c: x = 1 or x = 2 or x = 3 say x is at mem location 0x5000 f is "x = x+1;", which might compile to: movq 0x5000, %rbx # load from address 0x5000 into register addq $1, %rbx # add 1 to the register's value movq %rbx, 0x5000 # store back g is "x = x+2;", which might compile to: movq 0x5000, %rbx # load from address 0x5000 into register addq $2, %rbx # add 2 to the register's value movq %rbx, 0x5000 # store back panel 2: incorrect list structure both threads will insert new List_elem as head one List_elem will be lost (later threads cannot access it) points: --all of these are called **race conditions**; not all of them are errors, though --all of these can happen on one-CPU machine (multi-core case can be worse; will see soon) B. Memory consistency model --sequential consistency not always in effect --sequential consistency means: (or what I call, "formal definition of how normal human being thinks of concurrency") --(1) maintain a single sequential order among operations (namely: all memory operations can form a sequential order where the read ops read from the most recent writes.) --(2) maintain program order on individual processors (namely: all operations from the same thread follow their issue order in the program) --a concurrent execution is **equivalent** to a sequential execution when [--(0) of course, both must have the same operations] --(1) all (corresponding) reads return the same values --(2) the final state are the same --if a memory has _sequential consistency_, it means that: a concurrent program running on multiple CPUs looks as if the same program (with the same number of threads) running on a single CPU. or, more formally, there always _exists_ a sequential order of memory operations that is _equivalent_ to what the concurrent program produces running on multiple CPUs. - On multiple CPUs, we can get "interleavings" _that are impossible on single-CPU machines_. In other words, the number of interleavings that you have to consider is _worse_ than simply considering all possible interleavings of sequential code. [handout week06a] panel 1 --hardware makes the problem even harder [look at handout_w6a panel 1; what is the correct answer?] [draw these cases on board] [answer: "it depends on the hardware"] - explain why sequential consistency is not held: -- CPUs -> Memory (too slow) -- CPUs -> Caches -> Memory (Write stall: invalidating cache requires a long time) -- CPUs -> StoreBuffers -> Caches -> Memory -- more... -- out-of-order execution -- speculative execution [draw: +-----+ +-----+ | CPU | | CPU | | | | | [later: StoreBuffer] | | | | [later: cache] +--+--+ +--+--+ | | +-------+---------+ | v +------------+ | memory | +------------+ ] - To whom interested in knowing more about the details of memory consistency: --http://sadve.cs.illinois.edu/Publications/computer96.pdf --Paul E. McKenney, "Memory Barriers: a Hardware View for Software Hackers" --assume sequential consistency until we explicitly relax it - multi-core CPU architecture [show slides] A concurrency problem: T1: A=1; print(B) T2: B=1; print(A) From a sequential execution point of view, at least one "1" will be printed. However, in reality, the outputs might be two "0"s. C. concurrency problem from software But it's not just hardware: -- Compiler may play a role as well. -- here is an example (borrowed from https://courses.cs.washington.edu/courses/cse451/15au/notes/l17-mcm-slides.pdf) int x = 0; // a global variable void foo() { for (int i=0; i<100; i++) { x = 1; printf("%d", x); } } void bar() { x = 0; } Q: if foo() and bar() are run on two threads, what can be the possible output? Q: Can we see "1100000000000000..."? [YES! because a compiler may optimize foo() as """ x = 1; for (int i=0; i<100; i++) { printf("%d", x); } """ ] -- point: concurrency is HARD. [draw what people think vs. what compiler complies vs. sequential consistency vs. real-world memory consistency ]