Week 5.b
CS5600
02/08 2023
https://naizhengtan.github.io/23spring/

0. Last time
1. Concurrency and consistency model
2. Managing concurrency
-----------------------------------------

0. Processes vs. Threads (last time)

  - Q: what's the difference?

  - an example

    void *f(void *xx) {
        sleep(1);
        printf("this is f\n");
        exit(0);
    }

    int main(){
        pthread_t tid;
        pthread_create(&tid, NULL, f, NULL);
        pthread_join(tid, NULL);  // line X
        printf("this is main\n");
    }

    Question: what is the output? 
      [answer: "this is f"]

    Question: if we comment line X, what is the output?
      [answer: "this is main"]


1. Concurrency and consistency model

 A. handout from last time

   panel 2: incorrect list structure

       both threads will insert new List_elem as head
       one List_elem will be lost (later threads cannot access it)


   panel 3: incorrect count in buffer

       [draw producer-consumer on board]

       producer --+                 +--> consumer
                  |                 |
                  +---> [buffer] ---+
                     (count, in, out)

       "buffer" of course is not a real ring.
       it is an array with a wrap around pointer.

       Question: why might go wrong with this incorrect count?
       [answer: the producer may overwrite items;
       the consumer may take out NULL.]

     points:

       --all of these are called **race conditions**;
                not all of them are errors, though

       --all of these can happen on one-CPU machine
         (multi-core case can be worse; will see soon)


   --What are you going to do when debugging concurrency code?

       --worst part of errors from race conditions is that a program
       may work fine most of the time but only occasionally show
       problems. why?  (because the instructions of the various threads
       or processes or whatevever get interleaved in a
       non-deterministic order.)

       --and it's worse than that because inserting debugging code may
       change the timing so that the bug doesn't show up


  B. Memory consistency model

    --hardware makes the problem even harder

      [look at handout_w5a panel 4; what is the correct answer?]

      [draw these cases on board]

      [answer: "it depends on the hardware"]

      --sequential consistency not always in effect

      --sequential consistency means:

        (or what I call, "formal definition of how normal human being
        thinks of concurrency")

      --if a memory has _sequential consistency_, it means that:

          a concurrent program running on multiple CPUs
            looks as if
          the same program (with the same number of threads)
          running on a single CPU.

        or, more formally,

          there always _exists_ a sequential order of memory operations
          that is _equivalent_ to what the concurrent program produces
          running on multiple CPUs.


      - On multiple CPUs, we can get "interleavings" _that are
      impossible on single-CPU machines_. In other words, the number
      of interleavings that you have to consider is _worse_ than
      simply considering all possible interleavings of sequential code.


      - explain why sequential consistency is not held:

        -- CPUs -> Memory            (too slow)

        -- CPUs -> Caches -> Memory  (Write stall: invalidating cache requires a long time)

        -- CPUs -> StoreBuffers -> Caches -> Memory

        -- more...
          -- out-of-order execution
          -- speculative execution

       [draw:

          +-----+           +-----+
          | CPU |           | CPU |
          |     |           |     | [later: StoreBuffer]
          |     |           |     | [later: cache]
          +--+--+           +--+--+
             |                 | 
             +-------+---------+
                     |
                     v
               +------------+
               |   memory   |
               +------------+

       ]

      - To whom interested in knowing more about the
        details of memory consistency:
         --http://sadve.cs.illinois.edu/Publications/computer96.pdf
         --Paul E. McKenney, "Memory Barriers: a Hardware
           View for Software Hackers"

    --assume sequential consistency until we explicitly relax it

  - in fact, checking sequential consistency (SC) is NP-hard

  - multi-core CPU architecture
    [show slides]

  - But it's not just hardware:

    -- Compiler may play a role as well.
    -- here is an example
       (borrowed from 
       https://courses.cs.washington.edu/courses/cse451/15au/notes/l17-mcm-slides.pdf)

    int x = 0;  // a global variable

    void foo() {

      for (int i=0; i<100; i++) {
          x = 1;
          printf("%d", x);
      }
    }

    void bar() {
      x = 0;
    }

    Q: if foo() and bar() are run on two threads,
      what can be the possible output?

    Q: Can we see "1100000000000000..."?
    [YES! because a compiler may optimize foo() as

      """
      x = 1;
      for (int i=0; i<100; i++) {
          printf("%d", x);
      }
      """

    ]

  -- point: concurrency is HARD.


2. Managing concurrency

    * critical sections
    * protecting critical sections
    * implementing critical sections

    --step 1: the concept of *critical sections*

      --Regard accesses of shared variables (for example, "count" in
      the bounded buffer example) as being in a _critical section_

      --Critical sections will be protected from concurrent execution

      [draw: program {A; c.s.; B} and three threads running it

      T1 ---[A][c.s.]-----[  B  ]-->
      T2 ---[A]------[c.s.]---[ B ]----->
      T3 ---[A]------------[c.s.]-----[B]----->

      ]

      --analogy: a room that only allows one person
         --people: threads
         --room: critical section

    --Solution must satisfy 3 properties:

        1. mutual exclusion
        only one thread can be in c.s. at a time
                [this is the notion of atomicity]

        2. progress
        if no threads executing in c.s., one of the threads
        trying to enter a given c.s. will eventually get in
          (sometimes called "no lockout")

        3. bounded waiting
        once a thread T starts trying to enter the critical
        section, there is a bound on the number of other threads
        that may enter the critical section before T enters
          (sometimes called "no deadlock")

    [will start from here next time]