Week 5.a
CS5600
2/14 2022
https://naizhengtan.github.io/22spring/

1. Last time
2. Intro to concurrency
3. Memory consistency model
4. Managing concurrency
--------------------------------------------

Admin

- Lab2 is released
  -- substantially harder: C + syscall + shell
  -- milestone in a week

--------------------------------------------

1. Last time

    - Interface to threads:
        tid thread_create (void (*fn) (void *), void *); 
        void thread_exit (); 
        void thread_join (tid thr); 

    - thread vs. process
      -- threads share memory, but they have their own "execution context"
      (registers and stack).

2. Intro to concurrency

    There are many sources of concurrency.

    --what is concurrency?

        --stuff happening at the same time

    --sources of concurrency

        --on a single CPU, processes/threads can have their instructions
        interleaved (helpful to regard the instructions in multiple
        threads as "happening at the same time")

        --computers have multiple CPUs and common memory, so
        instructions in multiple threads can happen at the same time!

        --interrupts (CPU was doing one thing; now it's doing another)

    --why is concurrency hard?

        *** Hard to reason about all possible interleavings

        *** (deeper reason: human brain is "single-threaded")

    --handout:

        1a:  x = 1 or x = 2.
        1b:  x = 13 or x = 25.
        1c:  x = 1 or x = 2 or x = 3 

            say x is at mem location 0x5000

            f is "x = x+1;", which might compile to:

            movq 0x5000, %rbx    # load from address 0x5000 into register
            addq $1, %rbx        # add 1 to the register's value
            movq %rbx, 0x5000    # store back


            g is "x = x+2;", which might compile to:

            movq 0x5000, %rbx    # load from address 0x5000 into register
            addq $2, %rbx        # add 2 to the register's value
            movq %rbx, 0x5000    # store back


      2: incorrect list structure

        both threads will insert new List_elem as head
        one List_elem will be lost (later threads cannot access it)


      3: incorrect count in buffer

        [draw producer-consumer on board]

        producer --+                 +--> consumer
                   |                 |
                   +---> [buffer] ---+
                      (count, in, out)

        "buffer" of course is not a real ring.
        it is an array with a wrap around pointer.

        Question: why might go wrong with this incorrect count?
        [answer: the producer may overwrite items;
        the consumer may take out NULL.]

      points:

        --all of these are called race conditions;
                 not all of them are errors, though

        --all of these can happen on one-CPU machine
          (multi-core case can be worse; will see soon)


    - revisit: an example you've seen, HW2 Q2.b
        int main()
        {
            fork();
            fork();
            // [update 2/4: fix the double quotation marks]
            printf("\nhello world");
        }

      --expected outputs:
        [many possibilities; should see four newlines ("\n")
        and four "hello world", but their order is uncertain.]


    --What are you going to do when debugging concurrency code?

        --worst part of errors from race conditions is that a program
        may work fine most of the time but only occasionally show
        problems. why?  (because the instructions of the various threads
        or processes or whatevever get interleaved in a
        non-deterministic order.)

        --and it's worse than that because inserting debugging code may
        change the timing so that the bug doesn't show up


3. Memory consistency model

    --hardware makes the problem even harder

        [look at handout_w5a panel 4; what is the correct answer?]

        [draw these cases on board]

        [answer: "it depends on the hardware"]

        --sequential consistency not always in effect

        --sequential consistency means:

          (or what I call, "formal definition of how normal human being
          think of concurrency")

            --(1) maintain a single sequential order among operations
              (namely: all memory operations can form a sequential order
              where the read ops read from the most recent writes.)

            --(2) maintain program order on individual processors
              (namely: all operations from the same thread follow their
              issue order in the program)

        --a concurrent execution is **equivalent** to a sequential execution when

           [--(0) of course, both must have the same operations]

           --(1) all (corresponding) reads return the same values

           --(2) the final state are the same

        --if a memory has _sequential consistency_, it means that:

            a concurrent program running on multiple CPUs
              looks as if
            the same program (with the same number of threads)
            running on a single CPU.

          or, more formally,

            there always _exists_ a sequential order of memory operations
            that is _equivalent_ to what the concurrent program produces
            running on multiple CPUs.


        - On multiple CPUs, we can get "interleavings" _that are
        impossible on single-CPU machines_. In other words, the number
        of interleavings that you have to consider is _worse_ than
        simply considering all possible interleavings of sequential code.


        - explain why sequential consistency is not held:

          -- CPUs -> Memory            (too slow)

          -- CPUs -> Caches -> Memory  (Write stall: invalidating cache requires a long time)

          -- CPUs -> StoreBuffers -> Caches -> Memory

          -- more...
            -- out-of-order execution
            -- speculative execution

         [draw:

            +-----+           +-----+
            | CPU |           | CPU |
            |     |           |     | [later: StoreBuffer]
            |     |           |     | [later: cache]
            +--+--+           +--+--+
               |                 | 
               |                 |
               +-------+---------+
                       |
                       v
                 +------------+
                 |   memory   |
                 +------------+

         ]

        - To whom interested in knowing more about the
          details of memory consistency:
           --http://sadve.cs.illinois.edu/Publications/computer96.pdf
           --Paul E. McKenney, "Memory Barriers: a Hardware
             View for Software Hackers"

    --assume sequential consistency until we explicitly relax it

    Discussion: memory consistency and nosql
      -- sequential consistency
      -- causal consistency
      -- eventual consistency


4. Managing concurrency

    * critical sections
    * protecting critical sections
    * implementing critical sections

    --step 1: the concept of *critical sections*

        --Regard accesses of shared variables (for example, "count" in
        the bounded buffer example) as being in a _critical section_

        --Critical sections will be protected from concurrent execution

        [draw: program {A; c.s.; B} and three threads running it

        T1 ---[A][c.s.]-----[  B  ]-->
        T2 ---[A]------[c.s.]---[ B ]----->
        T3 ---[A]------------[c.s.]-----[B]----->

        ]

        --analogy: a room that only allows one person
           --people: threads
           --room: critical section

    --Critical section should satisfy 3 properties:

        1. mutual exclusion
        only one thread can be in c.s. at a time        
                [this is the notion of atomicity]

        2. progress
        if no threads executing in c.s., one of the threads
        trying to enter a given c.s. will eventually get in

        3. bounded waiting
        once a thread T starts trying to enter the critical
        section, there is a bound on the number of other threads
        that may enter the critical section before T enters

    --Note progress vs. bounded waiting 

        --If no thread can enter C.S., don't have progress 

        --If thread A waiting to enter C.S. while B repeatedly
        leaves and re-enters C.S. ad infinitum, don't have bounded
        waiting 

        --We will be mostly concerned with mutual exclusion (in fact,
        real-world synchronization/concurrency primitives often don't
        satisfy bounded waiting.)

        [will continue in the next lecture]