Week 4.a
CS5600
9/28 2021
https://naizhengtan.github.io/21fall/

1. Last time
2. Intro to concurrency, continued
3. Memory consistency model
4. Managing concurrency
--------------------------------------------

Admin

- review session on Wed, 2-3pm; will be recorded

- sample midterm
  -- for simplicity, repeating handouts/homework; no "hints" here
  -- all material is fair game for exams

- take the survey; silent majority

- Lab1, due 09/30

- communication via mailing list: remember to "reply all" or cc mailing list

--------------------------------------------

1. Last time

    - the fork/exec separation

    - implementation of processes; PCB

    - processes vs. threads

        [draw CPU, memory, and threads on board]

    - thread APIs

       thread_create(...)
       thread_exit()
       thread_join(...)

       Question: why not fork and exec?

    - intro to concurrency
      Concurrency is challenging because it challenges the
      "single-threaded" world we're living in!


2. Intro to concurrency, continued

    [handout_w3a from last time]

      2: incorrect list structure

      3: incorrect count in buffer

      all of these are called race conditions; not all of them are
      errors, though

    --Question: what are you going to do when debugging concurrency code?

        --worst part of errors from race conditions is that a program
        may work fine most of the time but only occasionally show
        problems. why?  (because the instructions of the various threads
        or processes or whatevever get interleaved in a
        non-deterministic order.)

        --and it's worse than that because inserting debugging code may
        change the timing so that the bug doesn't show up


3. Memory consistency model

    --hardware makes the problem even harder

        [look at handout_w3a panel 4; what is the correct answer?]

        [answer: "it depends on the hardware"]

        --sequential consistency not always in effect

        --sequential consistency means:

          (or what I call, "formal definition of how normal human being
          think of concurrency")

            --(1) maintain a single sequential order among operations
              (namely: all memory operations can form a sequential order
              where the read ops read from the most recent writes.)

            --(2) maintain program order on individual processors
              (namely: all operations from the same thread follow their
              issue order in the program)

        --a concurrent execution is **equivalent** to a sequential execution when

           [--(0) of course, both must have the same operations]

           --(1) all (corresponding) reads return the same values

           --(2) the final state are the same


        --if a memory has _sequential consistency_, it means that:

            a concurrent program running on multiple CPUs
              looks as if
            the same program (with the same number of threads) running on a single CPU.

          or, more formally,

            there always _exists_ a sequential order of memory operations
            that is _equivalent_ to what the concurrent program produces
            running on multiple CPUs.


        - On multiple CPUs, we can get "interleavings" _that are
        impossible on single-CPU machines_. In other words, the number
        of interleavings that you have to consider is _worse_ than
        simply considering all possible interleavings of sequential
        code.

        - explain why sequential consistency is not held:

          -- CPUs -> Memory            (too slow)

          -- CPUs -> Caches -> Memory  (Write stall: invalidating cache requires a long time)

          -- CPUs -> StoreBuffers -> Caches -> Memory

          -- more...
            -- out-of-order execution
            -- speculative execution

         [draw:

            +-----+           +-----+
            | CPU |           | CPU |
            |     |           |     | [later: StoreBuffer]
            |     |           |     | [later: cache]
            +--+--+           +--+--+
               |                 | 
               |                 |
               +-------+---------+
                       |
                       v
                 +------------+
                 |   memory   |
                 +------------+

         ]

        - To whom interested in knowing more about the
          details of memory consistency:
           --http://sadve.cs.illinois.edu/Publications/computer96.pdf
           --Paul E. McKenney, "Memory Barriers: a Hardware
             View for Software Hackers"

    --assume sequential consistency until we explicitly relax it

    [skipped]
    Discussion: memory consistency and nosql
      -- sequential consistency
      -- causal consistency
      -- eventual consistency


4. Managing concurrency

    * critical sections
    * protecting critical sections
    * implementing critical sections

    --step 1: the concept of *critical sections*

        --Regard accesses of shared variables (for example, "count" in
        the bounded buffer example) as being in a _critical section_

        --Critical sections will be protected from concurrent execution

        [draw: program {A; c.s.; B} and three threads running it

        T1 ---[A][c.s.]-----[  B  ]-->
        T2 ---[A]------[c.s.]---[ B ]----->
        T3 ---[A]------------[c.s.]-----[B]----->

        ]

        --analogy: a room that only allows one person
           --people: threads
           --room: critical section

    --Now we need a solution to _critical section_ problem

    --Solution must satisfy 3 properties:

        1. mutual exclusion
        only one thread can be in c.s. at a time        
                [this is the notion of atomicity]

        2. progress
        if no threads executing in c.s., one of the threads
        trying to enter a given c.s. will eventually get in
        
        3. bounded waiting
        once a thread T starts trying to enter the critical
        section, there is a bound on the number of other threads
        that may enter the critical section before T enters

    --Note progress vs. bounded waiting 

        --If no thread can enter C.S., don't have progress 

        --If thread A waiting to enter C.S. while B repeatedly
        leaves and re-enters C.S. ad infinitum, don't have bounded
        waiting 

        --We will be mostly concerned with mutual exclusion (in fact,
        real-world synchronization/concurrency primitives often don't
        satisfy bounded waiting.)

    --step 2: protecting critical sections. 

        --want lock()/unlock() or enter()/leave() or acquire()/release()

        --lots of names for the same idea

        --mutex_init(mutex_t* m), 
          mutex_lock(mutex_t* m),
          mutex_unlock(mutex_t* m),....

        --pthread_mutex_init(), pthread_mutex_lock(), ...

    --in each case, the semantics are that once the thread of
    execution is executing inside the critical section, no other
    thread of execution is executing there

      --analogy: mutex => door for the room
        --lock/qcuire => close the door
        --unlock/release => open the door
                            (let another person to enter the room)