Week 6.a
CS3650
02/12 2024
https://naizhengtan.github.io/24spring/

0. Processes vs. threads
1. Intro to concurrency
-----

0. Processes vs. threads

  - Q: what's the difference?

  - an example

    void *f(void *_) {
        sleep(1);
        printf("this is f\n");
        exit(0);
    }

    int main(){
        pthread_t tid;
        pthread_create(&tid, NULL, f, NULL);
        pthread_join(tid, NULL);  // line X
        printf("this is main\n");
    }

    Question: what is the output? 
      [answer: "this is f"]

    Question: if we comment line X, what is the output?
      [answer: "this is main"]


1. Intro to concurrency

    There are many sources of concurrency.

    --what is concurrency?

        --stuff happening at the same time

    --sources of concurrency

        --on a single CPU, processes/threads can have their instructions
        interleaved (helpful to regard the instructions in multiple
        threads as "happening at the same time")

        --computers have multiple CPUs and common memory, so
        instructions in multiple threads can happen at the same time!

        --interrupts (CPU was doing one thing; now it's doing another)

    --why is concurrency hard?

        *** Hard to reason about all possible interleavings

        *** (deeper reason: human brain is "single-threaded")

    [skipped]
    - an example:

        int main()
        {
            fork();
            fork();
            printf("\nhello world");
        }

      --expected outputs:
        [many possibilities; should see four newlines ("\n")
        and four "hello world", but their order is uncertain.]

    [handout week05b]
    panel 1:

        1a:  x = 1 or x = 2.
        1b:  x = 13 or x = 25.
        1c:  x = 1 or x = 2 or x = 3 

            say x is at mem location 0x5000

            f is "x = x+1;", which might compile to:

            movq 0x5000, %rbx    # load from address 0x5000 into register
            addq $1, %rbx        # add 1 to the register's value
            movq %rbx, 0x5000    # store back


            g is "x = x+2;", which might compile to:

            movq 0x5000, %rbx    # load from address 0x5000 into register
            addq $2, %rbx        # add 2 to the register's value
            movq %rbx, 0x5000    # store back


   panel 2: incorrect list structure

       both threads will insert new List_elem as head
       one List_elem will be lost (later threads cannot access it)

     points:

       --all of these are called **race conditions**;
                not all of them are errors, though

       --all of these can happen on one-CPU machine
         (multi-core case can be worse; will see soon)

  B. Memory consistency model

      --sequential consistency not always in effect

      --sequential consistency means:

        (or what I call, "formal definition of how normal human being
        thinks of concurrency")

          --(1) maintain a single sequential order among operations
            (namely: all memory operations can form a sequential order
            where the read ops read from the most recent writes.)

          --(2) maintain program order on individual processors
            (namely: all operations from the same thread follow their
            issue order in the program)

      --a concurrent execution is **equivalent** to a sequential execution when

         [--(0) of course, both must have the same operations]

         --(1) all (corresponding) reads return the same values

         --(2) the final state are the same

      --if a memory has _sequential consistency_, it means that:

          a concurrent program running on multiple CPUs
            looks as if
          the same program (with the same number of threads)
          running on a single CPU.

        or, more formally,

          there always _exists_ a sequential order of memory operations
          that is _equivalent_ to what the concurrent program produces
          running on multiple CPUs.

      - On multiple CPUs, we can get "interleavings" _that are
      impossible on single-CPU machines_. In other words, the number
      of interleavings that you have to consider is _worse_ than
      simply considering all possible interleavings of sequential code.

   [handout week06a]
   panel 1

    --hardware makes the problem even harder

      [look at handout_w6a panel 1; what is the correct answer?]

      [draw these cases on board]

      [answer: "it depends on the hardware"]


   - explain why sequential consistency is not held:

        -- CPUs -> Memory            (too slow)
        -- CPUs -> Caches -> Memory  (Write stall: invalidating cache requires a long time)
        -- CPUs -> StoreBuffers -> Caches -> Memory

        -- more...
          -- out-of-order execution
          -- speculative execution

       [draw:

          +-----+           +-----+
          | CPU |           | CPU |
          |     |           |     | [later: StoreBuffer]
          |     |           |     | [later: cache]
          +--+--+           +--+--+
             |                 | 
             +-------+---------+
                     |
                     v
               +------------+
               |   memory   |
               +------------+

       ]

      - To whom interested in knowing more about the
        details of memory consistency:
         --http://sadve.cs.illinois.edu/Publications/computer96.pdf
         --Paul E. McKenney, "Memory Barriers: a Hardware
           View for Software Hackers"

    --assume sequential consistency until we explicitly relax it

    - multi-core CPU architecture
      [show slides]

      A concurrency problem:
       T1: A=1; print(B)
       T2: B=1; print(A)

      From a sequential execution point of view,
        at least one "1" will be printed.
      However, in reality, the outputs might be two "0"s.

  C. concurrency problem from software

    But it's not just hardware:

    -- Compiler may play a role as well.
    -- here is an example 
       (borrowed from https://courses.cs.washington.edu/courses/cse451/15au/notes/l17-mcm-slides.pdf)

    int x = 0;  // a global variable

    void foo() {

      for (int i=0; i<100; i++) {
          x = 1;
          printf("%d", x);
      }
    }

    void bar() {
      x = 0;
    }

    Q: if foo() and bar() are run on two threads,
      what can be the possible output?

    Q: Can we see "1100000000000000..."?
    [YES! because a compiler may optimize foo() as

      """
      x = 1;
      for (int i=0; i<100; i++) {
          printf("%d", x);
      }
      """

    ]

  -- point: concurrency is HARD.
     [draw what people think vs.
           what compiler complies vs.
           sequential consistency vs.
           real-world memory consistency
       ]