Week 2.a
CS6640
09/12 2023
https://naizhengtan.github.io/23fall/

0. Admin
1. Why C?
2. Everything is 0s/1s
3. Little-endian
4. Memory layout in egos-2k+
5. C pointers
----

0. Admin

  - go through the lottery table
  - a typo in lab1 (thanks Marcin!)
  - repeat the slack hours

1. why C?

  - good for low-level programming
    easy mapping between C and RISC-V instructions
    easy mapping between C types and hardware structures
      e.g.., set bit flags in hardware registers of a device

  - minimal runtime
    easy to port to another hardware platform
    direct access to hardware

  - explicit memory management
    no garbage collector
    kernel is in complete control of memory management

  - efficient: compiled (no interpreter)
    compiler compiles C  to assembly

  - popular for building kernels, system software, etc.
    good support for C on almost any platform

  - why not?
    easy to write incorrect code
    easy to write code that has security vulnerabilities

  - using high-level language to implement OS? 
    It was (?) a hot topic:
    -- using Go: https://pdos.csail.mit.edu/papers/biscuit:login19.pdf (OSDI'2018)
    -- using Rust: https://www.yecl.org/publications/boos2020osdi.pdf  (OSDI'2020)
    [given time, circle back to Biscuit.]


2. everything is 0s and 1s

  - some backgrounds:
    -- bits and bytes
    -- binary, decimal, and hex numbers

    -- hexadecimal numbers (or hex numbers)
      -- integer with base of 16
      -- an example: 0x123456789abcdef
      -- other examples: 0xdeadbeef, 0xbebeebee

    -- binary vs. hex
       -- 0000 == 0x0
       -- 1111 == 0xf

    -- why "0x" for hex?
      --short answer: a convention, an arbitrary choice
      [see: https://stackoverflow.com/questions/2670639/why-are-hexadecimal-numbers-prefixed-with-0x]

  - program = instructions + data

  - an executable program = a binary file
    (ELF file for us; will discuss details in later lectures)

  - an executing program = CPU registers + some chunks of memory
                           (a process)

  - instructions
    -- lab1: "call main"

  - data

    Q: how many bytes in an int?
     [4bytes or 32bits
     egos-2k+ uses a RV32 CPU.
     see also calling convention]

  - how about text file?
     [take a look at egos/egos2000_repo.txt]
  - how about the web pages?
     [take a look at diff.html]
  - how about pictures?
  - how about video?
     ["Using TensorFlow for Deep Learning on Video Data":
     https://blog.tensorflow.org/2023/01/using-tensorflow-for-deep-learning-on-video-data.html]

  Q: In egos-2k+, an int is 4B.
     How does 0xdeadbeef look like in memory?

     0      1       2      3
     +---------------------------+
     | 0xde | 0xad | 0xbe | 0xef |
     +---------------------------+
       (this is big-endian)

3. little endian

     This is how 0xdeadbeef looks like in our labs:

     0      1       2      3
     +---------------------------+
     | 0xef | 0xbe | 0xad | 0xde |
     +---------------------------+

   [demo:
    int main() {
        unsigned int x = 0xdeadbeef;
        char *byte = (char *)&x;

        for (int i=0;i<4; i++) {
            printf("[%d-byte]: %x", i, (unsigned int)byte[i]);
        }
        return 0;
    }
   ]

   Q: Why do we use little endian???
   -- historical reasons: backward compatibility
   -- many big endian machines
   -- little-endian is sometimes more intuitive (ironically):

   [demo:
    long long  val64 = 0x12345678;
    int *ptr32 = (int*) &val64;
    printf("what do you expect? 0x%x\n", *ptr32);
   ]

  We use little-endian in egos-2k+.
  [defined by "mstatus" register; show RISC-V spec]


4. Memory layout in egos-2k+

   [explain here how we learn egos:
     finding a closure is hard; we will learn points separately,
     and gradually connect the points together. ]

  - a program has:
    text: code, read-only data
    data: global C variables
    stack: function's local variables
    (heap: dynamic memory allocation using sbrk, malloc/free)
      [will talk about this in a later module]

  - introduce build/debug/helloworld.lst

  - what we know:
    -- test section sits at 0x082000xx
    -- data section starts at 0x082000xx
    -- stack pointer is at 0x80002000

5. C pointers

  - a pointer = a memory address
    every variable has a memory address (i.e., p = &i)
    so each variable can be accessed through its pointer (i.e., *i)
    a pointer can be variable (e.g., int *p)
      and thus has a memory address, etc.

  [demo:

    int g = 3;

    int main() {
      int l = 5;  // local variables don't have a default value
      int *p, *q;

      // take address of variable
      p = &g;
      q = &l;
      printf("p %p q %p\n", p, q);

      // pointer of an pointer
      int **pp;
      pp = &p;    // take address of a pointer variable
      printf("pp %p %p %d\n", pp, *pp, **pp);

      int (*f)(int, char **);
      f = &main;  // take address of a function<
      printf("main: %p\n", f);

      return 0;
    }
   ]

   Q: Why a pointer of a pointer is useful?

      For example, in context switch of egos-2k+,
      you will need to save the old stack pointer (void*) for future use.

      ctx_switch(??? old_sp, ??? new_sp ) {
        ...
        void *sp = get current stack pointer from CPU
        save sp to old_sp  // <-- what the type of this old_sp?
        ...
        CPU's sp = new_sp
        ...
      }

(will continue from here next time)

[Achknowledgements: Frans Kaashoek]