Week 2.a CS6640 09/12 2023 https://naizhengtan.github.io/23fall/ 0. Admin 1. Why C? 2. Everything is 0s/1s 3. Little-endian 4. Memory layout in egos-2k+ 5. C pointers ---- 0. Admin - go through the lottery table - a typo in lab1 (thanks Marcin!) - repeat the slack hours 1. why C? - good for low-level programming easy mapping between C and RISC-V instructions easy mapping between C types and hardware structures e.g.., set bit flags in hardware registers of a device - minimal runtime easy to port to another hardware platform direct access to hardware - explicit memory management no garbage collector kernel is in complete control of memory management - efficient: compiled (no interpreter) compiler compiles C to assembly - popular for building kernels, system software, etc. good support for C on almost any platform - why not? easy to write incorrect code easy to write code that has security vulnerabilities - using high-level language to implement OS? It was (?) a hot topic: -- using Go: https://pdos.csail.mit.edu/papers/biscuit:login19.pdf (OSDI'2018) -- using Rust: https://www.yecl.org/publications/boos2020osdi.pdf (OSDI'2020) [given time, circle back to Biscuit.] 2. everything is 0s and 1s - some backgrounds: -- bits and bytes -- binary, decimal, and hex numbers -- hexadecimal numbers (or hex numbers) -- integer with base of 16 -- an example: 0x123456789abcdef -- other examples: 0xdeadbeef, 0xbebeebee -- binary vs. hex -- 0000 == 0x0 -- 1111 == 0xf -- why "0x" for hex? --short answer: a convention, an arbitrary choice [see: https://stackoverflow.com/questions/2670639/why-are-hexadecimal-numbers-prefixed-with-0x] - program = instructions + data - an executable program = a binary file (ELF file for us; will discuss details in later lectures) - an executing program = CPU registers + some chunks of memory (a process) - instructions -- lab1: "call main" - data Q: how many bytes in an int? [4bytes or 32bits egos-2k+ uses a RV32 CPU. see also calling convention] - how about text file? [take a look at egos/egos2000_repo.txt] - how about the web pages? [take a look at diff.html] - how about pictures? - how about video? ["Using TensorFlow for Deep Learning on Video Data": https://blog.tensorflow.org/2023/01/using-tensorflow-for-deep-learning-on-video-data.html] Q: In egos-2k+, an int is 4B. How does 0xdeadbeef look like in memory? 0 1 2 3 +---------------------------+ | 0xde | 0xad | 0xbe | 0xef | +---------------------------+ (this is big-endian) 3. little endian This is how 0xdeadbeef looks like in our labs: 0 1 2 3 +---------------------------+ | 0xef | 0xbe | 0xad | 0xde | +---------------------------+ [demo: int main() { unsigned int x = 0xdeadbeef; char *byte = (char *)&x; for (int i=0;i<4; i++) { printf("[%d-byte]: %x", i, (unsigned int)byte[i]); } return 0; } ] Q: Why do we use little endian??? -- historical reasons: backward compatibility -- many big endian machines -- little-endian is sometimes more intuitive (ironically): [demo: long long val64 = 0x12345678; int *ptr32 = (int*) &val64; printf("what do you expect? 0x%x\n", *ptr32); ] We use little-endian in egos-2k+. [defined by "mstatus" register; show RISC-V spec] 4. Memory layout in egos-2k+ [explain here how we learn egos: finding a closure is hard; we will learn points separately, and gradually connect the points together. ] - a program has: text: code, read-only data data: global C variables stack: function's local variables (heap: dynamic memory allocation using sbrk, malloc/free) [will talk about this in a later module] - introduce build/debug/helloworld.lst - what we know: -- test section sits at 0x082000xx -- data section starts at 0x082000xx -- stack pointer is at 0x80002000 5. C pointers - a pointer = a memory address every variable has a memory address (i.e., p = &i) so each variable can be accessed through its pointer (i.e., *i) a pointer can be variable (e.g., int *p) and thus has a memory address, etc. [demo: int g = 3; int main() { int l = 5; // local variables don't have a default value int *p, *q; // take address of variable p = &g; q = &l; printf("p %p q %p\n", p, q); // pointer of an pointer int **pp; pp = &p; // take address of a pointer variable printf("pp %p %p %d\n", pp, *pp, **pp); int (*f)(int, char **); f = &main; // take address of a function< printf("main: %p\n", f); return 0; } ] Q: Why a pointer of a pointer is useful? For example, in context switch of egos-2k+, you will need to save the old stack pointer (void*) for future use. ctx_switch(??? old_sp, ??? new_sp ) { ... void *sp = get current stack pointer from CPU save sp to old_sp // <-- what the type of this old_sp? ... CPU's sp = new_sp ... } (will continue from here next time) [Achknowledgements: Frans Kaashoek]