Week 1 CS4973/CS6640 01/06 2025 https://naizhengtan.github.io/25spring/ □ 0. in-class policies □ 1. Intro to OSI □ 2. Assumptions □ 3. Mechanics and admin □ 4. C basics □ * [skipped] Why C? □ * [skipped] Everything is 0s/1s □ * Little-endian □ * Memory layout in egos-2k+ □ * [skipped] C pointers □ * [skipped] C arrays □ * [skipped] Bitwise operators □ * [skipped] Some C keywords ---- 0. in-class policies -- no laptop in the first hour (why? context switch & interruption are expensive) -- chocolate (why? willpower and energey) -- lottery explain lottery system Q: why do you want to take the course? 1. Intro to OSI - a "special" course: instead of lecture-oriented, OSI is lab-oriented What does that mean? For a student, pretty much the only goal is to understand and finish labs. The process of doing so however has a "side effect"---you will learn how OS is implemented. - this course vs. other systems courses ^ | +-------+ |impl | => OSI |details| --+---------+-------- + assump- |core OS |advanced | => CS5600 -tion |concepts |topics | --+----------------------> || \/ CS3650 Examples of "advanced" topics that we will not cover: -- multi-core concurrency -- security - What is an operating system? [cliche: - draw hardware, OS, applications - draw syscall ] Q: what does this hierarchical view really mean? CPUs are running one instruction at a time; what do these "ups" and "downs" mean? [draw a timeline of CPU execution of syscall with - ecall - CPU privilege level change - potential address space change - exception handler - save registers - switching kernel stack - [do the work] - switching user stack - restore registers - mret ] In this course, you should see OS as several pieces of C code. For our lab OS, egos-2k+, there are two main pieces (earth and grass) and multiple systems applications (now four). - What's the purpose of OS? (1) managing the resources of the machine for example, managing memory [show a simple memory allocator: earth/page_alloc.c] (2) abstracting the hardware for example, abstracting print [show earth/dev_tty.c and earth/uart_getc] - Why study OS implementations? Q: ask students You'll be glad you took this course if you... * care about what goes on under the hood * like infrastructure * care about high performance - How will we study operating systems? We will focus on a single-core microkernel, egos-2k+. history [show the homepage] the course is built around labs [go through the lab page] 2. Assumptions a. know core concepts of OS (taken CS5600 or CS3650 or equivalent OS courses) a quick quiz: -- give one difference between a 32-bit CPU vs. a 64-bit CPU -- have you heard of privileged levels? what are they (in your prev OS class)? -- what's the difference between kernel code and user code? [this is an intentionally vague question.] -- have you heard of virtual memory? cr3? -- what is an inode? [you will learn all of these during the course] b. feel comfortable with C another quick quiz: -- how to set the fifth bit (starting from 0th) of an int? -- what happens when calling a function? what is calling convention? -- what x will be on a 32-bit CPU: int x = 0; char* c_ptr= (void*)&x; c_ptr[3] = 1; printf("0x%x\n", x); answer: 0x1000000 (on little-endian machine) c. can read manual and specifications (e.g., RISC-V spec and CPU manual) 3. Mechanics and admin a. communication us-to-you: --homepage, announcements: check this regularly --your NEU email (seldom) you-to-us: --piazza: questions that are not sensitive --staff email list for admin/sensitive things --office hours b. components of the course: --labs, arranging from easy to hard --lectures, assisting labs --one exam --final project c. lectures --attending: no roll call, but...will randomly pick students to answer questions (lottery) --notes will be published, but will be hard to understand if you miss the lecture --asking questions in class is encouraged (different from asking questions about labs; see below) d. labs: --what matters in this course (70% of the grades) --often: not much code to write (relatively speaking), but lots to learn! --"Start early" "reading is part of labs" "debugging is part of labs" --we expect you think through, then ask --"Here is my code. It doesn't work. Please debug." won't work. --If you get a reply from a TA, and then send email 20 minutes later asking a closely related question, that’s probably not great too. --slack hours e. the exam: -- paper-pen exam -- will cover labs f. integrity policies: --Here are some questions: a. Looking at a classmate's solution and then coding it by yourself afterward b. Showing your code to a classmate who has questions c. Modifying code that you find on StackOverflow d. Modifying code for a similar assignment that you find on GitHub The correct answer: ALL of these are ruled out by the policy. --wait, what about ChatGPT? 4. C basics A) why C? - good for low-level programming easy mapping between C and RISC-V instructions easy mapping between C types and hardware structures e.g.., set bit flags in hardware registers of a device - minimal runtime easy to port to another hardware platform direct access to hardware - explicit memory management no garbage collector kernel is in complete control of memory management - efficient: compiled (no interpreter) compiler compiles C to assembly - popular for building kernels, system software, etc. good support for C on almost any platform - why not? easy to write incorrect code easy to write code that has security vulnerabilities - using high-level language to implement OS? It was (?) a hot topic: -- using Go: https://pdos.csail.mit.edu/papers/biscuit:login19.pdf (OSDI'2018) -- using Rust: https://www.yecl.org/publications/boos2020osdi.pdf (OSDI'2020) [given time, circle back to Biscuit.] B) everything is 0s and 1s - some backgrounds: -- bits and bytes -- binary, decimal, and hex numbers Q: How many bit patterns can a group of 2 bits have? [Answer: 2^2 = 4] Q: How many bit patterns can a group of 3 bits have? [Answer: 2^3 = 8] Q: How many bit patterns does a group of n bits have? [Answer: 2^n] -- hexadecimal numbers (or hex numbers) -- integer with base of 16 -- an example: 0x123456789abcdef -- other examples: 0xdeadbeef, 0xbebeebee -- binary vs. hex -- 0000 == 0x0 -- 1111 == 0xf Q: show $misa using hex and ask what does this CPU has? -- why "0x" for hex? --short answer: a convention, an arbitrary choice [see: https://stackoverflow.com/questions/2670639/why-are-hexadecimal-numbers-prefixed-with-0x] * Digression: any self-respecting CS person must memorize powers of 2 2^1 = 2 2^2 = 4 2^3 = 8 2^4 = 16 2^5 = 32 2^6 = 64 2^8 = 128 2^9 = 256 2^10 = 1024 [fun fact: programmer's day -- the 256th of the year (09/12 or 09/13) -- 10/24 ] * Q: why bit has only two values? does it work for a machine with bits representing three values? [Answer: yes! there are ternary computers see: https://en.wikipedia.org/wiki/Ternary_computer Aside, ternary machines are "optimal" in some sense. It provides "optimal coding of numbers". ] - program = instructions + data - an executable program = a binary file (ELF file for us; will discuss details in later lectures) - an executing program = CPU registers + some chunks of memory (a process) - instructions -- lab1: "call main" - data Q: how many bytes in an int? [4bytes or 32bits egos-2k+ uses a RV32 CPU. see also calling convention] - how about text file? [take a look at egos/egos2000_repo.txt] - how about the web pages? [take a look at diff.html] - how about pictures? [Google CIFAR-10] - how about video? ["Using TensorFlow for Deep Learning on Video Data": https://blog.tensorflow.org/2023/01/using-tensorflow-for-deep-learning-on-video-data.html] Q: In egos-2k+, an int is 4B. How does 0xdeadbeef look like in memory? 0 1 2 3 +---------------------------+ | 0xde | 0xad | 0xbe | 0xef | +---------------------------+ C) little endian 0 1 2 3 +---------------------------+ | 0xef | 0xbe | 0xad | 0xde | +---------------------------+ [demo: int main() { unsigned int x = 0xdeadbeef; char *byte = (char *)&x; for (int i=0;i<4; i++) { printf("[%d-bit]: %x", i, (unsigned int)byte[i]); } return 0; } ] Q: Why do we use little endian??? -- historical reasons: backward compatibility -- many big endian machines -- little-endian is sometimes more intuitive (ironically): [demo: long long val64 = 0x12345678; int *ptr32 = (int*) &val64; printf("what do you expect? 0x%x\n", *ptr32); ] We use little-endian in egos-2k+. [defined by "mstatus" register; show RISC-V spec] D) Memory layout in egos-2k+ [explain here how we learn egos: finding a closure is hard; we will learn points separately, and gradually connect the points together. ] - a program has: text: code, read-only data data: global C variables stack: function's local variables (heap: dynamic memory allocation using sbrk, malloc/free) [will talk about this in a later module] - introduce build/debug/helloworld.lst - what we know: -- test section sits at ? -- data section starts at ? -- stack pointer is at ? E) C pointers - a pointer = a memory address every variable has a memory address (i.e., p = &i) so each variable can be accessed through its pointer (i.e., *i) a pointer can be variable (e.g., int *p) and thus has a memory address, etc. [demo: int g = 3; int main() { int l = 5; // local variables don't have a default value int *p, *q; // take address of variable p = &g; q = &l; printf("p %p q %p\n", p, q); // pointer of an pointer int **pp; pp = &p; // take address of a pointer variable printf("pp %p %p %d\n", pp, *pp, **pp); int (*f)(int, char **); f = &main; // take address of a function< printf("main: %p\n", f); return 0; } ] Q: Why a pointer of a pointer is useful? For example, in context switch of egos-2k+, you will need to save the old stack pointer (void*) for future use. ctx_switch(??? old_sp, ??? new_sp ) { ... void *sp = get current stack pointer from CPU save sp to old_sp // <-- what the type of this old_sp? ... CPU's sp = new_sp ... } - uninitialized local variables on stack. Q: what do you expect the results of this: [demo: void bar() { int l; // local variables don't have a default value printf("uninitialized local variable: %x\n", l); } void foo() { int xyz = 0xdeadbeef; printf("quit foo w/ local var = %x\n", xyz); } ] - pointer arithmetic Q: what is the value of c+1 and i+1? [demo: int main() { char *c = (char *)0x08200000; int *i = (int *) 0x08200000; printf("%p %p\n", (c+1), (i+1)); return 0; } ] F) C arrays - an array: contiguous memory holding same data type (char, int, etc.) no bound checking, no growing - two ways to access arrays: through index: buf[0], buf[1] through pointer: *buf, *(buf+1) [demo: int a[3] = {1, 2, 3}; // an array of 3 int's char b[3] = {'a', 'b', 'c'}; // an array of 3 char's int main() { // first element is at index 0 printf("%d\n", a[0]); a[1] += 1; // use index access *(a+2) = 5; // pointer access printf("%d %d\n", a[1], a[2]); // pointers to array elements printf("a %p a1 %p a2 %p a2 %p\n", a, a+1, a+2, &a[2]); // pointer arithmetic uses type printf("%p %p\n", b, b+1); return 0; } ] G) bitwise operators - you can manipulate bits with |, &, ~, ^ 10001 & 10000 = 10000 // bit-wise and 10001 | 10000 = 10001 // bit-wise or 10001 ^ 10000 = 00001 // bit-wise xor ~1000 = 0111 // bit-wise not - examples: earth/earth.c earth/bus_uart.c grass/kernel.c H) keywords: - void: "no type", "no value", "no parameters" - static: to make a variable's visibility limited to the file it is declared but global within the file - union: a composite data type that allows you to store different types of data in the same memory location. This is useful for data that can be interpreted as multiple types. For example, see library/file/file.h [Achknowledgements: Frans Kaashoek]