CS3650 Final exam review ---- Overview 1. computer organization & C 2. processes 3. concurrency & synchronization 4. file system 5. networking 6. system security 1. computer organization & C - everything is 0s and 1s -- 1 byte == 8 bits -- capacity: KB, MB, GB, TB, PB -- 1GB DRAM != 1GB disk - OS -- providing services to user-level programs -- managing the resources -- abstracting the hardware - hardware abstraction -- CPU -- memory: an array of bytes -- disk: an array of blocks - how a program (helloworld) runs? * a file on disk * load to memory * memory layout: code, data, stack, heap * CPU "jumps" to the first instruction (CPU has a register %rip; $rip points to the address of the first instruction) * run a instruction at a time - a life cycle of a program vi gcc as ld loader HUMAN ---> foo.c ---> foo.s ----> foo.o ---> a.out ----> process - C basics -- control flow -- functions and scope -- types and operators -- char, int, float, double -- precedence and associativity -- expectation: fluently read C code and write C code with minor syntax errors - memory manipulation in C -- a pointer = a memory address -- C strings: a char array with terminating '\0' -- C arrays -- struct and malloc/free -- printf -- main and arguments 2. processes - an abstraction of a machine - process manipulation -- create process: fork() -- fork/exec separation -- parent and child -- orphan process and zombie process - programmer's view of process -- memory -- code, data, stack, and heap -- in C, what variables are in which memory area? -- registers -- %rip -- %rsp -- %rax -- others -- file descriptors - syscall -- an interface between processes and kernels -- system calls you've learned: -- fork, execve, wait, exit -- open, close, read, write -- socket, send, recv, bind, connect, accept, listen -- pipe, dup2, select - file descriptors -- an abstraction: a file, a device, or anything that follows open/read/write/close -- 0/1/2: stdin/stdout/stderr -- redirection and pipe - shell -- an interface between human and computer -- how it works; Lab2 -- parse commands -- internal and external cmds -- how to tell? use "which" -- run commands (fork/exec) -- handle shell operators -- redirections -- exit status ($?) - assembly code (x86-64) -- movq A,B -- pushq %rax -- popq %rax -- call foo -- ret - calling convention & stack frame -- where is the arguments and where is the return value -- call-preserved & call-clobbered -- stack frame -- [saved %rbp; local variables; call-preserved regs; %rip] 3. concurrency - threads -- inaccurate: "abstracting a CPU core" -- two threads in the same process share memory -- meaning: they share code segment, data segment (i.e., global variables), and heap (i.e., memory from malloc) -- threads have separate stack and registers - synchronization primitives -- Mutex: providing mutual exclusion -- APIs -- init(mutex) -- acquire(mutex) -- release(mutex) -- providing a critical section (mutual exclusion) -- entering c.s.: "I can mess up with the invariant" -- leaving c.s.: "I need to restore the invariant" -- Condition variable: -- why? providing synchronization scheduling (motivating by the soda example) -- APIs -- cond_init(cond) -- cond_wait(cond, mutex) // UGLY -- cond_signal(cond) -- cond_broadcast(cond) -- an important interface, cond_wait(...) -- meaning: 2 steps, (1) unlock mutex and wait (until cond signaled) (2) acquire mutex and resume executing -- why includes mutex (ugly interface)? - mental model: dangerous concurrency world [correct impl] ^ | +--[safe path] +--------------| |--|-------+ | Concurrency / / | | | World / /<--+ | | (dangerous)/ / | | / /all sorts of | | / / concurrency | | / / bugs | +---------| |---------------+ ^ [you] - safe path: how to write concurrency code -- safety first; this is one way, but is tested by time -- Monitor (mutex + condition variable) -- condition variables are not necessary! -- 6 rules from Mike Dahlin -- memorize them! -- four-step design approach 4. file system - what does FS do? 1. provide persistence 2. give a way to "name" a sequence of bytes on the disk (files) 3. give a way to map from human-friendly-names to "named bytes" (directories) - what are files? -- a set of disk blocks (OS) -- a sequence of named bytes (programmer) - file mapping -- {file, offset} ---> disk address (i.e., block number) -- mapping is part of inode -- three candidates -- contiguous allocation -- linked list -- indexed files - Unix inode -- file mapping: imbalanced tree -- why imbalanced? -- metadata: -- size -- timestamps -- owner -- file permission - dirs -- why do we need dirs? -- hierarchical namespace -- what is a dir? -- a "file" with specific contents -- the contents are pairs of (name, inode#) - links -- hard links ($ ln) -- soft links ($ ln -s) -- their difference - fs3650 (*) (lab4) -- a read-only Unix-like fs, with many simplifications -- have you internalized the following: -- superblock -- root -- mode (FD and RWX) -- dirs -- files -- path walk -- read -- given a fs3650 instance, you should be able to ``dry-run'' fs operations - crash recovery/consistency -- ad-hoc (fsck) -- CoW fs (zfs) -- journaling (ext4) 5. networking - layered network model -- application layer (HTTP) -- transport layer (TCP) -- routing layer (IP) -- link & physical layer (Ethernet) - packet switching -- idea: delivering packets best effort -- a packet: [ethernet[http[tcp[ip[data]]]]] - IP addresses (IPv4) -- a 32bit address -- private IPs: 192.168.*.*, 10.*.*.*, 172.16.*.*--172.31.*.* -- localhost: 127.0.0.1 -- public IPs: others - DNS -- mapping from a string to IPs -- much like fs namespace - socket programming (lab5) -- what are port numbers? -- well-known ports (<1024): 22 (ssh), 80 (http) -- socket interfaces/syscalls: a) socket, send, and recv b) bind, listen, and accept (server side) c) connect (client side) d) select (multiplexing) FD_SET, FD_CLR, FD_ISSET, FD_ZERO 6. system security - authentication -- three types: based on what you know/have/are -- password history: plaintext -> hashed password -> hashed password with salt - access control -- problem: sub-[op]->obj, pass or reject? -- solution: ACL -- for Unix, -- sub are users, processes, ... -- obj are files, sockets, ... -- users/processes have UID and GIDs -- a file has its owner UID and a GID -- ACL is the permission bits on file's inode - how to raise the privilege level? -- like reset password -- setuid; dangerous and use with care - stack smashing -- overflow a local buffer on stack -- replace the returned address on stack with a malicious address -- when "return", CPU then jumps to the malicious address notes about exam --- - NOTE: bring your NUID - a lot to read, not much to write - labs questions require some knowledge of the lab contents - write down you assumptions if you're not sure - we're reasonable people Q&A --- Logistic questions Content questions [shrug if cannot answer]