Week 10.b CS3650 03/13 2024 https://naizhengtan.github.io/24spring/ 1. File mapping (cont'ed) 2. Directories ------ 1. File mapping * problem: mapping from an offset to a block address * revisit candidate mappings: -- extent-based files -- linked files -- indexed files * Question: how do they perform for the following metrics? -- disk utilization -- sequential access -- random access +--------------------+-----------+-----+--------+ | | disk util | seq | random | +--------------------+-----------+-----+--------+ | extent-based files | - | + | + | +--------------------+-----------+-----+--------+ | linked files | + | + | - | +--------------------+-----------+-----+--------+ | indexed files | - | + | + | +--------------------+-----------+-----+--------+ * Unix inode: [draw on board] permisssions times for file access, file modification, and inode-change link count (# directories containing file) ptr 1 --> data block ptr 2 --> data block ptr 3 --> data block ..... ptr 11 --> indirect block ptr --> ptr --> ptr --> ptr --> ptr --> ptr 12 --> indirect block ptr 13 --> double indirect block ptr 14 --> triple indirect block This is just a tree. Question: why is this tree intentionally imbalanced? [Answer: optimize for short files. each level of this tree requires a disk seek...] Pluses/minuses: +: Simple, easy to build, fast access to small files +: Maximum file length can be enormous, with multiple levels of indirection -: worst case # of accesses pretty bad -: worst case overhead (such as 11 block file) pretty bad -: Because you allocate blocks by taking them off unordered freelist, metadata and data get strewn across disk * Notes about inodes: --stored in a fixed-size array --Size of array fixed when disk is initialized; can't be changed [why? easier for fs to find inodes, and fewer disk accesses (better performance!)] --Multiple inodes in a disk block --Question: how many files can the following toy-fs has? sizeof(inode) = 128B sizeof(block) = 512B toy-fs uses 1000 blocks to store inodes [answer: the number of inodes (hence files) is (521B / 128B) * 1000 = 4000 ] -- use "$ df -i ~" to see how many inodes you can use. --inode lives in known location, originally at one side of disk, now lives in pieces across disk (helps keep metadata close to data) --The index of an inode in the inode array is called an ***i-number*** --Internally, the OS refers to files by i-number --When a file is opened, the inode brought in memory --Written back when modified and file closed or time elapses 2. Directories --Problem: "Spend all day generating data, come back the next morning, want to use it." F. Corbato, on why files/dirs invented. --History: --Approach 0: Have users remember where on disk their files are --like remembering your social security or bank account # --yuck. (people want human-friendly names.) --So use directories to map names to file blocks, somehow --But what is in directory? --A short history of directories --Approach 1: Single directory for entire system --Put directory at known location on disk --Directory contains pairs --If one user uses a name, no one else can --"Many ancient personal computers work this way" [I heard this as an anecdote; never saw one myself...] --Approach 2: Single directory for each user --Still clumsy, and "ls" on 10,000 files is a real pain --(But some oldtimers still work this way) --Approach 3: Hierarchical name spaces. --Allow directory to map names to files ***or other dirs*** --File system forms a tree (or graph, if links allowed) --Large name spaces tend to be hierarchical --examples: IP addresses, domain names, scoping in programming languages, etc. --more generally, the concept of hierarchy is everywhere in computer systems --Hierarchial Unix --used since CTSS (1960s), and Unix picked it up and used it nicely --structure like: [draw: "/" bin/ dev/ tmp/ usr/ ls, grep ... ] --directories stored on disk just like regular files --here's the data in a directory file; this data can be in the *data blocks* of the directory or else in the inode of the directory. [] .... --i-node for directory contains a special flag bit --only special users can write directory files --key point: i-number might reference another directory --this neatly turns the FS into a hierarchical tree, with almost no work --bootstrapping: where do you start looking? --root dir always inode #2 (0 and 1 reserved) --and, voila, we have a namespace! --special names: "/", ".", ".." --given those names, we need only two operations to navigate the entire name space: --"cd name": (change context to directory "name") --"ls": (list all names in current directory) --example: /a/foo.c /b/c/essay.txt what does the file system look like? [ i0 ... i7 || [block ] [ block ] [block ] .....] [Draw picture] --Question: if you were FS designer, how would you address this problem? [an argument that hierarchical fs is no longer relevant: Margo Seltzer, Nicholas Murphy. Hierarchical File Systems are Dead. HotOS'09 but...we're still using them in 2024... ] * links: --hard link: multiple dir entries point to same inode; inode contains refcount "$ ln /tmp/a /tmp/b": creates a synonym ("b") for file ("a") --how do we avoid cycles in the graph? (answer: can't hard link to directories). --soft link: synonym for a *name* "$ln -s /tmp/a /tmp/sb": --creates a new inode (sb), not just a new directory entry --new inode has "sym link" bit set --contents of that new file: "/tmp/a" --Question: what will happen: "$ rm /tmp/a; cat /tmp/b; cat /tmp/sb"? [answer: try it yourself] --Question: can I create soft-link cycles? [answer: yes you can. Try: $ mkdir /tmp/a; mkdir /tmp/b $ ln -s /tmp/a /tmp/b/a $ ln -s /tmp/b /tmp/a/b ]