Week 13.a CS 5600 04/11 2022 On the board ------------ 1. Last time 2. Files 3. Directories 4. fs5600 (Lab4) A. fs5600 disk B. data structures C. interfaces ------------------------------------------- Admin: -Lab4 is released --------- 1. Last time --SSD some unique features: * read/erase/program * cannot program twice without erasing * wear-out --fs, three goals (i) persistance (ii) name data => file (iii) locate data => directory [do we really need (iii)? a bold question] 2. Files --what is a file? --answer from user's view: a bunch of named bytes on the disk --answer from FS's view: collection of disk blocks --big job of a FS: map name and offset to disk blocks FS {file,offset} --> disk address [also called file mapping. see an introduction in S2 of this paper: (which btw is a cool paper about persistent memory) Neal, Ian, Gefei Zuo, Eric Shiple, Tanvir Ahmed Khan, Youngjin Kwon, Simon Peter, and Baris Kasikci. "Rethinking File Mapping for Persistent Memory.", FAST'21 ] --operations are create(file), delete(file), read(), write() **note that we have seen translation/indirection before: page table: page table virtual address ----------> physical address per-file metadata: inode offset ------> disk block address how'd we get the inode? directory file name ----------> inode [Analogies: per-process memory <-> File multiple processes <-> Multiple files page table <-> inode memory chip <-> disk/SSD ] FS design parameters: * small files (most files are small) vs. large files (much of the disk is allocated to large files) * access patterns: sequential access vs. random accesses vs. keyed accesses * disk utilization (metadata overhead and fragmentation) access patterns we could imagine supporting: (i) Sequential: --File data processed in sequential order --By far the most common mode --Example: editor writes out new file, compiler reads in file, etc (ii) Random access: --Address any block in file directly without linear scanning --Examples: large data set, demand paging, databases (iii) Keyed access --Search for block with particular values --Examples: associative database, index --This thing is everywhere in the field of databases, search engines, but.... --...usually not provided by a FS in OS Question: if you were a fs designer, which structure will you use for different parameters? [answer: depends on what workloads the fs going to meet] candidate designs........ A. contiguous B. linked files C. indexed files A. contiguous allocation "extent based" --when creating a file, make user pre-specify its length, and allocate the space at once --file metadata contains location and size --example: IBM OS/360 [ a1 a2 a3 <5 free> b1 b2 ] what if a file c needs 7 sectors?! +: simple +: fast access, both sequential and random -: fragmentation [somewhat regain interests due to PM: Li, Ruibin, Xiang Ren, Xu Zhao, Siwei He, Michael Stumm, and Ding Yuan. "ctFS: Replacing file indexing with hardware memory translation through contiguous file allocation for persistent memory." FAST'22] B. linked files "linked list based" --keep a linked list of free blocks --metadata: pointer to file's first block --each block holds pointer to next one +: no more fragmentation +: sequential access easy (and probably mostly fast, assuming decent free space management, since the pointers will point close by) -: random access is a disaster -: pointers take up room in blocks; (overhead: sizeof(pointer) / sizeof(block)) messes up alignment of data C. indexed files [DRAW PICTURE] --Each file has an array holding all of its block pointers --like a page table, so similar issues crop up --Allocate this array on file creation --Allocate blocks on demand (using free list or bitmap) +: sequential and random access are both easy -: need to somehow store the array --large possible file size --> lots of unused entries in the block array --large actual block size --> huge contiguous disk chunk needed --solve the problem the same way we did for page tables. --okay, so now we're not wasting disk blocks, but what's the problem? (answer: equivalent issues as for page table walking: here, it's extra disk accesses to look up the blocks) --this motivates the classic Unix file system --Unix inode: [draw on board] permisssions times for file access, file modification, and inode-change link count (# directories containing file) ptr 1 --> data block ptr 2 --> data block ptr 3 --> data block ..... ptr 11 --> indirect block ptr --> ptr --> ptr --> ptr --> ptr --> ptr 12 --> indirect block ptr 13 --> double indirect block ptr 14 --> triple indirect block This is just a tree. [deja vu? that's right, we talked similar things when we argue for different designs of page tables.] Question: why is this tree intentionally imbalanced? (Answer: optimize for short files. each level of this tree requires a disk seek...) Pluses/minuses: +: Simple, easy to build, fast access to small files +: Maximum file length can be enormous, with multiple levels of indirection -: worst case # of accesses pretty bad -: worst case overhead (such as 11 block file) pretty bad -: Because you allocate blocks by taking them off unordered freelist, metadata and data get strewn across disk * Notes about inodes: --stored in a fixed-size array --Size of array fixed when disk is initialized; can't be changed [why? easier for fs to find inodes, and fewer disk accesses (better performance!)] --Multiple inodes in a disk block (which is not the case for your Lab4) --Question: how many files can the following toy-fs has? sizeof(inode) = 128B sizeof(block) = 512B toy-fs uses 1000 blocks to store inodes [answer: the number of inodes (hence files) is (521B / 128B) * 1000 = 4000 ] -- use "$ df -i ~" to see how many inodes you can use. --inode lives in known location, originally at one side of disk, now lives in pieces across disk (helps keep metadata close to data) --The index of an inode in the inode array is called an ***i-number*** --Internally, the OS refers to files by i-number --When a file is opened, the inode brought in memory --Written back when modified and file closed or time elapses 3. Directories --Problem: "Spend all day generating data, come back the next morning, want to use it." F. Corbato, on why files/dirs invented. [skipped, dir history --Approach 0: Have users remember where on disk their files are --like remembering your social security or bank account # --yuck. (people want human-friendly names.) --So use directories to map names to file blocks, somehow --But what is in directory? --A short history of directories --Approach 1: Single directory for entire system --Put directory at known location on disk --Directory contains pairs --If one user uses a name, no one else can --"Many ancient personal computers work this way" [I heard this as an anecdote; never saw one myself...] --Approach 2: Single directory for each user --Still clumsy, and "ls" on 10,000 files is a real pain --(But some oldtimers still work this way) --Approach 3: Hierarchical name spaces. --Allow directory to map names to files ***or other dirs*** --File system forms a tree (or graph, if links allowed) --Large name spaces tend to be hierarchical --examples: IP addresses, domain names, scoping in programming languages, etc. --more generally, the concept of hierarchy is everywhere in computer systems ] --Hierarchial Unix --used since CTSS (1960s), and Unix picked it up and used it nicely --structure like: [draw: "/" bin/ dev/ tmp/ usr/ ls, grep ... ] --directories stored on disk just like regular files --here's the data in a directory file; this data can be in the *data blocks* of the directory or else in the inode of the directory. [] .... --i-node for directory contains a special flag bit --only special users can write directory files --key point: i-number might reference another directory --this neatly turns the FS into a hierarchical tree, with almost no work --bootstrapping: where do you start looking? --root dir always inode #2 (0 and 1 reserved) --and, voila, we have a namespace! --special names: "/", ".", ".." --given those names, we need only two operations to navigate the entire name space: --"cd name": (change context to directory "name") --"ls": (list all names in current directory) --example: /a/foo.c /b/c/essay.txt what does the file system look like? [ i0 ... i7 || [block ] [ block ] [block ] .....] [Draw picture] [an argument that hierarchical fs is no longer relevant: Margo Seltzer, Nicholas Murphy. Hierarchical File Systems are Dead. HotOS'09 but...we're still using them in 2022... ] * links: --hard link: multiple dir entries point to same inode; inode contains refcount "$ ln /tmp/a /tmp/b": creates a synonym ("b") for file ("a") --how do we avoid cycles in the graph? (answer: can't hard link to directories). --soft link: synonym for a *name* "$ln -s /tmp/a /tmp/sb": --creates a new inode (sb), not just a new directory entry --new inode has "sym link" bit set --contents of that new file: "/tmp/a" --Question: what will happen: "$ rm /tmp/a; cat /tmp/b; cat /tmp/sb"? [answer: try it yourself] --Question: can I create soft-link cycles? [answer: yes you can. Try: $ mkdir /tmp/a; mkdir /tmp/b $ ln -s /tmp/a /tmp/b/a $ ln -s /tmp/b /tmp/a/b ] 4. CS5600 File System (Lab4) [this is an (almost) duplication of the Lab4 instructions.] - a Unix-like fs -- with many simplifications -- for example, dirs are no deeper than 10 A. fs5600 disk - an abstract disk (the overall system format) -- a block is 4KB +-------+--------+----------+------------------------+ | super | block | root dir | data blocks ... | | block | bitmap | inode | | +-------+--------+----------+------------------------+ block 0 1 2 3 ... - talk to the disk? -- read or write one or multiple blocks [see handout week12.b, panel 1] B. important data structures ** superblock [see handout week12.b, panel 2] ** block bitmap -- tells fs5600 which blocks are free -- one bit represents one block --QUESTION: what's the maximum size of a fs5600? [answer: 128MB = 4096 * 8 * 4KB] ** file and inode [see handout week12.b, panel 2] -- metadata [or stat, see full file metadata: https://pubs.opengroup.org/onlinepubs/7908799/xsh/sysstat.h.html] -- uid, gid: user and group of this file -- mode (see below) -- ctime: changed time, time of last status (metadata) change -- mtime: modified time, time of last data modification (fs5600 skips the access time in Unix) -- size: file size in bytes -- ptrs [draw on board] --Question: what's the maximum size of a file? [answer: number of ptrs = (4KB - 20B) / 4B = 1019 file size = 1019 * 4KB = 4076KB ~= 4MB ] ** mode (uint32_t) [see handout] |<-- S_IFMT --->| |<-- user ->|<- group ->|<- world ->| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | F | D | | | | | | R | W | X | R | W | X | R | W | X | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ -- R: read -- W: write -- X: execute -- F bit: 0100000 (the 16th bit from right) -- D bit: 0040000 (the 15th bit from right) ** directory -- recall that directory is used to map name to #inode [see handout week12.b, panel 2] [draw on board as well] --Question: how many files can appear in one directory? [answer: one fs_dirent is 32B (see handout week12.b, panel 2) A file is maximum of 4076KB. So there can be 127 * 1024 files, or roughly 128K (~= 4MB/32B) ] [update 04/21: the above answer is confusing when you come back from Lab4. It meant to be a question asking if a dir inode could use all data blocks (1019 of them), then how many files a dir can contain. But the true fs5600 only allows one data block for a dir inode, so the max files is 4KB/32B=128 (said below). ] -- fs5600 only supports using 1 data block, meaning 128 files at maximum. C. interface [next time]