Week 11.a CS 5600 11/16 2021 On the board ------------ 1. Last time 2. SSD 3. Intro to file systems 4. Files ------------------------------------------- Admin: --final exam --should be the last lecture of this course, but... ...[several reasons]... ...still confirming with the college. --------- 1. last time -- I/O -- disk 2. SSD: solid state drives [see handout week10.b] --hardware organization --semiconductor-based flash memory --storing data electrically, instead of magnetically --a flash bank contains blocks --blocks (or erase blocks) are of size 128KB or 256KB --a block contains pages --pages are of 4KB to 16KB --operations --read: a page --erase: a block, resetting all bits to 1 --program: a page, setting some bits to 0 (you cannot program a page twice without erasing) --(logical) write: a combination of erase and program operations --Question: can you imagine how to update a page A in a single block flash? (which of course is a little bit too small...) [answer: 1. copy other pages in the block to other places (where? anywhere, memory or disk) 2. erase the entire block 3. write page A with wanted contents 4. copy other pages back to their positions ] -- this echos "writes are more expensive than reads" which appears in many places. (probably something deeper about it.) --performance --read: tens of us --erase: several ms --program: hundreds of us --a bummer: wear-out -- a block can bare about 10,000 to 100,000 times erasing, then becomes unusable --FTL: flash translation layer --read/write logic blocks -->FTL--> read/erase/program physical blocks (note, the "blocks" in logical blocks and physical blocks are different: logical blocks as in the device interface, physical blocks as in the flash hardware) --Question: if you were FTL, how would you mitigate wear-outs? [answer: evenly spread the erase/program to blocks.] --a log-structured FTL --idea: --when write, appending the write to the next free page (called logging). --when read, keeping track where the data are by mapping logical data to physical pages. ** an example: --Given a flash bank has two blocks; each has two pages. --there are four writes to pages: wirte(logic_page_1) [short as LP1] wirte(logic_page_10) [short as LP10] wirte(logic_page_99) [short as LP99] --what will happen: +-----------------------------+ blocks | block 0 | block 1 | block 2 | +---------+---------+---------+ pages | P1 | P2 | P3 | P4 | P5 | P6 | +----+----+----+----+----+----+ data |LP1 |LP10|LP99| | | | +----+----+----+----+----+----+ mapping: LP1 => P1, LP10 => P2, LP99 => P3 Question: what will happen if the following op is write(logic_page_1')? [answer: +-----------------------------+ blocks | block 0 | block 1 | block 2 | +---------+---------+---------+ pages | P1 | P2 | P3 | P4 | P5 | P6 | +----+----+----+----+----+----+ data |LP1 |LP10|LP99|LP1'| | | +----+----+----+----+----+----+ mapping: LP1 => P4, LP10 => P2, LP99 => P3 ] --Notice: P1 now contains old (invalid) data and is useless. --hence, SSD requires *garbage collection* (GC) -- want to GC block0 -- move the useful pages (i.e., P2[LP10]) in the same block to other free places -- erase block 0 (now bot P1 and P2 can be used again) --complicated internals, hence sometimes unpredictable latency --predicting latency? see below [Hao, Mingzhe, Levent Toksoz, Nanqinqin Li, Edward Edberg Halim, Henry Hoffmann, and Haryadi S. Gunawi. "LinnOS: Predictability on unpredictable flash storage with a light neural network.", OSDI'20] 3. Intro to file systems --what does a FS do? 1. provide persistence (don't go away ... ever) 2. give a way to "name" a set of bytes on the disk (files) 3. give a way to map from human-friendly-names to "names" (directories) --a few quick notes about disks in the context of FS design --disk/SSD are the first thing we've seen that (a) doesn't go away; and (b) we can modify (BIOS ROM, hardware configuration, etc. don't go away, but we weren't able to modify these things). two implications here: (i) we're going to have to put all of our important state on the disk (ii) we have to live with what we put on the disk! scribble randomly on memory --> reboot and hope it doesn't happen again. scribbe randomly on the disk --> now what? (answer: in many cases, we're hosed.) [ haven't said in class: --where are FSes implemented? --can implement them on disk, over network, in memory, in NVRAM (non-volatile RAM), on tape, with paper (!!!!) -- Lab4 simulates a disk that abstracts an array of blocks [see handout, panel 1] ] 4. Files --what is a file? --answer from user's view: a bunch of named bytes on the disk --answer from FS's view: collection of disk blocks --big job of a FS: map name and offset to disk blocks FS {file,offset} --> disk address --operations are create(file), delete(file), read(), write() [note that we have seen translation/indirection before: page table: page table virtual address ----------> physical address per-file metadata: inode offset ------> disk block address how'd we get the inode? directory file name ----------> inode ] [Analogies: per-process memory <-> File multiple processes <-> Multiple files page table <-> inode memory chip <-> disk/SSD ] FS design parameters: * small files (most files are small) * large files (much of the disk is allocated to large files) * sequential access * random accesses * disk utilization (metadata overhead and fragmentation) access patterns we could imagine supporting: (i) Sequential: --File data processed in sequential order --By far the most common mode --Example: editor writes out new file, compiler reads in file, etc (ii) Random access: --Address any block in file directly without passing through --Examples: large data set, demand paging, databases (iii) Keyed access --Search for block with particular values --Examples: associative database, index --This thing is everywhere in the field of databases, search engines, but.... --...usually not provided by a FS in OS Question: if you were a fs designer, which structure will you use for different parameters? [answer: depends on what workloads the fs going to meet] candidate designs........ A. contiguous B. linked files C. indexed files A. contiguous allocation "extent based" --when creating a file, make user pre-specify its length, and allocate the space at once --file metadata contains location and size --example: IBM OS/360 [ a1 a2 a3 <5 free> b1 b2 ] what if a file c needs 7 sectors?! +: simple +: fast access, both sequential and random -: fragmentation B. linked files "linked list based" --keep a linked list of free blocks --metadata: pointer to file's first block --each block holds pointer to next one +: no more fragmentation +: sequential access easy (and probably mostly fast, assuming decent free space management, since the pointers will point close by) -: random access is a disaster -: pointers take up room in blocks; (overhead: sizeof(pointer) / sizeof(block)) messes up alignment of data C. indexed files [DRAW PICTURE] --Each file has an array holding all of its block pointers --like a page table, so similar issues crop up --Allocate this array on file creation --Allocate blocks on demand (using free list) +: sequential and random access are both easy -: need to somehow store the array --large possible file size --> lots of unused entries in the block array --large actual block size --> huge contiguous disk chunk needed --solve the problem the same way we did for page tables. --okay, so now we're not wasting disk blocks, but what's the problem? (answer: equivalent issues as for page table walking: here, it's extra disk accesses to look up the blocks) --this motivates the classic Unix file system [we will learn Unix file in the next class]