week 11 cs4973/cs6640 03/17 2025 https://naizhengtan.github.io/25spring/ □ admin: final project □ 1. intro to fs □ 2. Unix files □ 3. fs namespace □ 4. egos-fs preview □ admin: exam ------ 1. Intro to file systems Q: what does a FS do? 1. provide persistence (don't go away ... ever) 2. give a way to "name" a set of bytes on the disk (files) 3. give a way to map from human-friendly-names to "names" (directories) --persistence comes from hardware: * classic persistent storage SSD/disk/tape (yes, we are still using tapes) --disk/SSD are the first thing we've seen that (a) doesn't go away; and (b) we can modify (BIOS ROM, hardware configuration, etc. don't go away, but we weren't able to modify these things). two implications here: (i) we're going to have to put all of our important state on the disk (ii) we have to live with what we put on the disk! scribble randomly on memory --> reboot and hope it doesn't happen again. scribbe randomly on the disk --> now what? (answer: in many cases, we're hosed.) * new persistent storage: --glass: Project Silica: https://www.microsoft.com/en-us/research/project/project-silica/ --persistent memory (see slides) --disk/ssd abstraction an array of blocks --a note about abstracting machines -- CPU: scheduler -- Memory: virtual memory -- disk: file system 2. Files --what is a file? --answer from user's view: a bunch of named bytes on the disk --answer from FS's view: collection of disk blocks Q: what does a file do? --big job of a FS: map name and offset to disk blocks FS {file,offset} --> disk addrss this is called file mapping. --the problem: "file mapping" (file, offset) -> disk addr "70% of the time spent on file mapping" from paper [Neal, Ian, Gefei Zuo, Eric Shiple, Tanvir Ahmed Khan, Youngjin Kwon, Simon Peter, and Baris Kasikci. "Rethinking File Mapping for Persistent Memory.", FAST'21] --Unix inode: how Unix does file mapping [draw on board] permisssions times for file access, file modification, and inode-change link count (# directories containing file) ptr 1 --> data block ptr 2 --> data block ptr 3 --> data block ..... ptr 11 --> indirect block ptr --> ptr --> ptr --> ptr --> ptr --> ptr 12 --> indirect block ptr 13 --> double indirect block ptr 14 --> triple indirect block This is just an imbalanced tree. --Unix fs: (file, offset) -[inode]-> disk addr Why is inode designed the way in Unix? "fs is a disk data structure": -- granularity: block -- read/write a block is expensive -- sequential access > random access Q: How does Unix fs reads/writes files? app / \ / \ mmap read/write --------------------- \ / [kernel] \ / page cache | disk --introduce mmap using an example * implementing mmap could be an interesting final project --Unix-style file mappings works well in the past when we have single-core CPU, slow IO, sequential access > random access, read/write granularity is a block,... ...but, do these still hold today? Q: assume that memory is persistent, then do we need fs? how will that fs look like? 3. fs namespace * the question under-the-hood: Q: Consider you have 1,000 books at home. How will you organize them so that you can easily find the book you want? Q: Consider you are a librarian with 1,000 books. How would you organize books so that when a guest comes, you can find the book for them? Q: Are there any difference of how you organize the books? [these apply to file systems books: files your organization system: namespace ] * a book is an extent-based readonly fs --book pages: disk (persistent) --articals/sections: files --contents: dirs * "boring" hierarchical namespace -- used since CTSS (1960s), and Unix picked it up and used it nicely -- structure like: [draw: "/" bin/ dev/ tmp/ usr/ ls, grep ... ] -- How to lookup a file, say "/bin/ls"? * is the hierarchical namespace the only option? No, take a look at databases. comparison: file system vs. databases databases: structured data fs: unstructured data * "Hierarchical file systems are dead" (2009) Margo Seltzer and Nicholas Murphy, HotOS'09 -- hFSD argues, hiererachical namespace works good in the past. "The situation, however, has evolved" i) storage size grows -- 1992: 300MB disk -- 2009: 300GB disk (the paper's time, 17 years later) //-- 2023: 22TB disk & 8TB SSD (14 years later than paper) -- 2025: 36TB disk (from Seagate) & 8TB SSD (16 years later than paper) -- 2030(?): 1000TB SSD [https://www.techradar.com/news/1000tb-ssds-could-become-mainstream-by-2030-as-samsung-plans-1000-layer-nand] ii) "..they [file sizes] have not increased by the same margin." larger space => file size wasn't growing that fast => more files => harder to manage [Q: does logic flow?] iii) "Google is a verb" what they want instead of where it lives [a sharp observation] * Why hierarchical namespace is a problem? -- files are siloed (no need for a global namespace) Sounds familiar? this is very much like smartphone's logic (notice that this paper was 2009; iphone debuted in 2007) -- walk the hierarchy (performance is bad, without cache) -- concurrency (this is fundamental) * hFAD design idea: instead of a hierarchical namespace, use a tagged, search-based namespace [see Fig] -- based on an object-based storage device (OSD) -- naming (via index) and accessing (via obj store) -- "An object is named by one or more tag/value pairs." Q: how about updates? deleting a file needs to update multiple indexes at the same time. Note: index is a data structure that accelerate data retrieval, at the cost of more expensive updates and more space. * DISCUSSION: search-based fs vs. hierarchical fs real question: human-readable vs. machine-readable