week 8b CS6640 02/27 2026 https://naizhengtan.github.io/26spring/ □ 1. intro to fs □ 2. Unix files □ 3. what about now? -- PMem introduction -- modern(?) file mappings --- 1. Intro to file systems Q: what does a FS do? 1. provide persistence (don't go away ... ever) 2. give a way to "name" a set of bytes on the disk (files) 3. give a way to map from human-friendly-names to "names" (directories) --persistence comes from hardware: * classic persistent storage SSD/disk/tape (yes, we are still using tapes) --disk/SSD are the first thing we've seen that (a) doesn't go away; and (b) we can modify (BIOS ROM, hardware configuration, etc. don't go away, but we weren't able to modify these things). two implications here: (i) we're going to have to put all of our important state on the disk (ii) we have to live with what we put on the disk! scribble randomly on memory --> reboot and hope it doesn't happen again. scribbe randomly on the disk --> now what? (answer: in many cases, we're hosed.) * new persistent storage: --glass: Project Silica: https://www.microsoft.com/en-us/research/project/project-silica/ --persistent memory (see below) --disk/ssd abstraction an array of blocks --a note about abstracting machines -- CPU: scheduler -- Memory: virtual memory -- disk: file system 2. Files Q: what is a file? --answer from user's view: a bunch of named bytes on the disk --answer from FS's view: collection of disk blocks Q: what does a file do? --big job of a FS: map name and offset to disk blocks FS {file,offset} --> disk addrss this is called file mapping. --the problem: "file mapping" (file, offset) -> disk addr "70% of the time spent on file mapping" by "the rethinking paper" (see below) --Unix inode: how Unix does file mapping [draw on board] permisssions times for file access, file modification, and inode-change link count (# directories containing file) ptr 1 --> data block ptr 2 --> data block ptr 3 --> data block ..... ptr 11 --> indirect block ptr --> ptr --> ptr --> ptr --> ptr --> ptr 12 --> indirect block ptr 13 --> double indirect block ptr 14 --> triple indirect block This is just an imbalanced tree. --Unix fs: (file, offset) -[inode]-> disk addr Why is inode designed the way in Unix? "fs is a disk data structure": -- granularity: block -- read/write a block is expensive -- sequential access > random access Q: How does Unix fs reads/writes files? app / \ / \ mmap read/write --------------------- \ / [kernel] \ / page cache | disk [demo: mmap example] --Unix-style file mappings works well in the past when we have single-core CPU, slow IO, sequential access > random access, read/write granularity is a block,... ...but, do these still hold today? Q: assume that memory is persistent, then do we need fs? how will that fs look like? 3. what about now? A) PMem introduction Persistent memory or Non-volatile memory storage hierarchy: [see also slides] | registers \ | L1-L3 caches \ | DRAM \ | [Pmem] \ | SSD/disk \ | network fs (NFS) \ features: -- fastest persistent storage -- largest byte-addressable storage -- memory DIMM -- (potentially) managed by MMU the only implementation---Intel Optane PMem multiple modes, but what we care is called "App Direct Mode", where Pmem is persistent. performance characteristics (compared with DRAM) -- latency -- throughput -- concurrency -- access size B) Rethinking File Mapping for Persistent Memory [Neal, Ian, Gefei Zuo, Eric Shiple, Tanvir Ahmed Khan, Youngjin Kwon, Simon Peter, and Baris Kasikci. "Rethinking File Mapping for Persistent Memory.", FAST'21] * Q: does file mapping has to be persistent? pros? cons? -- on persistent storage -- in DRAM -- on storage with a cache * high-level view: reads: retrieve file mapping writes: update file mapping + block allocator * FS design parameters: - small files (most files are small) vs. large files (much of the disk is allocated to large files) - sequential access vs. random accesses - prefetching - disk utilization (metadata overhead and fragmentation) * Q: If you were to design a fs, what's your file-mapping structure? C) design choices 1. per-file extent tree extent: a continuous allocation causing fragmentation when appending, deleting, resizing binary search to find the extent O(log(N)) where N is the number of extents 2. global cuckoo hash table * cuckoo hashing how it works: * two hashing functions * invariant: you can always find the value in one of the two hashing location * read: key -[cuckoo]-> location * insert: for a conflicting insert, kick an existing key out; redo insertion for that key; repeat