Week 10.a CS3650 03/11 2024 https://naizhengtan.github.io/24spring/ 1. I/O architecture and storage devices 2. Intro to file systems 3. Files ---- 1. I/O architecture and storage devices general: [show slides: CPU, Mem, I/O, connected to BUS] --devices: lots of details --CPU/device interaction --storage devices --Disk --SSD --brief introduction to Disk [show slides] --Question: sequential accesses to disk is much faster than random accesses. Why? --Question: for disk, comparing reads or writes, what relative performance do you think? why? [answer: reads are faster.] --Question: what will happen if you shake your disk while using it? [answer: NO, BAD IDEA. DON'T DO IT. see: https://www.youtube.com/watch?v=tDacjrSCeq4&ab_channel=BryanCantrill ] --intro to SSD [show slides] --a bummer: wear-out --a block can bare a limited number of times (~10K-100K) times writes (erasing), then becomes unusable --Question: if there is a workload constantly modifying one file, how can software system help with the case? [answer: spread out the writes evenly to all blocks] --fancy new storage devices: ** low latency? persistent memory ** long-term data storage? project Silica https://www.microsoft.com/en-us/research/project/project-silica/ 2. Intro to file systems Question: in you eyes, what does fs do? --what does a FS do? 1. provide persistence (don't go away ... ever) 2. give a way to "name" a sequence of bytes on the disk (files) 3. give a way to map from human-friendly-names to "named bytes" (directories) --a few quick notes about disks in the context of FS design --disk/SSD are the first thing we've seen that (a) doesn't go away; and (b) we can modify (BIOS ROM, hardware configuration, etc. don't go away, but we weren't able to modify these things). two implications here: (i) we're going to have to put all of our important state on the disk (ii) we have to live with what we put on the disk! scribble randomly on memory --> reboot and hope it doesn't happen again. scribbe randomly on the disk --> now what? (answer: in many cases, we're hosed.) --where are FSes implemented? --can implement them on disk, over network, in memory, in NVRAM (non-volatile RAM), on tape, with paper (!!!!) -- Lab4 simulates a disk that abstracts an array of blocks 3. Files --what is a file? --answer from user's view: a bunch of named bytes on the disk --answer from FS's view: collection of disk blocks --operations regarding files are create(file), delete(file), read(), write() --big job of a FS: map name and offset to disk blocks FS {file,offset} --> disk address This is also called file mapping. -- note that this is a translation per-file: SOMETHING offset ---------> disk block address In fs, this SOMETHING is called inode. Then, how'd we get the inode? directory file name ------------> inode Question: Given an example of a file mapping. if you were a fs designer, which structure will you use for file mapping? [answer: it depends on...parameters] - File design parameters: * small files (most files are small) vs. large files (much of the disk is allocated to large files) * disk utilization (metadata overhead and fragmentation) * access patterns: sequential access vs. random accesses (i) Sequential: --File data processed in sequential order --By far the most common mode --Example: editor writes out new file, compiler reads in file, etc (ii) Random access: --Address any block in file directly without linear scanning --Examples: large data set, demand paging, databases Question: which structure will you use for different parameters? - candidate file mapping designs A. contiguous B. linked files C. indexed files A. contiguous allocation "extent based" --when creating a file, make user pre-specify its length, and allocate the space at once --file metadata contains location and size --example: IBM OS/360 [ a1 a2 a3 <5 free> b1 b2 ] what if a file c needs 7 sectors?! +: simple +: fast access, both sequential and random -: fragmentation [somewhat regain interests due to PM: Li, Ruibin, Xiang Ren, Xu Zhao, Siwei He, Michael Stumm, and Ding Yuan. "ctFS: Replacing file indexing with hardware memory translation through contiguous file allocation for persistent memory." FAST'22] B. linked files "linked list based" --keep a linked list of free blocks --metadata: pointer to file's first block --each block holds pointer to next one +: no more fragmentation +: sequential access easy (and probably mostly fast, assuming decent free space management, since the pointers will point close by) -: random access is a disaster -: pointers take up room in blocks; (overhead: sizeof(pointer) / sizeof(block)) messes up alignment of data C. indexed files [DRAW PICTURE] --Each file has an array holding all of its block pointers --Allocate this array on file creation --Allocate blocks on demand (using free list or bitmap) +: sequential and random access are both easy -: need to somehow store the array --large possible file size --> lots of unused entries in the block array --large actual block size --> huge contiguous disk chunk needed --this motivates the classic Unix file system [continue from here next time]