Week 3.a CS7670 09/19 2022 https://naizhengtan.github.io/22fall/ 1. last time 2. Rethinking file mapping (cont.) 3. before-class questions 4. eval --- 1. last time - paper, "Rethinking File Mapping for Persistent Memory" A) the problem "file mapping" (file, offset) -> disk addr B) challenges and non-challenges Challenges -- fragmentation -- locality -- mapping size -- concurrency (a new problem) Q: why are fragmentation/locality/mapping size challenging? 2. Rethinking file mapping (cont.) Q: what is the "correctness" spec for concurrent fs ops? [introduce sequential consistency] Non-challenges -- page caching (discussed earlier) -- crash consistency (an orthogonal problem/challenge) C) four design choices 1. per-file extent tree extent: a continuous allocation causing fragmentation when appending, deleting, resizing binary search to find the extent O(log(N)) where N is the number of extents [before-class question 4] Q: pros and cons? pros: + "most compact representation" (is this true?) cons: - "require many memory accesses" - irregular fragmentation optimization: - cursor 2. per-file radix tree much like page table in virtual memory a tree; on each level, an index in the "addr" (logical block id) will choose the next node [draw a radix tree on board] Q: where does this "9bit" come from? Q: if we have a 4B pointer and 512B block, what the "N bits" will be? A: N = log_2(512B/4B) = 7 (bits) Q: pros and cons? pros: + performance stability: one memory access per level cons: - "less compact" - "tree traversal takes longer" Q: [before-class question 2] Why not per-file hash mapping? What's wrong with it? A: no idea of the file size. 3. global cuckoo hash table * cuckoo hashing how it works: * two hashing functions * invariant: you can always find the value in one of the two hashing location * read: key -[cuckoo]-> location * insert: for a conflicting insert, kick an existing key out; redo insertion for that key; repeat Q: what do the keys look like? "(, )" Q: what will happen if "read(8KB-of-data)"? Q: what will happen if "append(4KB-of-data)"? insertion to the hash + allocate a block (<= can be expensive) Q: pros and cons? pros: + no resizing => no fragmentation + mapping size cons: - ops translate to a bunch of random accesses optimization: * use extent instead of a block pointer 4. global (traditional) hash table: HashFS design goal: better insertion (what's wrong with cuckoo hashing insertion? need talk to a "block allocator") co-design: file mapping + block allocator a trick: use index as offset [draw a HashFS] Q: how does "read(file_num=1, logical_block=21)" work? Q: how does "append(file_num=1, logical_block=22)" work? Q: "a total space overhead of <0.2%" How is this 0.2% calculated? Q: pros and cons? pros: + fast retrieval (compute a hash function) cons: - expensive hash collision - low spatial locality optimization: * SIMD prefetching Q: [before-class question 3] Why not global extent trees or radix trees? What's wrong with them? A: pull all files in the same namespace (know ahead of time about the number of files) [skipped] * DISCUSSION: per-file mapping vs. gloabl mapping per-file (S3.1): + locality + concurrency - resizing => - multiple levels of indirections => - fragmentation Q: but what forces per-file designs to resize? true question: can fs plan ahead of time? global (S3.2): + no resizing => no fragmentation + mapping size - locality (can be mitigated by caching) [skipped] * DISCUSSION: learned file mapping? 3. Before-class questions 1) Imagine implementing "HashFS" on a disk (by simply replacing the PMem with a disk). Will this "HashFS-disk" work well? If yes, why doesn't Unix do this? If not, what can go wrong? A: No, for two reasons. Reason 1: * think of multiple open files, random accesses the metadata region on disk (bad) [think of Unix fs: caching inode in page cache] * why not cache metadata in DRAM? consider a fs of 1TB, 4KB block => the metadata region will be 2GB (no data cached yet) Reason 2: * accessing a contiguous file data is also random (bad) Why? because locations of data blocks are decided by hash function (random) [below are discussed earlier] 2) Why both per-file mappings (extent tree and radix tree) are tree-based? Why not a hash-based per-file mapping? What can go wrong? 3) (similar to the above question) Why are both global mappings based on hash tables? What can go wrong for a "global radix tree"? 4) Write down the pros and cons of the four mapping designs: a) per-file extent tree b) per-file radix tree c) global cuckoo hash table d) HashFS [skipped] 4. what does the performance look like? Q: are there any experiments that are particularly interesting to you? * IO size (Fig8) How likely we do many file mapping? * Concurrency of 16 threads; ODINFS, 250 threads * page caching (Fig11) PMem is fast enough (2-3x slower than DRAM) * application workloads (Fig12) minor improvement * 70%, YCSB Why? small random reads/writes (1KB key-value pairs)