week 10a
CS6640
03/10 2026
https://naizhengtan.github.io/26spring/

□ 1. recap: fs and file
□ 2. file-mapping structures
□ 3. fs namespace
□ 4. hierarchical fs is dead?
---

1. Recap: fs and file

   Q: what does a FS do?
     1. provide persistence (don't go away ... ever)
     2. give a way to "name" a set of bytes on the disk (files)
     3. give a way to map from human-friendly-names to "names" (directories)

   Q: what is a file?
       --answer from user's view: a bunch of named bytes on the disk
       --answer from FS's view: collection of disk blocks

   --the problem: "file mapping"
                (file, offset) -> disk addr

   --Unix inode: an imbalanced tree
       [draw on board]

2. File-mapping structures

 * Unix-style file mappings works well in the past when we have
      single-core CPU,
      slow IO,
      sequential access > random access,
      read/write granularity is a block,...
    ...but, do these still hold today?

 * FS design parameters:

   - small files (most files are small)
       vs.
     large files (much of the disk is allocated to large files)
   - sequential access vs. random accesses
   - prefetching
   - disk utilization (metadata overhead and fragmentation)

  A. per-file extent tree

    extent: a continuous allocation
    causing fragmentation when appending, deleting, resizing
    binary search to find the extent
    O(log(N)) where N is the number of extents

    Q: pros and cons?

  B. global cuckoo hash table

   * cuckoo hashing
     how it works:
     * two hashing functions
     * invariant: you can always find the value in one of the two
                  hashing location
     * read:   key -[cuckoo]-> location
     * insert: for a conflicting insert, kick an existing key out; redo
               insertion for that key; repeat

    Q: pros and cons?

  * DISCUSSION: per-file mapping vs. gloabl mapping

    Q: Why not per-file hash mapping? What's wrong with it?
    A: no idea of the file size.

    Q: Why not global extent trees? What's wrong with it?
    A: pull all files in the same namespace
       (know ahead of time about the number of files)

3. fs namespace

  * the question under-the-hood:

    Q: Suppose you have 1,000 books at home. How would you organize them so
    that you can easily find the one you want?

    Q: Now suppose you are a librarian responsible for 1,000 books. How would
    you organize them so that you can quickly locate a book for a visitor?

    Q: Would you organize the books differently in these two situations?
       If so, how?

    [these apply to file systems
      books: files
      your organization system: namespace
    ]

    * a book is an extent-based readonly fs
      --book pages: disk (persistent)
      --articals/sections: files
      --contents: dirs

  * "boring" hierarchical namespace

    -- used since CTSS (1960s), and Unix picked it up and used it nicely

    -- structure like:
        [draw: 
                      "/"
           bin/     dev/     tmp/    usr/
         ls, grep    ...
        ]

    -- How to lookup a file, say "/bin/ls"?

  * is the hierarchical namespace the only option?
    No, take a look at databases.
    comparison: file system vs. databases
      databases: structured data
      fs: unstructured data

4. "Hierarchical file systems are dead"? (2009)
    by Margo Seltzer and Nicholas Murphy, HotOS'09

    -- hFSD argues,
       hiererachical namespace works good in the past.
       "The situation, however, has evolved"

    i) storage size grows
      -- 1992: 300MB disk
      -- 2009: 300GB disk (the paper's time, 23 years later)
      -- 2023: 22TB disk & 8TB SSD (14 years later than paper)
      -- 2025: 36TB disk & 8TB SSD (16 years later than paper) ["Amazon-buyable"]
      -- 2026: 36TB disk & 8TB SSD (17 years later than paper)
      -- 2030(?): 1000TB SSD
         [https://www.techradar.com/news/1000tb-ssds-could-become-mainstream-by-2030-as-samsung-plans-1000-layer-nand]

    ii) "..they [file sizes] have not increased by the same margin."
       larger space &
          file size wasn't growing that fast => 
            more files => 
              harder to manage
        [Q: does logic flow?]

    iii) "Google is a verb"
      what they want instead of where it lives
      [a sharp observation]

   * Why hierarchical namespace is a problem?
     -- files are siloed (no need for a global namespace)
        Sounds familiar?
          this is very much like smartphone's logic
          (notice that this paper was 2009; iphone debuted in 2007)
     -- walk the hierarchy (performance is bad, without cache)
     -- concurrency (this is fundamental)

   * hFAD design

     idea: instead of a hierarchical namespace,
           use a tagged, search-based namespace

     [see Fig]
     -- based on an object-based storage device (OSD)
     -- naming (via index) and accessing (via obj store)
     -- "An object is named by one or more tag/value pairs."

     Q: how about updates? deleting a file needs to update multiple
        indexes at the same time.

        Note: index is a data structure that accelerate data retrieval,
        at the cost of more expensive updates and more space.

   * DISCUSSION: search-based fs vs. hierarchical fs
     real question: human-readable vs. machine-readable