Week 11.b
CS6640
11/16 2023
https://naizhengtan.github.io/23fall/

1. intro to fs
2. Unix files
3. what about now?
   -- PMem introduction
   -- modern(?) file mappings
---

Last time: page fault

  * memory-mapped files
    * idea: allow access to files using load and store
      can easily read and writes part of a file
      e.g., don't have to change offset using lseek system call
    * Unix systems a new system call for m-mapped files:
      void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
    * kernel page-in pages of a file on demand
      when memory is full, page-out pages of a file that are not frequently used

1. Intro to file systems

    Q: what does a FS do?

      1. provide persistence (don't go away ... ever)

      2. give a way to "name" a set of bytes on the disk (files)

      3. give a way to map from human-friendly-names to "names" (directories)

    --persistence comes from hardware:

    * classic persistent storage
      SSD/disk/tape (yes, we are still using tapes)

      --disk/SSD are the first thing we've seen that (a) doesn't go away;
      and (b) we can modify (BIOS ROM, hardware configuration, etc.
      don't go away, but we weren't able to modify these things).
      two implications here:

        (i) we're going to have to put all of our important state on
        the disk

        (ii) we have to live with what we put on the disk! scribble
        randomly on memory --> reboot and hope it doesn't happen
        again. scribbe randomly on the disk --> now what? (answer:
        in many cases, we're hosed.)

    * new persistent storage:
      --glass:
        Project Silica: https://www.microsoft.com/en-us/research/project/project-silica/

      --persistent memory (see below)

    --disk/ssd abstraction
      an array of blocks

    --a note about abstracting machines
      -- CPU: scheduler
      -- Memory: virtual memory
      -- disk: file system

2. Files

    --what is a file?
        --answer from user's view: a bunch of named bytes on the disk
        --answer from FS's view: collection of disk blocks

    Q: what does a file do?

    --big job of a FS: map name and offset to disk blocks

                                 FS
                   {file,offset} --> disk addrss

        this is called file mapping.

    --the problem: "file mapping"
                 (file, offset) -> disk addr
      "70% of the time spent on file mapping" by "the rethinking paper" (see below)

    --Unix inode: how Unix does file mapping

        [draw on board]

        permisssions
        times for file access, file modification, and inode-change
        link count (# directories containing file)
        ptr 1  --> data block
        ptr 2  --> data block
        ptr 3  --> data block
        .....
        ptr 11  --> indirect block 
                      ptr --> 
                      ptr --> 
                      ptr --> 
                      ptr -->
                      ptr -->
        ptr 12 --> indirect block
        ptr 13 --> double indirect block
        ptr 14 --> triple indirect block

    This is just an imbalanced tree.

   --Unix fs: (file, offset) -[inode]-> disk addr
    Why is inode designed the way in Unix?
    "fs is a disk data structure":
      -- granularity: block
      -- read/write a block is expensive
      -- sequential access > random access

   [skipped]
   Q: How does Unix fs reads/writes files?

          app
         /   \
        /     \
      mmap   read/write
     ---------------------
        \     /  [kernel]
         \   /
      page cache
           |
          disk

   --Unix-style file mappings works well in the past when we have
      single-core CPU,
      slow IO,
      sequential access > random access,
      read/write granularity is a block,...
    ...but, do these still hold today?

 Q: assume that memory is persistent,
    then do we need fs?
    how will that fs look like?

3. what about now?

 A) PMem introduction

  Persistent memory or Non-volatile memory

  storage hierarchy:
   [see also handout]

         | registers    \
         | L1-L3 caches  \
         | DRAM           \
         | [Pmem]          \
         | SSD/disk         \
         | network fs (NFS)  \

  features:
    -- fastest persistent storage
    -- largest byte-addressable storage
    -- memory DIMM
    -- (potentially) managed by MMU

  the only implementation---Intel Optane PMem
  multiple modes, but what we care is called "App Direct Mode",
    where Pmem is persistent.

  performance characteristics (compared with DRAM)
    -- latency
    -- throughput
    -- concurrency
    -- access size

 B) Rethinking File Mapping for Persistent Memory
    [Neal, Ian, Gefei Zuo, Eric Shiple, Tanvir Ahmed Khan, Youngjin Kwon, Simon
    Peter, and Baris Kasikci. "Rethinking File Mapping for Persistent Memory.", FAST'21]

  * Q: does file mapping has to be persistent? pros? cons?
     -- on persistent storage
     -- in DRAM
     -- on storage with a cache

  * high-level view:
      reads: retrieve file mapping
      writes: update file mapping + block allocator

 * FS design parameters:

   - small files (most files are small)
       vs.
     large files (much of the disk is allocated to large files)

   - sequential access vs. random accesses

   - prefetching

   - disk utilization (metadata overhead and fragmentation)

 C) design choices

  1. per-file extent tree

    extent: a continuous allocation
    causing fragmentation when appending, deleting, resizing
    binary search to find the extent
    O(log(N)) where N is the number of extents

    Q: pros and cons?
      pros:
       + sequential access
       + disk utilization
       + prefetching
      cons:
       - large file

  2. per-file radix tree

    much like page table in virtual memory
    a tree; on each level, an index in the "addr" (logical block id)
        will choose the next node

    Q: pros and cons?
      pros:
       + performance stability: one memory access per level
      cons:
       - disk utilization (compared with others)

  3. global cuckoo hash table

   * cuckoo hashing
     how it works:
     * two hashing functions
     * invariant: you can always find the value in one of the two
                  hashing location
     * read:   key -[cuckoo]-> location
     * insert: for a conflicting insert, kick an existing key out; redo
               insertion for that key; repeat

    Q: what do the keys look like?
       "(<inum>, <logical block>)"

    Q: what will happen if "read(8KB-of-data)"?

    Q: what will happen if "append(4KB-of-data)"?
       insertion to the hash + allocate a block (<= can be expensive)

    Q: pros and cons?
     pros:
      + disk utilization
     cons:
      - sequential read
      - unpredictable and long update latency

  4. global (traditional) hash table: HashFS
     design goal: better insertion
       (what's wrong with cuckoo hashing insertion?
        need talk to a "block allocator")

     co-design: file mapping + block allocator
     a trick: use index as offset

     Q: how does "read(file_num=1, logical_block=21)" work?

     Q: how does "append(file_num=1, logical_block=22)" work?

     Q: pros and cons?
       pros:
         + fast retrieval (compute a hash function)
         + disk utilization
       cons:
         - expensive hash collision
         - prefetching

  [skipped]
  * DISCUSSION: per-file mapping vs. gloabl mapping

    Q: Why not per-file hash mapping? What's wrong with it?
    A: no idea of the file size.

    Q: Why not global extent trees or radix trees? What's wrong with them?
    A: pull all files in the same namespace
       (know ahead of time about the number of files)

    per-file:
      + locality
      + concurrency
      - resizing =>
        - multiple levels of indirections =>
          - fragmentation

      Q: but what forces per-file designs to resize?
         true question: can fs plan ahead of time?

    global:
      + no resizing => no fragmentation
      + mapping size
      - locality (can be mitigated by caching)