Week 2.b
CS7670
09/14 2022
https://naizhengtan.github.io/22fall/


1. Pmem introduction
2. Rethinking file mapping

---

0. Admin

  ask if encountering challenges in Lab1

  introduce some of the conferences
    FAST, SOSP, OSDI, EuroSys, VLDB, SIGMOD

  revisit Unix fs: a disk data structure

1. (5min) Pmem introduction

  [this is a brief introduction; will cover Pmem in later modules]

  Persistent memory or Non-volatile memory

  draw storage hierarchy:
   [see also handout]

         | registers    \
         | L1-L3 caches  \
         | DRAM           \
         | [Pmem]          \
         | SSD/disk         \
         | network fs (NFS)  \

  features:
    -- fastest persistent storage
    -- largest byte-addressable storage
    -- memory DIMM
    -- (potentially) managed by MMU

  architecture
   [see handout]

  the only implementation---Intel Optane PMem
  multiple modes, but what we care is called "App Direct Mode",
    where Pmem is persistent.

  naunce of "persistent write" (Intel Optane)
    1. store + clwb: write back (w/o evict) cache lines
    2. nstore: non-temporal stores,
               write directly to memory and bypass caches and

  [see handout]
  performance characteristics
    (compared with DRAM)
    -- latency
    -- throughput
    -- concurrency
    -- access size

2. Rethinking File Mapping for Persistent Memory

 A) the problem

  * the problem: "file mapping"
               (file, offset) -> disk addr
  * they claim: "70% of the time spent on file mapping"

    Q: in Unix fs, how to find such a mapping of
         "(/tmp/hello.c, 5121)" -> disk addr?
    A: (if first time)
       1. find root inode,
       2. find "tmp" inode (a dir),
       3. find "hello.c" inode (a regular file),
       4. fetch the ptr block (pointed by the indirect ptr)
       5. fetch the data block (pointed by the first ptr in the ptr block)

  [skipped]
  * Unix fs: (file, offset) -[inode]-> disk addr
    Why is inode designed the way in Unix?
    "fs is a disk data structure":
      -- granularity: block
      -- read/write a block is expensive
      -- sequential access > random access

  Q: the paper says, "eliminating memory copies and bypassing the kernel"? (first paragraph)
     How to understand this on Unix fs?

          app
         /   \
        /     \
      mmap   read/write
     ---------------------
        \     /  [kernel]
         \   /
      page cache
           |
          disk


  * DISCUSSION:
    does file mapping has to be persistent? pros? cons?
    mainly three ways (S2.2)
     -- on persistent storage
     -- in DRAM
     -- on storage with a cache

  [skipped]
  * JARGON: "shadow paging"
    "Shadow paging is a copy-on-write technique for avoiding in-place updates of
    pages. Instead, when a page is to be modified, a shadow page is allocated.
    Since the shadow page has no references (from other pages on disk), it can be
    modified liberally, without concern for consistency constraints, etc."---wiki

    solving a different problem: crash consistency


  * high-level view:
      reads: retrieve file mapping
      writes: update file mapping + block allocator

  [skipped]
  * DISCUSSION:
    the paper says,
    "PM file systems generally map files at block granularity of at least
    4KB in order to constrain the amount of metadata required to track file
    system space.", above S2.1

    How to understand this? and is this still true?


 B) challenges and non-challenges

  Challenges
    -- concurrency (a new problem)
    -- fragmentation
    -- locality
    -- mapping size

  Q: How Unix fs tackles the latter three challenges?
    -- fragmentation: tree-like inode
    -- locality: caching; inode (meta-data + ptrs)
    -- mapping size: fine

  * concurrency

    a relatively new problem.

    Q: think of Unix fs; what will happen when having multiple
       concurrent fs ops to one file?

       case1: multiple reads?
       case2: reads and writes to the same location?

    Q: what about dir operations?

       Example: 
       Linux has "int rename(const char *oldpath, const char *newpath);"
         T1: rename("/tmp/a/b", "/tmp/b");
         T2: rename("/tmp/a", "/tmp/c");
       Think of Unix fs; what may be the final results?

    [skipped]
    Q: what is the "correctness" spec of concurrent fs ops?
       [introduce sequential consistency]

  * fragmentation
    "a file's data is spread across non-contiguous physical locations on a
    storage device"

    Q: why fragmentation is a problem (an unwanted phenomenon)?
       Sequential reads and writes are still preferred (will be faster).

  * locality
    "Accesses with locality are typically accelerated by caching prior
    accesses and prefetching adjacent ones."

    Q: Is locality really the problem?
       the true problem is where is the mapping info stored,
        in PMem, DRAM, or CPU cache?

  * mapping size

    Q: Again, is mapping size the true problem?
      if you can have 99% of the space for mapping, what will happen?
        [toy solution: a gigantic hash table that barely has collision (an O(1) access)]
      assume we have inf large CPU cache, do people really care the size of the meta-data?
        [no, we cache the gigantic hash table in the CPU cache; done]

  [my opinion:
    for "fragmentation", "locality", and "mapping size",
      the true underlying problem is caching or more specifically:
      where the mapping info locates in the hardware (CPU cache vs. DRAM vs. PMem)?
  ]


  Non-challenges
    -- page caching
         (discussed earlier)
    -- crash consistency
         (an orthogonal problem/challenge)


 C) four design choices

   [next time]