Week 11.b
CS7670
11/16 2022
https://naizhengtan.github.io/22fall/


1. FUSE basics
2. stackfs implementation
3. FUSE performance
---

1. FUSE basics

  * a fundamental trade-off in systems:
    low-level vs. high-level


  Q: the 2nd paragraph, which one type lfs falls into?

  * FUSE workflow:
    [draw Fig1]

    some examples:
      [see handout]
      a)  open("/tmp/a", flag)
      b)  read(fd, buf, 4096)

  * FUSE implementation
    Q: what is an "interrupt"? what is "foget"?
      -- interrupt: issued by kernel
      -- forget: issued by user-space daemon, remove the inode from cache

  * API levels

    Checkout: https://www.fsl.cs.stonybrook.edu/docs/fuse/fuse-article-appendices.html

    Q: Shall lfs use high-level APIs or low-level ones?
       -- high-level skips the implementation of the path-to-inode mapping
       -- low-level has "lookup" which translate path-to-inode

  * five queues
     [draw Fig2]

     -- interrupts
     -- forgets
     -- pending
     -- processing
     -- background (what requests will go here?)

     Q: what's the status transformation graph of a read op?

        read op -> [background] -> [pending] -> [processing] -> done

    Q: what policy you will choose for four queues?  (page 61)
       WHY four, instead of five?

    Q: when will tasks in background move to pending?
       12 async tasks (page 62)

    Q: what will happen if FUSE meets congestion? (page 62)

    DISCUSSION: dropping or not dropping?
      tput and latency graph

  * FUSE optimizations

    -- Splicing: important for lsf
      [see handout]

      Linux syscall:

      ssize_t splice(int fd_in, off64_t *off_in, int fd_out,
                        off64_t *off_out, size_t len, unsigned int flags);

    -- multi-threading for user-daemon

    -- Write back cache and max writes

2. stackfs

   [see FUSE op implementation here:
    https://github.com/sbu-fsl/fuse-stackfs/blob/master/StackFS_LowLevel/StackFS_LowLevel.c]

  * inode
    path to the underlying file
    inode number
    reference counter

  * inode number is the address in memory

  * inodes are stored in a hash table

  Q: can you imagine how stackfs works?
    for a file create? [read handout]
      "insert(lo_data, lo_inode)"

3. FUSE performance

  Q: how to understand the statistics?
     read 2nd paragraph of S3.2
    -- row: request type
    -- col: time
    -- cell: #request, happened in the past in 2^{N+1}-2^{N+2} ns
    [draw onboard]

  * hardware: HDD and SDD
  Q: what do you expect the FUSE overhead for now? larger? or smaller?

  * optimizations:
    (1) writeback cache + batch multiple write pages
    (2) multi-threading
    (3) splice (memory copy)

  * read the observations 1--4, 5--8 in S5.1

 Q: ob2, how come there is an improvement?
    A: read ahead 128KB

 Q: Observation 3: why perf became worse for files-rd-1th?
   (1) this is read
   (2) this is single-thread
   (3) read a page at at time

  WHY adding overheads?

 Q: Observation 4: why is create expensive for stackfs?
   allocate inode in hashtable

 Q: ob5: why seq reads trigger overheads with concurrency (32threads)
    for HDD but not SSD?

    limited by the single-thread user-daemon mode (for base)
     cannot saturate bandwidth

    Q: why multi-threading opt also doesn't work well?

     limited by the 12 background events.

* performance summary

  * Let's focus on data not meta-data, and ignore CPU overheads.
  Q: can you summarize the FUSE perf?

  * Queuing effect:
    * long latency
    * tput-wise: if we can saturate the bandwidth
             can: concurrently do things
             cannot: wait for unfinished ones; cannot do new tasks

  * look at the last column; bad ones:
      -- rnd-rd-1th-1f (4KB, 32KB)
      -- rnd-rd-32th-1f (4KB)
      -- rnd-wr-1th-1f (4KB)
      -- rnd-wr-32th-1f (4KB)