Week 11.a
CS 5600
11/16 2021

On the board
------------

1. Last time
2. SSD
3. Intro to file systems
4. Files

-------------------------------------------

Admin:

  --final exam
   --should be the last lecture of this course, but...
     ...[several reasons]...
     ...still confirming with the college.

---------

1. last time

  -- I/O
  -- disk

2. SSD: solid state drives

  [see handout week10.b]

  --hardware organization

    --semiconductor-based flash memory

    --storing data electrically, instead of magnetically

    --a flash bank contains blocks
      --blocks (or erase blocks) are of size 128KB or 256KB

    --a block contains pages
      --pages are of 4KB to 16KB

  --operations

    --read: a page

    --erase: a block, resetting all bits to 1

    --program: a page, setting some bits to 0
      (you cannot program a page twice without erasing)

    --(logical) write: a combination of erase and program operations
    --Question: can you imagine how to update a page A in a single block flash?
      (which of course is a little bit too small...)
        [answer:
          1. copy other pages in the block to other places (where? anywhere, memory or disk)
          2. erase the entire block
          3. write page A with wanted contents
          4. copy other pages back to their positions
        ]

      -- this echos "writes are more expensive than reads"
         which appears in many places.
         (probably something deeper about it.)

  --performance
    --read: tens of us
    --erase: several ms
    --program: hundreds of us

  --a bummer: wear-out
    -- a block can bare about 10,000 to 100,000 times erasing,
       then becomes unusable


  --FTL: flash translation layer

    --read/write logic blocks -->FTL--> read/erase/program physical blocks
      (note, the "blocks" in logical blocks and physical blocks are different:
        logical blocks as in the device interface,
        physical blocks as in the flash hardware)

    --Question: if you were FTL, how would you mitigate wear-outs?
      [answer: evenly spread the erase/program to blocks.]

    --a log-structured FTL

      --idea:
        --when write, appending the write to the next free page
          (called logging).
        --when read, keeping track where the data are by mapping logical data
          to physical pages.

      ** an example:
      --Given a flash bank has two blocks; each has two pages.
      --there are four writes to pages:
        wirte(logic_page_1)   [short as LP1]
        wirte(logic_page_10)  [short as LP10]
        wirte(logic_page_99)  [short as LP99]

      --what will happen:

              +-----------------------------+
      blocks  | block 0 | block 1 | block 2 |
              +---------+---------+---------+
      pages   | P1 | P2 | P3 | P4 | P5 | P6 |
              +----+----+----+----+----+----+
      data    |LP1 |LP10|LP99|    |    |    |
              +----+----+----+----+----+----+

      mapping:
        LP1 => P1, LP10 => P2, LP99 => P3


    Question:
      what will happen if the following op is write(logic_page_1')?
    [answer:

              +-----------------------------+
      blocks  | block 0 | block 1 | block 2 |
              +---------+---------+---------+
      pages   | P1 | P2 | P3 | P4 | P5 | P6 |
              +----+----+----+----+----+----+
      data    |LP1 |LP10|LP99|LP1'|    |    |
              +----+----+----+----+----+----+

      mapping:
        LP1 => P4, LP10 => P2, LP99 => P3

    ]

    --Notice: P1 now contains old (invalid) data and is useless.

    --hence, SSD requires *garbage collection* (GC)
      -- want to GC block0
      -- move the useful pages (i.e., P2[LP10]) in the same block
         to other free places
      -- erase block 0 (now bot P1 and P2 can be used again)

  --complicated internals, hence sometimes unpredictable latency

    --predicting latency? see below

    [Hao, Mingzhe, Levent Toksoz, Nanqinqin Li, Edward Edberg Halim, Henry
    Hoffmann, and Haryadi S. Gunawi. "LinnOS: Predictability on unpredictable
    flash storage with a light neural network.", OSDI'20]


3. Intro to file systems

    --what does a FS do?

      1. provide persistence (don't go away ... ever)

      2. give a way to "name" a set of bytes on the disk (files)

      3. give a way to map from human-friendly-names to "names" (directories)

    --a few quick notes about disks in the context of FS design

    --disk/SSD are the first thing we've seen that (a) doesn't go away;
    and (b) we can modify (BIOS ROM, hardware configuration, etc.
    don't go away, but we weren't able to modify these things). two
    implications here:

        (i) we're going to have to put all of our important state on
        the disk

        (ii) we have to live with what we put on the disk! scribble
        randomly on memory --> reboot and hope it doesn't happen
        again. scribbe randomly on the disk --> now what? (answer:
        in many cases, we're hosed.)

    [ haven't said in class:

    --where are FSes implemented?

      --can implement them on disk, over network, in memory, in NVRAM
      (non-volatile RAM), on tape, with paper (!!!!)

      -- Lab4 simulates a disk that abstracts an array of blocks
        [see handout, panel 1]
    ]


4. Files

    --what is a file?
        --answer from user's view: a bunch of named bytes on the disk
        --answer from FS's view: collection of disk blocks

    --big job of a FS: map name and offset to disk blocks

                                 FS
                   {file,offset} --> disk address

        --operations are create(file), delete(file), read(), write()

    [note that we have seen translation/indirection before:

        page table:

                            page table
            virtual address ----------> physical address


        per-file metadata:

                   inode
            offset ------>  disk block address


        how'd we get the inode?

                       directory
            file name ----------> inode
    ]

    [Analogies:
       per-process memory <-> File
       multiple processes <-> Multiple files
       page table         <-> inode
       memory chip        <-> disk/SSD
    ]

    FS design parameters:

      * small files (most files are small)

      * large files (much of the disk is allocated to large files)

      * sequential access

      * random accesses

      * disk utilization (metadata overhead and fragmentation)

    access patterns we could imagine supporting:

    (i) Sequential:
        --File data processed in sequential order
        --By far the most common mode
        --Example: editor writes out new file, compiler reads in file, etc

    (ii) Random access:
        --Address any block in file directly without passing through
        --Examples: large data set, demand paging, databases

    (iii) Keyed access
        --Search for block with particular values
        --Examples: associative database, index
        --This thing is everywhere in the field of databases,
        search engines, but....
        --...usually not provided by a FS in OS

    Question: if you were a fs designer, which structure will you use for
    different parameters?
      [answer: depends on what workloads the fs going to meet]

    candidate designs........

        A. contiguous
        B. linked files
        C. indexed files

    A. contiguous allocation

      "extent based"
      --when creating a file, make user pre-specify its length, and
      allocate the space at once
      --file metadata contains location and size

      --example: IBM OS/360

        [<free> a1 a2 a3 <5 free> b1 b2 <free> ]

        what if a file c needs 7 sectors?!

      +: simple
      +: fast access, both sequential and random
      -: fragmentation

    B. linked files

        "linked list based"
        --keep a linked list of free blocks
        --metadata: pointer to file's first block
        --each block holds pointer to next one

      +: no more fragmentation
      +: sequential access easy (and probably mostly fast, assuming
         decent free space management, since the pointers will point
         close by)
      -: random access is a disaster
      -: pointers take up room in blocks;
         (overhead: sizeof(pointer) / sizeof(block))
         messes up alignment of data

    C. indexed files

        [DRAW PICTURE]

        --Each file has an array holding all of its block pointers
        --like a page table, so similar issues crop up

        --Allocate this array on file creation

        --Allocate blocks on demand (using free list)

        +: sequential and random access are both easy
        -: need to somehow store the array

        --large possible file size --> lots of unused entries in the
        block array

        --large actual block size --> huge contiguous disk chunk
        needed

        --solve the problem the same way we did for page tables.

        --okay, so now we're not wasting disk blocks, but what's the
        problem? (answer: equivalent issues as for page table
        walking: here, it's extra disk accesses to look up the blocks)

        --this motivates the classic Unix file system 

        [we will learn Unix file in the next class]