Week 10.a
CS3650
03/11 2024
https://naizhengtan.github.io/24spring/

1. I/O architecture and storage devices
2. Intro to file systems
3. Files
----

1.  I/O architecture and storage devices

    general:
    [show slides: CPU, Mem, I/O, connected to BUS]

    --devices: lots of details
    --CPU/device interaction

    --storage devices
      --Disk
      --SSD

    --brief introduction to Disk
    [show slides]

    --Question: sequential accesses to disk is much faster than
      random accesses. Why?

    --Question: for disk, comparing reads or writes, what relative performance
      do you think? why?
      [answer: reads are faster.]

    --Question: what will happen if you shake your disk while using it?
      [answer: NO, BAD IDEA. DON'T DO IT.
      see:
        https://www.youtube.com/watch?v=tDacjrSCeq4&ab_channel=BryanCantrill
      ]

    --intro to SSD
    [show slides]

    --a bummer: wear-out
      --a block can bare a limited number of times (~10K-100K) times writes
      (erasing), then becomes unusable

    --Question: if there is a workload constantly modifying one file,
      how can software system help with the case?
      [answer: spread out the writes evenly to all blocks]

    --fancy new storage devices:
      ** low latency?
         persistent memory
      ** long-term data storage?
         project Silica
         https://www.microsoft.com/en-us/research/project/project-silica/

2. Intro to file systems

   Question: in you eyes, what does fs do?

    --what does a FS do?

      1. provide persistence (don't go away ... ever)

      2. give a way to "name" a sequence of bytes on the disk (files)

      3. give a way to map from human-friendly-names to "named bytes" (directories)

    --a few quick notes about disks in the context of FS design

    --disk/SSD are the first thing we've seen that (a) doesn't go away;
    and (b) we can modify (BIOS ROM, hardware configuration, etc.
    don't go away, but we weren't able to modify these things). two
    implications here:

        (i) we're going to have to put all of our important state on
        the disk

        (ii) we have to live with what we put on the disk! scribble
        randomly on memory --> reboot and hope it doesn't happen
        again. scribbe randomly on the disk --> now what? (answer:
        in many cases, we're hosed.)

    --where are FSes implemented?

      --can implement them on disk, over network, in memory, in NVRAM
      (non-volatile RAM), on tape, with paper (!!!!)

      -- Lab4 simulates a disk that abstracts an array of blocks

3. Files

    --what is a file?
        --answer from user's view: a bunch of named bytes on the disk
        --answer from FS's view: collection of disk blocks

    --operations regarding files are create(file), delete(file), read(), write()

    --big job of a FS: map name and offset to disk blocks

                                 FS
                   {file,offset} --> disk address

      This is also called file mapping.

    -- note that this is a translation

        per-file:

                   SOMETHING
            offset --------->  disk block address

            In fs, this SOMETHING is called inode.

        Then, how'd we get the inode?

                       directory
            file name ------------> inode


   Question: Given an example of a file mapping.
        if you were a fs designer, which structure will you use for file mapping?
        [answer: it depends on...parameters]

   - File design parameters:

      * small files (most files are small)
          vs.
        large files (much of the disk is allocated to large files)

      * disk utilization (metadata overhead and fragmentation)

      * access patterns:
          sequential access vs. random accesses

        (i) Sequential:
            --File data processed in sequential order
            --By far the most common mode
            --Example: editor writes out new file, compiler reads in file, etc

        (ii) Random access:
            --Address any block in file directly without linear scanning
            --Examples: large data set, demand paging, databases

   Question: which structure will you use for different parameters?

   - candidate file mapping designs

        A. contiguous
        B. linked files
        C. indexed files

    A. contiguous allocation

      "extent based"
      --when creating a file, make user pre-specify its length, and
      allocate the space at once
      --file metadata contains location and size

      --example: IBM OS/360

        [<free> a1 a2 a3 <5 free> b1 b2 <free> ]

        what if a file c needs 7 sectors?!

      +: simple
      +: fast access, both sequential and random
      -: fragmentation

       [somewhat regain interests due to PM:
        Li, Ruibin, Xiang Ren, Xu Zhao, Siwei He, Michael Stumm, and Ding
        Yuan. "ctFS: Replacing file indexing with hardware memory translation
        through contiguous file allocation for persistent memory." FAST'22]

    B. linked files

        "linked list based"
        --keep a linked list of free blocks
        --metadata: pointer to file's first block
        --each block holds pointer to next one

      +: no more fragmentation
      +: sequential access easy (and probably mostly fast, assuming
         decent free space management, since the pointers will point
         close by)
      -: random access is a disaster
      -: pointers take up room in blocks;
         (overhead: sizeof(pointer) / sizeof(block))
         messes up alignment of data

    C. indexed files

        [DRAW PICTURE]

        --Each file has an array holding all of its block pointers

        --Allocate this array on file creation

        --Allocate blocks on demand (using free list or bitmap)

        +: sequential and random access are both easy
        -: need to somehow store the array

        --large possible file size --> lots of unused entries in the
        block array

        --large actual block size --> huge contiguous disk chunk
        needed

        --this motivates the classic Unix file system 
        [continue from here next time]