Week 13.a
CS 5600
04/11 2022

On the board
------------

1. Last time
2. Files
3. Directories
4. fs5600 (Lab4)
   A. fs5600 disk
   B. data structures
   C. interfaces

-------------------------------------------

Admin:

-Lab4 is released

---------

1. Last time

   --SSD
     some unique features:
     * read/erase/program
     * cannot program twice without erasing
     * wear-out

   --fs, three goals
     (i) persistance
     (ii) name data => file
     (iii) locate data => directory
       [do we really need (iii)? a bold question]


2. Files

    --what is a file?
        --answer from user's view: a bunch of named bytes on the disk
        --answer from FS's view: collection of disk blocks

    --big job of a FS: map name and offset to disk blocks

                                 FS
                   {file,offset} --> disk address

        [also called file mapping.
         see an introduction in S2 of this paper:
           (which btw is a cool paper about persistent memory)
          Neal, Ian, Gefei Zuo, Eric Shiple, Tanvir Ahmed Khan, Youngjin Kwon, Simon
          Peter, and Baris Kasikci. "Rethinking File Mapping for Persistent Memory.", FAST'21
        ]
        --operations are create(file), delete(file), read(), write()

    **note that we have seen translation/indirection before:

        page table:

                            page table
            virtual address ----------> physical address


        per-file metadata:

                   inode
            offset ------>  disk block address


        how'd we get the inode?

                       directory
            file name ----------> inode
    

    [Analogies:
       per-process memory <-> File
       multiple processes <-> Multiple files
       page table         <-> inode
       memory chip        <-> disk/SSD
    ]

    FS design parameters:

      * small files (most files are small)
          vs.
        large files (much of the disk is allocated to large files)

      * access patterns:
          sequential access vs.
          random accesses vs. 
          keyed accesses

      * disk utilization (metadata overhead and fragmentation)

    access patterns we could imagine supporting:

    (i) Sequential:
        --File data processed in sequential order
        --By far the most common mode
        --Example: editor writes out new file, compiler reads in file, etc

    (ii) Random access:
        --Address any block in file directly without linear scanning
        --Examples: large data set, demand paging, databases

    (iii) Keyed access
        --Search for block with particular values
        --Examples: associative database, index
        --This thing is everywhere in the field of databases,
        search engines, but....
        --...usually not provided by a FS in OS

    Question: if you were a fs designer, which structure will you use for
    different parameters?
      [answer: depends on what workloads the fs going to meet]

    candidate designs........

        A. contiguous
        B. linked files
        C. indexed files

    A. contiguous allocation

      "extent based"
      --when creating a file, make user pre-specify its length, and
      allocate the space at once
      --file metadata contains location and size

      --example: IBM OS/360

        [<free> a1 a2 a3 <5 free> b1 b2 <free> ]

        what if a file c needs 7 sectors?!

      +: simple
      +: fast access, both sequential and random
      -: fragmentation

       [somewhat regain interests due to PM:
        Li, Ruibin, Xiang Ren, Xu Zhao, Siwei He, Michael Stumm, and Ding
        Yuan. "ctFS: Replacing file indexing with hardware memory translation
        through contiguous file allocation for persistent memory." FAST'22]

    B. linked files

        "linked list based"
        --keep a linked list of free blocks
        --metadata: pointer to file's first block
        --each block holds pointer to next one

      +: no more fragmentation
      +: sequential access easy (and probably mostly fast, assuming
         decent free space management, since the pointers will point
         close by)
      -: random access is a disaster
      -: pointers take up room in blocks;
         (overhead: sizeof(pointer) / sizeof(block))
         messes up alignment of data

    C. indexed files

        [DRAW PICTURE]

        --Each file has an array holding all of its block pointers
        --like a page table, so similar issues crop up

        --Allocate this array on file creation

        --Allocate blocks on demand (using free list or bitmap)

        +: sequential and random access are both easy
        -: need to somehow store the array

        --large possible file size --> lots of unused entries in the
        block array

        --large actual block size --> huge contiguous disk chunk
        needed

        --solve the problem the same way we did for page tables.

        --okay, so now we're not wasting disk blocks, but what's the
        problem? (answer: equivalent issues as for page table
        walking: here, it's extra disk accesses to look up the blocks)

        --this motivates the classic Unix file system 

    --Unix inode:

        [draw on board]

        permisssions
        times for file access, file modification, and inode-change
        link count (# directories containing file)
        ptr 1  --> data block
        ptr 2  --> data block
        ptr 3  --> data block
        .....
        ptr 11  --> indirect block 
                      ptr --> 
                      ptr --> 
                      ptr --> 
                      ptr -->
                      ptr -->
        ptr 12 --> indirect block
        ptr 13 --> double indirect block
        ptr 14 --> triple indirect block

    This is just a tree.

    [deja vu? 
    that's right, we talked similar things when we argue for
    different designs of page tables.]

    Question: why is this tree intentionally imbalanced?

        (Answer: optimize for short files. each level of this tree
        requires a disk seek...)

    Pluses/minuses:

    +: Simple, easy to build, fast access to small files

    +: Maximum file length can be enormous, with
       multiple levels of indirection 

    -: worst case # of accesses pretty bad

    -: worst case overhead (such as 11 block file) pretty bad

    -: Because you allocate blocks by taking them off unordered
       freelist, metadata and data get strewn across disk


  * Notes about inodes:

    --stored in a fixed-size array

    --Size of array fixed when disk is initialized; can't be changed
      [why? easier for fs to find inodes,
            and fewer disk accesses (better performance!)]

    --Multiple inodes in a disk block
      (which is not the case for your Lab4)

    --Question: how many files can the following toy-fs has?
        sizeof(inode) = 128B
        sizeof(block) = 512B
        toy-fs uses 1000 blocks to store inodes

       [answer:
         the number of inodes (hence files) is
           (521B / 128B) * 1000 = 4000
       ]

       -- use "$ df -i ~" to see how many inodes you can use.

    --inode lives in known location, originally at one side of disk,
    now lives in pieces across disk (helps keep metadata close
    to data)

    --The index of an inode in the inode array is called an
    ***i-number***

    --Internally, the OS refers to files by i-number

    --When a file is opened, the inode brought in memory

    --Written back when modified and file closed or time elapses


3. Directories

    --Problem: "Spend all day generating data, come back the next
    morning, want to use it."  F. Corbato, on why files/dirs
    invented.

    [skipped, dir history

      --Approach 0: Have users remember where on disk their files are

      --like remembering your social security or bank account #

      --yuck. (people want human-friendly names.)

      --So use directories to map names to file blocks, somehow

      --But what is in directory?

      --A short history of directories

      --Approach 1: Single directory for entire system

          --Put directory at known location on disk

          --Directory contains <name,inumber> pairs

          --If one user uses a name, no one else can

          --"Many ancient personal computers work this way"
          [I heard this as an anecdote; never saw one myself...]

      --Approach 2: Single directory for each user

          --Still clumsy, and "ls" on 10,000 files is a real pain
          --(But some oldtimers still work this way)

      --Approach 3: Hierarchical name spaces. 

          --Allow directory to map names to files ***or other dirs***

          --File system forms a tree (or graph, if links allowed)

          --Large name spaces tend to be hierarchical

          --examples: IP addresses, domain names, scoping in
          programming languages, etc.

          --more generally, the concept of hierarchy is everywhere
          in computer systems
    ]

    --Hierarchial Unix

      --used since CTSS (1960s), and Unix picked it up and used it
      nicely

      --structure like:
        [draw: 
                      "/"
           bin/     dev/     tmp/    usr/
         ls, grep    ...
        ]

    --directories stored on disk just like regular files

        --here's the data in a directory file; this data can be in the
        *data blocks* of the directory or else in the inode of the
        directory.

          [<name, inode#>]
           <bin, 1021>
           <dev, 1001>
           <sbin, 2011>
           ....

        --i-node for directory contains a special flag bit

        --only special users can write directory files

    --key point: i-number might reference another directory

        --this neatly turns the FS into a hierarchical tree, with
        almost no work

    --bootstrapping: where do you start looking?

        --root dir always inode #2 (0 and 1 reserved)

        --and, voila, we have a namespace!

    --special names: "/", ".", ".."

    --given those names, we need only two operations to navigate the
    entire name space:

        --"cd name": (change context to directory "name")
        --"ls": (list all names in current directory)

    --example:

                /a/foo.c
                /b/c/essay.txt

            what does the file system look like?

            [ i0 ... i7 || [block ] [ block ] [block ] .....]

        [Draw picture]

      [an argument that hierarchical fs is no longer relevant:
        Margo Seltzer, Nicholas Murphy. Hierarchical File Systems are Dead. HotOS'09

       but...we're still using them in 2022...
      ]

    * links:

        --hard link: multiple dir entries point to same inode; inode
        contains refcount

        "$ ln /tmp/a /tmp/b": creates a synonym ("b") for file ("a")

          --how do we avoid cycles in the graph? (answer: can't
          hard link to directories).

        --soft link: synonym for a *name*

        "$ln -s /tmp/a /tmp/sb":

          --creates a new inode (sb), not just a new directory entry

          --new inode has "sym link" bit set

          --contents of that new file:

              "/tmp/a"

        --Question: what will happen:
          "$ rm /tmp/a; cat /tmp/b; cat /tmp/sb"?

          [answer: try it yourself]

        --Question: can I create soft-link cycles?
          [answer: yes you can. Try:
             $ mkdir /tmp/a; mkdir /tmp/b
             $ ln -s /tmp/a /tmp/b/a
             $ ln -s /tmp/b /tmp/a/b
          ]


4. CS5600 File System (Lab4)

  [this is an (almost) duplication of the Lab4 instructions.]

  - a Unix-like fs
    -- with many simplifications
    -- for example, dirs are no deeper than 10

  A. fs5600 disk 

    - an abstract disk
      (the overall system format)

      -- a block is 4KB

     +-------+--------+----------+------------------------+
     | super | block  | root dir |     data blocks ...    |
     | block | bitmap |   inode  |                        |
     +-------+--------+----------+------------------------+
    block     0        1         2          3 ...

    - talk to the disk?
      -- read or write one or multiple blocks

      [see handout week12.b, panel 1]

  B. important data structures

    ** superblock
      [see handout week12.b, panel 2]

    ** block bitmap
      -- tells fs5600 which blocks are free
      -- one bit represents one block

      --QUESTION: what's the maximum size of a fs5600?
      [answer: 128MB = 4096 * 8 * 4KB]

    ** file and inode

      [see handout week12.b, panel 2]

      -- metadata
        [or stat, see full file metadata: https://pubs.opengroup.org/onlinepubs/7908799/xsh/sysstat.h.html]

        -- uid, gid: user and group of this file
        -- mode (see below)
        -- ctime: changed time, time of last status (metadata) change
        -- mtime: modified time, time of last data modification
        (fs5600 skips the access time in Unix)
        -- size: file size in bytes

      -- ptrs

        [draw on board]

      --Question: what's the maximum size of a file?
      [answer:
        number of ptrs = (4KB - 20B) / 4B = 1019
        file size = 1019 * 4KB = 4076KB ~= 4MB
      ]

    ** mode (uint32_t)
     [see handout]

      |<-- S_IFMT --->|           |<-- user ->|<- group ->|<- world ->|
      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
      | F | D |   |   |   |   |   | R | W | X | R | W | X | R | W | X |
      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

      -- R: read
      -- W: write
      -- X: execute

      -- F bit: 0100000 (the 16th bit from right)
      -- D bit: 0040000 (the 15th bit from right)

   ** directory

     -- recall that directory is used to map name to #inode

     [see handout week12.b, panel 2]
     [draw on board as well]

     --Question: how many files can appear in one directory?

     [answer:
      one fs_dirent is 32B (see handout week12.b, panel 2)
      A file is maximum of 4076KB.
      So there can be 127 * 1024 files,
        or roughly 128K (~= 4MB/32B)
     ]

       [update 04/21:
         the above answer is confusing when you come back from Lab4. It meant
         to be a question asking if a dir inode could use all data blocks (1019
         of them), then how many files a dir can contain. But the true fs5600
         only allows one data block for a dir inode, so the max files is
         4KB/32B=128 (said below).
       ]

     -- fs5600 only supports using 1 data block,
        meaning 128 files at maximum.

  C. interface

    [next time]