Week 12.b
CS 5600
03/29 2023

0. Last time
1. Disks
2. SSDs
3. Intro to file systems
----------------------------------

Admin:

  -- Lab4

0. Last time

   - CPU-I/O interactions, four approaches:
    * port-mapped I/O
    * memory-mapped I/O
    * interrupts
    * via memory

   - {polling, interrupts} X {Programmed IO, DMA}

   - Drivers: a piece of code talking to devices.
     Q: can I use a GPU driver from NVIDIA for AMD's GPU?
     Q: can I use NVIDIA's Windows driver for Linux?


1. Disks

    Disks have historically been *the* bottleneck in many systems

        - This becomes less and less true every year:
        - SSDs (solid state drives) now common; will see in a sec
        - PM (persistent memory) or NVRAM (non-volatile RAM) now available

    [Reference: "An Introduction to Disk Drive Modeling",
    by Chris Ruemmler and John Wilkes. IEEE Computer 1994, Vol. 27,
    Number 3, 1994. pp17-28.]

    A. What is a disk?

    [see handout]

    --stack of magnetic platters

    --Rotate together on a central spindle @3,600-15,000 RPM

    --Arms rotate around pivot, all move together

    --Arms contain disk heads--one for each recording surface

    --Heads read and write data to platters

    [interlude: why are we studying this?

        disks are still widely in use everywhere, and will be for some
        time. Very cheap. Great medium for backup. Better than SSDs for
        durability (SSDs have limited number of write cycles, and decay
        over time)

        Google, Facebook, etc. historically packed their data centers
        full of cheap disks.

        As a second point, it's technical literacy; many filesystems
        were designed with the disk in mind (sequential access
        significantly higher throughput than random access). You have to
        know how these things work as a computer scientist and as a
        programmer.
    ]

    B. Geometry of a disk

    [see handout]

    --track: circle on a platter. each platter is divided into
    concentric tracks.

    --sector: chunk of a track

    --cylinder: locus of all tracks of fixed radius on all platters

    --Heads are roughly lined up on a cylinder

    Question: how many heads do you think could work at the same time?
        --Generally only one head active at a time

    --disk positioning system

        --Move head to specific track and keep it there

        --a *seek* consists of up to four phases:

          --speedup: accelerate arm to max speed or half way point

          --coast: at max speed (for long seeks)

          --slowdown: stops arm near destination

          --settle: adjusts head to actual desired track

          [BTW, this thing can accelerate at up to several hundred g]

    --Question: which have better performance, reads or writes? why?
        [answer: reads. 
           [updated 04/24: was "writes", a typo]

        Here are reasons:

        --settle times takes longer for writes than reads. why?
        --because if read strays, the error will be caught, and the
        disk can retry
        --if the write strays, some other track just got clobbered.
        so write settles need to be done precisely]

    C. Common #s and performance

    --capacity: several TBs (as high as 20TBs)

    --platters: 8

    --number of cylinders: hundreds of thousands

    --sectors per track: ~1000

    --bytes per sector: 4096 (was 512)

    --RPM: ~10000

    --transfer rate: 50-200 MB/s

    (ask this as a question)
    --mean time between failures: ~1 million hours
        (for disks in data centers, it's vastly less; for a provider
        like Google, even if they had very reliable disks, they'd still need
        an automated way to handle failures because failures would
        be common (imagine 10 million disks: *some* will be on the
        fritz at any given moment). so what they do is to buy
        defective and cheap disks, which are far cheaper. lets them
        save on hardware costs. they get away with it because they
        *anyway* needed software and systems -- replication and
        other fault-tolerance schemes -- to handle failures.)

        [recent report: 25,233 hours
         also, they say, "older hard drives are more durable and resilient than
         newer ones."

         https://www.securedatarecovery.com/blog/how-long-do-hard-drives-last]

    D. How driver interfaces to disk

    --Sectors

        [see handout's bootloader code]

        --Disk interface presents linear array of **sectors**

        --traditionally 512 bytes (moving to 4KB)

    --disk maps logical sector # to physical sectors

        --Zoning: puts more sectors on longer tracks

        --Track skewing: sector 0 position varies by track, but let
        the disk worry about it. Why? (for speed when doing
        sequential access)

        --Sparing: flawed sectors remapped elsewhere

    --all of this is invisible to OS. stated more precisely, the OS
    does not know the logical to physical sector mapping.

    --in old days (before 1990ish): the OS specifies a platter, track, sector
    (CHS, Cylinder-head-sector); but who knows where it really is?

    --nowadays, the OS sees a disk as an array of sectors (LBA, logical block
    addressing); normally each sector is of size 512B.

    --Question: how many bits do we need to address a 1TB disk?
      (note: we will simplify here, assuming 1TB=10^40B;
      in reality, in the context of storage, 1TB=1000,000,000,000B,
      or 1 trillion bytes)

      [answer:
        1 sector is 512B = 2^9 Bytes;
        the entire disk has 1TB/512B = 2^40 / 2^9 = 2^31 sectors;
        to address each sector, we need at least 31-bits

        In fact:
        "The current 48-bit LBA scheme was introduced in 2003 with the ATA-6
        standard,[4] raising the addressing limit to 2^48 × 512 bytes, which is
        exactly 128 PiB or approximately 144 PB."
        (from wiki: https://en.wikipedia.org/wiki/Logical_block_addressing)
      ]

    E. Disk scheduling: not covering in class.
       [used to be an important topic]

    F. technology and systems trends

    --unfortunately, while seeks and rotational delay are getting a
    little faster, they have not kept up with the huge growth
    elsewhere in computers.

    --transfer bandwidth has grown about 10x per decade

    --the thing that is growing fast is disk density
    (byte_stored/$). that's because density is less about the
    mechanical limitations

    --to improve density, need to get the head close to the surface.

        --[aside: what happens if the head contacts the surface? called
        "head crash": scrapes off the magnetic material ... and,
        with it, the data.]

    --Disk accesses a huge system bottleneck and getting worse. So
    what to do?

        --Bandwidth increase lets system (pre-)fetch large chunks
        for about the same cost as small chunk.

        --So trade latency for bandwidth if you can get lots of
        related stuff at roughly the same time. How to do that?

        --By clustering the related stuff together on the disk. can
        grab huge chunks of data without incurring a big cost since
        we already paid for the seek + rotation.


    --The saving grace for big systems is that memory size
    is increasing faster than typical workload size

        --result: more and more of workload fits in file cache,
        which in turn means that the profile of traffic to the disk
        has changed: now mostly writes and new data.

        --which means logging and journaling become viable (more on
        this over next few classes)


2. SSD: solid state drives

  [see handout]

  --hardware organization

    --semiconductor-based flash memory

    --storing data electrically, instead of magnetically

    --a flash bank contains blocks
      --blocks (or erase blocks) are of size 128KB or 256KB

    --a block contains pages
      --pages are of 4KB to 16KB

  --operations

    --read: a page

    --erase: a block, resetting all bits to 1

    --program: a page, setting some bits to 0
      (you cannot program a page twice without erasing)

    --(logical) write: a combination of erase and program operations
    --Question: can you imagine how to update a page A in a single block flash?
      (which of course is a little bit too small...)
        [answer:
          1. copy other pages in the block to other places (where? anywhere, memory or disk)
          2. erase the entire block
          3. write page A with wanted contents
          4. copy other pages back to their positions
        ]

  --a bummer: wear-out
    -- a block can bare about 10,000 to 100,000 times erasing,
       then becomes unusable

  [will start from here next time]