Week 12.a
CS3650
03/25 2024
https://naizhengtan.github.io/24spring/

1. journaling
2. intro to networking
3. packet switching
------

Admin
- lab3, cheating detection
- collapsing assignment systems
- midterm regrade
- lab4

1. Journaling

      -- Copy on write showed that crash consistency is achievable when
      modifications **do not** modify (or destroy) the current copy. 

      Golden rule of atomicity, per Saltzer-Kaashoek:
      "never modify the only copy"

      -- Problem is that copy-on-write carries significant write and space overheads.
      Want to do better without violating the golden rule of atomicity.

      -- Going to do so by borrowing ideas from how transactions are implemented in databases.

      -- Core idea: Treat file system operations as transactions. Concretely, this means that
         after a crash, failure recovery ensures that:
          * Committed file system operations are reflected in on-disk data structures.
          * Uncommitted file system operations are not visible after crash recovery.

      -- Core mechanism: Record enough information to finish applying committed operations 
         (*redo operations*) and/or roll-back uncommitted operations (*undo operations*). 
         This information is stored in a redo log or undo log. Discuss this in detail next.

  --concept: commit point---the point at which there's no turning back.

      --actions always look like this:
      --first step
      ....            [can back out, leaving no trace]
      --commit point
      .....           [completion is inevitable]
      --last step

      --Question: what's commit point when buying a house?

      --Question: what's the commit point in in the copy-on-write
        protocol above?

        [answer: the uberblock is updated.]

      -- Redo logging
          * Used by Ext3 and Ext4 on Linux, going to discuss in that context.

          * Log is a fixed length ring buffer placed at the beginning of the disk
            (see handout).

          * Basic operations

              Step 1: planning
              filesystem computes what would change due to an operation. For instance,
              creating a new file involves changes to directory inodes, appending to a file 
              involves changes to the file's inode and data blocks.

              Step 2: begin txn
              the file system computes where in the log it can write this transaction,
              and writes a transaction begin record there (TxnBegin in the handout). This 
              record contains a transaction ID, which needs to be unique. The file system 
              **does not** need to wait for this write to finish and can immediately proceed to
              the next step.

              Step 3: journal write
              the file system writes a record or records detailing all the changes it computed in 
              step 1 to the log. The file system **must** now wait for these log changes and
              the TxnBegin record (step 2) to finish being written to disk.

              Step 4: commit txn
              once the TxnBegin record, and all the log records from step 3 have been
              written, the system writes a transaction end record (TxnEnd in the handout). 
              This record contains the same transaction ID as was written in Step 2, and the 
              transaction is considered committed once the TxEnd has been successfully written to disk.

              Step 5: checkpointing
              Once the TxnEnd record has been written, the filesystem asynchronously
              performs the actual file system changes; this process is called **checkpointing**. 
              While the system is free to perform checkpointing whenever it is convenient, 
              the checkpoint rate dictates the size of the log that the system must reserve.

          --Question: which step is  the commit point?
              [answer: step 4; why? see recovery below]

          * Crash recovery: During crash recovery, the filesystem needs to read through the logs,
            determine the set of **committed** operations, and then apply them. Observe that:
            -- The filesystem can determine whether a transaction is committed or not by comparing 
               transaction IDs in TxnBegin and TxnEnd records.
            -- It is safe to apply the same redo log multiple times. 

            Operationally, when the system is recovering from a crash, the system 
            does the following:

              Step 1: The file system starts scanning from the beginning of the log. 
              Step 2: Every time it finds a TxnBegin entry, it searches for a 
                  corresponding TxnEnd entry.
              Step 3: If matching TxnBegin and TxnEnd entries are found -- indicating that
                  the transaction is committed -- the file system applies (checkpoints) the
                  changes.
              Step 4: Recovery is completed once the entire log is scanned.

              Note, for redo logs, filesytems generally begin scanning the log from the
              **start of the log**.

          --Question: let's revisit crash in these five steps.
            convince yourself that we're good when fs crashes at any moment.

          * What to log? 
          Observe that logging can double the amount of data written to disk.
          To improve performance, Ext3 and 4 allow users to choose what to log.
              * Default is to log only metadata. The idea here is that many people
                are willing to accept data loss/corruption after a crash, but 
                keeping metadata consistent is important. This is because if metadata is
                  inconsistent the FS may become unusable, as the data
                  structures no longer have integrity.
              * Can change settings to force data to be logged, along with metadata.
                This incurs additional overheads, but prevents data loss on crash.


2. intro to networking

 -- What is networking?

    * people (and their computers) sharing information with each other.
    * let's take loading a web page (say, www.google.com) as an example.
    * here, you (and your laptop/desktop) want information that is stored on Google's servers.

    [draw the big picture with clients, google servers, local networks,
      Internet, and routers]

    [demo:
      -- using browser
      -- using nc
    ]

 -- networking abstractions

    * Question: what information is needed to talk to a remote service?

      [This is open-ended question. Possible answers include:
        * a "name" to specify the remote machine
        * which "service" you're talking to?
        * what are you looking for?
        ...
      ]
      Point: there are different types of information

    * Motivation: why abstractions?
      abstracting (hiding) the complexity of the layers below
      will revisit this after the introduction to the layers.

    * the layered network model
      There are several standards, for example, ISO/OSI and TCP/IP.
      We're using the following layers:
        -- application layer,
        -- transport layer,
        -- routing layer,
        -- and link&physical layer
      [show slides of layered model]

  A. The Application Layer

    * End-user/End-host/End-point: Laptops, desktops, phones, etc.  
    * As an end-user, you're interacting with applications (e.g., web browser)
    * Application: a program that communicates with another program
    * The application abstracts away the user; represents only the user's interactions with the network

  B. The Transport Layer

    * How are sockets implemented while preserving contracts to applications, i.e.,
       if I write AB at one end, how do I ensure I get AB in the same order at the
       other end. This is the requirement of reliable, in-order delivery.
    * How do we ensure applications send bytes at the right rate?

  C. The Routing Layer

    * Answer questions like
      * How do bytes find their way from your laptop to Google's servers?
      * How do they know that they have to get to this Access Point, then this
        other router, and so on?

    * Aside: What are routers? This is the beauty of layering. You don't need to
      know about the existence of routers unless you get down into the network
      layer. E.g., when you put up your own web site, there is a lot of networking
      underneath it, but you don't need to know anything more than sockets.
    * Back to routers: These are devices that exist just for connecting other devices (routers
      and end hosts) to each other. They typically don't run any end-user-facing code (e.g.,
      browsers don't run on routers).
    * Also, what is inside a router? And how does a router forward packets so quickly: can
      be as high as a few Tbit/s in the core of the Internet.

  D. The link and physical layers

    * Let's zoom in even further. How do your bytes make it to the first hop, your
      access point? How does an AP mediate accesses from two clients to the same
      network so that it's fair?
    * E.g., when you're all in the same room, you raise your hand before you speak.
    * (or talk in a softer voice, etc.)
    * The air around us is a "shared medium," whether it is for sound waves or RF waves.


3. Packet switching

    * circuit switching
      [show circuit switching for telephone in early days:
        https://www.youtube.com/watch?v=aYkh6BrsPpQ&list=PPSV&ab_channel=May-StringerHouse
      ]

      [continue from here next time]

[Acknowledgment: Anirudh Sivaraman]