Week 14.b
CS 5600
04/12 2023

1. Journaling (continued)
2. authentication
3. access control
--------------------------------------

1. Journaling (continued)

  [draw journaling]

  - (recall) Redo logging

   * Basic operations:

       Step 1: planning
       Step 2: begin txn
       Step 3: journal write
       Step 4: commit txn
       Step 5: checkpointing

   * recovery:

       Step 1: scanning the log
       Step 2: find a TxnBegin entry:
               searches for the corresponding TxnEnd
       Step 3: if matching TxnEnd, do the transaction
               by applying the changes (possibly again)
       Step 4: done when the entire log is scanned


2. Authentication

  [security is a broad topic.
   given time left, we will oly talk about two topics:
   authentication and access control]

  Authentication is the process of verifying one's identity.
    ("who are you?")

  Approach 1: password

    --more broadly, this is based on something that the user **knows**.
      (other examples are security questions, PIN, ...)

    Passwords were originally deployed in the 1960s for access to time-shared
    mainframe computers.

    --plaintext passwords stored in files

    --attack: read the file

    --hashed passwords (assumption: you cannot revert a hash function)

    --attack: rainbow table attack
      --pre-compute hashes for all possible strings
      --find the users' password hashes in the rainbow table
      --return the plaintext password

    --hashed and salted password (in 1979, Robert Morris and Ken Thompson)
      --pair a password wit ha "salt" (a random number, like 128bits)
      --store the salted hash [=hash(password + salt)]
      --the password file contains: salted-hash and salt

    --Question: why rainbow table attack is not effective in this case?
      [answer: because a comprehensive rainbow table would be 2^128 times
      larger than the original rainbow table!]

    However, here is the password status quo:
      --Empirical estimates suggest that over 40% of sites store passwords unhashed
      --plaintext passwords: Rockyou and Tianya
      --hashed but unsalted: LinkedIn
      --improperly hashed: Gawker

    [J. Bonneau and S. Preibusch. The password thicket: technical and market
    failures in human authentication on the web. WEIS 2010.]


  Approach 2: based on what you have (like cell phone)

    --idea: something the user has can prove identity,
      for example, ID card, security token, smart card, ...

    --NEU's two factor authentication

  Approach 3: authentication by what you are

   --idea: unique biology features or behaviors can identify one person,
     for example, fingerprints, DNA, Apple face id, ...

   --many charming ideas! many very cool proposals!

   --as an example:

     --"rubber hose attack"
       --torture users to get their passwords (or any secret in general)

     --Question: can we be able to defend this attack at all?

     --idea: plant a secret directly into human brain without having any
     conscious knowledge of the secret

     --concrete approach:
       --playing a game (similar to typing practice)
       --a sequence of chars appears often
       --people will have muscle memory of the char-sequence
       --without explicitly learn what is the string


    [Bojinov, Hristo, Daniel Sanchez, Paul Reber, Dan Boneh, and Patrick Lincoln.
    "Neuroscience meets cryptography: designing crypto primitives secure against
    rubber hose attacks." USENIX Security 2012.]

   --but sometimes hard to be available to everyone
     (for example, requiring special hardware support)


3. Access control (Unix)

  The problem of access control:

    A subject accesses an object. Should OS allow or deny?

    subjects: users, processes, or any other actors

    objects: files, devices, or any other resources

    (different abstractions will give you different subjects/objects).

    There are two common approaches:
      --access control list (ACL)
      --capability-based

    Both are used in today's OSes.

    At a high level:

    ACL usually associates with objects.
      When an subject accesses an object,
      system checks if subject is in the obj's access list

    Capability usually associates with subjects.
      When an subject accesses an object,
      system checks if subject has the capability to access the obj.


 A. Unix's UIDs/GIDs

    * UIDs and GIDs

    UIDs are historically unsigned 16-bit integers (0-65535).

    UNIX keeps the mapping between usernames and UIDs in the file /etc/passwd.
      [try "$ cat /etc/passwd"]

    see your UID by:
      $ id <username>
      and you can get your username by "$ whoami"

    special user: uid 0, called root, treated specially by
    the kernel as administrator

        uid 0 has all permissions: can read any file, do anything

        certain ops only root can do:
        --binding to ports less than 1024
        --change current process's user or group ID
        --mount or unmount file systems
        --opening raw sockets (so you can do something like ping remote machines,
        for example)
        --set clock
        --halt or reboot machine
        --change UIDs (so login program needs to run as root)

    GIDs are also 16-bit integers.
    A group represents a group of users.
      [see all groups by "$ cat /etc/group"]

    * processes have a user ID and one or more group IDs

      when a process runs, it is associated with UID/GIDs
        [see them by "$ ps -l"]

    * files and directories are access-controlled.

        you saw this in Lab5 (recall "mode" in inode)

        system stores with each file who owns it.

        where's the info stored? (answer: inode.)

    --Unix login (classic version)

      1. A privileged login process asks for the username,
        which is echoed on the screen.

      2. The login process asks for the password,
        which is not echoed.

      3. The login process checks the username and password in the password file.
        -- check salted-hashed password

      4. If succeed, login forks, exec shell with the user's id, and switches to
      the home directory of the user.

      Question:
      (a) why password has no echo on screen?
      (b) if failed, should the login tell users if the username or the password is incorrect?
      (c) if failed, should it takes longer to reject a wrong username?


  B. permissions

    * "rwx" for files:
      for example, file "f.txt"
       'r': read permission (cat f.txt)
       'w': write permission (echo "xyz" >> f.txt)
       'x': executable (./f.txt)

    * "rwx" for dirs
      for example, dir "a/"
       'r': read the file names (ls a/)
       'w': create and rename files (touch a/newfile)
       'x': "search" in the dir   (cat a/b when a/ has "--x")
            "walk through" into it (cd a/)

       ['x' is confusing.
        if you know fs implement, then it is easy:
         'x' gives access to inodes in this dir. ]

    More about dirs:

    -- if 'r--', ls will list all files but with an error.
       Why? because ls also try to fetch the metadata of files
       how to fix? add 'x' permission

    -- if '-w-', will "touch a/file" work?
       No. Why?
       "touch" will need to search inodes in "a/", which requires 'x'.

    -- if '-x-', will "cd a/" work?
       Yes.

    * "rwxrwxrwx"

      Recall Lab5 inode's "mode" attribute:

        |<-- user ->|<- group ->|<- world ->|
        +---+---+---+---+---+---+---+---+---+
        | R | W | X | R | W | X | R | W | X |
        +---+---+---+---+---+---+---+---+---+


    * check and change

     (a) check:
      you can see permission of a file/dir using "ls -l":
       file: "-rw-r--r--"
       dir:  "drwxr-xr-x"

     (b) change:
       "$ chmod 0777 file"
         =>
       file will have "rwxrwxrwx"


  C. pull everything together

    [draw figure of a Unix system with UIDs]

      [running processes]
      User (uid=1000) --> login (uid=0)
                            | (check username/passwd)
                            |
                            +----> shell (uid=1000)
                                     |
                                     +--> vim (uid=1000)
                                     +--> gcc (uid=1000)
                                     +--> chrome (uid=1000)

      [fs]
        / (owner:uid=0)
        |
        +-->home (owner:uid=0)
             |
             +--> user (owner: uid=1000)
                   |
                   +-> ...

      note: devices are abstracted as files in Unix, so they are
      access-controlled in the same manner.


    Here we have:
      - authentication (prev topic)
      - shell (lab2)
      - process (parent-child, memory layout, and virtual memory)
      - file systems (lab5)
      - access control
    More can fit into this picture:
      - file descriptors
      - scheduling
      - concurrency
      - devices (seen as "file") and I/O