Week 10.b
CS7670
11/09 2022
https://naizhengtan.github.io/22fall/


1. Cache eviction
2. Cache admission
3. CacheSack
----

Problem: cache admission
   read/write-->[cache]  (if we will keep the data in cache?)
   (if cache is full, then cache eviction problem.)

Another problem: cache eviction
   new item-->[cache]
  Q: if full, where to put the item?
     this question is especially interesting when the cache is full.


1. Cache eviction

  * LRU

  Q: Can you explain LRU?
     What is the cache admission policy for LRU?

  S1: "An idealized LRU is difficult to implement in a flash cache"
  Q: why? how to implement LRU?

    a queue of doubly-linked list
    hash table to point to node

  Q: what's the problem of this implementation on SSD? 
     Why write amplification? (S3.1)

  * FIFO

    Q: how to implement FIFO?
       Why it is good for SSD?

  * CachSack approach: approximate LRU (S4)

    -- a FIFO queue of 1GB chunks
    -- each chunk contains many cache items
    -- an eviction of a chunk will reinsert 28% of hot cache items

2. Cache admission

  * baseline, LARC: lazy adaptive replacement cache
     [if you're interested in LARC, check out:
       "Improving Flash-Based Disk Cache with Lazy Adaptive Replacement"
       https://dl.acm.org/doi/pdf/10.1145/2737832
     ]

     -- idea: admit the item in its second cache miss
        [draw LARC]

     -- Q: motivation? why we want LARC?
        how to implement LARC?

        Two LRU queues: real one and a ghost one

    note: LARC doesn't have to tie with LRU.
          You can have LARC x {FIFO, LRU, approximate LRU}.

  * Q: isn't LARC good enough?
    Why we want to have CachSack? 

    performance penalty
    [draw Fig2]

    Q: how to read Fig2?
       "60% of the data is accessed only once"
       "39% of the data accessed more than once is accessed exactly twice"

  * How to improve LARC? an observation
    Q: how to read Fig3? and what are the takeaways?

  * CacheSack solution

    Q: what is CacheSack's cache admission policy in one sentence?
    a combination of four existing policies tailored for different workloads

    (1) four candidates:
      -- AdmitOnWrite
      -- AdmitOnMiss (default)
      -- AdmitOnSecondMiss (LARC)
        [read the description in S5.2]
        Q: how to understand "if the last access time is not older than the oldest last access
        time of the blocks in the cache"?
      -- NeverAdmit

    (2) pick one policy for one category of traffic

        [read the first paragraph of S5.3]
        this is SUPER CONFUSING!

        Q: why this is a knapsak problem?
        The concrete problem:

          min  Cost1(disk reads) + Cost2(written bytes)
          s.t.  cached data <= total capacity

          disk reads = \sum{ disk reads, \forall policy-category pairs}
          written bytes = \sum{ written bytes,  \forall policy-category pairs}
          cached data = \sum { cache usage, \forall policy-category pairs  }

    (3) How to estimate disk reads, written bytes, and cache usage for each
        policy-category pair?

        strawman: simulate approximate LRU -> LRU model
           -- inputs: a trace of data accesses
           -- outputs: (disk reads, cache usage, bytes written to cache)

        problem: relay on other workloads, and is too slow

        idea: cache retention time
           "The cache retention time is the maximum duration that a block stays in
           the LRU cache without any intervening accesses to it."
           Q: how to understand this?
           Q: what is gonna to affect this cache retention time?

        solution: try 128 different ones

3. Overview

 * "training phase"

   every 5min,
   for each of the 5000 categories,
     deciding what fractions of the four admission policies to use.

   algo:
     for D in 128 different cache retention time:
       for C in 5000 categories:
         for P in the four cache admission policies:
           trace <- the trace of C in the past 5min:
           disk_reads[C,D,P], \
           written_bytes[C,D,P], \
           cached_data[C,D,P] <- simulate(D, C, P)
             // simulate() uses the logic
             //  if (d <= D): cache hit
             //  else if simulate_buffer_cache_hit(C): cache hit
             //  else: cache miss
       policy[D] <- optimiztaion_solving(disk_reads, written_bytes, cached_data)


 * inference phase

   [draw Fig 1]
   data: key->pos->value

   Cache index server:
     index: key->pos (SSD), in DRAM
     ghost cache: key->last access time, in DRAM

   Disk server
     buffer cache: cache values in DRAM


   Went through examples:
     Read(A), cache hit
     Read(A), cache miss
        -- what happens to the client?
        -- what happens to cache index servers and flash servers?
     insert(A)
        -- what happens to the client?
        -- what happens to cache index servers and flash servers?


* DISCUSSION

  * Q: read S3.5. Do you buy this?

  * goal: reduce the amount of hard disk reads
    Q: read footnote 1; if reduce latency is the goal, what will you do?

  * how is CacheSak related to lfs?