Week 7.b
CS7670
10/19 2022
https://naizhengtan.github.io/22fall/

1. Linux scheduling problem
2. this paper
3. Decima -> Linux scheduling?
---


1. Linux scheduling problem

  - the problem:
     given a number threads, a number of cores,
     periodically (a timer interrupt)
     the kernel decides to schedule which thread on which core.

     [draw a figure about the problem]

  - metrics and considerations:
    -- fairness: if threads are given fair amount of CPU time
    -- priority: some threads are more important than others
    -- locality: CPU cache
    -- affinity: a group of threads working on the same job
                 (gang scheduling)
    -- resource: IO bound vs. CPU bound vs. memory bound
    -- NUMA
    ...

  - NUMA: none-uniform memory access
    [see handout
     check out:
     https://www.amd.com/system/files/2018-03/AMD-Optimizes-EPYC-Memory-With-NUMA.pdf
    ]

  - what Linux does today---CFS
    -- CFS: completely fair scheduler
    -- per-core runqueue
       -- a red-black tree sorted according to thread virtual runtime
       -- for fairness: choose the least run thread

    Q: how Linux address priorities?
       by setting virtual runtime differently for different threads

    -- more details:
      -- hierarchical of trees (we will ignore)

  - CFS load balancing

    -- idea: work stealing
       core A,
          "I'm idle.
          I will steal a job (which one?) from another core (which core?)"

    -- implementation

    timer_handler()        // periodically called
      +-> scheduler_tick()
        ...                         // update states
        +-> trigger_load_balance()  // check if needs load balancing
            +-> trigger a soft-interrupt

    handler()
      +-> run_rebalance_domains()
          +-> load_balance()
              +-> detach_tasks()
                  +-> can_migrate_task()  // ** what this paper's ML model learns **
              +-> attack_tasks()


2. this paper

   Q: what's the goal of this paper?

   Q: what network architecture they're using?
      fc: 15-10-1

   Q: what are the elements in the input tensor (of size 15)?
      9 data field => 15 tensor

   Q: what's the output? migrate or not

   Q: how many operations CPU needs to run for an inference?

      [see handout]

      #ops = 15*10*2 + 10 + 10 // first fc layer
             + 10*1*2 + 1      // second fc layer
           = 341

   Q: Why authors want to use "fixed-point" implementation?

   Q: How did they implement the "fixed-point" arithmetic operations?
      32bits: 21 (integer) + 11 (fractional number)

      -- why \epsilon is 0.000488281?

      -- how does "add" works?

      -- how does "multiply" work?

      [see handout]


3. apply Decima to Linux scheduling

   -- coming back to the Linux scheduling problem (instead of migration problem)

   -- can we apply RL + GNN to this problem?

   -- what a naive clone of Decima system to this will look like?

   -- what's the major challenges of that?

   DISCUSSION: for scheduling, RL vs. Supervised learning
      "RL is well-suited to learning scheduling policies." from Decima paper