Week 7.b CS7670 10/19 2022 https://naizhengtan.github.io/22fall/ 1. Linux scheduling problem 2. this paper 3. Decima -> Linux scheduling? --- 1. Linux scheduling problem - the problem: given a number threads, a number of cores, periodically (a timer interrupt) the kernel decides to schedule which thread on which core. [draw a figure about the problem] - metrics and considerations: -- fairness: if threads are given fair amount of CPU time -- priority: some threads are more important than others -- locality: CPU cache -- affinity: a group of threads working on the same job (gang scheduling) -- resource: IO bound vs. CPU bound vs. memory bound -- NUMA ... - NUMA: none-uniform memory access [see handout check out: https://www.amd.com/system/files/2018-03/AMD-Optimizes-EPYC-Memory-With-NUMA.pdf ] - what Linux does today---CFS -- CFS: completely fair scheduler -- per-core runqueue -- a red-black tree sorted according to thread virtual runtime -- for fairness: choose the least run thread Q: how Linux address priorities? by setting virtual runtime differently for different threads -- more details: -- hierarchical of trees (we will ignore) - CFS load balancing -- idea: work stealing core A, "I'm idle. I will steal a job (which one?) from another core (which core?)" -- implementation timer_handler() // periodically called +-> scheduler_tick() ... // update states +-> trigger_load_balance() // check if needs load balancing +-> trigger a soft-interrupt handler() +-> run_rebalance_domains() +-> load_balance() +-> detach_tasks() +-> can_migrate_task() // ** what this paper's ML model learns ** +-> attack_tasks() 2. this paper Q: what's the goal of this paper? Q: what network architecture they're using? fc: 15-10-1 Q: what are the elements in the input tensor (of size 15)? 9 data field => 15 tensor Q: what's the output? migrate or not Q: how many operations CPU needs to run for an inference? [see handout] #ops = 15*10*2 + 10 + 10 // first fc layer + 10*1*2 + 1 // second fc layer = 341 Q: Why authors want to use "fixed-point" implementation? Q: How did they implement the "fixed-point" arithmetic operations? 32bits: 21 (integer) + 11 (fractional number) -- why \epsilon is 0.000488281? -- how does "add" works? -- how does "multiply" work? [see handout] 3. apply Decima to Linux scheduling -- coming back to the Linux scheduling problem (instead of migration problem) -- can we apply RL + GNN to this problem? -- what a naive clone of Decima system to this will look like? -- what's the major challenges of that? DISCUSSION: for scheduling, RL vs. Supervised learning "RL is well-suited to learning scheduling policies." from Decima paper