Week 8.a CS7670 10/24 2022 https://naizhengtan.github.io/22fall/ 1. revisit Decima 2. the problem 3. LSched --- [draw Fig4 and Fig5 on board] 1. Decima -- the problem a job is a DAG: -- nodes are execution stages -- a stage is an operation that runs on many shards -- a task is an operation runs on one shard -- edges are data flow -- GNN embeddings -- inputs to each node (or x_v^i): (1) #tasks remaining in the stage (2) avg task duration (3) #executor in the current node (4) available #executors (5) if available executors are local to the job -- per-node embedding: (G_i, x_v^i) -> e_v^i -- per-job embedding (for job i) {\forall v \in G_i, (x_v^i, e_v^i)} -> y^i -- global embedding {all y^i} -> z -- policy network inputs: embedings outputs: -- RL to train NNs Q: what're the difference between Decima and LSched? two main questions: Q1: what are the differences in problem? Q2: what are the differences in solution? 2. Database query planning Q1: what are the differences in problem? -- the delta problem a SQL query is a DAG -- nodes are operators -- an operator can use multiple threads -- edges are data flow -- **some adjacent operators can be pipelined** -- pipelining operators select A,B from T1 join T2 on T1.id=T2.id -- no pipeline: (1) create T1.join(T2) as a new table (2) run select on the table -- pipelining: T1 -+ +-> [join] --> [select] --> output T2 -+ -- two benefits (at least): (1) no materializing of table T1.join(T2) (2) can run in streaming fashion -- Q: why Decima ignore pipelining? Spark is a distributed system; quickstep is a single node database. 3. LSched Q2: what are the differences in solution? [ask a student to read the five contributions in page 2] Q: which contributions do you think significantly differ from Decima? -- inputs -- architecture [draw fig3] Q: read fig3: what are these operations? what do O-TY, O-CON, O_IN mean? -- features: operator features edge features query features Q: compared with Decima features, do you think LSched features are better or worse? DISCUSSION: contribution of a NN4Sys paper: feature engineering? Q: read fig3: why join a table twice? Q: what is a join? quick example: -- Table [student]: id, name, address id, -- Table [location]: id, address print student's address Q: during covid, add home address print both address and home address [fig4] Read Fig4. For GCN: Q: how to calculate "o3" and "o4"? O'_embd = weights * O_embd + sumb(Child_embd) Q: two limitations of GCN? -- L1: over-smoothing problem -- L2: no differentiation of child nodes Q: how do they address these limitations? For TCN (tree convolution network): Q: how to calculate "o3" and "o4"? O'_embd = weights_p * O_embd + weights_left * O_left + weights_right * O_right Read Fig5. For TCN plus edge weights: Q: how to calculate "o3" and "o4"? O'_embd = weights_p * O_embd + weights_left * O_left + weights_right * O_right + weights_edge_right * E_right + weights_edge_left * E_left For GAT (graph attention network): the add pair-wise attention scores -- LSched overall design: scheduling agent | +- encoder: | inputs: DAG plans of operators, environment status | output: embedding representation | network: TCN + GAT | +- predictors: inputs: embeddings ouputs: (1) which op to schedule, (2) #threads network: NN -- Q: when to trigger the predictor? baselines: -- make all scheduling decisions at the same time -- make tiny decision every move DISUCSSION: LSched vs. Decima