Week 8.a
CS7670
10/24 2022
https://naizhengtan.github.io/22fall/

1. revisit Decima
2. the problem
3. LSched
---

[draw Fig4 and Fig5 on board]

1. Decima

  -- the problem
     a job is a DAG:
      -- nodes are execution stages
      -- a stage is an operation that runs on many shards
      -- a task is an operation runs on one shard
      -- edges are data flow

  -- GNN embeddings

    -- inputs to each node (or x_v^i):
       (1) #tasks remaining in the stage
       (2) avg task duration
       (3) #executor in the current node
       (4) available #executors
       (5) if available executors are local to the job

    -- per-node embedding:
        (G_i, x_v^i) -> e_v^i

    -- per-job embedding (for job i)
        {\forall v \in G_i, (x_v^i, e_v^i)} -> y^i

    -- global embedding
        {all y^i} -> z

  -- policy network
       inputs: embedings
       outputs: <v, l>

  -- RL to train NNs

  Q: what're the difference between Decima and LSched?
     two main questions:
     Q1: what are the differences in problem?
     Q2: what are the differences in solution?

2. Database query planning

  Q1: what are the differences in problem?

  -- the delta problem
     a SQL query is a DAG
      -- nodes are operators
      -- an operator can use multiple threads
      -- edges are data flow
      -- **some adjacent operators can be pipelined**

  -- pipelining operators

       select A,B from T1 join T2 on T1.id=T2.id

    -- no pipeline:
      (1) create T1.join(T2) as a new table
      (2) run select on the table

    -- pipelining:

     T1 -+
         +-> [join] --> [select] --> output
     T2 -+

    -- two benefits (at least):
       (1) no materializing of table T1.join(T2)
       (2) can run in streaming fashion

  -- Q: why Decima ignore pipelining?
      Spark is a distributed system;
      quickstep is a single node database.


3. LSched

  Q2: what are the differences in solution?

  [ask a student to read the five contributions in page 2]

  Q: which contributions do you think significantly differ from Decima?
    -- inputs
    -- architecture

  [draw fig3]

  Q: read fig3: what are these operations?
     what do O-TY, O-CON, O_IN mean?

  -- features:
    operator features
    edge features
    query features

  Q: compared with Decima features, do you think LSched features are
     better or worse?

  DISCUSSION: contribution of a NN4Sys paper: feature engineering?


  Q: read fig3: why join a table twice?

    Q: what is a join?
       quick example:
       -- Table [student]: id, name, address id,
       -- Table [location]: id, address

       print student's address

    Q: during covid, add home address

      print both address and home address

  [fig4]

  Read Fig4.

  For GCN:
  Q: how to calculate "o3" and "o4"?

     O'_embd = weights * O_embd + sumb(Child_embd)

  Q: two limitations of GCN?
    -- L1: over-smoothing problem
    -- L2: no differentiation of child nodes

  Q: how do they address these limitations?

  For TCN (tree convolution network):
  Q: how to calculate "o3" and "o4"?

     O'_embd = weights_p * O_embd + weights_left * O_left 
                                  + weights_right * O_right

  Read Fig5.

  For TCN plus edge weights:
  Q: how to calculate "o3" and "o4"?

     O'_embd = weights_p * O_embd 
                 + weights_left * O_left 
                 + weights_right * O_right
                 + weights_edge_right * E_right
                 + weights_edge_left * E_left

  For GAT (graph attention network):
   the add pair-wise attention scores

  -- LSched overall design:

  scheduling agent
   |
   +- encoder:
   |    inputs: DAG plans of operators, environment status
   |    output: embedding representation
   |    network: TCN + GAT
   |
   +- predictors:
        inputs: embeddings
        ouputs: (1) which op to schedule, (2) #threads
        network: NN

  -- Q: when to trigger the predictor?
      baselines:
      -- make all scheduling decisions at the same time
      -- make tiny decision every move


  DISUCSSION: LSched vs. Decima