Week 1
CS7670
09/05 2025
https://naizhengtan.github.io/25fall/

□ 0. in-class policies
□ 1. Intro to this course
□ 2. Assumptions
□ 3. A bird's-eye view of LLM systems research
□ 4. Attention
□ 5. Mechanics and admin
----


Admin: for the rest of the class, starting at 10AM


0. in-class policies
  -- no laptop
     [why? context switch & interruption are expensive]
  -- chocolate
     [why? willpower and energy]
  -- lottery
     explain lottery system
     [some advantages of lottery system]

  Q: why do you want to take the course?
    [will circle back to this question]

1. Intro to this course

  - What is a seminar?
    -- [ask students]
    -- SOTA research vs. knowledge breadth
    -- difference between education and research
       * by your own interests
       * Northeastern is well known for educating students
         (but not necessarily inspiring)
    -- you need __your own opinions__
        * often people don't have any; why?
            - they don't care
            - they don't know enough
            - they don't think they can or worthwhile
            - lacking the motivation to explore freely about what they're interested
        * ironically: really easy to find contents on Internet, with help of AI
            - fewer people take risks (why?)

  - philosophy of the class

    1) low-stress, high-reward
       What does that mean?
       many discussions and require you to give opinions

    2) "no pain, no gain" still applies
       * efforts spent -> how much you gain/grow
       * the course is designed to have a WIDE range on how much you might get
       * again, to gain depth, it requires time

    3) the course covers the foundations
       * this course provide you pointers, or path, or actionable items for you to explore
       * but, it is you, who would decide if you want to, or have the will power to, get to the wild
       * overall, we have 14 classes (45min each) of lecturing

    4) show by examples---the system approach to connect concepts with implementation
      - examples are toy; you can...
        ...either go deep to bloody details...
          ...or go cutting edge, with recent (possibly advanced) optimizations

    5) opinions matter
      - opinions, including strong or critical ones, are welcome; we encourage them

  - What to expect:
    -- reading one paper per week
    -- expect to ask and answer questions

  - Q: Why do you want to take this course?

    You'll be glad if you...
       * care about what goes on under the hood
       * like systems and infrastructure
       * care about high performance

  - How will we study LLM systems?
    -- read basic and fundamental papers
    -- read cutting-edge papers
    -- interact with frontier researchers, both from academia and industry
    -- a lot of discussion to form your opinions

  - course goals:
    -- breadth: overview of LLM systems (my job)
    -- depth in three ways (your job):
       1. **as an end-user**: play with many existing services and familiar with them
       2. **build LLM from scratch**: know the basics from the code level
       3. **understand state-of-the-art**: read most recent papers, know all
         related work of relevant papers, understand the trade-offs made by
         cutting edge papers


2. Assumptions

    a. know core concepts of LLMs
       [a quick survey here]

    b. feel comfortable with Python and PyTorch

    c. can read systems papers


3. Overview of LLM systems research

  - Q: What are LLM systems? give one example of an LLM system
    [ask students]
    -- is PyTorch an LLM system?
    -- is CuBLAS an LLM system?
    -- is vLLM an LLM system?

    * LLM under the lens of systems:

      computer systems > data systems > ML systems > LLM systems

      distributed systems (fault tolerance), HPC
        +-> training

      concurrent systems, GPU programming
        +-> serving

    * LLM under the lens of ML:

      AI > ML > DL (FFNN, CNN, RNN, Transformer)
                                      +-> LLM

  - What are the ultimate goals of LLM systems research?
    [ask students]

    (my opinion) three major topics in this area:
     -- efficient (in GPU cycles) and reliable (in final outcome) training
     -- cheap (in dollar) and fast (in well-clock latency) serving
     -- understanding various trade-offs
        for example:
        * trade-off between throughput and latency
          (batching)
        * trade-off between memory and computation
          (recomputing gradients in training and KV Cache in serving)
        * trading correctness/accuracy for runtime performance (?!)

    Aside:
    if an optimization is strictly better than all alternatives without requiring
    any trade-off, it constitutes a fundamental advancement. We have observed
    such cases in LLM systems, but far less often in classic computer systems.

4. Attention

  -- self-attention
    * dot-product self-attention
    * Matrix-based self-attention

  -- questions on hotcrp
     [see slides]

  -- recent advancements at model level
     ** pre-norm
     ** Rotary Positional Embeddings (RoPE)
     ** RSMNorm, SwiGLU
     ** Recent Attentions
        ** Grouped-Query Attention
        ** Sliding window attention
     ** MoE

5. Mechanics and admin

   a. communication

      us-to-you:
        --homepage, announcements: check this regularly
        --your NEU email (seldom)

      you-to-us:
        --HotCRP: where you review papers and form discussions
        --my email for admin/sensitive things

   b. components of the course:
      --my lectures
      --guest lectures
      --a final writeup: opinion (can team up)

   c. lectures
      [take a look at the schedule]

      * my lectures
        --paper-oriented; must submit your review before the class

        --attending: no roll call, but...will randomly pick students
        to answer questions (lottery)

        --notes will be published, but will be hard to understand if
        you miss the lecture

        --asking questions in class is encouraged

      * guest lectures:

        --attending is required

        --asking questions is required (later in policy)

   d. final writeup:
       -- an opinion paper
       -- encouraging whoever share the same opinion to team up

   e. final grade
      -- policy: participation (80%)
        * paper review (10 papers): 40%
        * attending in person: 20% (-2% for each absent)
        * questions to guest speakers (2 questions): 20%
      -- writeup (20%)
        * evaluate by quality