Week 1 CS7670 09/05 2025 https://naizhengtan.github.io/25fall/ □ 0. in-class policies □ 1. Intro to this course □ 2. Assumptions □ 3. A bird's-eye view of LLM systems research □ 4. Attention □ 5. Mechanics and admin ---- Admin: for the rest of the class, starting at 10AM 0. in-class policies -- no laptop [why? context switch & interruption are expensive] -- chocolate [why? willpower and energy] -- lottery explain lottery system [some advantages of lottery system] Q: why do you want to take the course? [will circle back to this question] 1. Intro to this course - What is a seminar? -- [ask students] -- SOTA research vs. knowledge breadth -- difference between education and research * by your own interests * Northeastern is well known for educating students (but not necessarily inspiring) -- you need __your own opinions__ * often people don't have any; why? - they don't care - they don't know enough - they don't think they can or worthwhile - lacking the motivation to explore freely about what they're interested * ironically: really easy to find contents on Internet, with help of AI - fewer people take risks (why?) - philosophy of the class 1) low-stress, high-reward What does that mean? many discussions and require you to give opinions 2) "no pain, no gain" still applies * efforts spent -> how much you gain/grow * the course is designed to have a WIDE range on how much you might get * again, to gain depth, it requires time 3) the course covers the foundations * this course provide you pointers, or path, or actionable items for you to explore * but, it is you, who would decide if you want to, or have the will power to, get to the wild * overall, we have 14 classes (45min each) of lecturing 4) show by examples---the system approach to connect concepts with implementation - examples are toy; you can... ...either go deep to bloody details... ...or go cutting edge, with recent (possibly advanced) optimizations 5) opinions matter - opinions, including strong or critical ones, are welcome; we encourage them - What to expect: -- reading one paper per week -- expect to ask and answer questions - Q: Why do you want to take this course? You'll be glad if you... * care about what goes on under the hood * like systems and infrastructure * care about high performance - How will we study LLM systems? -- read basic and fundamental papers -- read cutting-edge papers -- interact with frontier researchers, both from academia and industry -- a lot of discussion to form your opinions - course goals: -- breadth: overview of LLM systems (my job) -- depth in three ways (your job): 1. **as an end-user**: play with many existing services and familiar with them 2. **build LLM from scratch**: know the basics from the code level 3. **understand state-of-the-art**: read most recent papers, know all related work of relevant papers, understand the trade-offs made by cutting edge papers 2. Assumptions a. know core concepts of LLMs [a quick survey here] b. feel comfortable with Python and PyTorch c. can read systems papers 3. Overview of LLM systems research - Q: What are LLM systems? give one example of an LLM system [ask students] -- is PyTorch an LLM system? -- is CuBLAS an LLM system? -- is vLLM an LLM system? * LLM under the lens of systems: computer systems > data systems > ML systems > LLM systems distributed systems (fault tolerance), HPC +-> training concurrent systems, GPU programming +-> serving * LLM under the lens of ML: AI > ML > DL (FFNN, CNN, RNN, Transformer) +-> LLM - What are the ultimate goals of LLM systems research? [ask students] (my opinion) three major topics in this area: -- efficient (in GPU cycles) and reliable (in final outcome) training -- cheap (in dollar) and fast (in well-clock latency) serving -- understanding various trade-offs for example: * trade-off between throughput and latency (batching) * trade-off between memory and computation (recomputing gradients in training and KV Cache in serving) * trading correctness/accuracy for runtime performance (?!) Aside: if an optimization is strictly better than all alternatives without requiring any trade-off, it constitutes a fundamental advancement. We have observed such cases in LLM systems, but far less often in classic computer systems. 4. Attention -- self-attention * dot-product self-attention * Matrix-based self-attention -- questions on hotcrp [see slides] -- recent advancements at model level ** pre-norm ** Rotary Positional Embeddings (RoPE) ** RSMNorm, SwiGLU ** Recent Attentions ** Grouped-Query Attention ** Sliding window attention ** MoE 5. Mechanics and admin a. communication us-to-you: --homepage, announcements: check this regularly --your NEU email (seldom) you-to-us: --HotCRP: where you review papers and form discussions --my email for admin/sensitive things b. components of the course: --my lectures --guest lectures --a final writeup: opinion (can team up) c. lectures [take a look at the schedule] * my lectures --paper-oriented; must submit your review before the class --attending: no roll call, but...will randomly pick students to answer questions (lottery) --notes will be published, but will be hard to understand if you miss the lecture --asking questions in class is encouraged * guest lectures: --attending is required --asking questions is required (later in policy) d. final writeup: -- an opinion paper -- encouraging whoever share the same opinion to team up e. final grade -- policy: participation (80%) * paper review (10 papers): 40% * attending in person: 20% (-2% for each absent) * questions to guest speakers (2 questions): 20% -- writeup (20%) * evaluate by quality