Schedule

  • This schedule is subject to change over the course of the semester.
  • Readings are to be completed before class.

Guest speaker schedule

Week 1

Friday (09/05)
Lecture Introduction
Attention Is All You Need

Week 2

Tue (09/09)
Lecture Training I
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Friday (09/12)
Lecture Training II
 

Week 3

Tue (09/16)
Lecture Serving I
vLLM: Efficient Memory Management for Large Language Model Serving with PagedAttention
Friday (09/19)
Lecture Serving II
 

Week 4

Tue (09/23)
Paper reading
RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation
(optional) HybridFlow: A Flexible and Efficient RLHF Framework
Friday (09/26)
Guest lecture Quanlu Zhang (Infinigence)
RLinf: Reinforcement Learning Infrastructure for Agentic AI

Week 5

Tue (09/30)
lecture Optimization I
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Friday (10/03)
lecture Optimization II
 

Week 6

Tue (10/07)
paper reading Training
Understanding Stragglers in Large Model Training Using What-if Analysis (OSDI’25)
Friday (10/10)
guest lecture Jinkun Lin (Cornell)
Understanding Stragglers in Large Model Training Using What-if Analysis

Week 7

Tue (10/14)
paper reading Fault tolerance
GEMINI: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints (SOSP’23)
Friday (10/17)
guest lecture Zhuang Wang (AWS)
GEMINI: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints

Week 8

Tue (10/21)
paper reading KV Cache
CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving (SIGCOMM’24)
Friday (10/24)
guest lecture Yuhan Liu (UChicago)
A Case for the KV Cache Layer: Enabling the Next Phase of Fast Distributed LLM Serving

Week 9

Tue (10/28)
paper reading LLM Verification
TrainVerify: Equivalence-Based Verification for Distributed LLM Training (SOSP’25)
Friday (10/31)
guest lecture Yunchi Lu (UMich)
TrainVerify: Equivalence-Based Verification for Distributed LLM Training

Week 10

Tue (11/04)
paper reading Communication
An Extensible Software Transport Layer for GPU Networking
Friday (11/07)
guest lecture Yang Zhou (UC Davis)
UCCL: An Extensible Software Transport Layer for GPU Networking

Week 11

Week 11 (Veterans Day)

Tue (11/11)
Veterans Day no class
 
Friday (11/14)
guest lecture Yanghua Peng (ByteDance)
Large-Scale Multimodal LLM Training in Production

Week 12

Tue (11/18)
paper reading Distributed training
TorchTitan: One-stop PyTorch native solution for production ready LLM pretraining (ICLR’25)
Friday (11/21)
guest lecture Chien-Chin Huang (Meta)
TorchTitan: a PyTorch Native Platform for Training Foundation Models

Week 13

Tue (11/25)
guest lecture Kaichao You (UCB)
vLLM: Easy, Fast, and Cheap LLM Serving for Everyone
Friday (11/28)
fall break no class