Project1: LLM Training Speedrun
What is speedrun?
A speedrun is a practice where someone attempts to complete a task, system, or process as fast as possible. See Super Mario Speedrun (Champion finishes in 4m54s; no kidding!).
In this project, we will target Marin Speedrun, a LLM training speedrun. Quote from Marin Speedrun official website:
Speedrun is a community-driven initiative by the Marin project to track and optimize the training efficiency of large language models. Have a new architecture or training procedure that you think is more efficient? Participate in the Marin speedrun competition (inspired by the nanogpt speedrun), pick your compute budget, and create the fastest method to train a model to a certain quality!
Current Pareto frontier
Here is the current Pareto frontier (09/08/2025) between training efficiency and model quality:
Metrics (X/Y-axis):
- Hardware FLOP Cost (efficiency): Floating point operations the hardware used to train the model could have performed in the time the model was trained; lower values are better.
- C4-EN BPB (Bits per Byte) (quality): Bits-per-byte on the validation portion of the c4-en (English portion of the C4) dataset; lower values are better.
Goals
- Gain hands-on experience with LLM training
- Advance the Pareto frontier in the Marin Speedrun
- Write an opinion reflecting on LLM training and participation in the speedrun
Who should care?
- those interested in LLM training
- do research in advancing training algorithms (e.g., new optimizers)
- do research in working on system-level training efficiency
Milestones to run the project
- form a group (ideally 2–4 people)
- build infrastructure for the speedrun (machines, tools, TODO tracking, shared documentation)
- submit one successful result to the Marin project (as a starting point)
- propose and develop optimizations
- contribute a Pareto frontier result to the Marin project
- write an opinion