Project1: LLM Training Speedrun

What is speedrun?

A speedrun is a practice where someone attempts to complete a task, system, or process as fast as possible. See Super Mario Speedrun (Champion finishes in 4m54s; no kidding!).

In this project, we will target Marin Speedrun, a LLM training speedrun. Quote from Marin Speedrun official website:

Speedrun is a community-driven initiative by the Marin project to track and optimize the training efficiency of large language models. Have a new architecture or training procedure that you think is more efficient? Participate in the Marin speedrun competition (inspired by the nanogpt speedrun), pick your compute budget, and create the fastest method to train a model to a certain quality!

Current Pareto frontier

Here is the current Pareto frontier (09/08/2025) between training efficiency and model quality:

Marin Speedrun (09/08/2025)

Metrics (X/Y-axis):

Hardware FLOP Cost (efficiency): Floating point operations the hardware used to train the model could have performed in the time the model was trained; lower values are better.
C4-EN BPB (Bits per Byte) (quality): Bits-per-byte on the validation portion of the c4-en (English portion of the C4) dataset; lower values are better.

Goals

Gain hands-on experience with LLM training
Advance the Pareto frontier in the Marin Speedrun
Write an opinion reflecting on LLM training and participation in the speedrun

Who should care?

those interested in LLM training
do research in advancing training algorithms (e.g., new optimizers)
do research in working on system-level training efficiency

Milestones to run the project

form a group (ideally 2–4 people)
build infrastructure for the speedrun (machines, tools, TODO tracking, shared documentation)
submit one successful result to the Marin project (as a starting point)
propose and develop optimizations
contribute a Pareto frontier result to the Marin project
write an opinion