Zero Bubble Pipeline Parallelism

Core idea is think about backward pass into two flows, one to compute grad wrt to parameters, and one to compute grad wrt to output of last layer, schedule so that you are always working instead of waiting (bubble).

Listen on your favorite platforms

Listen to the Episode

The (AI) Team

Alex Askwell: Our curious and knowledgeable moderator, always ready with the right questions to guide our exploration.
Dr. Paige Turner: Our lead researcher and paper expert, diving deep into the methods and results.
Prof. Wyd Spectrum: Our field expert, providing broader context and critical insights.

Listen on your favorite platforms

Listen to the Episode

Related Links

The (AI) Team