Zero Bubble Pipeline Parallelism

Systems and Performance
Deep Learning
Machine Learning
Published

July 8, 2024

Core idea is think about backward pass into two flows, one to compute grad wrt to parameters, and one to compute grad wrt to output of last layer, schedule so that you are always working instead of waiting (bubble).

Listen to the Episode

The (AI) Team

  • Alex Askwell: Our curious and knowledgeable moderator, always ready with the right questions to guide our exploration.
  • Dr. Paige Turner: Our lead researcher and paper expert, diving deep into the methods and results.
  • Prof. Wyd Spectrum: Our field expert, providing broader context and critical insights.

Listen on your favorite platforms

Spotify Apple Podcasts YouTube RSS Feed