Core idea is think about backward pass into two flows, one to compute grad wrt to parameters, and one to compute grad wrt to output of last layer, schedule so that you are always working instead of waiting (bubble).
Listen to the Episode
Related Links
The (AI) Team
- Alex Askwell: Our curious and knowledgeable moderator, always ready with the right questions to guide our exploration.
- Dr. Paige Turner: Our lead researcher and paper expert, diving deep into the methods and results.
- Prof. Wyd Spectrum: Our field expert, providing broader context and critical insights.