Models tell you what to discard

Systems and Performance
Machine Learning
Optimization
Published

July 18, 2024

This paper introduces FastGen, a novel method that uses lightweight model profiling and adaptive key-value caching to significantly reduce memory footprint without noticeable quality loss.

Listen to the Episode

The (AI) Team

  • Alex Askwell: Our curious and knowledgeable moderator, always ready with the right questions to guide our exploration.
  • Dr. Paige Turner: Our lead researcher and paper expert, diving deep into the methods and results.
  • Prof. Wyd Spectrum: Our field expert, providing broader context and critical insights.

Listen on your favorite platforms

Spotify Apple Podcasts YouTube RSS Feed