TiTok: A Transformer-based 1D Tokenization Approach for Image Generation

Generative Models
Computer Vision
Transformers
Published

July 18, 2024

TiTok introduces a novel 1D tokenization method for image generation, enabling the representation of images with significantly fewer tokens while maintaining or surpassing the performance of existing 2D grid-based methods. The approach leverages a Vision Transformer architecture, two-stage training with proxy codes, and achieves remarkable speedup in training and inference. The research opens up new possibilities for efficient and high-quality image generation, with implications for various applications in computer vision and beyond.

Listen on your favorite platforms

Spotify Apple Podcasts YouTube RSS Feed

Listen to the Episode

The (AI) Team

  • Alex Askwell: Our curious and knowledgeable moderator, always ready with the right questions to guide our exploration.
  • Dr. Paige Turner: Our lead researcher and paper expert, diving deep into the methods and results.
  • Prof. Wyd Spectrum: Our field expert, providing broader context and critical insights.