ML//diffusion model//Imagen

2026-03-07

- Google's text-to-image model family. Imagen (2022), Imagen 2 (2023), Imagen 3 (2024)

Google's text-to-image model family. Imagen (2022), Imagen 2 (2023), Imagen 3 (2024)

Architecture: T5-XXL text encoder → cascading diffusion upsampling (64 → 256 → 1024 resolution)

Key finding: scaling the text encoder matters more than scaling the image model. Language understanding drives image quality.

Less open than Stable Diffusion but competitive. Google's counterpart to DALL-E