ML//diffusion model//DiT
- Diffusion Transformer: replace the U-Net backbone in diffusion models with a standard transformer
Diffusion Transformer: replace the U-Net backbone in diffusion models with a standard transformer
Sora, Flux, SD3 all use DiT variants. Transformers scale better than U-Nets at large model sizes.
Patch the image into tokens, apply transformer blocks, unpatch to reconstruct.
Architecture convergence: image generation now uses the same backbone as language models.