ML//diffusion model//DiT

- Diffusion Transformer: replace the U-Net backbone in diffusion models with a standard transformer


Diffusion Transformer: replace the U-Net backbone in diffusion models with a standard transformer

Sora, Flux, SD3 all use DiT variants. Transformers scale better than U-Nets at large model sizes.

Patch the image into tokens, apply transformer blocks, unpatch to reconstruct.

Architecture convergence: image generation now uses the same backbone as language models.