ML//Multimodal//CLIP

- Contrastive Language-Image Pre-training (OpenAI, 2021)


Contrastive Language-Image Pre-training (OpenAI, 2021)

Image encoder + text encoder trained to map matching pairs to nearby points in shared space.

Zero-shot classification: "a photo of a cat" matches cat images with no cat-specific training.

The bridge between vision and language. Powers Stable Diffusion's text conditioning.