ML//Training//contrastive learning

2026-03-08

Train with pairs and triplets: (query, positive match, negative match). Push positive pairs closer in vector space, pull negative pairs apart.

Train with pairs and triplets: (query, positive match, negative match). Push positive pairs closer in vector space, pull negative pairs apart.

Example for RAG: "Capital of France?" + "Paris is the capital" → high cosine similarity. "Capital of France?" + "Tortilla recipe" → low similarity.

The loss function that makes sentence transformers work: transforms BERT's [CLS] vector from "predicts masked tokens" to "captures semantic similarity".

BERT base without contrastive fine-tuning does semantic search mediocremente: the [CLS] wasn't optimized for similarity.

Also used in CLIP (text-image pairs), self-supervised learning, and representation learning broadly.