ML//Training//contrastive learning
Train with pairs and triplets: (query, positive match, negative match). Push positive pairs closer in vector space, pull negative pairs apart.
Train with pairs and triplets: (query, positive match, negative match). Push positive pairs closer in vector space, pull negative pairs apart.
Example for RAG: "Capital of France?" + "Paris is the capital" → high cosine similarity. "Capital of France?" + "Tortilla recipe" → low similarity.
The loss function that makes sentence transformers work — transforms BERT's [CLS] vector from "predicts masked tokens" to "captures semantic similarity".
BERT base without contrastive fine-tuning does semantic search mediocremente — the [CLS] wasn't optimized for similarity.
Also used in CLIP (text-image pairs), self-supervised learning, and representation learning broadly.