ML//RAG

2026-02-15

Retrieval-augmented generation: before answering, retrieve relevant documents and stuff them into the context window alongside the question.

Retrieval-augmented generation: before answering, retrieve relevant documents and stuff them into the context window alongside the question.

Sources: web search, databases, PDFs (retrieved from a vector database using cosine similarity)

Powered by sentence transformers: BERT fine-tuned with contrastive learning so that [CLS] captures semantic similarity instead of just predicting masks.

GPT doesn't use [CLS]: it takes the last token as the embedding (optimized via causal masking to carry full meaning)

Training is expensive and slow. RAG is cheap, instant, pure context window manipulation.

The dark side: if the retrieved documents are garbage, the model will confidently synthesize garbage. RAG is only as smart as its search engine.