ML//RAG

Retrieval-augmented generation — before answering, retrieve relevant documents and stuff them into the context window alongside the question.


Retrieval-augmented generation — before answering, retrieve relevant documents and stuff them into the context window alongside the question.

Sources: web search, databases, PDFs — retrieved from a vector database using cosine similarity

Powered by sentence transformers: BERT fine-tuned with contrastive learning so that [CLS] captures semantic similarity instead of just predicting masks.

GPT doesn't use [CLS] — it takes the last token as the embedding (optimized via causal masking to carry full meaning)

Training is expensive and slow — RAG is cheap, instant, pure context window manipulation.

The dark side: if the retrieved documents are garbage, the model will confidently synthesize garbage — RAG is only as smart as its search engine.