ML//RAG
Retrieval-augmented generation — before answering, retrieve relevant documents and stuff them into the context window alongside the question.
Retrieval-augmented generation — before answering, retrieve relevant documents and stuff them into the context window alongside the question.
Sources: web search, databases, PDFs — retrieved from a vector database using cosine similarity
Powered by sentence transformers: BERT fine-tuned with contrastive learning so that [CLS] captures semantic similarity instead of just predicting masks.
GPT doesn't use [CLS] — it takes the last token as the embedding (optimized via causal masking to carry full meaning)
Training is expensive and slow — RAG is cheap, instant, pure context window manipulation.
The dark side: if the retrieved documents are garbage, the model will confidently synthesize garbage — RAG is only as smart as its search engine.