ML//RAG//ColBERT
- Late-interaction retrieval: keep per-token embeddings instead of compressing into single vectors.
Late-interaction retrieval: keep per-token embeddings instead of compressing into single vectors.
MaxSim: each query token matches against its best document token, then sum. More expressive than cosine similarity.
ColPali/ColQwen extend to vision — retrieve documents by visual layout, not just text.
Bridge between cheap bi-encoders and expensive cross-encoder rerankers