ML//RAG//ColBERT

- Late-interaction retrieval: keep per-token embeddings instead of compressing into single vectors.


Late-interaction retrieval: keep per-token embeddings instead of compressing into single vectors.

MaxSim: each query token matches against its best document token, then sum. More expressive than cosine similarity.

ColPali/ColQwen extend to vision — retrieve documents by visual layout, not just text.

Bridge between cheap bi-encoders and expensive cross-encoder rerankers