ML//Multimodal//BLIP
- Bootstrapping Language-Image Pre-training (Salesforce). BLIP (2022) and BLIP-2 (2023)
Bootstrapping Language-Image Pre-training (Salesforce). BLIP (2022) and BLIP-2 (2023)
BLIP-2 innovation: Q-Former — a lightweight querying transformer bridging a frozen vision encoder to a frozen LLM.
No need to fine-tune the vision or language model — only the Q-Former trains. Extremely parameter-efficient.
Superseded CLIP for many tasks. SigLIP (Google) and PaliGemma are related successors.