ML//model//BERT//downstream layer
A simple linear layer on top of BERT's output that converts the embedding vector into task-specific predictions.
A simple linear layer on top of BERT's output that converts the embedding vector into task-specific predictions.
Always found after an encoder — takes the pre-trained representation and adapts it.
Examples: 2 classes for sentiment analysis, similarity score for RAG, NER tags per token, entailment classification.
"Downstream" = the task-specific part that comes after the general-purpose encoder.
The [CLS] token's vector is typically the input to downstream layers for sentence-level tasks.