ML//model//BERT//downstream layer

A simple linear layer on top of BERT's output that converts the embedding vector into task-specific predictions.


A simple linear layer on top of BERT's output that converts the embedding vector into task-specific predictions.

Always found after an encoder — takes the pre-trained representation and adapts it.

Examples: 2 classes for sentiment analysis, similarity score for RAG, NER tags per token, entailment classification.

"Downstream" = the task-specific part that comes after the general-purpose encoder.

The [CLS] token's vector is typically the input to downstream layers for sentence-level tasks.