ML//model//BERT//next sentence prediction

The pre-training objective that got fired — BERT shipped with it, then RoBERTa proved everyone was better off without it. BERT's second pre-training objective alongside MLM: given sentence pair (A, B), predict if B actually follows A in the original text.


The pre-training objective that got fired — BERT shipped with it, then RoBERTa proved everyone was better off without it. BERT's second pre-training objective alongside MLM: given sentence pair (A, B), predict if B actually follows A in the original text.

Binary classification: "is next" vs "not next" — 50% real pairs, 50% random.

Later discovered to not contribute much — models like RoBERTa dropped NSP entirely and performed better.

The intuition was that it would teach sentence-level reasoning, but MLM alone turned out to capture this sufficiently.