ML//Training//fine-tuning//catastrophic forgetting

When fine-tuning on new data destroys knowledge learned during pre-training. The model gets better at the new task but forgets everything else. The weights that encoded general knowledge get overwritten by task-specific gradients.


When fine-tuning on new data destroys knowledge learned during pre-training. The model gets better at the new task but forgets everything else. The weights that encoded general knowledge get overwritten by task-specific gradients.

This is the core problem that LoRA, QLoRA, and ReFT were designed to solve. Instead of updating all parameters (which risks overwriting pre-trained knowledge), they freeze most weights and only train small adapters or low-rank deltas.

Parameter freezing is the simplest defense: freeze early layers (which encode general features like syntax and semantics) and only fine-tune later layers (which encode more task-specific patterns). The trade-off: less forgetting but also less adaptation.

Why it happens mechanistically: neural networks are distributed representations. The same weights participate in many different capabilities. When you optimize for task A, the gradient doesn't know which weight updates will break task B.

Transfer learning depends on NOT forgetting: the whole point is to transfer knowledge from pre-training to a new task. Catastrophic forgetting is the failure mode where the transfer goes backward.

The continual learning problem: how to learn task after task without forgetting. No fully general solution exists. Practical mitigations include replay (mixing old data with new), elastic weight consolidation (penalizing changes to important weights), and adapter methods.