ML//neural network//loss landscape

2026-03-11

The surface you get when you plot the loss function value against all possible combinations of a neural network's weights. Each point on the surface represents one specific configuration of weights and how badly the network performs with that configuration. The gradient descent optimizer moves across this surface, always trying to go downhill.

The problem: a typical network has millions of parameters, so the landscape lives in millions of dimensions. We cannot see it. Visualization requires dimensionality reduction (projecting onto a 2D plane using random orthogonal directions, then plotting loss as the third axis). These projections are imperfect (like a photograph of a 3D scene) but preserve enough structure to generate useful insights about the morphology of the training process.

Key features of loss landscapes include minima (valleys where the optimizer converges), saddle points (points that look like minima in some directions but not others), and the overall smoothness or ruggedness of the terrain (which correlates with how well the network generalizes).