Information Theory//channel capacity
The maximum rate at which information can be transmitted reliably through a noisy channel — Shannon's most famous result (1948)
The maximum rate at which information can be transmitted reliably through a noisy channel — Shannon's most famous result (1948)
Formula (Shannon-Hartley): C=Blog2(1+S/N)C = B \log_2(1 + S/N)C=Blog2(1+S/N) where BBB is bandwidth, S/NS/NS/N is signal-to-noise ratio.
The noisy-channel coding theorem: if you transmit below capacity, codes exist that make error probability arbitrarily small. Above capacity, errors are inevitable regardless of coding scheme.
Analogy to ML models: a model has a "capacity" — the complexity of functions it can represent. A model too small for the task will always make errors, like transmitting above channel capacity. Scaling laws are empirical measurements of this capacity.
Analogy to extended thinking: thinking tokens are like increasing bandwidth — more compute per problem means more capacity to reduce entropy. But there is a ceiling set by pretraining (the channel itself), and no amount of extra thinking surpasses it.
The capacity is a property of the channel, not the message. Similarly, a model's ceiling is set by its architecture and pretraining, not by the prompt or thinking budget.