ML//Transformer//positional encoding//RoPE

Rotary Position Embedding — instead of summing a position vector, **rotates** Q and K vectors by an angle proportional to their position before the dot product


Rotary Position Embedding — instead of summing a position vector, rotates Q and K vectors by an angle proportional to their position before the dot product

The elegance: Q·K then depends naturally on the relative distance between tokens, not absolute position. Rotation angles compose.

Extrapolates to longer sequences better than learned absolute embeddings — the rotation pattern generalizes.

Used in LLaMA, Mistral, and most modern open LLMs.