ML//Transformer//positional encoding//RoPE
Rotary Position Embedding — instead of summing a position vector, **rotates** Q and K vectors by an angle proportional to their position before the dot product
Rotary Position Embedding — instead of summing a position vector, rotates Q and K vectors by an angle proportional to their position before the dot product
The elegance: Q·K then depends naturally on the relative distance between tokens, not absolute position. Rotation angles compose.
Extrapolates to longer sequences better than learned absolute embeddings — the rotation pattern generalizes.
Used in LLaMA, Mistral, and most modern open LLMs.