ML//Transformer//positional encoding//RoPE

2023-03-20

Rotary Position Embedding: instead of summing a position vector, **rotates** Q and K vectors by an angle proportional to their position before the dot product

Rotary Position Embedding: instead of summing a position vector, rotates Q and K vectors by an angle proportional to their position before the dot product

The elegance: Q·K then depends naturally on the relative distance between tokens, not absolute position. Rotation angles compose.

Extrapolates to longer sequences better than learned absolute embeddings: the rotation pattern generalizes.

Used in LLaMA, Mistral, and most modern open LLMs.