ML//neural network//embedding

2018-03-08

Mapping discrete tokens to dense vectors: "king" becomes [0.2, -0.5, 0.8, ...]. Typical dimension: 12K+ in large models.

Mapping discrete tokens to dense vectors: "king" becomes [0.2, -0.5, 0.8, ...]. Typical dimension: 12K+ in large models.

Word2Vec showed embeddings capture meaning: king - man + woman ≈ queen. Dot product = 0 → perpendicular → unrelated. Directionality encodes gender, plurality, semantics.

Learned during training, not hand-crafted. The model discovers its own feature space.

Each forward pass starts from the base embeddings. Enriched representations from the previous pass are NOT saved (that would be RNNs)

The embedding matrix that produces these vectors is also used (transposed) as the output layer via weight tying