ML//Training//fine-tuning//LoRA

The cheat code that democratized fine-tuning — freeze 99% of the model, inject tiny trainable matrices into the joints, and somehow get almost the same result. Low-Rank Adaptation: freeze the base model, inject small trainable matrices (rank 4-64) into attention layers.


The cheat code that democratized fine-tuning — freeze 99% of the model, inject tiny trainable matrices into the joints, and somehow get almost the same result. Low-Rank Adaptation: freeze the base model, inject small trainable matrices (rank 4-64) into attention layers.

Typically applied to W_Q, W_K, W_V, W_O — irónicamente the attention matrices, not MLP — because behavior changes most with fewest parameters there.

Train ~1% of parameters, get ~90% of full fine-tuning quality.

QLoRA adds quantization on top — fine-tune a 65B model on a single 48GB GPU.

Democratized fine-tuning. Everyone can customize a foundation model now.

The tokenizer is almost never touched — changing it means changing the embedding matrix, basically starting over.