ML//Alignment//AI safety

- Ensuring AI systems do what we actually want, not just what we measured.


Ensuring AI systems do what we actually want, not just what we measured.

Alignment (values), robustness (edge cases), interpretability (understanding)

The gap between "optimizes the reward function" and "does the right thing" is the entire problem.