BOOK · [040]
The Alignment Problem: Machine Learning and Human Values
AI Safety
Christian examines the technical and philosophical difficulty of specifying what we actually want from AI systems—reward hacking, distributional shift, value loading, interpretability—using close reporting on the researchers working on each problem. The book makes alignment research accessible without sacrificing precision, connecting abstract safety concerns to concrete failure modes already observed in deployed systems. A16z's AI Canon addresses foundational model capabilities; this is the companion volume on what happens when those capabilities are pointed in the wrong direction.