BOOK · [040]

The Alignment Problem: Machine Learning and Human Values

AI Safety

Christian examines the technical and philosophical difficulty of specifying what we actually want from AI systems—reward hacking, distributional shift, value loading, interpretability—using close reporting on the researchers working on each problem. The book makes alignment research accessible without sacrificing precision, connecting abstract safety concerns to concrete failure modes already observed in deployed systems. A16z's AI Canon addresses foundational model capabilities; this is the companion volume on what happens when those capabilities are pointed in the wrong direction.

Endorsed By

1 PERSON

a16z
Best accessible treatment of AI alignment and safety research; complements the canon's technical papers section on RLHF and Constitutional AI.

a16z.com