Resources / Recordings / Barriers and Pathways to Human-AI Alignment: A Game-Theoretic Approach

Recording

Barriers and Pathways to Human-AI Alignment: A Game-Theoretic Approach

With Aran Nayebi


Date

Under what conditions can capable AI systems efficiently align with human preferences, and when is this alignment computationally feasible? Since such generally capable systems do not yet exist, a theoretical analysis is needed to establish when guarantees hold — and what they even are. We provide the first complexity-theoretic analysis of the alignment problem, introducing a game-theoretic framework that generalizes prior alignment approaches under minimal assumptions, providing both upper and lower bounds on alignment’s complexity across M objectives and N agents. We show that even very capable, cooperative AI agents—including those enhanced by brain-computer interfaces—face inherent bottlenecks when the task space or number of agents grows large. Nevertheless, we identify key conditions under which efficient alignment remains possible, clarifying what makes an AI agent “sufficiently safe” and valuable to humans. Full paper: https://arxiv.org/abs/2502.05934

Fund the science of the future.

Donate today