Super Intelligent Alignment @ Intelligent Cooperation Workshop
With Jan Leike
OpenAI recognizes the dangers of superintelligence and the importance of aligning it with human intent. Current alignment techniques won’t scale to superintelligence, so OpenAI’s new Superalignment effort aims to build an automated alignment researcher, and use scalable oversight and generalization techniques. They plan to stress-test their approach by deliberately training misaligned models to detect potential issues arising downstream early. OpenAI is dedicating a team and 20% of their compute resources to their superintelligence alignment efforts. Jan remains optimistic that focused efforts can solve this critical issue, and encourages participants to red-team the project.