Alignment team lead at OpenAI; Research Associate at the Future of Humanity Institute. My research aims to make progress on the alignment problem. The central question I’m studying is: How can we design competitive, scalable, and general machine learning algorithms that act in accordance with their user’s intentions? I lead the Alignment Team at OpenAI. Our approach to alignment research has three pillars: - Training AI systems using human feedback - Training AI systems to assist human evaluation - Training AI systems to do alignment research
OpenAI recognizes the dangers of superintelligence and the importance of aligning it with human intent. Current alignment techniques won’t scale to superintelligence, so OpenAI’s new Superalignment effort aims to build an automated alignment researcher, and use scalable oversight and generalization techniques. They plan to stress-test their approach by deliberately training misaligned models to detect potential issues arising downstream early. OpenAI is dedicating a team and 20% of their compute resources to their superintelligence alignment efforts. Jan remains optimistic that focused efforts can solve this critical issue, and encourages participants to red-team the project.