Fazl Barez

How do we make AI systems that we can actually inspect, correct, and trust? Fazl Barez is a Principal Investigator at the University of Oxford leading research on technical AI safety, interpretability, and governance. He teaches Oxford’s AI Safety and Alignment course, has consulted for Anthropic, and held fellowships at RAND Corporation. His work bridges mechanistic interpretability, model auditing, and policy — translating what’s found inside models into tools regulators and auditors can use.

Related Recordings

Unlearning in Large Language Models @ Intelligent Cooperation Workshop 2024

Fazl Barez

Recording

Jul 2024

Interpretability for Safety and Alignment @ Intelligent Cooperation Workshop

Fazl Barez

Recording

Sep 2023

Fazl Barez

Related Recordings

Unlearning in Large Language Models @ Intelligent Cooperation Workshop 2024

Interpretability for Safety and Alignment @ Intelligent Cooperation Workshop

Fund the science of the future.

Find us on

Contact

Focus Areas

Resources

Engage

Related Recordings

Unlearning in Large Language Models @ Intelligent Cooperation Workshop 2024

Interpretability for Safety and Alignment @ Intelligent Cooperation Workshop

Fund the science of the future.

Subscribe to our Newsletter:

Find us on

Contact

Focus Areas

Resources

Engage