2026 Fazl Barez

Fazl Barez

University of Oxford

2026, Berlin, AI for Security

An Automated Interpretability-Driven System for Model Auditing and Control

We will build an agentic interpretability system in which experts converse with an agent to check what internal computations led to a model’s answer, and which fixes model errors with minimal side effects. This will allow non-AI experts in safety-critical domains to contribute to AI alignment through their area of expertise.

Biography

How do we make AI systems that we can actually inspect, correct, and trust? Fazl Barez is a Principal Investigator at the University of Oxford leading research on technical AI safety, interpretability, and governance. He teaches Oxford’s AI Safety and Alignment course, has consulted for Anthropic, and held fellowships at RAND Corporation. His work bridges mechanistic interpretability, model auditing, and policy — translating what’s found inside models into tools regulators and auditors can use.

Fazl Barez

An Automated Interpretability-Driven System for Model Auditing and Control

Biography

Find us on

Contact

Focus Areas

Resources

Engage

Fazl Barez

An Automated Interpretability-Driven System for Model Auditing and Control

Biography

Subscribe to our Newsletter:

Find us on

Contact

Focus Areas

Resources

Engage