Presenter
Fazl Barez, PhD student Edinburgh/Oxford and Co-director at Apart
My name is Fazl-Kiko. I am a PhD researcher at Edinburgh Centre For Robotics and visiting PhD Scholar at University of Oxford. I am also a research affiliate at CSER at University of Cambridge and AI safety community researcher at Future of Life Institue. Previously, I worked as a data scientist at RBS and The Data Lab Innovation Centre. Broadly, I am interested in Applications of Safety and Interpretability in Machine Learning. I am also interested in AGI safety and governance. From Sep 21 to Apr 22, I was an Applied Science Intern at Amazon working on interpretability and from July 20 to Feb 21, I was a research intern at Huawei Research Labs working on spatial recommender systems. Outside research, I co-founded AI Safety Hub Edinburgh and help run AI Safety Hub and Apart Research. Previously, I Co-Founded EA Edinburgh and now act as a senior advisor.
Summary:
Fazl Barez described some of his recent work, leading through a mechanistic interpretability “recipe”. He discussed interpreting language model neurons at scale, specifically delving into the attention-head neuron interaction and neuroplasticity. After explicating what it means to formulate the problem, he ended by discussing training humans to predict model behaviour.