Presenter
Matjaz Leonardis, University of Oxford
Interested in learning and creativity. In both humans and machines.
Summary:
In this talk, Matjiaz discusses the concept of back doors in machine learning models. He explains that while machine learning models are typically used as classifiers, there is interest in using them for consequential decisions such as university admissions or loan applications. However, there are concerns about the ability for someone to train the model in a way that could influence its outcomes. Matjaz introduces his recent paper that successfully demonstrated the existence of back doors in machine learning models, where secret modifications can be applied to inputs to produce desired outputs without detection. He explains the technique used to hide these back doors in seemingly random choices during training. He raises questions about the interpretability and robustness of models with back doors, as well as the potential for using this technique in AI safety. He concludes by mentioning the need to explore extending back doors to more complex models and the role of cryptography and security in understanding the explainability of computation.