Resources / Recordings / Interpretability and Security of AI models @ Intelligent Cooperation Workshop

Recording

Interpretability and Security of AI models @ Intelligent Cooperation Workshop

With Matjaz Leonardis

Date

20 September 2023

In this talk, Matjiaz discusses the concept of back doors in machine learning models. He explains that while machine learning models are typically used as classifiers, there is interest in using them for consequential decisions such as university admissions or loan applications. However, there are concerns about the ability for someone to train the model in a way that could influence its outcomes. Matjaz introduces his recent paper that successfully demonstrated the existence of back doors in machine learning models, where secret modifications can be applied to inputs to produce desired outputs without detection. He explains the technique used to hide these back doors in seemingly random choices during training. He raises questions about the interpretability and robustness of models with back doors, as well as the potential for using this technique in AI safety. He concludes by mentioning the need to explore extending back doors to more complex models and the role of cryptography and security in understanding the explainability of computation.

Interpretability and Security of AI models @ Intelligent Cooperation Workshop

Fund the science of the future.

Find us on

Contact

Focus Areas

Resources

Engage

Interpretability and Security of AI models @ Intelligent Cooperation Workshop

Fund the science of the future.

Subscribe to our Newsletter:

Find us on

Contact

Focus Areas

Resources

Engage