Presenter
Christian Schroeder de Witt
Christian is a researcher in foundational AI, information security, and AI safety, with a current focus on the limits of undetectability. Lately, he has been busy pioneering the field of Multi-Agent Security (masec.ai), which aims to overcome the safety and security issues inherent in contemporary approaches to multi-agent AI. His recent works include a breakthrough result on the 25+ year old problem of perfectly secure steganography (jointly with Sam Sokota), which was featured by Scientific American, Quanta Magazine, and Bruce Schneierās Security Blog, as well as illusory attacks, a novel form of stealthy adversarial attack on reinforcement learning agents (Spotlight at ICLR 2024). He was invited to present his latest work on Collusion Among Generative AI Agents (https://arxiv.org/abs/2402.07510) at the NOLA Alignment Workshop ahead of NeurIPS 2023. During his Ph.D., Christian helped establish the field of cooperative deep multi-agent reinforcement learning, resulting in popular learning algorithms such as QMIX, MACKRL, IPPO, and FACMAC, and the standard benchmark environments SMAC and Multi-Agent MuJoCo. Christian is currently a postdoc with Torr Vision Group at the University of Oxford, and a former visiting researcher with Turing Award-winner Prof. Yoshua Bengio at MILA (Quebec) and postdoc with FLAIR. Previously, he completed his DPhil (Ph.D.) āCoordination and Communication in Deep Multi-Agent Reinforcement Learningā with Prof. Philip Torr (Torr Vision Group) and Prof. Shimon Whiteson (WhiRL), which won an EPSRC IAA Doctoral Impact Fund Award and is, according to his examiner Prof. Frans Oliehoek, a āstandard reference in the fieldā. Christian holds distinguished masters degrees in Physics, as well as Computer Science (both University of Oxford), during the latter of which he proved an open incompleteness theorem in categorical quantum mechanics (the completed ZX-calculus is now a mainstream tool in quantum computing). In 2022, he was selected as a ā30 under 35 rising strategist (Europe)ā by Schmidt Futures International Strategy Forum and the European Council on Foreign Relations. He also received a Best Idea award from the CCAI community in 2019 for work on solar geoengineering and deep multi-agent learning. Christian was invited to spend time with Constellation as a Visiting Researcher in Berkeley in February 2024.
Summary:
I argue that the ability of advanced AI agents to use perfect stealth will soon be AI Safety’s biggest concern.
Importantly, information-theoretic undetectability undermines our current approaches to multi-agent security.
In this talk, I will be focusing on the matter of steganographic collusion among generative AI agents.
I will give an introduction to steganography, talk about how agents could be incentivised to use it in unintended ways,
and present a novel model evaluation framework, as well as mitigation measures.
Challenge:
How do we design autonomous systems and environments in which undetectable actions cannot cause unacceptable damages?