Resources / Recordings / C. Schroeder de Witt | Secret Collusion Among Generative AI Agents: Toward Multi-Agent Security

Recording

C. Schroeder de Witt | Secret Collusion Among Generative AI Agents: Toward Multi-Agent Security

With Christian Schroeder de Witt


Date

I argue that the ability of advanced AI agents to use perfect stealth will soon be AI Safety’s biggest concern.Importantly, information-theoretic undetectability undermines our current approaches to multi-agent security.In this talk, I will be focusing on the matter of steganographic collusion among generative AI agents.I will give an introduction to steganography, talk about how agents could be incentivised to use it in unintended ways, and present a novel model evaluation framework, as well as mitigation measures. How do we design autonomous systems and environments in which undetectable actions cannot cause unacceptable damages?

Fund the science of the future.

Donate today