Resources / Recordings / C. Schroeder de Witt | Secret Collusion Among Generative AI Agents: Toward Multi-Agent Security

Recording

C. Schroeder de Witt | Secret Collusion Among Generative AI Agents: Toward Multi-Agent Security

With Christian Schroeder de Witt

Date

21 April 2024

I argue that the ability of advanced AI agents to use perfect stealth will soon be AI Safety’s biggest concern.Importantly, information-theoretic undetectability undermines our current approaches to multi-agent security.In this talk, I will be focusing on the matter of steganographic collusion among generative AI agents.I will give an introduction to steganography, talk about how agents could be incentivised to use it in unintended ways, and present a novel model evaluation framework, as well as mitigation measures. How do we design autonomous systems and environments in which undetectable actions cannot cause unacceptable damages?

C. Schroeder de Witt | Secret Collusion Among Generative AI Agents: Toward Multi-Agent Security

Fund the science of the future.

Find us on

Contact

Focus Areas

Resources

Engage

C. Schroeder de Witt | Secret Collusion Among Generative AI Agents: Toward Multi-Agent Security

Fund the science of the future.

Subscribe to our Newsletter:

Find us on

Contact

Focus Areas

Resources

Engage