As advanced AI systems proliferate, they will increasingly operate in environments populated by other AIs and by humans—with complex, often conflicting goals. This multi-agent context raises profound safety challenges and new opportunities for cooperation. Interactions between multiple agents—whether human, artificial, or hybrid—can produce emergent behaviors that are hard to predict, control, or align.
Without careful design, such systems could foster deception, collusion, power concentration, or societal fragmentation. Yet if properly guided, multi-agent systems could instead enable an explosion in technologies for mutually-beneficial cooperation, benefiting individual one-to-one interactions as well as society-wide collaboration on collective action problems.
In this area, we seek proposals that explore how to upgrade today’s cooperation infrastructure of norms, laws, rights, and institutions to ensure that humans and AI systems can interact safely and beneficially in multi-agent settings. We are particularly interested in prototypes that address new game-theoretic dynamics and principal-agent problems that will arise in interactions between AI agents and humans, mitigate risks of collusion and deception, and enhance mechanisms for trustworthy communication, negotiation, and coordination. We aim to de-risk the deployment of cooperative AI and create pathways where AI agents strengthen rather than undermine human-aligned cooperation.
Early demonstrations—such as AI systems that assist humans in negotiation or agents that autonomously identify and enforce mutually beneficial deals or detect and punish deception— could lay the foundation for a future where interactions among AIs and humans are predictably safe, transparent, and welfare-improving. We also welcome projects that lift collective intelligence at the group level, using AI to augment the processes through which groups form shared preferences, resolve conflicts, and coordinate action.
Building safe multi-agent systems is not just about designing good individual agents. It is about shaping the ecosystem of interactions, incentives, and norms that will govern how AIs and humans co-evolve together.
Scalable solutions that demonstrably prevent collusion, manipulation, or exploitation in agent-mediated agreements.
Autonomous agents that can identify, negotiate, and enforce mutually beneficial arrangements between humans and other AI systems.
AI systems that enhance collective intelligence and enable more effective group coordination around shared preferences.
Proposals should clearly demonstrate how the work will enhance safety in multi-agent AI environments, with particular attention to preventing harmful emergent dynamics when multiple AI systems interact with each other and humans.
We prioritize projects that:
Examples of past projects in this area include:
Cooperative AI Foundation
Convergence Analysis
AE Studio
Independent
Simplify (Macrotec LLC)
University of Oxford
Institute for Advanced Consciousness Studies
Assess the ability of AI agents to engage in steganographic collusion and AI-AI manipulation within adversarial oversight environments, and develop evaluations to assess how AI models strategize and enact exploitation within negotiation scenarios.
To meet rising demand for our Intelligence Rising AI scenario workshops, we seek support to maintain and improve the web application developed by Modeling Cooperation that simplifies, automates, and improves their facilitation. Designed by researchers from the Universities of Cambridge, Oxford, and Wichita State, these workshops support decision-makers in governments, industry, academia, and other relevant groups in understanding the possible development paths and risks of transformative AI.
An embedded agent is one whose cognitive machinery is a part of the environment in which it’s acting to achieve goals. Current frontier AI models are not embedded, but superintelligent AI will eventually become embedded whether we like it or not, because understanding your place in the world and gaining some form of back-door access to yourself are convergently instrumental goals for many tasks. If this first happens suddenly and unexpectedly in a domain such as “the Internet” or “the physical world” that would be extremely risky. Therefore I propose to study phenomena of embedded agency in safe, mathematically simple sandbox environments. This could lead to deconfusion and experimental verification of theorized embedded agency phenomena, hopefully long before it becomes a concern in capable general-purpose models.
Using the theory of Active Inference to build realistic models of multi-agent bounded-rational systems. We would like to understand how to improve agent’s cooperative capabilities, propensity to cooperate and ability to shape environments without disempowering others.
This research aims to explore the potential of safe multipolar AI scenarios, with a focus on multi-agent game simulations, game theory, scenarios avoiding collusion and deception, and addressing principal agent problems in multipolar systems.
Our mission is to conduct research and facilitate discussions on biologically inspired multi-objective multi-agent AI safety benchmarking. This effort aims to contribute to more concrete standardization, informed policy making, and the development of global safety culture in AI applications.
This research aims to explore the potential of safe multipolar AI scenarios, with a focus on multi-agent game simulations, game theory, scenarios avoiding collusion and deception, and addressing principal agent problems in multipolar systems.
Our mission is to conduct research and facilitate discussions on biologically inspired multi-objective multi-agent AI safety benchmarking. This effort aims to contribute to more concrete standardization, informed policy making, and the development of global safety culture in AI applications.
We are pioneering a multi-agent computational model that can directly simulate multi-polar and game theoretic behaviours in AI scenarios. The developed model will enable the rigorous testing and refinement of scenarios, their underpinning assumptions, and prospective policy proposals, presenting first-of-its-kind computational analysis for multi-polar and AI safety research.