- Instead of trying to design a novel single superintelligent agent that aligns with human values, should we instead make a constellation of decentralized intelligences that can fit into our existing frameworks for cooperation?
- How can we achieve this?
- We start the exploration of this topic with principal-agent relationships between agents, and the tools humans use to do this.
- Today we hear from Gillian Hadfield about Incomplete Contracting, how it works with human institutions and how we may extend these structures to incorporate intelligent agents.
This meeting is part of the Intelligent Cooperation Group and accompanying book draft.
Presentation: Incomplete Contracts & AI Alignment
- The Big Shift: from “How do we embed a particular set of values into artificially intelligent agents?” to “How do we embed artificially intelligent agents into our human systems that work?”
- The big question: How do we get an agent to do what we want?
- This has been long studied in economics as the theory of incentives, and at some point economists claimed to have a science of incentives: economic theory was that you could get the agent to do what you want by writing a complete contingent contract that deals with all relevant bits of the world.
- But is this true…
- The disconnect between the way economists talk about contracts and how the law profession talks about contracts is thus: lawyers know that contracts are replete with vague language!
- Terms like “Reasonable”, “responsible”, “best effort”, etc. abound in legal contracts. It’s not a realm of total crispness.
- Incomplete Contracts
- So we started taking seriously that contracts are incomplete, and there is significant literature around that now. But it’s hard to design as-complete-as-possible contracts that do what we want
- Incomplete contracts give rise to strategic behavior between parties to exploit ambiguity and gaps in the contract. This in turn leads to distrust and suboptimal behavior.
- How do we design contracts KNOWING they will be incomplete?
- Artificial Agents and Reinforcement Learning
- The primary approach to working with AIs is through reinforcement learning: specify a set of rewards and penalties based on the environment the agent is expected to be operating in, then let the agent move through the environment learning how to navigate to maximize its reward and minimize its penalty.
- But this is no easier than writing complete contracts!
- The difficulties in reward engineering lead to the same issues incomplete contracts in the human realm bring up:
- Due to the complex realities of reality, the reward function will ALWAYS be misspecified.
- Example: You’re designing a function to teach a robot to carry boxes across the room. The box carrying robot can’t possibly take into account ALL things that may be in its path.
- Human Intuition and Implied Terms
- The human you hire with a contract will avoid all obstacles intuitively. Why is this?
- Adam Smith had a theory: The human agent goes through a process when it sees an obstacle: “what would an impartial spectator think if I just knocked it over?”
- People have the whole of human norms and societal expectations in their head when they’re acting in the world, reasoning hypotheticaly about potential impacts of decisions.
- Normative Infrastructure
- Key message to AI builders: game theory is all good and well, but we need a normative infrastructure like humans have if we really want aligned artificial agents.
What are the important differences between human cognition and AI cognition when it comes to transferring normative structure to AIs?
- A: We absolutely have a cognitive architecture that is attuned to “what would others think?”
- What’s really distinctive about human societies is the third party punishment schemes (from jail time to a raised eyebrow) that govern our social behavior.
- What does that mean for AI systems? Conceptually it’s the same challenge: make predictions about what will happen based on its behavior, but what will other parties think?
- Right now, we’re training our AIs like autonomous vehicles with pure behavior training, without the feedback.
One could imagine the opposite situation: instead of bringing the external spectator to the AI, we actually lose our human ability to have external spectators.
- A: Yeah, this is of concern and why I stress integrating AIs into our human normative structures and not the other way around.
- It’s a good point: if we start to have these intelligent agents in our midst, what will that do to our systems?
- If most vehicles are autonomous, I have less information about the people in my community that I used to get from how they drive.
- Does this degrade our interaction with each other?
There’s some hope that we can use blockchain technology to create explainable AIs by tracking their decision processes: what are you looking at in this realm?
- A: It’s in the right direction, it’s definitely a question of trust we’re dealing with. How to trust how these systems will make decisions.
- “Am I willing to participate in a complex interdependent relationship with these machines” is the key question.
- If we can use blockchain technology to help solve the trust problem, that’s great.
- Once you take the lid off this, perfect transparency is not necessarily what produces trust, so it’s going to be more complicated than that.
It’s true that metrics are always going to imperfect, but I think there’s a huge space between what people usually set as “easy to operationalize” goals and what it is that you could really do if you were trying.
- A: Agreed
It seems plausible that getting the AI to think about what humans MIGHT be upset about, like children do when they’re trying to decide what might upset mom.
- A: The child being cautious is not a bad model here! While in game theory, reward and punishment are mathematically equivalent, our legal systems primarily operate on punishment, not reward.
- You want the AI to be unsure what might be a penalty and to be cautious about their behavior.
Regarding case law: it seems to me that the way we fill in gaps in contracts is by the historical record of case law that happened in the past. This seems to be a bit like machine learning, a historical record of previous learning. Could we use something similar in AI?
- A: Case law is important in our advanced formal legal system, but is pretty new, and we’ve been filling in incomplete contracts for much longer using our normative structures.
- It’s interesting to look at common law versus other court systems.
- We train all our lawyers on reading the same cases. It’s not really the precedent in the case that matters, it’s what the lawyers learn about what other layers might think about decisions.
- When we train GPT-3, it’s just training on “what word is next” not “what did people think about when they read that word.”
- You use different language in different places: “you don’t say that to a child!” “You say that in a chat room but not in person”, s by using language without the norms involved, we’re training on cheap data and need to be taking into account richer human data.
This feels similar to the problem with autonomous cars: humans fail a lot at driving cars, and with interacting with others. So we don’t actually need get the AI perfect, just better than a human, which is a pretty low bar. If we could have these AIs watching humans all the time, could we have them mirror us?
- A: An observation on “just do better than humans”: when I interact with a human, I’m cutting them some slack. When I interact with a machine, I’m not cutting them the same slack.
- I see our normative structure as what allows us to work together in more and more complex arrangements.
- You don’t necessarily want to totally replicate human judgement and behavior.
Humans learn what we’re trying to teach the machines in two ways: hardwired by evolution and learning as children. There doesn’t seem to a be a limit to the second type of training: a human is watching the robot and judging its behavior to train the model. What is the limit?
- A: There’s not just an “in the lab” time when we’re children that we learn things: we are learning all through our lives via the consequences of our behavior.
- There’s a view that “humans have a preference for conformity” which I think is really more like “if I do this differently I’m gonna be treated like a weirdo.”
- There are people using human feedback techniques in the lab, it’s definitely part of the process.
There’s a missing distinction in the conversation. What’s the extended meaning of the contract? There’s an embedding in the system of institutions and norms here. Then there’s the issue of the impartial observer in the heads of the contracted parties.
- A: I’m looking at the institutions and norms as the output of the normative structures between people.
- The institutions are not just adjudicators, they’re systems of reasoning, and particular systems of reasoning: the French system is different than the English, etc.
It’s not until I heard about split contracts that I thought smart contracts held any water. One of the things that’s very important in our legal system: when a judge makes a decision, they can explain why they came to it. AIs right now are very bad at this. Leilani Gilpin’s work on retroactive explanations is important here: https://groups.csail.mit.edu/mac/users/gjs/lgilpin-PhD-EECS-Sept2020.pdf “An AI system that can’t explain it’s decision is not very intelligent” Do you think this is a prerequisite for AI systems to be given decision making power?
- A: Absolutely! My latest work is on Justification.
- AI world is barking up the wrong tree with “explainability” as the goal.
- It’s a black box what’s going on in our brains too: we don’t ask the judge to Explain how the decision was made, we ask them to Justify the decision with a set of reasons that are consistent with the way we make decisions in the community, then we examine those reasons for consistency.
- The key distinction: the causal account of how the decision was reached vs. the normative justification of the decision.