Summary
- Instead of trying to design a novel single superintelligent agent that aligns with human values, should we instead make a constellation of decentralized intelligences that can fit into our existing frameworks for cooperation?
- How can we achieve this?
- We start the exploration of this topic with principal-agent relationships between agents, and the tools humans use to do this.
- Today we hear from Gillian Hadfield about Incomplete Contracting, how it works with human institutions and how we may extend these structures to incorporate intelligent agents.
This meeting is part of the Intelligent Cooperation Group and accompanying book draft.
Presenters
Gillian Hadfield
Ā
Gillian Hadfield is the inaugural Schwartz Reisman Chair in Technology and Society, Professor of Law, and Professor of Strategic Management. She is alsoĀ Director of the Schwartz Reisman Institute for Technology and Society. Her research is focused on innovative design for legal and dispute resolution systems in advanced andā¦
Ā
Ā
Ā
Presentation: Incomplete Contracts & AI Alignment
- The Big Shift: from āHow do we embed a particular set of values into artificially intelligent agents?ā to āHow do we embed artificially intelligent agents into our human systems that work?ā
- The big question: How do we get an agent to do what we want?
- This has been long studied in economics as the theory of incentives, and at some point economists claimed to have a science of incentives: economic theory was that you could get the agent to do what you want by writing a complete contingent contract that deals with all relevant bits of the world.
- But is this trueā¦
- The disconnect between the way economists talk about contracts and how the law profession talks about contracts is thus: lawyers know that contracts are replete with vague language!
- Terms like āReasonableā, āresponsibleā, ābest effortā, etc. abound in legal contracts. Itās not a realm of total crispness.
- Incomplete Contracts
- So we started taking seriously that contracts are incomplete, and there is significant literature around that now. But itās hard to design as-complete-as-possible contracts that do what we want
- Incomplete contracts give rise to strategic behavior between parties to exploit ambiguity and gaps in the contract. This in turn leads to distrust and suboptimal behavior.
- How do we design contracts KNOWING they will be incomplete?
- Artificial Agents and Reinforcement Learning
- The primary approach to working with AIs is through reinforcement learning: specify a set of rewards and penalties based on the environment the agent is expected to be operating in, then let the agent move through the environment learning how to navigate to maximize its reward and minimize its penalty.
- But this is no easier than writing complete contracts!
- The difficulties in reward engineering lead to the same issues incomplete contracts in the human realm bring up:
- Due to the complex realities of reality, the reward function will ALWAYS be misspecified.
- Example: Youāre designing a function to teach a robot to carry boxes across the room. The box carrying robot canāt possibly take into account ALL things that may be in its path.
- Human Intuition and Implied Terms
- The human you hire with a contract will avoid all obstacles intuitively. Why is this?
- Adam Smith had a theory: The human agent goes through a process when it sees an obstacle: āwhat would an impartial spectator think if I just knocked it over?ā
- People have the whole of human norms and societal expectations in their head when theyāre acting in the world, reasoning hypotheticaly about potential impacts of decisions.
- Normative Infrastructure
- Key message to AI builders: game theory is all good and well, but we need a normative infrastructure like humans have if we really want aligned artificial agents.
Q&A
Ā
Ā
Ā
What are the important differences between human cognition and AI cognition when it comes to transferring normative structure to AIs?
- A: We absolutely have a cognitive architecture that is attuned to āwhat would others think?ā
- Whatās really distinctive about human societies is the third party punishment schemes (from jail time to a raised eyebrow) that govern our social behavior.
- What does that mean for AI systems? Conceptually itās the same challenge: make predictions about what will happen based on its behavior, but what will other parties think?
- Right now, weāre training our AIs like autonomous vehicles with pure behavior training, without the feedback.
Ā
One could imagine the opposite situation: instead of bringing the external spectator to the AI, we actually lose our human ability to have external spectators.
- A: Yeah, this is of concern and why I stress integrating AIs into our human normative structures and not the other way around.
- Itās a good point: if we start to have these intelligent agents in our midst, what will that do to our systems?
- If most vehicles are autonomous, I have less information about the people in my community that I used to get from how they drive.
- Does this degrade our interaction with each other?
Ā
Thereās some hope that we can use blockchain technology to create explainable AIs by tracking their decision processes: what are you looking at in this realm?
- A: Itās in the right direction, itās definitely a question of trust weāre dealing with. How to trust how these systems will make decisions.
- āAm I willing to participate in a complex interdependent relationship with these machinesā is the key question.
- If we can use blockchain technology to help solve the trust problem, thatās great.
- Once you take the lid off this, perfect transparency is not necessarily what produces trust, so itās going to be more complicated than that.
Ā
Itās true that metrics are always going to imperfect, but I think thereās a huge space between what people usually set as āeasy to operationalizeā goals and what it is that you could really do if you were trying.
- A: Agreed
Ā
It seems plausible that getting the AI to think about what humans MIGHT be upset about, like children do when theyāre trying to decide what might upset mom.
- A: The child being cautious is not a bad model here! While in game theory, reward and punishment are mathematically equivalent, our legal systems primarily operate on punishment, not reward.
- You want the AI to be unsure what might be a penalty and to be cautious about their behavior.
Ā
Regarding case law: it seems to me that the way we fill in gaps in contracts is by the historical record of case law that happened in the past. This seems to be a bit like machine learning, a historical record of previous learning. Could we use something similar in AI?
- A: Case law is important in our advanced formal legal system, but is pretty new, and weāve been filling in incomplete contracts for much longer using our normative structures.
- Itās interesting to look at common law versus other court systems.
- We train all our lawyers on reading the same cases. Itās not really the precedent in the case that matters, itās what the lawyers learn about what other layers might think about decisions.
- When we train GPT-3, itās just training on āwhat word is nextā not āwhat did people think about when they read that word.ā
- You use different language in different places: āyou donāt say that to a child!ā āYou say that in a chat room but not in personā, s by using language without the norms involved, weāre training on cheap data and need to be taking into account richer human data.
Ā
This feels similar to the problem with autonomous cars: humans fail a lot at driving cars, and with interacting with others. So we donāt actually need get the AI perfect, just better than a human, which is a pretty low bar. If we could have these AIs watching humans all the time, could we have them mirror us?
- A: An observation on ājust do better than humansā: when I interact with a human, Iām cutting them some slack. When I interact with a machine, Iām not cutting them the same slack.
- I see our normative structure as what allows us to work together in more and more complex arrangements.
- You donāt necessarily want to totally replicate human judgement and behavior.
Ā
Humans learn what weāre trying to teach the machines in two ways: hardwired by evolution and learning as children. There doesnāt seem to a be a limit to the second type of training: a human is watching the robot and judging its behavior to train the model. What is the limit?
- A: Thereās not just an āin the labā time when weāre children that we learn things: we are learning all through our lives via the consequences of our behavior.
- Thereās a view that āhumans have a preference for conformityā which I think is really more like āif I do this differently Iām gonna be treated like a weirdo.ā
- There are people using human feedback techniques in the lab, itās definitely part of the process.
Ā
Thereās a missing distinction in the conversation. Whatās the extended meaning of the contract? Thereās an embedding in the system of institutions and norms here. Then thereās the issue of the impartial observer in the heads of the contracted parties.
- A: Iām looking at the institutions and norms as the output of the normative structures between people.
- The institutions are not just adjudicators, theyāre systems of reasoning, and particular systems of reasoning: the French system is different than the English, etc.
Ā
Itās not until I heard about split contracts that I thought smart contracts held any water. One of the things thatās very important in our legal system: when a judge makes a decision, they can explain why they came to it. AIs right now are very bad at this. Leilani Gilpinās work on retroactive explanations is important here: https://groups.csail.mit.edu/mac/users/gjs/lgilpin-PhD-EECS-Sept2020.pdf āAn AI system that canāt explain itās decision is not very intelligentā Do you think this is a prerequisite for AI systems to be given decision making power?
- A: Absolutely! My latest work is on Justification.
- AI world is barking up the wrong tree with āexplainabilityā as the goal.
- Itās a black box whatās going on in our brains too: we donāt ask the judge to Explain how the decision was made, we ask them to Justify the decision with a set of reasons that are consistent with the way we make decisions in the community, then we examine those reasons for consistency.
- The key distinction: the causal account of how the decision was reached vs. the normative justification of the decision.
Seminar summary by James Risberg.