Building Safe AI

There are those who worry about AIs or robots taking over the world. Isaac Asimov famously worried about people worrying about it — what he called the Frankenstein Complex — and invented the Three Laws of Robotics to show, at a sort of literary level of understanding, that we could build machines that were safe to have around.

Asimov assumed that there would be some sort of world government on Earth that required all robots to be built with the Three Laws. In practice we have no such world government and we have no idea what kind of laws we would want to build in — Asimov’s being literary hints, not real engineering.

And even if we could do all of that, people making AIs would have all kind of incentives to cheat, or build their robots to cheat, and find loopholes in the Laws, or whatever. After all, if the Laws weren’t going to be constraining them from something they wanted to do otherwise, there’d be no need for the Laws in the first place.

It would be better if we could set things up so that the people, and ultimately AIs, who build the AIs of the future had an incentive to make them safe rather than having that imposed on them from above.

It turns out that the answer lies in understanding how intelligence works in the first place. If we take the right approach to building AI, the incentives to make it safe and the incentive to simply make it work well will coincide.

A logic-based robot that figures everything out for itself is not safe. Consider, for example, the development of game theory: there was a couple of decades, roughly ranging from von Neumann/Morganstern to Axelrod, where the phenomenon of evolving cooperation and altruism was not understood. In experiments of the period, game theorists were shown to play interaction games significantly more selfishly than the average person.

Human intelligence doesn’t work that way. What we do is a highly sophisticated form of imitation. This is famously observed in the other primates as well — but we do it at a higher level, being able to do the same kind of thing we saw rather than just the same thing, or indeed use blending and metaphor to extend imitation into completely different domains than the original action.

I want my robot not only to do the things I want done, but to do them the way I would have done them. I want it to be capable of imitating anyone, but prefer to imitate me. I want it to have my values and be just like me in every respect.

Every respect but one, that is. It should be more cooperative than I am, since it’s me that it’s cooperating with. It should be more even-tempered, more foresightful, more diplomatic, less forgetful, more consistent, and perhaps even a tiny bit more trustworthy. Just a little bit, and in a way that’s just the way I would when I’m at my best.

In other words, it should be just like me, but just a little better. In my definition of better.

If all robots and AIs were built that way, it would be a perfect world. They won’t, of course, because most people don’t want a J Storrs robot, but one that imitates them. So they’ll build their AIs, or buy and train them, to be like themselves, only just a little better. And that’s what they’ll want to do. No need to legislate, or to create the ultimate moral code to build in. Just build the robots to imitate their owners in at least as sophisticated a way as humans imitate, to give people that option.

So if we build our AIs to imitate us, individually, at all levels including the one of morals and values, we won’t have a perfect world. That was never an option. But we’ll have one just a little bit better than the one we have now.

Gaming the Future: The Book!

Building Safe AI

Leave a comment

Search Foresight Institute