AI Could Be a Disaster for Humanity. A Top Computer Scientist Thinks He Has the Solution.

We call this the King Midas problem. King Midas specified his objective: I want everything I touch turned to gold. He got exactly what he asked for. Unfortunately, that included his food and his drink and his family members, and he dies in misery and starvation. Many cultures have the same story. The genie grants you three wishes. Always the third wish is “please undo the first two wishes” because I ruined the world.

And unfortunately, with systems that are more intelligent and therefore more powerful than we are, you don’t necessarily get a second and third wish.

So the problem comes from increasing capabilities, coupled with our inability to specify objectives completely and correctly. Can we restore our carbon dioxide to historical levels so that we get the climate back in balance? Sounds like a great objective. Well, the easiest way to do that is to get rid of all those things that are producing carbon dioxide, which happen to be humans. You want to cure cancer as quickly as possible. Sounds great, right? But the quickest way to do it is to run medical trials in parallel with millions of human subjects or billions of human subjects. So you give everyone cancer and then you see what treatments work.

Kelsey Piper: You’ve been a leading AI researcher for decades. I’m curious at what point you became convinced that AI is dangerous.
Stuart Russell: So for a long time I’ve been uncomfortably aware that we don’t have an answer to the question: “What if you succeed?” In fact, the first edition of [my] textbook has a section with that title, because it’s a pretty important question to ask if a whole field is pushing towards a goal. And if it looks like, when you get there, that you may be taking the human race off a cliff, then that’s a problem.

If you ask, okay, we’re gonna make things that are much more intelligent, much more powerful than us. How on earth do we expect us to [keep] power from more powerful [entities] forever? It’s not obvious that that question has an answer.

In fact, [computer scientist Alan] Turing said we would have to expect the machines to take control. He was completely resigned to this and our species would be humbled, as he put it. So that’s clearly a disturbing state of affairs.

It was more clear to me starting in the early 2010s. I was on sabbatical in Paris. I had more time to appreciate the importance of human experience and civilization. And in the meantime, other researchers, mostly outside the field, had started to point out these failure modes: that fixed objectives led to all of these unwelcome behaviors, deception and potentially arbitrarily bad consequences from resource consumption, from self-defense incentives.

So the confluence of those things led me to start thinking about, okay, how do we actually fix the problem?