For decades, science fiction has been fascinated with the idea that human creations would rise to overthrow us. From the works of Isaac Asimov to The Matrix, the dangerous side of AI has been a powerful theme. Now, the media floods us with contrasting extremes that suggest AI will either bring about humanity’s demise or that it will improve our lives in every conceivable way. In this article, I unpack these seemingly-contradictory views and explain why humanity is facing its greatest challenge yet—coordinating intelligent systems and inadequate institutions.
Anyone following technology news will be aware of the tremendous progress made in artificial intelligence over the past decade. Whilst most experts agree that we are quite far from achieving artificial general intelligence (AGI), even the current narrow AI systems pose significant risks not just to society but to humankind.
In developed nations, tasks are increasingly being automated through the use of AI systems for the efficiency and scalability they offer. AI doesn’t get fatigued or irritable with customers. AI doesn’t demand a pay increase. AI doesn’t need a coffee machine or a benefits package. In almost every way, robust automation is superior to even the highest-performing employees. Consumers have come to expect the always-on, always-dynamic features AI systems allow—from navigation to film recommendations.
AI development is accelerating and humanity is becoming increasingly reliant on AI systems.
A common misconception is that AI is just the 21st century equivalent of nuclear energy—a double-edged sword that can provide immense value, but bring about terrible destruction. At first, this seems like a plausible and optimistic conclusion. After all, the world has regulated nuclear technology effectively, hasn’t it? But this is misguided. Firstly, it severely understates the risks of nuclear technology. Most people alive today were born into a world where a thermonuclear apocalypse is always a single button-press away. This desensitised us to a state of affairs that few people would accept if it were not the default. As historian Dan Carlin puts it, “when you grow up with a gun to your head, you forget that it’s there.”
Secondly, developing nuclear weapons is both conspicuous and prohibitively expensive for all but a few nations. If making a nuke was as easy as making microwave popcorn, the world would be a very different place. But with AI, we simply don’t know when or how the next technological leap will happen. Just about anyone with a laptop and an internet connection can research AI development. This makes AI risks unique in many ways, which makes them vastly more complex to manage.
Thirdly, nuclear weapons are terribly destructive, but their mechanisms are entirely beholden to human decisions. A target is selected. A series of buttons is pressed. A missile is launched. Nukes might be difficult to manufacture, but they are easy to understand. AI is fundamentally different. AI is a technology that is, by definition, smarter than humans in at least some domains. For now, those domains are narrow—Chess, Go, image classification—but they are becoming wider each day that goes by. Never before in human history have we faced the prospect of systems that may exceed our own intelligence in just about every dimension that matters.
Unlike other technologies, AI will inherently be smarter than us. We need to implement it safely.
“But calculators and computers have been smarter than us at arithmetic for nearly a century now,” you might protest. “So how is AI any different to those useful, safe technologies?” That is indeed true. But the mechanisms by which our calculators and computers operate are still well-within human understanding. There is no mystery about how they work. With modern AI systems, the mechanisms operate many layers of abstraction higher than those of a calculator. It can be helpful to compare it to a human brain—we understand our thoughts, but the details of how billions of our neurons work together to create those thoughts remain a mystery. Just as it is hard to diagnose and repair human brains that aren’t thinking correctly, it is incredibly hard to diagnose and repair AI.
Above all, AI technology is unique in one exceptionally-dangerous regard. It has the capacity for self-improvement. Nuclear bombs don’t figure out how to make deadlier nuclear bombs, but an intelligent machine can figure out how to make a more intelligent machine. This runaway “intelligence explosion” is what makes AI technology so powerful, but so risky. It’s what keeps researchers awake at night and what keeps futurists predicting singularity events in the near future. Unlike other technologies, AI will inherently be smarter than us, so we need to implement it safely.
Most systems capable of improvement are based on function optimisation. You give the system some goal, such as win this game of chess, and it takes the actions that bring it closer to achieving that goal. Just about all systems we consider “AI” are based on this principle. The goal is always to optimise a function—to minimise cost or maximise some value. For instance, a bank might use some kind of AI to determine whether an applicant is likely to repay their loan on time. That AI is trained on some historical data and has the goal of minimising prediction error—the difference between what it expects to happen and what actually happens. From this perspective, the bank itself is a function optimiser too. It has the goal of maximising profits, so it performs the actions that reliably increase those profits. This often comes at the cost of other goals that humanity might have, such as “don’t crash the housing market by offering subprime loans.” So, if subprime loans are the best way to maximise profits, the bank will offer them. It is merely a function optimiser.
Viewing systems through this lens, there is no requirement of sentience or agency in a function optimiser. Natural selection is an inadvertent function optimiser—it wasn’t designed—but it still optimises for the fittest organisms. It isn’t intelligent at all, so it’s very slow and inefficient. But it still works. More importantly, it has the same property all function optimisers share—it is ambivalent to consequences that don’t relate to its function. Evolution, for instance, produces organisms that are good at finding food and reproducing in their ecological niche. But it’s also responsible for pathogens and disease, for child mortality, for ageing and death. Function optimisers only care about the function they are optimising. Evolution doesn’t care how happy animals are, just whether they reproduce. Banks don’t care whether you really need the loan, just whether it’s profitable to offer it. Similarly, AI systems are very unlikely to turn evil or gain consciousness like we see in popular films. But, like all function optimisers, they can blindly follow their objectives, regardless of what is in the way.
Function optimisers are prone to misalignment with human goals.
These concepts are obviously not new, but computing technology and the Internet have exacerbated their effects. Never before have function optimisers been so unconstrained in their capacity to scale. Electrons move much faster than animals. Software is more agile than a bank. An engineer at Facebook can unleash new code on a Monday and have inadvertently caused a genocide by Friday. The very advantages of our technology are what creates the potential for catastrophe.
Researchers have grown increasingly aware of the risks of uncontrolled function optimisers over the decades. In particular, philosopher Nick Bostrom and decision theorist Eliezer Yudkowsky have produced an extensive body of work outlining the ways in which AI systems can misalign with the intentions of their designers. Recently, these ideas have begun to spread to AI researchers and public figures.
Some function optimisers can be controlled quite successfully, right? Would AI be so different? Well, simple ones can often be nudged towards more preferable outcomes by continuously monitoring their activity and adjusting the incentives. In an ideal world, this is the role that regulators play. If the bank is giving out subprime loans, and those subprime loans are going to cause the housing market to crash, the government can ban subprime loans. Easy! In the real world, as we are all aware, this isn’t how things work. Incentives are powerful and regulation is slow and inefficient.
AI is also unlike most other function optimisers in that it is vastly more complex. Contemporary systems have billions of parameters and are trained on enormous datasets. Hunting for simple ways to reliably nudge these complex functions in a favourable direction is a fool’s errand. As Google themselves have demonstrated, it’s much more complicated than “don’t be evil.”
Okay, so we need a way to control AI systems. We want to be able to reap their benefits without having them run away from us and optimise the wrong things. Why would that be so difficult? Let’s imagine we are software engineers working for a marketing company. We can’t figure out what website configuration will lead to the most sales of our online product, so we decide to throw our new AI system at it. We give it the goal of maximising the number of clicks on the “Purchase” button and set it running on a cloud server. At first, it doesn’t achieve much. It just idles away collecting data from our customers. Then it starts making modifications to the files on the webserver and observing the result. After a while, it finds a pattern—when the “Purchase” button is red, more people click on it than when it’s blue. Fantastic! It just figured out how to increase sales by 23%. Excited by our promising results, we decide to leave work early and go for drinks, leaving the AI to continue running overnight.
When we come into work the next day, we find that things have escalated out of control. The AI system kept finding increasingly elaborate ways to manipulate the layout of the website to increase clicks on the purchase button. It started generating misleading text to attract more people to the website. Then it started moving the button around the screen to suddenly jump under the visitor’s cursor. In an effort to keep maximising clicks, it eventually discovered that rapidly flashing different colours across the website could induce seizures in visitors and cause them to accidentally click the “Purchase” button as their muscles convulsed. With the company lawyers descending into panic, we decide to shut down the entire site before anyone else can be harmed. But the AI had foreseen this possibility and estimated that it could maximise clicks more effectively if it prevented us from doing that. So, it changed all the passwords for the cloud server, making it impossible for us to log in and disable it. “Oh dear…”
In AI research, this kind of misalignment between what the developers of the system intended and what the system ends up doing is called the Control Problem.
We need to solve the Control Problem before AI becomes too sophisticated.
From the cases we have to discussed so far, it is clear that the world should be devoting serious thought to AI risks. We need to have discussions about how to address the Control problem. Despite many attempts from AI experts to stimulate these conversations, the UN has still made very little progress. Some countries have released national agendas about AI, but all focus primarily on strategic advantages and autonomous weapons.
The problem here is that the benefits of having the most advanced AI are extreme. Moreover, the disadvantages of having even the second-best AI system are equally extreme. Developing a breakthrough in AI is akin to earning a 1000X score multiplier in a video game—it’s so much more effective that everyone else may as well not be playing. Because AI development is based on these kinds of exponential improvements, there are overwhelming incentives for every nation, organisation, and individual to develop AI systems first. Taking time to think about safety concerns like the Control Problem wastes resources and makes it more likely that someone else will beat you to it. The natural equilibrium is an AI arms race. And it’s already begun.
But let’s be optimistic for a moment. Let’s assume some group of researchers solves the seemingly-insurmountable Control Problem. Imagine a research paper were published that proved a method for aligning AI with the goals of humans. Great! But even then, we would still need a way of ensuring that this Control system were implemented by AI developers. If even one competent developer didn’t utilise the control system, the whole of humanity would still be at the mercy of runaway function optimisers. It would be like living in a world where people can make nuclear weapons in their kitchen. Disaster wouldn’t even require any malice, just mistakes.
Even if we somehow learn to align AI, we still need to align our institutions so that AI safety is properly implemented by developers.
And it’s not just individuals. The Control solution would certainly be a great challenge to incorporate into AI systems—requiring many brilliant minds working for many years. This immense cost would place pressure on organisations to cut corners. If a Chinese company decided to focus mostly on the core AI system and an American company knew that, the American company would spend as little time as possible on implementing the Control system, maximising their chances of winning the development race. But knowing that, the Chinese company would be even less inclined to work on including the Control system. The institutions are function optimisers and their goal is to win the race to AI. Once again, the natural equilibrium is an arms race. As everyone competes to implement AI first—under huge economic and social incentives—the resources devoted to safety concerns will be reduced to zero.
But surely local government could make it mandatory to implement the control mechanisms? Perhaps. But how? They could mandate that companies developing AI can be fined for not incorporating safety mechanisms. But the price of the fine will be less than the enormous expected value of developing AI first, so the economic incentives will still push the companies to develop AI as soon as possible—absorbing the fine as a mere operating cost. What if the government jailed CEOs who didn’t have at least 30% of their developers working in the safety department? Well, the CEOs would just hire extra people to sit in the “safety” office and hold the “developer” title whilst their best employees continued work on the core AI.
And why would governments want to restrict AI development anyway? The same enormous socio-economic incentives that are pushing the companies into an AI race are pushing the governments to restrict those companies as little as possible. We see this in the U.S. with Google, Amazon, etc. as well as in China with Alibaba, Baidu, Tencent, etc.
Okay, but surely the UN could pass international laws that force all countries to implement AI safety? Well the UN is composed of countries that have competing interests here. It is to no country’s strategic advantage to limit their own AI development if they have a chance of winning the arms race. They will be hostage to the incentive structure and merely stall proceedings whilst their developers race to build the next breakthrough. Indeed, this is exactly what we see happening at the UN. Our institutions are inadequate.
You’ve probably noticed some striking similarities to the original Control Problem here. Just as an AI is a function optimiser, so is a company, and so is a country. Just as aligning humanity’s goals with those of an AI is a fundamentally hard problem, so is aligning the best interests of organisations. Both are problems of coordination. Indeed, the Control problem can be thought of as a coordination problem wearing a special hat.
Coordination problems are situations where achieving the best overall outcome for everyone requires all entities to act cooperatively, but without access to all the information. The best interests of the group and the best interests of each individual are not always aligned. Economists and mathematicians have been studying these problems for decades. Famous examples are the Prisoner’s Dilemma and the Tragedy of the Commons.
The world is rife with coordination problems. You and your colleagues need to meet to work on a project, but nobody wants to be the one to organise it? That’s a coordination problem. A third of the world’s food is wasted, but millions suffer from starvation? That’s a coordination problem. Everyone would prefer to live on a planet that isn’t saturated with CO2, but fossil fuels are cheaper and more accessible than renewable energy? You guessed it. That’s a coordination problem. Unless everyone is certain that everyone else will cooperate, it makes more sense to be selfish instead.