AI Is Getting Powerful. But Can Researchers Make It Principled?

Can researchers create AI algorithms that are ethical today—and will be safe in the future?

By Mordechai Rorvig on April 4, 2023

AI Is Getting Powerful. But Can Researchers Make It Principled? — Credit: peshvov/Getty Images

Soon after Alan Turing initiated the study of computer science in 1936, he began wondering if humanity could one day build machines with intelligence comparable to that of humans. Artificial intelligence, the modern field concerned with this question, has come a long way since then. But truly intelligent machines that can independently accomplish many different tasks have yet to be invented. And though science fiction has long imagined AI one day taking malevolent forms such as amoral androids or murderous Terminators, today’s AI researchers are often more worried about the everyday AI algorithms that already are enmeshed with our lives—and the problems that have already become associated with them.

Even though today’s AI is only capable of automating certain specific tasks, it is already raising significant concerns. In the past decade, engineers, scholars, whistleblowers and journalists have repeatedly documented cases in which AI systems, composed of software and algorithms, have caused or contributed to serious harms to humans. Algorithms used in the criminal justice system can unfairly recommend denying parole. Social media feeds can steer toxic content toward vulnerable teenagers. AI-guided military drones can kill without any moral reasoning. Additionally, an AI algorithm tends to be more like an inscrutable black box than a clockwork mechanism. Researchers often cannot understand how these algorithms, which are based on opaque equations that involve billions of calculations, achieve their outcomes.

Problems with AI have not gone unnoticed, and academic researchers are trying to make these systems safer and more ethical. Companies that build AI-centered products are working to eliminate harms, although they tend to offer little transparency on their efforts. “They have not been very forthcoming,” says Jonathan Stray, an AI researcher at the University of California, Berkeley. AI’s known dangers, as well as its potential future risks, have become broad drivers of new AI research. Even scientists who focus on more abstract problems such as the efficiency of AI algorithms can no longer ignore their field’s societal implications. “The more that AI has become powerful, the more that people demand that it has to be safe and robust,” says Pascale Fung, an AI researcher at the Hong Kong University of Science and Technology. “For the most part, for the past three decades that I was in AI, people didn’t really care.”

Concerns have grown as AI has become widely used. For example, in the mid-2010s, some Web search and social media companies started inserting AI algorithms into their products. They found they could create algorithms to predict which users were more likely to click on which ads and thereby increase their profits. Advances in computing had made all this possible through dramatic improvements in “training” these algorithms—making them learn from examples to achieve high performance. But as AI crept steadily into search engines and other applications, observers began to notice problems and raise questions. In 2016 investigative journalists raised claims that certain algorithms used in parole assessment were racially biased.

That report’s conclusions have been challenged, but designing AI that is fair and unbiased is now considered a central problem by AI researchers. Concerns arise whenever AI is deployed to make predictions about people from different demographics. Fairness has now become even more of a focus as AI is embedded in ever more decision-making processes, such as screening resumes for a job or evaluating tenant applications for an apartment.

In the past few years, the use of AI in social media apps has become another concern. Many of these apps use AI algorithms called recommendation engines, which work in a similar way to ad-serving algorithms, to decide what content to show to users. Hundreds of families are currently suing social media companies over allegations that algorithmically driven apps are directing toxic content to children and causing mental health problems. Seattle Public Schools recently filed a lawsuit alleging that social media products are addictive and exploitative. But untangling an algorithm’s true impact is no easy matter. Social media platforms release few data on user activity, which are needed for independent researchers to make assessments. “One of the complicated things about all technologies is that there’s always costs and benefits,” says Stray, whose research focuses on recommender systems. “We’re now in a situation where it’s hard to know what the actual bad effects are.”

The nature of the problems with AI is also changing. The past two years have seen the release of multiple “generative AI” products that can produce text and images of remarkable quality. A growing number of AI researchers now believe that powerful future AI systems could build on these achievements and one day pose global, catastrophic dangers that could make current problems pale in comparison.

What form might such future threats take? In a paper posted on the preprint repository arXiv.org in October, researchers at DeepMind (a subsidiary of Google’s parent company Alphabet) describe one catastrophic scenario. They imagine engineers developing a code-generating AI based on existing scientific principles and tasked with getting human coders to adopt its submissions to their coding projects. The idea is that as the AI makes more and more submissions, and some are rejected, human feedback will help it learn to code better. But the researchers suggest that this AI, with its sole directive of getting its code adopted, might potentially develop a tragically unsound strategy, such as achieving world domination and forcing its code to be adopted—at the cost of upending human civilization.

Some scientists argue that research on existing problems, which are already concrete and numerous, should be prioritized over work involving hypothetical future disasters. “I think we have much worse problems going on today,” says Cynthia Rudin, a computer scientist and AI researcher at Duke University. Strengthening that case is the fact that AI has yet to directly cause any large-scale catastrophes—although there have been a few contested instances in which the technology did not need to reach futuristic capability levels in order to be dangerous. For example, the nonprofit human rights organization Amnesty International alleged in a report published last September that algorithms developed by Facebook’s parent company Meta “substantially contributed to adverse human rights impacts” on the Rohingya people, a minority Muslim group, in Myanmar by amplifying content that incited violence. Meta responded to Scientific American’s request for comment by pointing to a previous statement to Time magazine from Meta’s Asia-Pacific director of public policy Rafael Frankel, who acknowledged that Myanmar’s military committed crimes against the Rohingya and stated that Meta is currently participating in intergovernmental investigative efforts led by the United Nations and other organizations.

Other researchers say preventing a powerful future AI system from causing a global catastrophe is already a major concern. “For me, that’s the primary problem we need to solve,” says Jan Leike, an AI researcher at the company OpenAI. Although these hazards are so far entirely conjectural, they are undoubtedly driving a growing community of researchers to study various harm-reduction tactics.

In one approach called value alignment, pioneered by AI scientist Stuart Russell at the University of California, Berkeley, researchers seek ways to train an AI system to learn human values and act in accordance with them. One of the advantages of this approach is that it could be developed now and applied to future systems before they present catastrophic hazards. Critics say value alignment focuses too narrowly on human values when there are many other requirements for making AI safe. For example, just as with humans, a foundation of verified, factual knowledge is essential for AI systems to make good decisions. “The issue is not that AI’s got the wrong values,” says Oren Etzioni, a researcher at the Allen Institute for AI. “The truth is that our actual choices are functions of both our values and our knowledge.” With these criticisms in mind, other researchers are working to develop a more general theory of AI alignment that works to ensure the safety of future systems without focusing as narrowly on human values.

Some scientists are taking approaches to AI alignment that they see as more practical and connected with the present. Consider recent advances in text-generating technology: the leading examples, such as DeepMind’s Chinchilla, Google Research’s PaLM, Meta AI’s OPT and OpenAI’s ChatGPT, can all produce content that is racially biased, illicit or deceptive—a challenge that each of these companies acknowledges. Some of these companies, including OpenAI and DeepMind, consider such problems to be ones of inadequate alignment. They are now working to improve alignment in text-generating AI and hope this will offer insights into aligning future systems.

Researchers acknowledge that a general theory of AI alignment remains absent. “We don’t really have an answer for how we align systems that are much smarter than humans,” Leike says. But whether the worst problems of AI are in the past, present or future, at least the biggest roadblock to solving them is no longer a lack of trying.

Rights & Permissions

ABOUT THE AUTHOR(S)

Mordechai Rorvig is a freelance science writer whose work has appeared at outlets that include Astronomy, Inverse, Nautilus, New Scientist, Physics, Quanta, Symmetry, Vice and Wired. His independent journalism on computer science and AI is currently supported by a grant from Open Philanthropy.

ABOUT THE AUTHOR(S)

Newsletter

Get smart. Sign up for our email newsletter.