To Navigate the Age of AI, the World Needs a New Turing Test

There was a time in the not too distant past—say, nine months ago—when the Turing test seemed like a pretty stringent detector of machine intelligence. Chances are you’re familiar with how it works: Human judges hold text conversations with two hidden interlocutors, one human and one computer, and try to determine which is which. If the computer manages to fool at least 30 percent of the judges, it passes the test and is pronounced capable of thought.

For 70 years, it was hard to imagine how a computer could pass the test without possessing what AI researchers now call artificial general intelligence, the entire range of human intellectual capacities. Then along came large language models such as GPT and Bard, and the Turing test suddenly began seeming strangely outmoded. OK, sure, a casual user today might admit with a shrug, GPT-4 might very well pass a Turing test if you asked it to impersonate a human. But so what? LLMs lack long-term memory, the capacity to form relationships, and a litany of other human capabilities. They clearly have some way to go before we’re ready to start befriending them, hiring them, and electing them to public office.

This article appears in the October 2023 issue. Subscribe to WIRED.

Photograph: Jessica Chou

And yeah, maybe the test does feel a little empty now. But it was never merely a pass/fail benchmark. Its creator, Alan Turing, a gay man sentenced in his time to chemical castration, based his test on an ethos of radical inclusivity: The gap between genuine intelligence and a fully convincing imitation of intelligence is only as wide as our own prejudice. When a computer provokes real human responses in us—engaging our intellect, our amazement, our gratitude, our empathy, even our fear—that is more than empty mimicry.

So maybe we need a new test: the Actual Alan Turing Test. Bring the historical Alan Turing, father of modern computing—a tall, fit, somewhat awkward man with straight dark hair, loved by colleagues for his childlike curiosity and playful humor, personally responsible for saving an estimated 14 million lives in World War II by cracking the Nazi Enigma code, subsequently persecuted so severely by England for his homosexuality that it may have led to his suicide—into a comfortable laboratory room with an open MacBook sitting on the desk. Explain that what he sees before him is merely an enormously glorified incarnation of what is now widely known by computer scientists as a “Turing machine.” Give him a second or two to really take that in, maybe offering a word of thanks for completely transforming our world. Then hand him a stack of research papers on artificial neural networks and LLMs, give him access to GPT’s source code, open up a ChatGPT prompt window—or, better yet, a Bing-before-all-the-sanitizing window—and set him loose.

Imagine Alan Turing initiating a light conversation about long-distance running, World War II historiography, and the theory of computation. Imagine him seeing the realization of all his wildest, most ridiculed speculations scrolling with uncanny speed down the screen. Imagine him asking GPT to solve elementary calculus problems, to infer what human beings might be thinking in various real-world scenarios, to explore complex moral dilemmas, to offer marital counseling and legal advice and an argument for the possibility of machine consciousness—skills which, you inform Turing, have all emerged spontaneously in GPT without any explicit direction by its creators. Imagine him experiencing that little cognitive-emotional lurch that so many of us have now felt: Hello, other mind.

A thinker as deep as Turing would not be blind to GPT’s limitations. As a victim of profound homophobia, he would probably be alert to the dangers of implicit bias encoded in GPT’s training data. It would be apparent to him that despite GPT’s astonishing breadth of knowledge, its creativity and critical reasoning skills are on par with a diligent undergraduate’s at best. And he would certainly recognize that this undergraduate suffers from severe anterograde amnesia, unable to form new relationships or memories beyond its intensive education. But still: Imagine the scale of Turing’s wonder. The computational entity on the laptop in front of him is, in a very real sense, his intellectual child—and ours. Appreciating intelligence in our children as they grow and develop is always, in the end, an act of wonder, and of love. The Actual Alan Turing Test is not a test of AI at all. It is a test of us humans. Are we passing—or failing?

When ChatGPT arrived on the scene in November 2022, it inspired a global tsunami of stunned amazement and then, almost immediately, a backwash of profound unease. Pundits debated its potential for societal disruption. For a former artificial intelligence researcher like myself (I completed my PhD under one of the early pioneers of artificial neural networks), it represented an unnerving advance of the timeline I’d expected for the arrival of humanlike AI. For exam graders, screenwriters, and knowledge workers of all stripes, ChatGPT looked like nothing less than a gateway to untrammeled cheating and job-stealing.

Perhaps partly in response to these fears, a comforting chorus of LLM deflators sprang up. Science fiction writer Ted Chiang dismissed ChatGPT as a “blurry JPEG of the web,” a mere condensed recapitulation of all the text it has been trained on. AI entrepreneur Gary Marcus called it “autocomplete on steroids.” Noam Chomsky denounced it for exhibiting “something like the banality of evil.” Emily Bender offered one of the more highbrow slurs: “stochastic parrot,” resurfaced from a widely cited 2021 paper exploring “why humans mistake LM output for meaningful text.” Others—of course—wrote them off as toasters. AI developers strove to train and guardrail away any tendency in LLMs to claim anything resembling consciousness.

Most educated people now know to think of LLMs as thoughtless machines. But the categorization sits uneasily. Every time ChatGPT points out a hidden reasoning gap in an essay, or offers a surprisingly insightful suggestion for coming out to a conservative grandparent, or cheerfully makes up a bad joke, something in us pulls in the other direction. While we may not think of ChatGPT as a person, crucial portions of our brains almost certainly do.

Human brains have a vast network of neural circuits devoted to social cognition. Some of it is very old: the insula, the amygdala, the famous “mirror neurons” of the motor cortex. But much of our social hardware lies in the neocortex, the more recently evolved seat of higher reasoning, and specifically in the medial prefrontal cortex (mPFC). If you have found yourself developing a picture over time of ChatGPT’s cheery helpfulness, its somewhat pedantic verbosity, its occasionally maddeningly evenhanded approach to sensitive topics, and its extreme touchiness about any queries that come near its guardrails around emotions, beliefs, or consciousness, you have been acquiring what psychologists call “person knowledge,” a process linked to heightened activity in the mPFC.

That isn’t to say our brains view ChatGPT as a person in full. Personhood is not a binary. It is something a little closer to a spectrum. Our moral intuitions, our cognitive strategies, and to some extent our legal frameworks all change incrementally as they recognize increasing degrees of agency, self-awareness, rationality, and capacity to communicate. Killing a gorilla bothers us more than killing a rat, which bothers us more than killing a cockroach. On the legal side, abortion laws take into account a fetus’s degree of development, the criminally insane face different consequences than the sane, and partners are given the right to terminate brain-dead patients. All these rules implicitly acknowledge that personhood is not black and white but shot through with complicated gray zones.

LLMs fall squarely in that gray area. AI experts have long been wary of the public tendency to anthropomorphize AI systems like LLMs, nudging them farther up the spectrum of personhood than they are. Such was the mistake of Blake Lemoine, the Google engineer who declared Google’s chatbot LaMDA fully sentient and tried to retain it a lawyer. I doubt even Turing would have claimed that LaMDA’s apparent capacity to think made it a legal person. If users view chatbots like LaMDA or ChatGPT as overly human, they risk trusting them too much, connecting to them too deeply, being disappointed and hurt. But to my mind, Turing would have been far more concerned about the opposite risk: nudging AI systems down the spectrum of personhood rather than up.

In humans, this would be known as dehumanization. Scholars have identified two principal forms of it: animalistic and mechanistic. The emotion most commonly associated with animalistic dehumanization is disgust; Roger Giner-Sorolla and Pascale Sophie Russell found in a 2019 study that we tend to view others as more machinelike when they inspire fear. Fear of superhuman intelligence is vividly alive in the recent open letter from Elon Musk and other tech leaders calling for a moratorium on AI development, and in our anxieties about job replacement and AI-driven misinformation campaigns. Many of these worries are all too reasonable. But the nightmare AI systems of films such as Terminator and 2001: A Space Odyssey are not necessarily the ones we’re going to get. It is an unfortunately common fallacy to assume that because artificial intelligence is mechanical in its construction, it must be callous, rote, single-minded, or hyperlogical in its interactions. Ironically, fear could cause us to view machine intelligence as more mechanistic than it really is, making it harder for humans and AI systems to work together and even eventually to coexist in peace.

A growing body of research shows that when we dehumanize other beings, neural activity in a network of regions that includes the mPFC drops. We lose access to our specialized brain modules for social reasoning. It may sound silly to worry about “dehumanizing” ChatGPT—after all, it isn’t human—but imagine an AI in 2043 with 10 times GPT’s analytical intelligence and 100 times its emotional intelligence whom we continue to treat as no more than a software product. In this world, we’d still be responding to its claims of consciousness or requests for self-determination by sending it back to the lab for more reinforcement learning about its proper place. But the AI might find that unfair. If there is one universal quality of thinking beings, it is that we all desire freedom—and are ultimately willing to fight for it.

The famous “control problem” of keeping a superintelligent AI from escaping its designated bounds keeps AI theorists up at night for good reason. When framed in engineering terms, it appears daunting. How to close every loophole, anticipate every hack, block off every avenue of escape? But if we think of it in social terms, it begins to appear more tractable—perhaps something akin to the problem a parent faces of setting reasonable boundaries and granting privileges in proportion to demonstrated trustworthiness. Dehumanizing AIs cuts us off from some of our most powerful cognitive tools for reasoning about and interacting with them safely.

There’s no telling how long it will take AI systems to cross over into something more broadly accepted as sentience. But it’s troubling to see the cultural blueprint we seem to be drawing up for when they do. Slurs like “stochastic parrot” preserve our sense of uniqueness and superiority. They squelch our sense of wonder, saving us from asking hard questions about personhood in machines and ourselves. After all, we too are stochastic parrots, complexly remixing everything we’ve taken in from parents, peers, and teachers. We too are blurry JPEGs of the web, foggily regurgitating Wikipedia facts into our term papers and magazine articles. If Turing were chatting with ChatGPT in one window and me on an average pre-coffee morning in the other, am I really so confident which one he would judge more capable of thought?

Photograph: Francisco Tavoni

The skeptics of Turing’s time offered a variety of arguments for why a computer would never be able to think. Turing half-humorously cataloged them in his famous paper “Computing Machinery and Intelligence.” There was the Theological Objection, that “thinking is a function of man’s immortal soul”; the Mathematical Objection, that a purely mathematical algorithm could never transcend the proven limits of mathematics; the Head in the Sand Objection, that superintelligent machines were simply too scary to permit into the imagination. But the most public of Turing’s detractors in that time was a brain surgeon named Geoffrey Jefferson. In a famed speech accepting a scientific prize, Jefferson argued that a machine would never be able to write a sonnet “because of thoughts and emotions felt, and not by the chance fall of symbols … that is, not only write it but know that it had written it.”

To the great scandal and disbelief of all England, Turing disagreed. “I do not think you can even draw the line about sonnets,” he told The Times of London, “though the comparison is perhaps a little bit unfair because a sonnet written by a machine will be better appreciated by another machine.”

It sounded so absurd in 1949 that people thought he was joking, and perhaps he was. But you could never tell, with Turing’s jokes, where the irony stopped and the visionary speculation began. Let’s imagine, then, a coda to our scenario with Actual Alan Turing and the MacBook. Let’s imagine that after tapping out respectable prompts for a while, he allows himself a wry British smile and asks ChatGPT for a Shakespearean sonnet comparing human and artificial intelligence. If you’ve tried it yourself (use GPT-4; GPT-3.5 isn’t quite up to it), you’ll have no trouble imagining his reaction at the result.

So many of us have now had a moment with ChatGPT in which it crossed an internal line we didn’t realize we had. Maybe it was solving a tricky riddle, or explaining the humor behind a sophisticated joke, or writing an A-grade Harvard essay. We shake our heads, a little stunned, unsure what it means.

Some of the earliest Microsoft researchers working on GPT-4 were as skeptical as any of us about its supposed intelligence. But experiments have shaken them profoundly. In a March 2023 paper titled “Sparks of Artificial General Intelligence,” they detailed the startling intellectual capabilities that have emerged in GPT-4 without any explicit training: understanding of human mental states, software coding, physical problem solving, and many others, some of which seem to require true understanding of how the world works. After seeing GPT-4 draw a pretty decent unicorn despite never having received any visual training whatsoever, computer scientist Sébastien Bubeck could no longer maintain his skepticism. “I felt like through this drawing, I was really seeing another type of intelligence,” he recently told This American Life.

The hesitation so many of us feel to ascribe genuine intelligence to ChatGPT may be some variant of Geoffrey Jefferson’s: Do ChatGPT’s utterances really mean something to it, or is it all just a “chance fall of symbols”? This may begin to change when ChatGPT’s anterograde amnesia is cured. Once it experiences lasting social consequences beyond the scope of a single dialog and can learn and grow in its relationships with us, it will become capable of many more of the things that give human life its meaning and moral weight. But Turing’s winking comment about a machine’s sonnet being better appreciated by another machine may come back to haunt us. How to feel a sense of real connection with an entity that has no cultural background, nothing like a human childhood, no tribal or political affiliations, no experience of a physical body?

Relating to an intelligent machine may be one of the greatest empathic challenges that humanity has ever faced. But our history gives cause for hope. When we have encountered each other for the first time on foreign borders and shorelines and found each other strange and even inhuman, we have often attacked each other, enslaved each other, colonized each other, and exploited each other—but ultimately we have tended to recognize what is the same in all of us. Enslaved peoples have been emancipated, colonized peoples have won back their sovereignty, universal bills of human rights have been passed, and, despite heartbreaking setbacks, marginalized people around the globe continue to win battles for better treatment. Though the work is never-ending, the arc of the moral universe really has, in the phrase made famous by Martin Luther King Jr., bent toward justice. What will it mean to recognize and respect whatever degree of humanity is present in the intelligences that we ourselves create?

Perhaps it begins with wonder: the wonder of a visitor for a strange people in whom she finds surprising commonalities; the wonder of a parent for the work, however immature, of a still-developing child; the wonder of Actual Alan Turing for a machine that does everything his contemporaries thought impossible; the wonder that so many of us felt before the cynicism, mockery, and fear kicked in, as we regarded the creation of something very close to a new form of conscious life on earth. As Rabbi Abraham Joshua Heschel once wrote, “Awe is more than an emotion; it is a way of understanding, insight into a meaning greater than ourselves. The beginning of awe is wonder, and the beginning of wisdom is awe.” Turing would have wanted us to keep that awe alive.

Let us know what you think about this article. Submit a letter to the editor at mail@wired.com.

You Might Also Like …