In an article last year titled “The False Promise of ChatGPT,” which has given rise to recent memes on social media, the famed linguist Noam Chomsky and his two co-authors start by declaring that “OpenAI’s ChatGPT, Google’s Bard and Microsoft’s Sydney are marvels of machine learning.”
But they go on to argue that
However useful these programs may be in some narrow domains (they can be helpful in computer programming, for example, or in suggesting rhymes for light verse), we know from the science of linguistics and the philosophy of knowledge that they differ profoundly from how humans reason and use language. These differences place significant limitations on what these programs can do, encoding them with ineradicable defects.
Chomsky and his co-authors are partially right. But when they say that “Indeed, such programs are stuck in a prehuman or nonhuman phase of cognitive evolution,” they are, in fact, overestimating the abilities of these programs: they are not in any phase of “cognitive evolution,” because they are not cognizing anything at all.
To understand why, first one must look at John Searle’s Chinese room argument—a famous thought experiment in philosophy in which Searle demonstrated the lack of “intelligence” in “artificial intelligence”—and update it to reflect the way large language models (LLMs) actually work.
Imagine you have been hired by the “Hungarian Vectorization Lab,” despite not knowing a single word of Hungarian. (They assure you it is completely unimportant that you don’t understand the language at all.) Your job is to produce numerical tables that, given a stretch of Hungarian text, allow the user of the tables to predict the next word. (This is like a predictive LLM, but generative LLMs work largely the same way.) You are given a very large number of Hungarian texts as your starting point. You take each text and “vectorize” each word (meaning turn it into a sequence of numbers). You have a little trick you employ while doing this, which you call “attention,” that allows you to take the entire textual context of the word into account in creating these tables . . . but note that this is, again, a purely mathematical manipulation of the text that can be carried out without any idea of what the text is about.
At the end of this process, given the Hungarian sentence equivalent to “At the cookout, he put the dogs on the . . . ” you are able to use your tables to predict that the next word is “grill,” while after “At the fox hunt, he put the dogs on the . . . ” you predict “scent.”
Would you say that this means you now understand Hungarian? I think it is clear that you don’t. If you had access to your tables, and a conversation partner was patient with you while you used them, you might even be able to “hold a conversation” in Hungarian. But at no point would you have any idea of what you or your partner were saying: while your responses seem “appropriate” to your partner, as far as you know, you could be agreeing to have sex with the person, or planning a terrorist attack, or discussing the weather.
The above is an abbreviated but accurate picture of what goes on inside a large language model. First, words from a vast amount of “input” are turned into vectors, or long lists of numbers, based upon how likely certain words are to appear together in the input. So, if “apple” and “pie” often appear near each other in the input, their vectors will point in the same general direction. “Tungsten” and “circular saw” would show some similar relationship, while “tungsten” and “apple” or “circular saw” and “pie” would not. As mentioned above, a key technological breakthrough for the models was the concept called “attention,” which made it possible to take into account the entire context in which a word appears to determine which words are most likely to appear after it. (In a predictive model, like the one used to suggest the next word in an email you are writing, the model offers you that word as a choice. In a generative model, it simply puts that word next in its response to some query.)
Or, if the model were dealing with images, the result of processing many, many images would reflect the high probability that legs grow downward out of torsos rather than upward out of the top of a head.
This development may at first seem like a step toward intelligence. But note that the computer never actually sees any image—the image had to be converted to numbers before the program could process it—and has no actual “idea” of what torsos or legs are. Instead of a monitor, the output of an image generation program was hooked up to a synthesizer, it would “happily” play music instead of creating images. It is only we, the intelligent interpreters of the computer’s output, who decide whether it should represent an image, or a musical composition, or a response to a missile attack.
The mechanistic nature of what happens in these models is only obscured when AI enthusiasts describe it in terms that imply that genuine thought is occurring. For instance, when an LLM is being developed, what is going on is often described as “machine learning,” and the process is called “training,” both terms that suggest that a conscious but immature entity is being gradually educated. However, it is much more accurate, in software engineering terms, to describe what is happening as running a program to search a “parameter space” to find the best set of parameters to handle the task at hand. This is an entirely deterministic process, and the more accurate description does not fatuously suggest that the computer has, at the end of the process, “learned” something. No, the program has found a set of parameters that work adequately to achieve the programmer’s goal for the program.
Similarly, when these probability engines produce a very ridiculous answer, it is sometimes described as an “hallucination.” But LLMs do not “hallucinate.” They work entirely on probabilities. The fact that 95 percent of the time X would be the best word to follow Y means that 5 percent of the time it won’t be, and within some subset of that 5 percent, it will be crazily wrong, like when Google AI recommended using glue to prevent cheese from slipping off of pizza.
Intelligence can be defined as the use of reason to achieve some aim. (I use AI voice recognition all the time when composing a piece of writing. In this case, the AI decided that the previous sentence was “Intelligence can be defined as the use of raisin to achieve some aim.” That is supposed to represent some level of understanding of English?) Even in the least practically oriented use of intelligence, there is still some aim: to understand the nature of the prime numbers, or to contemplate why the universe exists.
But ChatGPT and its ilk have no aims. (Apple’s AI here read what I said as “But chat, GPT and they’re El Cavo aims.”) The only entity with an aim involved in the process is the human mind guiding the AI machine.
Some materialists will argue that, since thought is only a product of the brain, and the brain “must” be some sort of computer, when we think we “understand” language, we must actually be carrying out some computation like this. But this argument is a genuine example of begging the question: why should we accept either of those premises?
In fact, many materialists go further and argue that consciousness itself is an illusion. But this is nonsense: Only conscious beings have illusions. Rocks, pieces of string, and road signs do not. So, if we have illusions, that proves we are conscious: consciousness itself cannot be an illusion. They may argue that the scientific facts show that it is, but these facts are only known to us through our consciousness. So if consciousness is an illusion, then all of the “facts” upon which our scientific theories are also illusions. The whole effort is self-refuting through and through.
LLMs are a great technical achievement. The techniques employed in them to simulate human language processing are extremely clever. I do not wish to downplay the engineering brilliance that went into creating them.
But that is where 100 percent of the brilliance lies: with the engineers who created these models. The models themselves are mere machines built to carry out the wishes of the engineers. The fact that with a complex LLM the engineers themselves do not understand exactly how the model does so does not change that reality one bit.
Ironically, since many AI enthusiasts consider themselves to be hard-headed scientific rationalists, the entire AI enterprise—not the genuine technical advances, but the snark hunt for machine intelligence—depends upon a belief in magic, much like alchemy. If practitioners can just employ the right magic words (programs) upon a material object (a box of electronic circuits), they can transform a base metal (that box) into gold (true intelligence). Unfortunately for the new believers in alchemy, their hope is as absurd now as such magical thinking has always been.