I’ve been resisting writing about ChatGPT, partly because it seems like every possible take has already been written, and partly because I find the subject matter deeply irritating. But over the past weeks, I’ve kept writing speeches in my head about why I find it so irritating and why I hate the way LLM (Large Language Model) tools are discussed, and I figured this newsletter was the best way to exorcise those speeches from my brain. I promise that this newsletter will not include any screenshots whatsoever of “conversations” with AI models; I find those deeply boring to read. Since I work in tech and particularly for a language and writing-oriented company, this needs an extra “views are my own” disclaimer.
I’ll start here, with a kind of clumsy analogy:
One very low-tech way of fortune-telling is bibliomancy: formulate a question, shut your eyes, then open a book at random (ideally something large and full of pronouncements, like the Bible) and put your finger down on the page. Then, use your imagination to interpret the passage your finger lands on as an answer to your question. No tech need be involved, and no intelligence other than your own; still, the results will be interesting often enough that if you’re of a certain cast of mind, you might imagine that some greater metaphysical intelligence is guiding your hand. This is also how tarot cards work: by imagining a separate mind that has secret knowledge of the universe and is trying to communicate with you, you wind up communicating something interesting to yourself.
Now, suppose you are an app developer, and you want to make a bibliomancy app that will obviate the need to own any books. Load the entire text of the Bible into a database, supply a screen where the user can enter their question and tap a button, and use a random number generator to choose a passage when requested. You can do the physical book one better, though: skew the selection process so that it’s more likely to select passages that contain some of the words from the question. Questions containing the word “love” become more likely to get passages with the word “love” in them, and so on. Maybe not all the time, though, or you give the game away.
Then there are many more improvements you can make so that the answers correspond more closely to the questions. Expand beyond the Bible and add lots of other books (public domain, of course) into your database, and run the numbers to determine which words are likely to appear close to which other words, so a question with the word “love” might get passages containing “marriage” or “girlfriend” or “desire” or whatever. Let users of the app rate whether their answer was applicable to their question or not (or pay some people to do it manually), so that, depending on the key words in the question, you can build information about which passages are more likely to be relevant, and then surface those passages more often. Maybe at some point, you can drop the pretense that the passages are coming from books written by other humans; just break the passages down into their component parts and mix-and-match the best ones, and the result will be relevant enough of the time that people will have much less trouble imagining that they are communicating with a real intelligence, something that understands their question and is trying to supply the best answer.
ChatGPT and other LLMs are, of course, doing something more sophisticated than this. But what they are building is much, much closer to my imaginary bibliomancy app than a human mind — and I don’t mean “closer” like the way San Francisco is closer to LA than it is to Seattle. I mean closer like how San Francisco is closer to LA than to Mars.
When I was a baby computer science major in the aughts, I remember taking courses about Artificial Intelligence, and I remember that despite the exciting name, these courses were usually more boring to me than the others. Algorithms were interesting, systems were interesting, linear algebra was interesting (in that “is this the face of God?” way that young sensitive mathy people are susceptible to), but the AI courses seemed to boil down to:
Find a large dataset, and split it into training vs. test data.
Run some not-very-exciting math on your training dataset to transform it into statistics
See how well those statistics also match your “test” data. If they match your test data, great! If they don’t match, fiddle with your parameters a bit.
Repeat steps 2 and 3 until you’re happy.
I remember that our exercise was about classifying images of single digits: writing a program that could identify whether a black-and-white picture of a number contained a two, or a four, or whatever. In order to do this, you as the programmer didn’t need to know what a number was, or why identifying a picture of a number might be meaningful. You might as well have been classifying the symbols from Microsoft Wingdings; the process was the same. Also, the computer didn’t need to know what a number was. You didn’t need to try to encode anywhere in your program that a 1 was a single stroke, or that a 4 had a triangle in it. It just needed to be able to calculate, based on a matrix of pixels, which group of ten categories of images was most similar to the image you fed into it.
A program like this can get very, very good: you can train it on thousands, millions of pictures of numbers, and eventually it will be able to identify a 2 even if it’s written in shaky cursive by a four-year-old with poor motor skills, who has also drawn a stick figure in the corner. But no matter how well you train it and how many pictures of numbers you feed in, it will never deduce that 2 is twice as big as 1, or that 2 + 2 equals 4. Even if you feed it pictures where someone has written “2 + 2 = 4”: It might encode, statistically, that this visual pattern appears a lot: 2 + 2 = 4. I could give it pictures of lots of other equations — millions of them, and it might identify that 2 + 2 = 4 is a more common arrangement of pixels than 2 + 2 = 5. But it would never learn to add1. To the algorithm, they’re just matrices of pixels, just like a photo of Gwyneth Paltrow or a Courbet is a matrix of pixels.
I remember the professor talked a bit about Natural Language Processing (the field of writing programs that work with text written in human languages). He said that in the early days of the field, people tried to take their knowledge of English, of the parts of speech and their grammar and how they fit together, and encode them into their programs, with disappointing results. Then other researchers took a different approach and decided to pay a bunch of linguistics grad students to painstakingly parse sentences from newspapers and other sources, enter the sentences plus the parsings into a database, and from that data calculate patterns of statistically likely sequences. Then the sentence-parsing algorithm didn’t have to involve looking up words in a dictionary, or following a grammar decision tree: it just needed to pattern-match a sentence against the patterns calculated from the sentences in its database. This approach led to results that were so dramatically better than the previous attempts that the “symbolic”, rules-based approaches became obsolete.
He said that, to some researchers, this result was a disappointment: it was sidestepping the entire question of how to model language in a way a machine could process — and perhaps, to the extent that was possible, “understand” — in favor of something that seemed like cheating, the simple exercise of brute force. The algorithm didn’t have to know what a verb was to identify it, didn’t have to try to determine the meaning of a word from the web of meanings around it; the only way it would correctly identify the verb in the phrase “time flies like an arrow” is knowing that out of 1000 occurrences of the word pair “time flies”, 993 of them had been tagged by a human as subject-verb and only 7 of them as verb-subject, the latter in articles about measuring the airborne velocity of fruit flies as compared to arrows.
Statistics-based and pattern-finding techniques have continued to dominate and gotten much more sophisticated than when I was in school, thanks to the availability of cheaper processing power and data storage. The storage and the processing is so cheap, and the results so effective, that they come to seem like magic (or “intelligence,” which in this context is close to the same thing). Part of what seems to make them magical is that they give us the illusion that the program can handle ambiguity: its data contains both “time flies like an arrow” (metaphor about the human experience) and “time flies like an arrow” (instruction on the experimental study of fruit flies) and, given a rich enough data set and the right set of circumstances, might surface one or the other as required. Also, they obfuscate the human work that goes into them. The people who programmed that old sentence-parsing algorithm didn’t have to know anything about language, and so they can say that the program “machine-learned” how to parse sentences all by itself by “reading” the data — but really, the knowledge of how to parse sentences came from those grad students in linguistics, who weren’t “teaching” the computer to parse but rather making sure they were supplied the answer keys to its exams. And the computer still didn’t know what a verb is, or a fly, or why the passage of time is “like an arrow” — the words themselves didn’t matter, they were a bunch of symbols in a database, the same way a picture of the number 2 is a matrix of pixels and not a conceptual entity.
We, the consumers of ChatGPT, have the illusion of interacting with something that possesses vast knowledge and can deploy it in uncannily appropriate ways. This illusion is enabled by an enormous amount of human labor — first, writing the vast reams of text that supply its training data (we all did this, unknowingly, for free), then tidying and pruning and standardizing it, then painstakingly guiding the LLM away from bad outputs and towards good ones. But because it mixes and matches vast quantities of text written by other humans — as does my pretend bibliomancy app above — the sentences produced by ChatGPT appear to be an original product.2
This is usually where someone comes in with an extremely glib argument, going something like “but aren’t humans just pattern-matching based on previous experience” or “aren’t we just repeating what other people say most of the time” or “isn’t our sensory experience just like training data”, and so on, arguing that in the end, human minds are just big computers. The unstated implication is that, if a human mind is no different from a big computer, then our big computers might themselves become like human minds. From this flow all the speculations this argument is designed to encourage: superhuman robots, the potential moral status of AI systems, and so on.
I am not an expert in cognitive science (or in machine learning), but I will say this: perhaps it’s true that my brain is just an input-output machine, but so is a pocket calculator. I do not have any obligation to entertain the potential ensoulment of a pocket calculator, nor do I entertain the potential ensoulment of the number-classification program I wrote in my Artificial Intelligence course in University, or of ChatGPT.
Human minds possess qualities that LLMs do not, and cannot have: a will and the agency to enact it, bodily sensation, emotions, reason, instincts, drives, values, and a point of view3. The biggest AI boosters/doomers seem to be arguing that, just like animal and human sentience evolved as if by magic from simpler organisms, that if we keep feeding more data into our big models and keep the reinforcement loops going, that human-like intelligence will emerge in the same way. Perhaps scientists in the future will be able to make something with consciousness (I imagine this will look more like building a synthetic fruit fly than a superintelligent robot). But I staunchly believe that “general intelligence” will not emerge from ChatGPT and its big-data cousins, any more than my image-recognizing class project could have learned to add, because while very good pattern-matching might give the illusion of reasoning from experience, it is not the same thing. A human child does not need to see ten thousand renderings of the number 2 in all positions, colors, and sizes to be able to identify a 2 on a kindergarten math worksheet. A human child can look at a cartoon illustration of a giraffe in a picture book and, from that single cartoon, recognize a real giraffe when it sees one at the zoo. All of us learned to speak and read from a corpus of written and spoken language that is a minuscule fraction of the size of ChatGPT’s dataset. Making ChatGPT bigger won’t bridge this gap — we don’t really understand what’s happening in the brain of a child, but I think it must be fundamentally different in nature from the machine learning algorithms we know and use now.4
So: ChatGPT is a machine optimized for generating verbiage that plausibly mimics human writing. It’s very impressive at what it does, and it’s an extraordinary achievement! Notably, it is trivially easy for it to generate verbiage about feelings and the experience of consciousness (plenty of that in the training data; humans produce it in vast quantities)! But let’s be very clear about what it can and cannot do.
What it can do: generate verbiage that mimics human writing as closely as possible. By design, it will choose “sounding right” over “being right”5, and it will produce the most generic possible result within the parameters of the request. It’s great at coming up with analogies and metaphors; this is admittedly pretty impressive.
What it can’t do (even if it sometimes appears to): evaluate the truth of factual claims, weigh the merits of opposing arguments, predict the future, draw new conclusions from a combination of principles and experience, hold a consistent point of view.
I’m even a bit skeptical of declaring that ChatGPT has passed the Turing test. Its outputs are very convincing on first glance, but anyone who spends a few hours with it (and hangs on to their skepticism) will quickly learn its shortcomings. And, as I mentioned the last time I wrote about AI, Alan Turing set a higher bar than is commonly supposed. Here’s the example he wrote of a potential test-passing exchange:
Interrogator In the first line of your sonnet which reads 'Shall I compare thee to a summer's day', would not 'a spring day' do as well or better?
Computer It wouldn't scan.
Interrogator How about 'a winter's day'? That would scan all right.
Computer Yes, but nobody wants to be compared to a winter's day.
Interrogator Would you say Mr. Pickwick reminded you of Christmas?
Computer In a way.
Interrogator Yet Christmas is a winter's day, and I do not think Mr Pickwick would mind the comparison
Computer I don't think you're serious. By a winter's day one means a typical winter's day, rather than a special one like Christmas.
In this version, the Computer holds a point of view and argues against the Interrogator on the subject of poetry — on both formal and metaphorical grounds. It also speculates about the intent of the interrogator (“I don’t think you’re serious”), hedges its answers, and responds to challenges. I don’t think ChatGPT is anywhere close to being able to hold a conversation like this. They didn’t even bother to hard-code the above exchange into the system! That’s the first thing I would have done.
One final note: I used to talk with friends about how so many fictional treatments of AI really just pose the question: “what if there was a woman who existed only to serve you?” before complicating it with “what if she doesn’t do what you tell her?” I was thinking of stories like the ones told in Ex Machina, Her, Westworld, Blade Runner 2049, Wierd Science, M3GAN, etc. plus our real-life, female-coded “AI assistants” like Alexa and Siri.
I now think there are several other just-as-bad fantasies at work. Fantasies — and their nightmare counterparts, which originate from the same place — about AI seem to fall into a few broad categories:
That the powerful might one day have access to a completely subservient class of person,
That a superhuman class of person (either AI “persons” or AI-enhanced humans) will seize control of the world, demolishing the established order to pursue an “optimized” utopian ideal, and
That it’s possible for humans to achieve immortality by digitizing their consciousnesses.
It seems bad to me that so many people are enraptured by this constellation of ideas: fantasies of ultimate power, domination of others, superhumans, and optimized worlds. If you read much about Longtermism (the ideology motivating the authors of that AI letter from the Future of Life Institute) besides their PR materials, you are likely to find it unsavory. My concerns about AI aren’t that it will become so intelligent as to deliberately steer us into catastrophe, but more that it will act as an environmental pollutant, putting more synthetic “thinking” into the world: fake product reviews, fake political analysis, fake social media profiles, fake selfies, fake conversations — the kind of thing that makes our relationships with each other and our understanding of the world a little bit worse, with the negative effects accumulating over a long period of time. I don’t think this is inevitable, but I think it’s a more likely scenario than the apocalypse.
Okay. That was a long one. I hope never, ever to write about ChatGPT again.
I’ll caveat that of course computers can add, it’s built into their hardware. But my ML class project would not have deduced the principles of arithmetic from images of numbers.
Some more about how LLMs work here, in a piece by Lak Lakshmanan.
Yes, this is all philosophically fraught! But go with me here. Most of us organize our lives and build our civilizations as though all these were true.
It’s not actually choosing between these things; it doesn’t know what is true or false.
Super nice piece. As someone a little more familiar with information theory, I tend to think of the LLM content generators as "decoders". Where the input prompt is a series of oriented codewords that represent encoded(compressed) bits of prose, which after initial reconstruction is run through a higher level error correction system to match a likely output.
As far as the capability of a LLM, there is a weakly parallel problem in computer science, where there are more subroutines than there are names to call them. Plus we want natural language code words, which limits the code book, limiting the level of compression, and thus the overall capacity of a LLM.
I also liked your larger point that most likely these LLM will just add to the noise floor.
Thank you for taking the time to write this thoughtful take on LLM content generators.
Two AI specialists were on Sam Harris podcast the other week and as far as I can gather (lots went over my head), they both thought GI was extremely dumb and incapable; one of them thought it can harm us because it's trash, other that we have nothing to fear because it's trash. https://www.samharris.org/podcasts/making-sense-episodes/312-the-trouble-with-ai This completely contradicts everything that we're reading in the lay commentariat sphere or even specialist media commentariat, which sees great changes afoot due to AI like Chat and ChaiGPT.
I've noticed the first job ad the other day that has the requirement "know how to use ChatGPT and relies on it a lot". Some assistantship position for Misha Gloubermann.