When it comes to AI (and particularly the generative AI of the 21st century), there are two disclaimers that should precede any discussion about how this computer technology functions:
- Not even the creators and top researchers of generative AI fully understand how the technology does everything it does.
and
- The field is advancing so quickly and drastically that what was true about AI one minute is quite often untrue the next.
Now, that second disclaimer means that at some point the first disclaimer may be rendered false and the experts will have it all figured out—maybe even due to the assistance from AI itself.
But even without a full understanding today, there are numerous resources which attempt to explain the basics of how generative AI (like chatbots and LLMs) can take a user’s natural language input (i.e. prompt) and produce a natural language output as a response. However, I think these resources (articles, videos, etc.) mostly fall short of boiling this process down into simple terms that non-computer scientists can understand. Depending on the particular explanation and the analogies used to describe the process, a learner might have to become familiar with a host of technical terms and advanced computing concepts if they are to reach a meaningful level of understanding.
Interestingly, generative AI itself can do a remarkable job of explaining complicated concepts in plain language suited for whatever audience a user specifies:
Note that even at the high-school level, ChatGPT has already introduced the ideas of neural networks and LLMs. These are not simple ideas in computer science and understanding them requires the understanding of further underlying concepts. According to ChatGPT, understanding a neural network requires a basic knowledge of algorithms, machine learning, and mathematics like statistics and calculus.
One possible conclusion then is that a sufficient explanation of how generative AI works does not exist for the non-mathematician/computer scientist. It is just too complicated of a technology. Asking a question like “how does ChatGPT work?” isn’t all that different from asking how a human being is able to verbally answer a question. There is just so much to it, and one lengthy article or video or even video series isn’t enough to cover all the necessary ground.
My own incomplete understanding of AI has been largely shaped by three books: Brian Christian’s The Alignment Problem (2020), Joseph Weizenbaum’s Computer Power and Human Reason (1976) and Yuval Noah Harari’s Nexus (2024). Combined, these books provide a decent primer on the history of AI and what its capabilities and limitations are, but I would not say that they sufficiently explain exactly how it works.
One source that I think does do an admirable job of trying to explain ChatGPT’s inner-workings is “What Is ChatGPT Doing … and Why Does It Work?” by Stephen Wolfram. I came across this in early 2024, and while Wolfram explains much of the mathematics behind ChatGPT/LLMs without completely overwhelming the mathematically challenged reader, I find something even more intriguing about how he approaches the areas of ChatGPT’s functionality that experts still cannot explain.
Here are the key excerpts from Wolfram’s article:
“…when ChatGPT does something like write an essay what it’s essentially doing is just asking over and over again ‘given the text so far, what should the next word be?’—and each time adding a word.
One might think it should be the “highest-ranked” word (i.e. the one to which the highest “probability” was assigned). But this is where a bit of voodoo begins to creep in. Because for some reason—that maybe one day we’ll have a scientific-style understanding of—if we always pick the highest-ranked word, we’ll typically get a very “flat” essay, that never seems to “show any creativity” (and even sometimes repeats word for word). But if sometimes (at random) we pick lower-ranked words, we get a “more interesting” essay.”
As a basic predictor of the next most probable word, Wolfram gives an example of what text looks like from GPT-2 (the 2019 model):
The text quickly devolves into non-sensical rambling. However, when the randomness of the next word is increased slightly, the text becomes more coherent. Turn up the randomness (i.e. temperature) a little higher and the text appears coherent and even creative, but turn the randomness up too high and it will again become non-sensical.
The fact that Wolfram calls this randomness factor “voodoo” is striking—and not just for its cultural insensitivity. We expect more concrete answers and explanations from the experts. Wolfram does a fine job of breaking down what can be said of the operations of ChatGPT, but it’s what the experts cannot explain that interests me the most.
A March 2024 article from MIT Technology Review digs into the mystery of LLMs further. Here are some relevant quotations:
“a remarkable fact about deep learning, the fundamental technology behind today’s AI boom: for all its runaway success, nobody knows exactly how—or why—it works.”
Mikhail Belkin, a computer scientist at the University of California, San Diego: “But our theoretical analysis is so far off what these models can do. Like, why can they learn language? I think this is very mysterious.”
“rapid advances in deep learning over the last 10-plus years came more from trial and error than from understanding.”
“It works, which is amazing. Our minds are blown by how powerful these things are”… And yet for all their success, the recipes are more alchemy than chemistry: “We figured out certain incantations at midnight after mixing up some ingredients,”
The major unknowns and mysterious randomness in generative AI operations ought to change the way some people (mis)understand the technology. If LLMs and chatbots were simply operating on basic probability, then that would make them easier to understand, easier to explain, and consequently, easy to dismiss as fancy calculators. The fact that they are not easy to understand or explain tells me that the people who do dismiss them as stochastic parrots are perhaps guilty of underestimating an alien/artificial intelligence that now powers, distills, and summarizes an ever-increasing amount of our daily internet searches for information.
If generative AI poses legitimate risks and threats (it does), if it creates problems (it has), then the best solutions and mitigation strategies ought to be based on the most complete understanding of the technology. That is far less likely to happen as long as prominent critics mistakenly believe they are dealing with a parlor trick instead of the complex enigma that is increasing in its reasoning capabilities with each passing day.


