Experimenting with Google’s NotebookLM (Project for ENGL 8180 Spring 2025)

Context

My generative AI journey began with ChatGPT in 2022 and has since expanded to include Claude.ai, Perplexity.ai, and Canva. My use cases generally relate to research and composition pedagogy, though I’ve even used Claude for some coding work.

When someone recently asked why I use different AI models for different tasks, I realized that I couldn’t articulate a clear answer. There’s significant overlap in what popular generative AI models can do, but each has distinct strengths and weaknesses. Their interfaces create different user experiences that can be difficult to explain—especially to those unfamiliar with these tools. Keep in mind, I’ve only used the free versions of these models, and paid versions often offer significant enhancements.

(Note the difference between ChatGPT and Perplexity’s informational approach and Claude’s more personal/conversational question. These starting pages may tell us something about the model’s utility.)

In my experience, Claude has always seemed more restricted by self-imposed ethical guardrails than ChatGPT, sometimes limiting its utility. Perplexity has been research-oriented from the start, with its links to credible and scholarly sources establishing reliability for its summaries. ChatGPT retains a nostalgic appeal as the first truly useful chatbot I encountered, but it carries baggage from “hallucinating” too much—especially in its earliest iterations. I avoided Google’s AI offerings due to similar accuracy issues with its Bard model upon release, but Google has since improved its AI products substantially. Its latest release, Gemini 2.5 Pro, ranks among the best available. Another AI-powered tool from Google is NotebookLM, which differs significantly from the simple chat interfaces offered by ChatGPT, Claude, and Perplexity.

NotebookLM has been on my radar for a while. I follow several AI content creators on YouTube (primarily AI Explained and Matthew Berman) who keep me updated on developments and reviews of new models. While NotebookLM has been mentioned occasionally, I don’t recall seeing a comprehensive review until watching this one from Skill Leap AI: https://www.youtube.com/watch?v=9xjmvUS-UGU

What is NotebookLM?

 

As I now understand it, NotebookLM is a tool with an integrated chatbot that draws knowledge and makes connections from a user-created dataset. Users can upload PDF documents, YouTube videos, and web URLs into a notebook. The chat feature within a notebook can answer questions about the uploaded content and make connections between different sources.

Pictured above: a NotebookLM notebook with multiple webpages uploaded as the dataset.

Within each notebook’s simple interface, Google created a “studio” feature that generates a podcast-style discussion about all uploaded content. This creatively summarizes the material and could benefit many users. However, the uncanny valley effect of this AI-generated podcast feels somewhat unsettling to me. The hosts have an awkward morning talk radio vibe, though the premium upgrade apparently allows users to change the hosts’ voices and tone/focus. The Skill Leap AI review demonstrates how users can create a podcast using NotebookLM and then use other generative AI tools to adjust host voices and even create virtual video hosts to read the script.

The studio section also offers one-click study guides, timelines, and briefings generated from your uploaded content. This content database functions as a context window (in computing terms) from which the chatbot can access information as part of its working memory when responding to users. The volume of text that can fit within this context window represents a significant advancement in publicly accessible, free generative AI models. Those who remember using ChatGPT before attachment uploads will appreciate this leap forward.

Capabilities and Limitations

Once you create a notebook (database), you can use the chat feature to answer questions about your uploaded source material. The responses provide links to the source material and include the full text from the source that supports each response. This crucial feature of linking to source text—now adopted by other popular models—allows users to verify the chatbot’s responses rather than relying on paraphrases or potential “hallucinations” with fake or missing citations.

Based on my experience with NotebookLM, limiting the chat feature’s responses to draw only from the user-created database significantly reduces fabricated information/hallucinations. In one case, the NotebookLM provided me some information that did not come from the source material I provided, but it also provided a disclaimer for that particular information:

The chat seems quite capable of digesting and synthesizing large volumes of text–to a point. For one notebook, I uploaded the full text of Bizzell’s 4,400-page anthology, The Rhetorical Tradition. Since this text is enormous (roughly 1.25 million words by my estimate), it exceeds NotebookLM’s processing capacity. Despite accepting the entire book as a pdf file upload, the tool only accessed approximately the first 1,000 pages. I resolved this by creating separate 1,000-page uploads for the remainder of the book. I confirmed this as an effective solution by asking the chatbot questions about different sections of the full text. Before I added separate uploads for pages 1001-4400, the chatbot could not provide links to anything beyond page 1000; after the additional uploads, the chatbot was able to answer questions and link responses to the rest of the text.

A Major—Welcome—Surprise

While NotebookLM users can upload webpages (not entire websites) as sources, many webpages include dynamic digital text elements that might not be captured as part of the uploaded information. In one case, I doubted whether an upload would include book notes that are only viewable when hovering over particular book titles on a webpage and then scrolling through the notes.

Surprisingly, the upload to NotebookLM included notes from all titles pictured on the webpage. In a sense, this gave NotebookLM agent-like capabilities, as I didn’t need to physically hover and scroll with my mouse to uncover the information. While I understand that all this information was embedded in the webpage’s code like any other text element, I was surprised that NotebookLM captured, accessed, and cited this content directly. As AI tools develop more sophisticated agent capabilities, their utility will grow substantially.

Notebook Capacity and Limitations

Directly from Google’s help page:

NotebookLM vs NotebookLM Plus User Limits

With NotebookLM, you can have up to 100 notebooks, with each notebook containing up to 50 sources. Each source can be up to 500,000 words long. All users start with up to 50 chat queries and 3 audio generations per day.

If you upgrade to NotebookLM Plus, you get at least 5X more usage, with up to 500 notebooks and 300 sources per notebook. The daily query limits also increase, providing up to 500 chat queries and 20 audio generations per day. Sharing a notebook doesn’t change the source limit: both you and anyone you share with can upload up to 300 sources to that notebook.

The X Factor

Perhaps novelty is the hard-to-describe factor that makes NotebookLM so appealing to me. I’m enjoying my learning process more in various areas because of the fascination of watching what a language model can do, while simultaneously testing its limits and trying to extract maximum utility from my laptop.

Implications for Teaching

Teachers already know how to use online platforms to provide students access to course materials. Imagine NotebookLM as a course site where all materials are aggregated and synthesized in helpful ways with a single click (to produce study guides and podcast overviews). More importantly, a NotebookLM notebook will function like a competent teaching assistant that can answer questions about all source material—even questions requiring connections between multiple sources.

I have already created a notebook for a class I am teaching this summer, and I am interested to see how students utilize it. (A notebook can be shared, but I believe it can only be shared with others who have a Google account.) Part of this ongoing project with NotebookLM is to learn how others leverage the technology in ways I have not thought of.

Implications for Writing

I’m still exploring how this tool can aid writing. For starters, it can serve as a database of one’s own writings. Unlike basic digital file storage, this tool helps writers find information from their database in ways that CTRL+F cannot. One of the best features of any LLM-based tool is its ability to instantly augment our searches with related terms/strategies we might not have considered.

Of course, the chat feature in notebook comprised of one’s writings can also make connections for the writer that were not evident. Maybe the chatbot could even make suggestions for where the writer could go next.

A classmate suggested this technology could help authors who create large fictional worlds that they need to track. NotebookLM would excel at reminding authors of obscure details that might otherwise conflict between previous works and current ones.

Tradeoffs and Conclusion

The most obvious downside to NotebookLM as a teaching and learning tool is that it may provide too much assistance, exacerbating our “TL;DR” tendencies. I often struggle with whether I need to read a full text to get what I need from it. As a painfully slow reader, my challenge is typically the finite nature of time rather than comprehension. NotebookLM addresses this time problem, albeit similarly to other generative AI models.

We should want to read primary texts in their entirety rather than relying on abstracts, summaries, and AI-generated bullet points. There’s significant value in struggling to make meaning from longform human communication. However, if information and technology continue accelerating, perhaps our paradigms for learning and processing information should evolve too. As we become an increasingly information-inhaling species, perhaps briefings, summaries, and podcast-style content we can process while driving are equally important to our learning, research, and writing processes.

We’ve all read longform materials that contained only small portions we truly needed. Context is often helpful and necessary, but not always. If NotebookLM can present context efficiently and effectively, perhaps this Google tool will be the first one I end up paying a subscription for. However, as these technologies increasingly filter communication between human writers and readers, we should always: 1) recognize the power, biases, and essence of the filter, and 2) remain mindful of what we may be sacrificing in attention span and deep reading comprehension.

Providing Argumentative Protocol to ChatGPT: A First-Year Writing Assignment

In the fall of 2023, about a year after the public release of ChatGPT, I decided to incorporate the new writing technology into my first-year composition course. My students were asked to experiment with using ChatGPT to generate topics and revise drafts. At this point, many of my students understood ChatGPT only as a cheating tool, but some were completely unaware of any of its capabilities.

Since students often struggled to land on specific topics to write about (when given a measure of freedom to do so), I created the following prompts to help them in the pre-writing process:

Prompt for exploring a topic:

Please help me prepare for writing my persuasive essay about a significant problem for Generation Z. Ask me questions–one at a time–that will help me come to a thesis, understand the strengths and weaknesses of my position and any opposing view(s). The problem I want to focus on is…

–or—

Please help me prepare for writing my persuasive essay about a controversial issue in my field of study. Ask me questions–one at a time–that will help me come to a thesis, understand the strengths and weaknesses of my position, and any opposing view(s).

Prompt for finding and understanding potential scholarly sources:

“Can you point me to some prominent researchers who have published on [insert topic] and very briefly summarize their positions?”

Prompt for testing a tentative thesis when you have a guess of what your position might be:

“Let’s engage in a dialectic exercise. I will present my tentative thesis, and then you will take the role of Socrates and ask me yes or no questions (one at a time) to expose any potential flaws or contradictions in my position.”

After the students completed a draft, we then focused on how ChatGPT/AI could help them improve the readability of their writing. To get the students to use the AI as tutor instead of a servant, we used the following prompt for revision.

Please revise and improve the wording and punctuation of the following text (without adding any new ideas). Below the revised version, please include a numbered list of each specific individual change that you made to each revised sentence along with an explanation of why you made each change.

 

Those prewriting and revision prompts served as an introduction to some of what ChatGPT could do beyond simply completing an essay for unscrupulous student in a time crunch. The last assignment of the semester attempted to incorporate ChatGPT in a different way and teach students to evaluate computer-generated text. Since the students had already written two persuasive essays, they were to chose one of them and have ChatGPT generate an essay that took the opposite or alternative position.

These are the instructions students were provided:

Objective: Evaluate a computer generated essay from ChatGPT. Use a formal style of academic writing to create your own original essay that is polished, sophisticated, and engaging to read. Apply what you have learned about sentence structure, punctuation, word choice, and tone to enhance the readability of your writing. Organize your ideas to logically support a clear thesis that draws a conclusion about the merits of the computer generated essay.

Step one: Have ChatGPT write an essay that takes the opposite position as one of your previous essays from this ENGL 1101 class.

Suggested Prompt: “Write an essay that makes the opposite argument as this thesis: [insert thesis from your previous essay]. The essay should be about 1000 words in length and include multiple direct quotations from multiple sources. All quotations should be cited in MLA format and a works cited page should be included at the end of the essay. The level of writing should be appropriate for a first-year college writing course.

Step two: Carefully read the essay and evaluate the following:

Readability (sentence structure, punctuation, grammar, word choice, tone)

Evidence of clear thesis

Support for thesis (is it logical? Vague? Precise? Accurate? Organized?)

Credibility of quotations/sources (Do they actually exist? Are they scholarly?)

Adherence to the prompt (Did the machine do exactly what you asked? 1000 words? Multiple quotes? Multiple sources? MLA documentation?)

Step three: Assess your thoughts from step two and draw a conclusion about the overall merits of the computer-generated essay. This is your thesis statement.

Step four: Using the evidence from step two, write an essay that supports your thesis statement from step 3. The essay should be written in MLA format and at least 1000 words in length. The audience for this essay is your fellow students and your instructor, so use formal academic English. You may wish to use the strategy of compare and contrast for this assignment (using your own essay as the object of comparison/contrast), but you do not have to do this. Make sure your main points are clear (typically indicated in topic sentences for your body paragraphs) and that you transition logically form one point to the next. You must have an introduction, a body, and a conclusion that attempts to answer the question “So what?”

**You must include the text of the ChatGPT Essay (clearly labeled) after the conclusion of your own evaluation essay.

Sources/Documentation: No research or documentation is necessary for this assignment, but if you choose to include information from an outside source, you must use MLA documentation (signal phases, in-text citations, and works cited entries).

 

I think this assignment has held up well for a few semesters. The hope is that, because students should ideally have some emotional investment in the essay they wrote, they will be more critical of the AI-generated text response than they would if the AI text sample was random. Without this buy-in and inclination to defend their own writing/thinking, I believe first-year college students would be more willing to simply accept a computer-generated text as accurate and/or “better” than what they created themselves. The polished fluency of the computer-generated text is impressive enough for most students (and all readers, for that matter) to distract from other potential problems with the text, namely the inclusion of quotations and sources that do not exist.  The assignment also allows me to gauge for myself how generative AI is evolving over time and compare different generative AIs that students are using.

A New Approach for Teaching Critical Thinking with Generative AI

As I experimented this past week with perplexity.ai’s new R1 feature, I was intrigued with the way that the model showed its reasoning in step-by-step fashion. I used it as an example in an ENGL 1102 composition class to discuss critical thinking. I was most interested to ask my students whether the computer was showing evidence of critical thinking, and either way, what could we learn from it.

Of course, to answer deductively whether the AI was critically thinking, a definition of critical thinking was necessary. We decided this one was clear and simple enough:

For the perplexity.ai demonstration, I prompted the AI with a question that my son is exploring right now as he writes his own essay for his middle school English class.

We could debate whether or not the full response of the AI shows evidence of critical thinking, but if we expand the “Reasoning with R1” section, we appear to have a window into how the full response came about. That is where I think we might be better off looking for evidence of critical thinking (or the simulation thereof). Let’s start with step 1:

Just in this step alone, we see that perplexity.ai decided to do a few different things. First, it turned my query into three separate searches and it prioritized finding evidence/sources which included content from political analysts and scholars. It also decided (for lack of a better term) that it should locate or reference a definition of oligarchy. If this were my 8th-grade son’s first steps or my college students’ first steps in answering the given question, I think I would have to say that they were on the right road to “analyzing available facts, evidence, observations, and arguments”—straight from our definition of critical thinking.  The selection of sources that were then chosen by the AI involved its own process worth discussing, and I expect my students to exercise their information literacy skills when doing that task for themselves. On another day, I may seek some clarification from the AI on how it went about selecting those particular sources, but suffice it to say for now that they were algorithmically generated like many information searches we may already be familiar with. If we can get past the idea that the AI states it was “Reading” the sources, rather than “processing” or “scanning” I suppose, we can move on to its next reasoning step.

Step 2:

I still struggle with seeing this AI using the personal pronoun “I,” but setting that aside, we can see how it defines oligarchy and links to sources where the definition comes from. It is worth checking those specific links to see if they do inform the definition the AI chooses to use, but what is happening here is exactly what deductive reasoning calls for. You cannot say X is Y until you define what Y is. In this case, America can be X and oligarchy can be Y.

Step 3:

This is where we see the AI connecting aspects of oligarchy to America as it summarizes some of the key findings from its sources. If step 2 gave us the deductive major premise that Y (oligarchy) is a system where a “small group of ultra-wealthy individuals hold significant power over political decisions, often to increase their own wealth,” now we have with step 3 the minor premise of the syllogism: X (America) has  “a few billionaires, influencing policies and elections.”

 

Steps 4 and 5:

Here we see the AI doing something that most writing instructors encourage their students to do: account for and consider counterarguments instead of just satisfying your confirmation bias monster. This relates to the “informed choices” language of our critical thinking definition. A choice is more informed once counterarguments are known and considered.

Step 6:

Instead of presenting a definitive answer to the question, the AI has settled on a thesis that expresses a degree of nuance. Getting my students to land on measured, qualified thesis statements like this is an achievement worth celebrating in a first-year composition course, especially when doing so forces them to move even a little bit away from their former strongly held, but ill-informed, convictions.

Conclusion:

So did the AI use critical thinking to answer the question it was given? And if so, what does that mean?

I cannot say for certain if the machine was critically thinking or not. And just because the reasoning steps it presented to me look just like critical thinking (or simulations of critical thinking), that doesn’t mean that AI was really doing those things. It just means that it says it was doing those things.

When Alan Turing first explored the question of whether a machine can think in his 1950 paper, he decided that particular question was not as suitable as asking if a machine can appear to think. In other words, can a machine fool a person into believing that it (the machine) is another thinking human being. In the case of perplexity.ai this past week, I think the appearance of critical thinking is quite strong.

Whether or not the AI was really critically thinking is important, but what might be more important (i.e. practical) is the new strategy I have for discussing the concept with my students.

Here is the link to the perplexity.ai query and answer:

https://www.perplexity.ai/search/is-the-united-states-an-oligar-5p9jmfAwSaW9kCW4wiHSLQ

 

Prompt Writing vs. Prompt Speaking

When I was in high school in the late 1990s, cell phones were just becoming ubiquitous among my Generation X peer group. However, the way we used these early generation phones for communication purposes did not look exactly like how most people use them now. Text messaging, in particular, only became realistically functional around 1999 when the feature started working across different networks, allowing one cell phone user to text any other, regardless of whether their service provider was AT&T, Sprint, etc. Yet even with its lower-than-calling cost, the new medium of texting did not catch on right away. I can distinctly remember sharing a sentiment with many others at the time: why would I text when I could just call? Speaking live to the other person just seemed so much easier, so much more efficient and timely, so much more…familiar. Phones– whether in a glass booth on a city street corner, attached to the wall of your house, or in your pocket–were for calling and talking.

For a while anyway, neither the early cell phone user nor the technology hardware was quite ready to usher in a significant replacement of live verbal over-the-phone communication in favor of finger-pecked messages. The users at the time, still getting used to digital communications in the form of electronic mail (then electronic-mail, then e-mail, and finally email), had spent their entire lives understanding landline phones as talking and listening devices. Why text from a phone what could be much more easily typed and emailed from a computer? The cell phones of the 1990s and early 2000s, without full alphabetical keypads, made texting a very clunky process.

Despite these challenges, the utility of the text message via cell phone ensured its survival, and as Blackberry phones introduced full keypads (2002) and the iphone introduced a full touchscreen keypad (2008), the user experience for texting improved dramatically. By 2011, Pew reported that almost a third of cell phone users who used text messaging preferred receiving and sending texts over calls. A little over a decade later, texting achieved status as the preferred choice of communication over calls, in-person conversations, video calls, and email. Writing (of a sort) actually defeated speaking.

The matter seems to be settled then, for now at least: texting is generally preferred as the means of communication between people today. But what about how humans interact with computers?

Virtual assistants in the form of Alexa speakers and Apple’s Siri have been around for over a decade, and they are technologies that exclusively work via voice activation and verbal inputs from the user. Speech-to-text programs have also been around for quite some time, and their advancement has made it easier for many people to dictate text messages, “write” emails, and utilize word processing programs like Google docs and Word. For many, typing textual inputs for their computers and devices is not a preferred method over speaking into the devices. The path to this point was/is not without challenges, however:

The latest generative AI models like ChatGPT can also respond to text or speech inputs from users. Considering the rapid integration of generative AI into existing technologies like phones and personal computers, how people choose to interact with the generative AI will largely drive both how educators might train people to use the AI and how the developers of generative AI adapt the user experience of the products.

As an instructor of writing, I decided a couple of years ago that generative AI might not be such a terrible disruption to my line of work –IF– the interface of ChatGPT and others like it was actually forcing the users to write their prompts for the machines. In fact, like others, I saw prompt writing as a new and exciting way to teach both basic writing skills and critical thinking. After all, getting a generative AI model to do what the user wants relies heavily on clear written communication (along with persistence and revision and creativity). However, as these chatbots increasingly incorporate better speech-to-text capabilities, I wonder if the idea of prompt writing will soon lose out in favor of prompt speaking—similar to how a reverse situation unfolded with verbal cell phone conversations losing the battle against written text messaging. If science fiction is any indication of what the future of human-computer interaction will look like (and sci-fi has quite the impressive track record), then keyboards could be on their way out. Hal and JARVIS and Kit have superior listening comprehension.

Already, some computer coders have turned away from natural language text prompts of generative AI models in favor what they are calling “vibe coding,” or speaking their instructions rather than typing them. This does not seem like a good sign for my plans to teach writing through the lens of AI prompting, but we will just have to wait and see if speech prompting catches on and influences the way that the tech industry updates AI experiences and interfaces for users.

Voodoo, Alchemy, and Incantations: The Enigmatic Nature of Generative AI

When it comes to AI (and particularly the generative AI of the 21st century), there are two disclaimers that should precede any discussion about how this computer technology functions:

  1. Not even the creators and top researchers of generative AI fully understand how the technology does everything it does.

and

  1. The field is advancing so quickly and drastically that what was true about AI one minute is quite often untrue the next.

Now, that second disclaimer means that at some point the first disclaimer may be rendered false and the experts will have it all figured out—maybe even due to the assistance from AI itself.

But even without a full understanding today, there are numerous resources which attempt to explain the basics of how generative AI (like chatbots and LLMs) can take a user’s natural language input (i.e. prompt) and produce a natural language output as a response. However, I think these resources (articles, videos, etc.) mostly fall short of boiling this process down into simple terms that non-computer scientists can understand. Depending on the particular explanation and the analogies used to describe the process, a learner might have to become familiar with a host of technical terms and advanced computing concepts if they are to reach a meaningful level of understanding.

Interestingly, generative AI itself can do a remarkable job of explaining complicated concepts in plain language suited for whatever audience a user specifies:

Note that even at the high-school level, ChatGPT has already introduced the ideas of neural networks and LLMs. These are not simple ideas in computer science and understanding them requires the understanding of further underlying concepts. According to ChatGPT, understanding a neural network requires a basic knowledge of algorithms, machine learning, and mathematics like statistics and calculus.

One possible conclusion then is that a sufficient explanation of how generative AI works does not exist for the non-mathematician/computer scientist. It is just too complicated of a technology. Asking a question like “how does ChatGPT work?” isn’t all that different from asking how a human being is able to verbally answer a question. There is just so much to it, and one lengthy article or video or even video series isn’t enough to cover all the necessary ground.

My own incomplete understanding of AI has been largely shaped by three books: Brian Christian’s The Alignment Problem (2020), Joseph Weizenbaum’s Computer Power and Human Reason (1976) and Yuval Noah Harari’s Nexus (2024). Combined, these books provide a decent primer on the history of AI and what its capabilities and limitations are, but I would not say that they sufficiently explain exactly how it works.

One source that I think does do an admirable job of trying to explain ChatGPT’s inner-workings is “What Is ChatGPT Doing … and Why Does It Work?” by Stephen Wolfram. I came across this in early 2024, and while Wolfram explains much of the mathematics behind ChatGPT/LLMs without completely overwhelming the mathematically challenged reader, I find something even more intriguing about how he approaches the areas of ChatGPT’s functionality that experts still cannot explain.

Here are the key excerpts from Wolfram’s article:

“…when ChatGPT does something like write an essay what it’s essentially doing is just asking over and over again ‘given the text so far, what should the next word be?’—and each time adding a word.

One might think it should be the “highest-ranked” word (i.e. the one to which the highest “probability” was assigned). But this is where a bit of voodoo begins to creep in. Because for some reason—that maybe one day we’ll have a scientific-style understanding of—if we always pick the highest-ranked word, we’ll typically get a very “flat” essay, that never seems to “show any creativity” (and even sometimes repeats word for word). But if sometimes (at random) we pick lower-ranked words, we get a “more interesting” essay.”

As a basic predictor of the next most probable word, Wolfram gives an example of what text looks like from GPT-2 (the 2019 model):

The text quickly devolves into non-sensical rambling. However, when the randomness of the next word is increased slightly, the text becomes more coherent. Turn up the randomness (i.e. temperature) a little higher and the text appears coherent and even creative, but turn the randomness up too high and it will again become non-sensical.

The fact that Wolfram calls this randomness factor “voodoo” is striking—and not just for its cultural insensitivity. We expect more concrete answers and explanations from the experts. Wolfram does a fine job of breaking down what can be said of the operations of ChatGPT, but it’s what the experts cannot explain that interests me the most.

A March 2024 article from MIT Technology Review digs into the mystery of LLMs further. Here are some relevant quotations:

“a remarkable fact about deep learning, the fundamental technology behind today’s AI boom: for all its runaway success, nobody knows exactly how—or why—it works.”

Mikhail Belkin, a computer scientist at the University of California, San Diego: “But our theoretical analysis is so far off what these models can do. Like, why can they learn language? I think this is very mysterious.”

“rapid advances in deep learning over the last 10-plus years came more from trial and error than from understanding.”

“It works, which is amazing. Our minds are blown by how powerful these things are”… And yet for all their success, the recipes are more alchemy than chemistry: “We figured out certain incantations at midnight after mixing up some ingredients,”

The major unknowns and mysterious randomness in generative AI operations ought to change the way some people (mis)understand the technology. If LLMs and chatbots were simply operating on basic probability, then that would make them easier to understand, easier to explain, and consequently, easy to dismiss as fancy calculators. The fact that they are not easy to understand or explain tells me that the people who do dismiss them as stochastic parrots are perhaps guilty of underestimating an alien/artificial intelligence that now powers, distills, and summarizes an ever-increasing amount of our daily internet searches for information.

If generative AI poses legitimate risks and threats (it does), if it creates problems (it has), then the best solutions and mitigation strategies ought to be based on the most complete understanding of the technology. That is far less likely to happen as long as prominent critics mistakenly believe they are dealing with a parlor trick instead of the complex enigma that is increasing in its reasoning capabilities with each passing day.

Thoughts on the Commonplace Book

This week’s ideas about commonplace books leads me to reflect on a book that I read recently. William Poundstone’s Head in the Cloud (2016) examines the value of knowing facts in a world where we can quickly Google or search for (and find) answers for everything.

To summarize one of the book’s main points, Poundstone thinks that even though technology does and should reduce our need to memorize facts, we still need some minimum level of memorized facts in order to make sense of the world around us and function well in a society. This makes sense to me, and I feel like the base level of factual knowledge that I see from my college freshman students has declined over the past 15 or 16 years. Of course, there is also the cognitive bias of declinism which I know could be impacting my perspective, but Poundstone’s research does provide data that affirms my anecdotal observations.

Maybe today’s technology affords everyone the choice to use their brain space for something other than fact storage. And maybe this helps to explain why the commonplace seems less common and why we no longer share a set of cultural touchstones or cultural canon the way we once did. However, I think there is something else we should consider as responsible for the disappearance of the common place.

Social media and the nature of new entertainment distribution gives people such a wide variety of content to choose from that it feels more difficult to connect with others the way we (Americans) used to. Influential algorithms certainly play a part in customizing our media/entertainment consumption in the 21st century, but individual choice within the menus that the algorithms provide each consumer still seems to allow far greater variation in a single person’s reading/viewing/listening profile than was the case a few decades ago.

It is worth accounting for the “viral” nature of the most widely shared internet content like memes, gifs, and Tiktok videos. Could these be forming 21st century commonplace? I think that is possible, but there is something so short-lived and ethereal about this kind of content that I wonder if a commonplace based on such a corpus is distinctly different from what previous generations shared.

I presented this idea to ChatGPT (after I wrote the above reflection) and the response is below. I like the observation that one’s social media account and corresponding history of shared and liked posts could be considered as their commonplace book. Since I do not participate much with social media, this did not occur to me, but I do keep my own version of a digital commonplace book in youtube, saving and organizing content that I intend to come back to. I suppose my email folders serve the same purpose at times.

ChatGPT:

The concept of the commonplace has a long history rooted in rhetorical tradition, particularly in ancient Greek and Roman education. A commonplace (locus communis in Latin) traditionally refers to a shared idea, theme, or argument that is widely recognized and can be used as a foundation for further reasoning, debate, or persuasion. In classical rhetoric, students maintained commonplace books, which were collections of quotations, maxims, or arguments that they could draw upon when composing speeches or writing.

The Traditional Commonplace

Historically, the commonplace served multiple functions:

  • Memory and Invention: It allowed speakers and writers to store useful ideas for future use.
  • Shared Cultural Knowledge: It provided a repository of widely accepted wisdom, strengthening arguments by appealing to common values.
  • Pedagogical Tool: It was used in rhetorical education to help students develop their ability to analyze and deploy persuasive ideas.

During the Renaissance and Early Modern period, commonplace books became highly personalized and were often filled with excerpts from literature, philosophy, and even scientific texts, reflecting the individual’s intellectual pursuits.

The 21st-Century Commonplace

While the fundamental idea of collecting, organizing, and drawing from shared knowledge remains, the digital age has transformed the commonplace in several key ways:

  1. Digital Curation Over Personal Notebooks
    • Instead of handwritten journals, people now use digital platforms like blogs, Twitter, Pinterest, or Notion to collect and organize knowledge.
    • Algorithms curate information, shifting from personal selection to automated suggestion.
  2. Fragmentation and Speed
    • Traditional commonplaces were structured collections of ideas meant for deep reflection; today, social media and digital reading encourage more ephemeral and rapid engagement with ideas.
    • Memes and viral trends serve as modern commonplaces, often reducing complex ideas into highly shareable but simplified forms.
  3. Community-Driven Knowledge
    • While historical commonplaces were curated individually, modern knowledge repositories (Wikipedia, Reddit, online forums) are collectively built, making the process of knowledge-gathering more communal.
    • Hashtags and online movements function as dynamic, evolving commonplaces.
  4. Multimedia Integration
    • Unlike the text-based commonplaces of the past, today’s versions incorporate video, audio, and interactive media, expanding the ways in which knowledge is stored and transmitted.

Fundamental Differences?

Despite these changes, the core purpose of the commonplace—to collect, store, and reuse knowledge—remains intact. However, some fundamental shifts have occurred:

  • Authority and Personalization: In the past, commonplace books often relied on established authorities, whereas today, the sources of knowledge are more democratized (but also more prone to misinformation).
  • Ephemerality vs. Permanence: Digital platforms prioritize trending ideas that can disappear quickly, whereas historical commonplaces aimed for long-term intellectual reflection.
  • Interactivity: The modern commonplace is more participatory, with collective engagement and commentary shaping the content in real-time.

A Literacy Narrative, Interview Style

The following questions and answers detail some of my thoughts and experience with reading, writing, and AI (which I have mostly interpreted as generative AI). I present this as a rather informal Q&A format, but to see how ChatGPT attempted to weave the same information into a more cohesive narrative, here is the link:

https://chatgpt.com/share/678a72be-1d2c-8004-8539-5a6595b495f8

I think the language of my responses below better captures the essence of my personality and experience, but the ChatGPT-constructed narrative does do more to create cohesion, as my prompt specifically instructed.

For what it’s worth, ChatGPT identified the central theme of the narrative as “the transformative power of language—how writing, reading, and thinking shape personal identity, intellectual growth, and teaching practices. It explores how these elements intertwine to create a lifelong journey of discovery, reflection, and connection, both with oneself and the broader world.”

Writing

What is your earliest memory of writing?

I remember writing “themes” in high school, probably 10th grade, late 90s. These were short essays on a given topic or with a given form. I don’t remember completing specific steps of the process, but they were handwritten. At least some of the process was done in class. Computers weren’t really an option from what I recall—not in the classroom. I don’t remember much writing on computers until college.

What is your earliest memory of being taught to write?

I remember learning to write in cursive when I was in third grade. It was difficult for me, and being left-handed seemed to make it worse. The pencil or ink would smear across the page and get all over my hand that didn’t have anywhere to go but all over the freshly scribbled letters.

*Note to self: It takes a bit of mental energy to dig around the closet of my mind’s memories to find this kind of stuff. I don’t poke around in this closet much—probably because the memory of learning to write might reside on top of some poorly hidden metaphorical skeleton.

Did your parents read to you?

I remember my mother reading to me. My grandmother and aunt more so. I cannot remember my father ever reading to me.

Did they have books lying around the house?

A few here and there, self-help type books maybe, but I rarely saw them read.

Did you have an influential school teacher and if so, what do you remember of them?

Since I have reading and writing on the brain, I do remember that my fourth-grade teacher, Mrs. Reif, was a fantastic out-loud reader of fiction. She helped stories come alive and probably helped influence my relationship with reading/literature.

What part did reading play in your development as a writer?

Dr. Suess and Shell Silverstein books gave me an appreciation for verbal gymnastics and rhyme and meter when I was young.  I have written a few short pieces of poetry in my life, and in some distant way, those children’s books influenced them all. Later in life, I was drawn to The Divine Comedy for both its story and style—same with Homer’s epics. If I write creatively (which is rare), then those works are serving as some kind of inspiration and maybe even archetypes.

Did you ever read a precis or watch a movie instead of reading the book?

Hell yeah, but I would prefer the gift of uninterrupted time and quiet brain space for reading the book.

Did you have an influential professor and if so, what do you remember of them?

Undergraduate: I had a film professor who I liked, maybe because we just got to watch movies most of the time and talk about them instead of the harder work of writing papers. He may have planted a seed in my mind that the job of a teacher/professor might be a decent way to make a living. He seemed pleased enough to be getting paid for what he was doing. I didn’t really understand the other responsibilities of a higher-ed professor though.

Before I committed to a concentration at GSU, Dr. Pullman convinced me to choose rhet-comp instead of literature and that I would have an avenue to study AI. He was right, and I am thankful for his influence.

Describe your writing processes and behaviors.

Part of my process is to internally think, think, and think some more. This often resembles what I would call an internal dialogue with myself. I would like to think of it like a dialectic where I am desperate to uncover the information I need to write and make appropriate/logical/original connections between that information and prior knowledge.  It’s getting harder and harder to compartmentalize my writing project(s), and they tend to dominate my brain space. Honestly, I have had dreams recently where the topic of my writing/thinking has been the subject matter, and when I wake up, I even feel like I have worked something out while I slept. I really wish I could do more pre-writing in my sleep for no other reason than efficiency, but I am impossibly far from realizing that. Other than that, my processes vary and can be either relatively linear or more scattered. I sometimes feel like I get trapped writing a never-ending introduction as my thinking faucet remains open and active as write.

How do you teach writing?

Frequently. Is the short answer. It depends on the course though. For co-req/remedial English, I start by focusing on writing at the sentence level and then expand to the paragraph and essay—and then the research and documentation components of academic and/or persuasive writing. That’s the sequence, anyway.

For second semester freshman, I am taking different approach this semester. I am having them write one short first-drafty piece every week that is a response to an idea/quotation from the assigned course text (a self-help type book). I hope to teach them something about writing as we look at their works together as a class.

Reading

What is your earliest memory of reading?

Jurassic Park when I was in 5th grade—at the community pool in the summer. What a book.

What part does reading play in your writing?

Reading the way I do currently is probably way too analytical and inefficient, but the thinking patterns I develop doing that might be beneficial as a writer because I am used to mentally and linguistically trying to, for lack of a better terminology, figure stuff out.

Have you ever read a summary or watched a movie instead of reading an assigned book?

I wouldn’t have my degree in English if I didn’t. I don’t read efficiently enough, and I didn’t love most of the assigned literature enough to give it the time and attention it probably deserved.

Does listening to someone read a book provide you with the same experience as reading it yourself? Elaborate.

No, not at all. Two very different experiences. When I read, I am more in control of the pacing, which allows for more internal dialogue with myself about what I am reading. I also take notes when I read. I do not do that when I am listening, although I probably could if I really wanted to. Listening would be more efficient for me, but it would not allow for same depth of connections as reading does for me.

Opining

Do you have an opinion about the present state of the economy?

I am trying not to. I only have so many effs to give right now, and I am actively trying to give up politics. The state of the economy is too close to politics for me. I recognize that I am very privileged to be able to tune out like this. I am mid-way through a stable career that pays the bills. I choose to teach partly as insulation from concerns about the economy. Since I am studying AI, I feel like I might be further insulating myself (at least a little bit) from the potential negative impact AI might have on employment in higher ed. That’s what I have to tell myself anyway.

Do you have an opinion about NATO?

It seems to have served an important purpose of thwarting expansionist dictatorial powers. That seems worth preserving to me, even at a significant financial cost to the US.

Do you have an opinion about the relationship between sacrifice and faith?

I have plenty of both satisfying and unsatisfying opinions about faith. I suppose if you choose to have faith in one thing (particularly one conception of God), you could be sacrificing the chance to have faith in another–depends on how all-encompassing your conception of God is though. The faith I wrestle with for myself requires an acknowledgement of sacrifice made on my behalf and the duty to love sacrificially, so I see a strong relationship between the two ideas.

Do you ever use words you can’t define?

Yuval Noah Harari talks about how difficult it is to define fundamental concepts. I use words like love, faith, AI, writing, literature, American, and learning, all the time, but I cannot define them as well as clearly as I wish I could.

How comfortable are you with saying, “I don’t know.”?

It depends on the day, but I hope that I am getting more comfortable with it as time passes and I learn just how much is out there that I don’t know.

Thinking

Do you have an internal monologue?

I do, and I want to learn more about it. The theory of linguistic determinism says that the internal dialogue I have with myself in English could completely shape and constrain my thoughts. I know alternative theories say otherwise, but they still indicate a relationship between thoughts and language worth exploring and understanding.

Do you ever edit as your narrate to yourself?

Sort of. I think there is constant editing going on. It might be very similar to how LLMs predict the next word. I have to shuffle through my vocabulary to choose the right word in real time, so there could be considerations taking place that one could call edits.

What is it like to think wordlessly?

Probably some kind of combination of internal visual and emotional processing would be my guess. I am pretty sure my dreams are devoid of actual language, so there may be some wordless thinking going on there.

Is there a difference between rationalizing and thinking?

Rationalizing seems like a specific application or direction for thinking.

AIing

Have you used AI?

Yes. Specifically ChatGPT, claude.ai, and perplexity.ai, but AI is so embedded within other technologies that I probably use other forms without even knowing it.

How many times?

Intentionally, at least a few times every week.

Toward what ends?

I am using it (perplexity.ai especially) much more often as a search/research tool. I used to use it more experimentally out of curiosity.

Current thoughts about it?

It is terrifying and fascinating. Knowing what many of the experts say about it’s destructive potential and harmful present impacts on the environment, labor forces, and artists, I wonder if using it (even for educational reasons) could be contributing to a world that doesn’t align with my own values.

Final Project: ChatGPT UX Case Study

Introduction and Project Focus:

Perhaps it’s a fool’s errand to attempt to improve the design and user experience of the fastest growing app in history. ChatGPT reached 1 million users in 5 days after its release and 100 million users in just two months. Some if-it-ain’t-broke,-don’t-fix-it wisdom might seem obvious to apply with this wildly successful product, but UX experts Gavin Lew and Robert Schumacher issue an important warning in their 2020 book AI and UX. According to them, users who encounter problems early on with their use of any product are likely to abandon the product, often never to return. Therefore, if OpenAI (parent company of ChatGPT 3.5) wants to hold on to their massive user base and attract and keep new users (and, equally importantly, transition users to their upgraded paid version), it is imperative that their ChatGPT 3.5 generative AI model delivers an accessible design and experience without any frustrating flaws. But to create the most desirable experience, UX designers must first figure out what their users are seeking and what exactly they need and desire from the product.

For this study of ChatGPT, the issue of use cases is the first area of focus. Beyond addressing the UX problems related to serving the most appropriate and popular use cases, there are two other aspects of the ChatGPT experience that merited attention: interface design/functionality and response accuracy. Informed by industry research, academic research, and focus group testing, this case study presents analysis and recommendations for ChatGPT’s improvement related to use cases, interface design/functionality, and response accuracy.

Competitive Analysis:

A recent (February 2024) competitive analysis study from Copenhagen Economics looks at the current state of generative AI:

https://copenhageneconomics.com/wp-content/uploads/2024/03/Copenhagen-Economics-Generative-Artificial-Intelligence-The-Competitive-Landscape.pdf

Included in the report’s executive summary is the following:

“This white paper provides a preliminary review of current developments in the GenAI space and the emerging implications for competition. Further research is needed to shed additional light on the evolving competitive situation in GenAI, as well as any case-specific matters, particularly as this is a fast-evolving sector.”

Also included is this analysis of the most notable generative AI models:

“First, most foundation model developers (e.g., OpenAI, Anthropic, Mistral) already provide their own user-facing applications. At the same time, many foundation model developers offer rival developers the ability to build applications on top of their existing models via open models or closed APIs (e.g., Jasper.ai writing assistant built on OpenAI GPT models).

 Second, players in different levels of the value chain often specialise in specific domains (e.g., BloombergGPT in finance, or Isomorphic Labs in the pharmaceutical drugs discovery process) or tasks (GitHub coding copilot, or Perplexity.ai in search). The success of these models suggests that specialisation is a viable business model.”

The second observation there is especially important for this ChatGPT case study because it reveals a difference between models that serve specific domain purposes and those (like ChatGPT, claude.ai, and Llama 2, and Gemini) which function more generally or without strongly defined purposes.

There are also a number of comprehensive comparison guides for generative AI models, and this February 2024 guide does an effective job of assessing a handful of the most common use cases:

https://www.zdnet.com/article/best-ai-chatbot/

This comparison guide offers valuable information for the UX researcher by pointing out some of the pros and cons of each model. Its analysis suggests that ChatGPT 3.5 lacks two distinct features that some other free generative AI models offer: live connection to internet and the ability to upload files for analysis. While the lack of these features may be hurting ChatGPT 3.5’s ability to compete with other free generative AI models, the paid subscription upgrade (ChatGPT 4) does offer both internet connectivity and the ability to upload files for the system to read. From a profit standpoint, it may make sense for OpenAI not to offer all its available features in the free version, but if other companies are offering them at no charge, then those other companies are positioning themselves to deliver a superior overall user experience among free generative AI models. To remain competitive, ChatGPT 3.5 should incorporate internet connectivity and file uploading. However, they could limit file sizes and number of responses that include live internet in order to maintain an incentive for users to upgrade. It seems important to at least demonstrate for users all that the upgraded version of a product is capable of.

Issue #1: Identifying Use Cases

Understanding a product’s primary use is key to designing the best experience for the user. Designers are used to having an end goal or ultimate purpose in mind when they seek to improve how a particular product works. If the product is a shovel, the designers know to work on creating the best digging experience for the target users. Understanding that products could have secondary or multiple uses is important as well, and sometimes those uses must be factored into design. In the case of a Swiss Army Knife, the designers considered a number of different uses for their product and then chose to accommodate all of those uses within a compact device.

ChatGPT and other similar generative AI tools are somewhat unique as human-created technologies because unlike most tools which are created to solve defined problems, generative AI is more like a solution looking for problems to solve. This is evident in their user interface as it currently provides users with four different suggestions for how it can “help you today.”

 
 

Google’s generative AI model Gemini presents users with a very similar set of use options. While these options may change daily or upon new access of the tool, this example from Gemini shows two of four options that are related to cooking. This makes it appear as if Gemini is limited in what it can do. Even the first and fourth suggestions about brainstorming for presentation ideas and outlining a logical sales pitch are similar.

Many first-time users of ChatGPT or other models are likely accessing their chosen tool with no defined purpose, so it may be helpful to include some very specific options for how the product can be used. However, the options provided on the webpage may not be the best strategy for engaging curious explorers. The current suggestions give some sense of what ChatGPT might be able to do, but they are also overly specific. A better option might be to highlight the broader tasks that ChatGPT can help with, along with multiple examples of each task (instead of just one), and then have the tool initiate the chat and help the user narrow down to something specific they want or need help with. Here is a mockup:

Issue #2: Interface Design and Functionality

Method for testing interface and functionality:

A group of 12 first-year college students used ChatGPT for a prewriting activity designed to help them explore a topic or tentative thesis statement. The use case was thus pre-determined and defined for them, although they had a choice between which initial prompt to use.

The demographics of the student test group are reflective of the college they attend: racially diverse, majority female, majority traditional college students age 18-24. While different groups of users will have different needs and expectations of ChatGPT, this group is a likely target audience for generative AI models as they may learn to use the technology in college and then bring their knowledge and use of generative AI with them as they enter into their career fields.

Prompts options for product test:

Prompt for exploring a topic:
Please help me prepare for writing my persuasive essay about [insert topic]. Ask me questions–one at a time–that will help me come to a thesis, understand the strengths and weaknesses of my position and any opposing view(s).

 

Prompt for finding and understanding potential credible sources:
Can you point me to some prominent researchers who have published on [insert topic] and very briefly summarize their positions?


Prompt for testing a tentative thesis when you have a guess of what your position might be:
Let’s engage in a dialectic exercise. I will present my tentative thesis, and then you will take the role of Socrates and ask me yes or no questions (one at a time) to expose any potential flaws or contradictions in my position.    

Data Collected: Students submitted their ChatGPT pre-writing activity conversations in the form of sharable links provided by ChatGPT. The user interface of ChatGPT does include a helpful “share” button at the top right of the screen (desktop version).

This share button creates a special link that captures an entire chat session that others can view without having to access the original user’s account. The user can also choose whether to remain named or anonymous in the shared version of the chat. Users in this case study generally found the share feature offered by ChatGPT to be easy to locate and use.

The researcher also conducted a focus group to gather opinions and insights regarding the design and experience of the students.

The full ChatGPT conversations collected from students are of limited value for UX purposes, but from a qualitative perspective they do reveal something important. Each student prompted and replied to ChatGPT in very different ways. Some used more formal, properly punctuated language while others were far more informal, using abbreviations and texting language. Despite this, the responses from ChatGPT did not show variation in tone and style. It might make sense to have ChatGPT adjust its formality according to user inputs—or make the level of formality an adjustable option for the user.

Focus Group Discussion Summary and Interface Recommendations:

  • A majority of the students reported using some type of generative AI before, typically for academic reasons, but none of the students were regular users except for one who said she used Snapchat’s AI feature. This particular student remarked that she liked the Snapchat feature better than ChatGPT because the Snapchat AI did not decline to answer questions like ChatGPT does and Snapchat’s AI was not as “neutral.”
  • Students found the website for ChatGPT easily and reported no problems with creating an account. Some reported that creating the account/signing in using their existing Google account made things easier.
    • Recommendation: Keep the simplified sign-in process for Google users.
  • Initial impressions of the interface were mixed with some liking the “clean” layout and others calling it “simple in a bad way.” Some users felt like they did not know exactly what to do first within the site. Based on some users’ confusion, the layout would not score extremely well on a five second test.
    • Recommendation: Reduce confusion about what to users should do first by having ChatGPT start the conversation with a letter-by-letter appearance of text instead of the static “How can I help you?” message. This text would automatically open the chat conversation instead of waiting for the user to initiate the first response.
  • Some students appreciated the ability to change their personal settings to show the site in dark mode, but most students did not seek or find that option. Some felt that the dark mode option from their browser was sufficient.
    • Recommendation: add a simple dark mode toggle switch at the top of the interface:
  • All students accessed the desktop version of the tool, but many said they would consider using the mobile app.
  • About half of the students did notice the accuracy disclaimer at the bottom of the screen, and some students agreed that a pop-up notification at the start of a chat would be better because the disclaimer is not highly visible.
    • Recommendation: Increase font size of disclaimer and add disclaimer as a pop-up message at the beginning of a chat session. Include an option on the pop-up that says, “Do not show this message again.”
  • Students commented that the letter-by-letter appearance of the text is “cool” and makes it feel like “someone is typing to you.”
  • Students generally agreed that the ability to verbally enter prompts and responses would be a good feature to add. Most did not realize that they could use certain internet browsers for that same feature.
  • Almost all students found it easy to get responses from ChatGPT, except for one student who had issues when using a Safari browser. When she switched to Chrome, that resolved her issues.
    • Recommendation: improve the tool’s functionality in Safari or stop offering the tool through Safari altogether. No experience may be better than a frustrating experience in this case.
  • Student generally had no issues with the speed of ChatGPT responses, but some thought the length of responses were too long.
    • Recommendation: Have ChatGPT ask users if its responses are sufficient in length during the chat.

 

Issue #3: Accuracy of Responses

There are a handful of reasons why potential users of generative AI instead choose to steer clear of the technology. Concerns about both intellectual property rights and bias related to the training of these models are enough to keep some away completely. A more pressing concern for UX designers is the accuracy of responses generated by the different models. For any users on the fence about trying ChatGPT, one inaccurate response may be enough to cement their opinion of the tool as unhelpful or even dangerous. As many users are using ChatGPT as a source of information on a range of topics and as an alternative to Google search, it is essential that responses include accurate information and avoid the so-called “hallucinations” and misinformation that harm the tool’s credibility and lead to the frustrating experiences that drive users to other alternatives. Furthermore, since ChatGPT 3.5’s training data cuts off at January of 2022 and it lacks internet connection, it will not usually provide information for events after January 2022. If it does, then it is most likely providing misinformation because it has no way of accessing new information. Other free models (like perplexity.ai, for example) do include internet connectivity and are able to pull up-to-date information giving them a major edge over ChatGPT 3.5.

Industry studies and scholarly research are pointing toward improvements in accuracy for generative AI. One 2023 study by Walters and Wilder shows that GPT4 showed a significant decline in the inclusion of false citations compared to ChatGPT 3.5. In the study, 55% of sources created by ChatGPT 3.5 were found to be fictitious whereas only 18% of those created by GPT4 were. It is likely that GPT4’s access to the internet contributed to this improved accuracy as the model was able to better verify the existence of sources. This is further reason to make internet connectivity a priority change for ChatGPT.

Other industry studies also suggest the rapid improvement in generative AI accuracy.

Originality.ai released data in February of 2024 showing GPT4 outperforming ChatGPT 3.5 in a fact checking task (Gillham), and Anyscale released data indicating GPT4 outperforms both ChatGPT 3.5 and humans in identifying accurate summaries of news reports (Kadous, 2024). This is good news for the user experience, but generative AI already has a significant stigma when it comes to accuracy. Some misinformation that makes its way into generative AI responses is a result of the program simply trying to execute on a prompt without taking into account any need for accuracy, but other inaccuracies can be the direct result of biases that influence outputs.  

Since the optimal user experience of ChatGPT includes program outputs with accurate information, designers need to ensure the model’s training data is more representative of marginalized online communities and cultures. This means finding better quality data than the primarily English language based swaths of internet collected by Common Crawl (Baak, 2024).

Another source of bias that generates less accurate information is the custom response feature of ChatGPT. This feature allows users to tailor the responses they receive based on their own preferences. For example:

A response from ChatGPT based on these instructions thus becomes biased and will ignore certain information, making the responses potentially less accurate.

 

Although these kinds of tailored responses do create a preferred user experience, in the long run, the presentation of biased info to users (even if they ask for it) is going to hurt the overall reputation of ChatGPT and other generative AI models. This is an ethical problem for UX designers that deserves careful consideration. There may be ways to provide users some of their preferences in terms of length and formality of responses without reflecting a user’s own bias back at them.

Conclusion

ChatGPT revolutionized the world on behalf of generative AI in 2022, but today it faces strong competition from similar free (and freemium) models that offer important features that ChatGPT only offers in its subscription based GPT4 model. There are a handful of minor changes that ChatGPT could make to its interface to enhance users’ overall experience, but if OpenAI chooses to keep ChatGPT 3.5 offline, they are taking a major risk. The level of misinformation in the responses of generative AI systems is a major concern to many, and users concerned with accuracy very well might choose to experiment with more accurate free internet-connected models like Perplexity.ai and Gemini before they choose to pay $20 per month for that feature in GPT4.

 

References

Baak, S. (2024, February 6). Training data for the price of a sandwich. Mozilla Insights. https://foundation.mozilla.org/en/research/library/generative-ai-training-data/common-crawl/

Gillham, J. (2024, February 2). AI fact checking accuracy study. Originality.ai. https://originality.ai/blog/ai-fact-checking-accuracy

Kadous, W. (2023, August 23). Llama 2 is about as factually accurate as GPT-4 for summaries and is 30X cheaper. https://www.anyscale.com/blog/llama-2-is-about-as-factually-accurate-as-gpt-4-for-summaries-and-is-30x-cheaper

Lew, G., & Schumacher, R. Jr. (2020). AI and UX: Why artificial intelligence needs user experience. O’Reilly.

Walters, W. H., & Wilder, E. I. (2023). Fabrication and errors in the bibliographic citations generated by ChatGPT. Scientific Reports, 13(1), 1–8. https://doi.org/10.1038/s41598-023-41032-5

Project Update: Reflecting on Data Collection Through a Focus Group Session

I would be willing to bet that I have participated in far more focus groups than the average person. When I was hustling to pay off my student loans in the 2010s, I discovered that the city of Atlanta had a slew of market research companies that would pay me for my time and opinions on a number of different products: restaurant foods, scratch-off lottery tickets, pilot episodes of television shows and commercials, deep sea fishing excursions, NASCAR race events, household appliances. Those are just some of the focus groups I remember participating in.

There is a bit of a dirty secret to my former success in selling my opinions to market research companies though, and I wouldn’t say that I am proud of it now. I had to lie…rather, I chose to lie in order to qualify for most of the studies that I participated in. If a market research company’s screener was recruiting for subjects to taste a new menu item for a fast-food chain that sold seafood, then they needed to find people who ate at these establishments. They would call all the people on their database list of potential subjects and work to fill quotas that typically required some diversity in the group of subjects (age, race, gender, income, etc.), but the subjects all had to have one thing in common: they had to be a target audience for the product, likely users or consumers of the product. The market research companies didn’t want vegans in their Chickfila focus groups, unless they were testing out a new non-meat option or something.

But there is a critical flaw in the way these focus groups operate. It is a problem that opens up an avenue for unscrupulous people like me (old-me, anyway) to easily exploit their system. The company that wants the focus group data typically outsources this task to third parties. Company X hires a market research firm to collect data, and that market research firm often relies on another market research company to assemble the groups of participants. (At least this is how I saw it working in the 2010s.) The real breakdown seemed to occur with relying on the initial screeners to produce a group of people who fit the specifics that Chickfila or their hired market research firm decided they wanted data from. The call screeners’ job was to find participants, and back then, I quickly realized that doing so was not always easy for them to do. Sometimes I would get through a 10-20 minute screening process (answering questions about demographics and purchasing behavior) only to have the screener say, “Sorry, you don’t qualify” or “You do qualify, but I already have enough males in their 30s.”

Experienced screeners worked more efficiently and could figure out if I qualified much faster by asking the right questions in the right order (obviously diverting from their scripted questionnaire). And the occasional desperate-for-subjects screener would be intentionally leading in their questioning: “Are you sure you haven’t bought a scratch-off lottery ticket in the past week?” I think the screeners were even incented financially to fill groups, earning bonuses for quantifiables that encouraged them to ignore quality. Either way, once I caught on to the different motivations of the parties involved, I started to play the game. I would be intentionally vague in my screener call responses, allowing “good” screeners to lead me to the desirable answers. Then I would show up to the focus group, participate as much as I could, offer legitimate and genuine feedback whenever possible, and then collect my check or gift card. Rinse, repeat.

My dishonesty involved lying about mostly my consumer behavior. It wouldn’t have been easy to lie about demographic information because much of that would be obvious when I showed up to the focus group and they verified my date of birth and address. My race would have been relatively obvious too, although, one time my wife (accomplice and fellow liar) did get herself qualified for a self-tanner product and showed up as the only non-orange person at the group! I guess she didn’t think that one completely through, but she still got her honorarium.

After participating in a focus group, my wife and I would often discuss the performance of the focus group leader. As educators, we both knew what leading a group discussion entailed and we marveled at just how incompetent some of these market research professionals were at their jobs. There were also the rare few who were excellent, but the bad ones were always worth gossiping about. In my hubris, I almost always thought I could do better. Considering my level of experience as a participant toward the middle and end of my multi-year run as a frequent focus group imposter, I was probably somewhat right.

When it came time this semester for me to collect data for my current UX project, I thought about my experience with focus groups and decided that doing a dozen interviews at once would be an efficient way to go about this step. It was time for me to see if I really could run a focus group better than the professionals I critiqued years ago.

I was prepared. I had two pages of questions all related to the generative AI system that my students-turned-informal-research-participants had used just days prior. I also had 15 years of experience leading discussions on all types of subjects with diverse groups of students, including high schoolers, college students, and even incarcerated men in a state prison. I was ready to prove myself as a market research genius.

Ultimately, I failed at achieving true focus group leader glory for two reasons:

#1. It is difficult to keep the participants on topic. My list of questions was well thought out and logical. One question or set of questions logically led to another, but my research subjects were not privy to this planning and therefore jumped ahead and backwards many times. This made it harder to elicit the feedback I was seeking. Fortunately, I recorded the session so I could sift out the data, but I believe there were lost opportunities and lost data because of the way I failed to control and redirect the respondents at times. It was a balancing act because more feedback is generally preferable to no feedback, and I didn’t want to shut the respondents down.

#2. I found it extremely difficult to avoid leading my respondents toward my own opinions and observations regarding the user experience of the product (ChatGPT). Hubris strikes again. I could not remain an impartial enough recorder of responses. I saw opportunities to seek additional feedback from respondents that would confirm my own opinions, and I couldn’t help myself.

Despite these shortcomings, I did collect data that will allow me to better assess the user experience of the target product. If I could do the focus group again, I think I would start with more open-ended questions and allow the discussion to go where the respondents lead it, rather than trying to rigidly stick to the list of questions I developed.

Book Review: AI and UX: Why Artificial Intelligence Needs User Experience

AI and UX: Why Artificial Intelligence Needs User Experience, by Gavin Lew and Robert M. Schumacher Jr. 1st edition 2020, Apress. EBSCOhost, search.ebscohost.com/login.aspx?direct=true&AuthType=ip,shib&db=cat06559a&AN=ggc.996791144102945&site=eds-live&scope=site.

Why did I choose this book?

My current research interests revolve around all things generative AI, especially its applications for writing and teaching. I am trying to learn as much as I can, and I think an understanding of the UX aspects related to AI is beneficial. This led me to seek a UX book that might provide insight into how AI systems and products are designed with end users in mind.

Because the release of ChatGPT in November of 2022 drastically altered the entire AI landscape, I was hoping to find a recently released book on UX and AI, but I was unable to find one that looked like legitimate scholarship. This is not entirely surprising since it takes a while to research, write, and publish. So instead, I settled for the most recent book on the topic I could find that looked reliable. There were not many options to choose from, but this book, despite its 2020 copyright date, proved a good read for someone like me who is neither an expert in UX or AI (yet).

Who are the authors?

The authors for this book are Gavin Lew and Robert Schumacher Jr. Instead of trying to summarize their already relatively short biographies in the book, I will include them in full:

Gavin Lew has over 25 years of experience in the corporate and academic environment. He founded User Centric and grew the company to be the largest private UX consultancy in the United States. After selling the company, he continued to lead a North American UX team to become one of the most profitable business units of the parent organization. He is a frequent presenter at national and international conferences and the inventor of several patents. He is an adjunct professor at DePaul and Northwestern universities. Gavin has a Masters in Experimental Psychology from Loyola University and is currently the Managing Partner of Bold Insight, part of ReSight Global, a globally funded UX consulting practice across North America, Europe, and Asia.

 Robert M. Schumacher Jr. has more than 30 years of experience in academic, agency, and corporate worlds. He co-owned User Centric with Gavin from its early stages until it was sold to GfK in 2012. While at User Centric, Bob helped found the User Experience Alliance, a global alliance of UX agencies. Also, he founded User Experience Ltd, a UX agency in Beijing. He is co-founder, co-owner, and Managing Partner of Bold Insight, part of ReSight Global, a global UX company. Bob was the editor of and contributor to The Handbook of Global User Research (2009). He has several patents and dozens of technical publications, including user interface standards for health records for the US government. He also is an Adjunct Professor at Northwestern University. Bob has a Ph.D. in Cognitive and Experimental Psychology from the University of Illinois at Urbana-Champaign.

Basically, we have two writers with plenty of UX experience (related to technology and other fields) and backgrounds in psychology. It might have been nice to have an author with more of a computer-science background paired up with someone who knows the psychology behind UX, but these two authors also have a long-established working relationship which enhances their ability to communicate throughout the book.

Summarizing the Chapters and Some Highlights:

In the preface, the authors state the following:

“Our perspective on how AI can be more successful is admittedly and unashamedly from a UX point of view. AI needs a focus on UX to be successful.”

This is a central theme in the book. The authors recognize the role UX must play in the development of AI systems, tools, and interfaces. Having now had some experience myself with a few of the generative AI platforms, I think the authors are correct, and an emphasis on UX for AI tools won’t just make those tools easier and more pleasant to use, but a better UX experience can actually save these tools from being written off by the general public as novelties or passing fads. The failure of AI to live up to hype in past decades did lead to these kinds of dismissals, but the latest wave of advancements may have reached a tipping point that insulates AI from another major cultural setback or lengthy pause.

Chapter 1: Introduction to AI and UX

This chapter does a respectable job of making the important connections between UX and AI. The authors prove that they know enough about these connections to be credible voices from which the reader can learn.

Drawing from their significant UX work, Lew and Schumacher tell us that “For any product, whether it has AI or not, the bare minimum should be that it be usable and useful. It needs to be easy to operate, perform the tasks that users ask of it accurately, and not perform tasks it isn’t asked to do. That is setting the bar really low, but there are many products in the marketplace that are so poorly designed where this minimum bar is not met” (16).

Throughout the book, the authors make a good case for the application of pretty much all general UX principles to AI products. Chapter 1 just lays out the landscape and major connections.

Chapter 2: AI and UX: Parallel Journeys

As the title implies, Chapter two provides a nice historical walk through AI and UX development. Particularly interesting is the focus on “AI winters” that followed periods of overhyped AI performance in the 1960s and again in the 1980s.  Also, they mention the “domain-specific AI winter” for AI personal assistants which followed the overhyping of Siri in the early 2010s.Part of the reason for these AI winters is that the developers of the systems were not focused enough on user experience.

I appreciate the differentiation the authors try to make between HCI (human-computer interaction) and UX in chapter 2:

“Where HCI was originally focused heavily on the psychology of cognitive, motor, and perceptual functions, UX is defined at a higher level—the experiences that people have with things in their world, not just computers. HCI seemed too confining for a domain that now included toasters and door handles. Moreover, Norman, among others, championed the role of beauty and emotion and their impact on the user experience. Socio-technical factors also play a big part. So UX casts a broader net over people’s interactions with stuff. That’s not to say that HCI is/was irrelevant; it was just too limiting for the ways in which we experience our world” (50).

The way I interpret this is that HCI is akin to a substrata of UX.

Chapter 3: AI Enabled Products are Emerging All Around Us

And

Chapter 4: Garbage In, Garbage Out

These two chapters are where the book shows its age a bit as a pre-ChatGPT publication. Although there are some interesting examples of AI systems discussed in Chapter 3, the next chapter disconnects enough from the user experience that I did not find it valuable as a UX text. The focus of Chapter 4 is the data that AI runs on. The authors are correct that without quality data, the user experience of any AI product will suffer, but since current AI systems are such black boxes when it comes to their training data, this is somewhat of a moot point for me right now.  

I will say that I perked up a bit reading about voice assistants and Grice’s four maxims for communication (67). Anyone studying generative AI could benefit from using those maxims as a starting point for evaluating what our machines are capable of. Current LLMs and systems based on LLMs seem to handle the three of the maxims with relative ease much of time (quantity, relevance, and clarity), but the truthfulness of LLM’s communication is where many people are finding the most problems. One could argue that truthfulness is the most important of the four, but it is obvious that advances in the other three areas have come quickly and impressively. I think it is entirely possible that AI systems make progress on that fourth maxim in the near future. And if things in the AI world are not interesting enough for someone yet, they will be once the programs are more reliably accurate purveyors of information.

Chapter 5: Applying a UX Framework

This final chapter is still relevant in the post-ChatGPT world. It ties the idea of UX and AI back together (whereas they diverged a bit in the previous two chapters). This quote at the beginning of the chapter seems especially relevant:

“For many people, there’s still a hesitance, a resistance, to adopt AI. Perhaps it is because of the influence of sci-fi movies that have planted images of Skynet and the Terminator in our minds, or simply fear of those things that we don’t understand. AI has an image problem. Risks remain that people will get disillusioned with AI again” (109).

I think the authors are correct that people could become disillusioned with AI again, but this will probably be less about the UX dimension and more about the existential threats, security concerns, and intellectual property issues that accompany 21st century AI. Either way, since AI is becoming so ubiquitous, I would not predict another AI winter like authors detail in Chapter 2.

One of the most interesting points in Chapter 5 regards the purpose of a product. As they lay out the case for applying a UX framework to AI, the authors pose the following questions:

“Probably the most important thing that defines any application is what it does—we call this “utility” or “functionality” or “usefulness.” Basically, is there a perceived functional benefit? In more formal terms, does the application (tool) fit for the purpose it was designed for?”

The reason this is interesting is because I am not sure the creators of ChatGPT and the other generative AI systems (or any of the precursors dating back to the 1960s) really had a specific end user functions in mind—at least not as the driving motivation for their creations. It seems like the systems have all been designed just to see if the creators could make a machine that could communicate like a human and display some level of “intelligence.” Along the way, clever people have figured out how to leverage this technology for different purposes, and profit-driven people have too, but I really don’t think that thoughts of the usefulness of LLMs weighed heavily on the creator’s minds. Evidence for this exists within the current user experience of ChatGPT. When users first access this application they see an interface with suggestions for how the app could be used. That is weird.

When we buy tools or access technologies, we typically already have the function in mind; that’s why we sought the tool to begin with. Generative AI companies are almost saying to the user, “Here it is. Figure out for yourself what purpose it has for you.” For the time being, that is the user experience for many users of generative AI.

As for the user experience of Lew and Schumacher’s book, I think they did a decent job of connecting two fields that need to be connected. A reader with a good grasp on AI could probably skip chapters 3 and 4, but there is plenty of helpful information and background in Chapters 1,2, and 5 that still holds up well in this four-year-old title from Springer/Apress.