To demo a few of the possibilities, let’s look at some health insurance claims data. The file contains 15,000 rows with each row containing a claims amount, the individual’s age, sex, weight, BMI, number of dependents, blood pressure, job title, hereditary diseases, city of residence, and whether the person is a smoker, has diabetes, or regularly exercises. First, I’ll upload the file and ask, “Using this dataset, will you please help me determine how different variables affect claims costs?” Then, we’ll see where things go. (This is a long video. I’ll put times of key events below. Also, here’s a link to the conversation.)
0:00 – 1:22 – Exploratory data analysis
1:23 – 1:56 – Handling missing values and cleaning data
1:56 – 3:00 – Visualizing distributions
3:00 – 4:51 – Visualizing categorical variables (Boxplots)
4:51 – 5:44 – Visualizing continuous variables (Scatterplots)
5:44 – 6:20 – Suggesting statistical tests
7:35 – 9:14 – Creating a table and providing a download link
9:14 – 10:24 – Drafting a methods section
10:24 – 11:58 – Explaining the test
11:58 – 13:20 – Generating conclusions.
In addition to quantitative data analysis, there are all sorts of other things that this mode of ChatGPT can do. My dabbling has only scratched the surface, but ChatGPT’s Advanced Data Analysis has also helped me to do the following:
-
- Categorize, summarize, and draw conclusions from survey data.
- Create a PowerPoint slideshow from a zipped folder of images.
- Create a “game” out of a set of course design recommendations and then show me how to add the HTML, CSS, and JavaScript files it created to GitHub to preview.
- Write 10 sample papers, score the papers based on a rubric, offer overall feedback as well as specific feedback on each criterion, create new files with the added comments, and produce a spreadsheet with each paper’s score on each criterion, total score for each paper, and overall descriptive statistics.
- Create an interactive map showing changes in CO2 emissions over time. As part of this process, ChatGPT walked me through how to install Python, add the Plotly graphing library, and use Terminal to create the map. (The climate data I used to create the map came from this demo of ChatGPT’s Advanced Data Analysis from MIT.)
I hope that this overview has inspired you to see what you can do with your own data or to integrate real-world data analysis into your teaching. Just be sure that, if you do choose to use this tool, do not upload data containing sensitive information. Also, in another post, I discuss how advanced prompting techniques can help shape interactions with generative AI chatbots. A lot of the techniques shared there are also applicable when working with ChatGPT to do data analysis. Finally, ChatGPT Plus does costs $20/mo. However, in addition to the advanced data analysis functionality, a subscription also provides access to plug-ins that extend ChatGPT’s functionality, a means to connect ChatGPT to the Internet, and Dall-E 3, OpenAI’s amazing text-to-image generator. It’s worth checking out.