4 Studies About AI Tutors Every Teacher Should Know
AI tutors are here, but so far the research on efficacy is mixed. Here is some of the most intriguing research conducted so far.
AI tutors hold tremendous potential. In theory, enthusiasts hope these will be able to provide students with individualized tutoring at scale with the same benefits a student might get from a human tutor. In practice, as with most everything education-related, it’s more complicated.
AI tutors have been helpful in some instances, but in other cases, have seemingly hindered student success. I’ve had a chance to interview several experts on the cutting edge of research into AI tutors, and the consensus is that, used in the right way in the right context, these can likely be helpful, but we’re still learning how to best utilize the technology.
As a teacher myself, here are four studies I’ve turned to in order to try and understand the strange and still new world of AI tutors. I think any teacher exploring AI tutor use or curious about using one in class would benefit from familiarizing themselves with these findings.
1. A Math Tutor That Subtracted From Success
In a study of nearly 1,000 students, researchers at the University of Pennsylvania compared the results of students who worked with a GPT-4-powered tutor to those who did not on a math exam. The students with access to the AI tutor did worse on average than those without it.
“We find that generative AI could hurt learning because students potentially use it as an answer machine, as opposed to a tool that is conducive for learning,’” Alp Sungu, one of the paper’s co-authors, told me.
This is an important study to be aware of for teachers because it serves as a cautionary tale. It clearly demonstrates that more AI access does not always equal more success. However, even Sungu warned against taking the findings too far. The study does not show that using AI tutors hurt student success in general, just that relying on one can.
2. Some Bright Spots for Chatbot Tutors
Yu Zhonggen, a professor of Foreign Studies at the Beijing Language and Culture University, led a team that looked at 24 studies comparing how interacting with ChatGPT-style chatbots and tutors influenced students.
Tech & Learning Newsletter
Tools and ideas to transform education. Sign up below.
“Overall, the study found that AI chatbots had a significant positive effect on students’ learning outcomes,” Zhonggen told me. “Specifically, AI chatbots were found to enhance students’ learning motivation, performance, self-efficacy, interests, and perceived value of learning. Additionally, AI chatbots could be helpful in alleviating students’ anxiety.”
However, the full story was a bit more murky. These positive impacts were only seen in college students—younger students did not see statistically significant improvement. Also, the positive impact for college students seemed to wear off over time.
3. A Clear Win for AI Tutor Interaction
A recent study led by Ying Xu, a professor at the Harvard Graduate School of Education, found AI characters can help young students learn.
For the study, Xu partnered with PBS Kids to conduct a study with more than 200 children aged 4-7. The students were split into three groups of 80 students. All students were shown the PBS KIDS science show, Elinor Wonders Why, which is geared toward children in preschool and early elementary school. One group of students just watched the show. Another group of students watched the show and got assistance from an AI-assisted version of Elinor, who is a curious cartoon rabbit. This AI version of Elinor encouraged children to answer questions and offered tips if they got it wrong. The AI-assisted students performed the best and answered the most questions correctly, Xu told me.
Interestingly though, this study did not utilize generative AI. Instead, the AI version of Elinor chose between pre-selected and vetted answers to help prod students along. This approach is interesting to me because it avoids some of the problems generative AI tutors can have with hallucinations and just providing non-helpful answers. That said, my sense is that to reap the full potential of AI tutors, we’ll need to figure out how to harness generative AI effectively.
4. AI Helping Human Tutors
Given how mixed some of the research into human and AI tutoring has been, I’m particularly interested in an AI tool designed at Stanford called Tutor CoPilot.
Instead of replacing human tutors, this tool is designed to help them do the work they do more efficiently by coaching them on different questions they might ask a student, and more. On the surface, this sounds too rosy to me, as I’m generally not a believer in the “AI will help humans do the work they do better” argument. But in this case, that seems to be exactly what’s happening.
In a study of the tool, nearly 1,000 students worked with 900 tutors. The students who worked with tutors and used Tutor CoPilot were 4% more likely to master a topic after a session. And students working with the lowest-rated tutors saw the most significant gains as the AI helped compensate.
I’m thrilled by these results, particularly because there’s a human connection that occurs during tutoring that may aid the emotional well-being of the student, and this tool helps with that. Additionally, it suggests to me that there are many creative potential uses for AI tutoring to be discovered in the future.
Ultimately, we’re still in the early days of AI — more tutoring successes are likely to come.
Erik Ofgang is a Tech & Learning contributor. A journalist, author and educator, his work has appeared in The New York Times, the Washington Post, the Smithsonian, The Atlantic, and Associated Press. He currently teaches at Western Connecticut State University’s MFA program. While a staff writer at Connecticut Magazine he won a Society of Professional Journalism Award for his education reporting. He is interested in how humans learn and how technology can make that more effective.