AI and Learning: Beyond “Good” vs. “Bad”

As AI changes the education landscape moment by moment, we can expect a steady flow of studies offering us guidance about its use. We’re happy to have all that research, but it does come with a probable danger. Every month or so, we will likely hear that a study has DEFINITIVELY concluded

AI damages our long-term ability to learn, or
AI revolutionizes classrooms to benefit students, or
AI promotes collaboration while fostering independent thinking, or
AI harms neurons and corrupts synapses…

Like Celine Dion’s heart, this list will go on and on.

In this blog post, I want to quickly summarize two AI studies which — unless carefully parsed — reach contradictory conclusions. My point will be not that one is wrong and the other right, but that each study has asked meaningfully different questions and measured results in a substantively different way. BOTH of these studies offer us useful guidance…as long as we resist the temptation to cry: “Look! This study gives us a clear-cut, definitive answer to our question.”

I’ll end by arguing that we should focus not on “AI good” vs. “AI bad,” but on “AI that replaces thinking” vs. “AI that fosters thinking.” Let’s start with those two studies.

Study #1

In a recent study by Dr. Andre Barcaui, a group of 120 undergraduates at a Brazilian university spent two weeks preparing presentations for their classmates. Half of those students did use AI to assist them in this work; the other half didn’t. Then — SURPRISE — 45 days later all students took a follow-up test on the material in their presentations.

The students in the AI group averaged 57.5% on that test; the students who didn’t use AI averaged 68.5%. That’s quite a difference! (If you speak stats, the Cohen’s d was 0.68.) The one sentence summary: “students who study with AI remember less than those who don’t.” If you want a hyped-up version:

Using AI harms learning.

Like all studies, this one has methodological strengths and weaknesses.

Strengths: most education research tests “learning” a few hours later. This one tests learning 45 DAYS later. That schedule is rare, and makes it more reasonable to talk about “learning.” If students recall more information after a month and a half, we can plausibly say they “learned more.”

Weaknesses: To my mind, the biggest weakness here is that we don’t really know what the students DID with, or without, AI. How exactly did they use AI tools in preparing? Did they ask for summaries or scripts or graphs? Equally important, we don’t know that the students in the non-AI group actually avoided AI. They could have used AI back in their dorms…how would the researcher know?

All that being said, Barcaui’s study makes rough-n-ready sense within a cognitive science framework. Basically speaking, we can assume that students in the AI group let ChatGPT or Claude “do more of the thinking for them.” On the other hand, students in the non-AI group did the thinking themselves. No one should be surprised that the students who thought more learned more.

One simple way to highlight this point: students in the AI group averaged 3.2 hours of prep time for their presentations; those in the non-AI group averaged 5.8 hours. If I spend twice as much time thinking deeply about a topic, I’m likelier to learn it. If AI reduces the amount of time I think, I don’t learn as much.

Study #2

A second study, led by Dr. Angel Tsai-Hsuan Chung, seems to reach the opposite conclusion. In this study, more than 1000 Taiwanese high-school students took a five month course in Python (the programming language, not the British comedy troupe). All of the students used an AI tutor to help them solve problems, but that tutor came in two distinct versions. The basic version gave students problems in a consistent order: easy, then medium, then hard.

The enhanced version gave problems depending on the student’s current level of understanding and “productive struggle.” That is: the enhanced AI analyzed the student’s work — the quality of their questions, the correctness of their edits, and so forth — to determine how hard the next question should be. Students who succeeded at one level of difficulty advanced to the next level; those who struggled with a problem got an easier one next. In this way, the enhanced AI tutor kept the challenge in a “desirably difficult” range.

Sure enough, at the end of the course, students who practiced with the enhanced AI tutor scored roughly 0.15 standard deviations higher on the final exam. This number is hard to translate into a non-stats framework, but let’s put it this way: the difference wasn’t huge, but students would certainly be happy they scored those few extra points — or disappointed if they didn’t.

We hyped up that first study by saying “using AI harms learning.” We can hype up this study by saying

AI-enhanced studying will transform education.

Two Conclusions

On first reading, the Barcaui study and the Chung study seem to point in different directions: the first basically “anti-AI” and the second basically “pro-AI.” On eX-Twitter — where I first learned about these studies — they were held up as the last word on the question of AI in education.

“This Barcaui study shows why we must ban AI from our classrooms!”
“This Chung study shows that AI will be essential for the future of learning!”

I myself draw two different conclusions.

First: both studies reinforce a core conclusion from cognitive science. If we want students to learn, we should do everything we can to cause them to think harder — but not too hard — about that topic. In the Barcaui/class presentation study, students who let AI do the thinking for them remembered less. In the Chung/python study, AI tutors that ramped up the thinking challenge helped students learn more.

In other words: AI is a tool like many others. It can be used badly — to replace thinking — or well — to prompt thinking.

Second: This pair of studies, and the over-hyped conclusions drawn about them, highlight an ongoing danger in our field. I’m confident that, with some frequency, we will hear that a new study provides the last word on AI: we must be all in or all out. Books and newspapers and blogs will make sweeping claims. Teachers and school leaders will be tempted, or encouraged, or required, to make passionate commitments in one direction or the other.

Rather than take an absolute stance, I think we should look carefully for useful, specific guidance.

“This classroom use of AI requires students to think more deeply (that’s good!), but has the potential to distract them (that’s bad).”
“This AI tool reduces WM load of a complex assignment (that’s good!), but isolates and demotivates students by reducing their connections with each other (that’s bad!)”

Over time, we will learn how a specific version of AI affects particular cognitive functions in certain ages, grades, disciplines, and cultural contexts. That growing, ever-shifting body of guidance will help us decide when AI support enhances learning, and when another tool — a book, a pencil, a mini-whiteboard — better serves our students.

A final note: after I wrote the blog post above, I came across an analysis by my friend Dr. Ian Kelleher whose thinking substantially overlaps with mine. If you’d like to see his more fleshed-out thinking, you can find it here.

Barcaui, A. (2025). ChatGPT as a cognitive crutch: Evidence from a randomized controlled trial on knowledge retention. Social Sciences & Humanities Open, 12, 102287.

Chung, A. T. H., Zhang, B., Kung, L. C., Bastani, H., & Bastani, O. (2026). Effective personalized AI tutors via LLM-guided reinforcement learning. Available at SSRN.

Study #1

Study #2

Two Conclusions

Recent Blogs

Teach Like a Champ by Regan Gurung and Elizabeth Yost Hammer

From Defensiveness to Curiosity

The Hidden Tradeoffs of Self-Explanation

Study #1

Study #2

Two Conclusions

Recent Blogs

Teach Like a Champ by Regan Gurung and Elizabeth Yost Hammer

From Defensiveness to Curiosity

The Hidden Tradeoffs of Self-Explanation

Newsletter & Event Emails