“Video Captions Benefit Everyone”: An Investigation

If you’ve watched educational videos lately, you’ve probably noticed that captions are EVERYWHERE. In fact, caption technology has evolved in recent months so that captions now highlight individual words as spoken, rather than present entire phrases or sentences one after another. But what if this well-intentioned practice is actually interfering with learning?

Given their prevalence, I’ve just assumed that we have good reasons to include captions. At the same time, people do LOTS of things that contradict evidence — so perhaps the time has come to investigate my assumption.

I started by casting a wide net. I went to Google Scholar and put in “captions and subtitles.” The first hit sounded a confident tone: “Video Captions Benefit Everyone.”

To be sure we understand the confidence of this study, let’s read the first two sentences:

Video captions, also known as same-language subtitles, benefit everyone who watches videos (children, adolescents, college students, and adults).

More than 100 empirical studies document that captioning a video improves comprehension of, attention to, and memory for the video.

If in fact captions promote understanding, focus, and recall for practically everyone who watches, we’ve got as close to a slam dunk as I can imagine.

Let’s check to be sure…

Focus on Definitions

As is so often the case, we should start by defining our terms, questions, and expectations clearly.

Because this blog focuses on education, I’m interested captioned videos used to help K-12 students learn stuff. That is: research into captions when people watch movies for fun don’t fit my question.

For the time being, I’m also going to focus on same-language captions: a video where the narrator speaks in English and the captions show the narrated words in English. Of course, a student who speaks Spanish at home might benefit from seeing Spanish subtitles for a video in English, but that’s a very different research questions.

For similar reasons, I’ll start by focusing on research into neurotypical students. I can imagine that students with particular diagnoses might — as part of their learning profile — have different requirements than their peers. It’s probably helpful to start by understanding how most people learn, and then adapt that practice as needed for individuals.

(To be clear: we will ultimately be interested in different-language captions, and in the potential benefits for different categories of learners. To start with, I simply want the most basic question answered.)

I’d like to find several rigorously designed studies all pointing in the same direction. I want sample sizes that rise above the trivial; I’d like plausible control groups; I’d like objective measures — not mere self-report. And so forth. After all, I shouldn’t tell you that captions are (or aren’t) a reseach-informed instructional practice if all the research I’m citing doesn’t meet basic standards.

Finally, if I’m really lucky, I’d like to have both research and theoretical frameworks pointing in the same direction.

Now that we’ve got some parameters in place, let’s return to that study and see what we find.

Working the Steps

I spend lots of my time double-checking (or triple-checking) “research-based” claims, so I’ve got a process to follow.

I won’t walk you through each step of the journey — it took a few hours — but the results are impressively clear.

First:

We have essentially no research that fits the criteria above.

No, really: we don’t have a pool of persuasive research giving us an answer one way or another.

That “Captions Benefit Everyone” study focuses on foreign-language captions, or on non-neurotypical learners, or on college students, or on self-report. LOTS of self-report.

I should explain, by the way, why self-report data don’t persuade most researchers: people are REALLY bad at knowing what helps us learn. College students might THINK they pay more attention, or remember better, when they see vids with captions. But unless we actually measure their attention, understanding, or learning, we shouldn’t actually make claims about attention, understanding, or learning.

When I asked Elicit.com to research this question, I found the same problem. The studies it summarized focused almost entirely on Chinese students watching videos with English captions. That research helps answer an important question — but it’s not the question I asked.

Second:

The study that comes the closest to answering my question suggests that captions might interfere with reading for not-at-risk students.

This study suggests that captions DO help at-risk 2nd and 3rd graders recognize words. But the not-at-risk students recognized fewer words with the captions on. Obviously we’re glad to have strategies to help at-risk students. But that’s not the big-picture question we started with.

(I’m honestly puzzled that captions benefit struggling readers…but because I don’t teach reading I’m not going to have a strong opinion here.)

Let’s Talk Theory

I noted above that I’d like to have both well-done empirical research AND a theoretical framework to answer my question.

Richard Meyer’s “redundancy principle” tells us that presenting the same information both verbally and visually at the same time increases cognitive load.

In his excellent book Sweller’s Cognitive Load Theory in Action, Oliver Lovell gives a common example: conference presentations.

It’s a common practice for presenters to provide written information on their slides and then to read out that information during the presentation. To conventional audiences, this represents the presentation of redundant information. Only one presentation format is needed, either the written words, or the spoken words. (62)

That example sounds A LOT like captions, no?

A full explanation of the redundancy principle requires a blog post of its own. The short version goes like this:

Because I read faster than others speak, I’m constantly reading ahead of the speaker’s current point in the text. I must therefore stop and go back multiple times. All this back-n-forth adds to my cognitive muddle.
With captions, I have to focus either on the WORDS that the captions present or the IMAGES in the video — and that back-n-forth also adds to the cognitive work I have to do.

For the dual-coding folks reading this post, remember: dual coding advocates that words and images complement one another — not that they represent precisely the same information.

Putting It All Together

I found NO research with objective measures of neurotypical K-12 learners reading same-language captions. The one study that comes closest hints — but does not say — that captions might interfere with word recognition for early readers.
Meyer’s redundancy principle gives us a good reason to be VERY skeptical of claims saying that “captions benefit everyone.”
If you find research that matches the criteria above, please send it my way. I always want to keep this blog as up-to-date an accurate as possible.

In the meanwhile, here are my suggestions:

a) Be wary of claims that captions benefit most learners — especially neurotypical K-12 learners reading same-language captions.

b) Be ESPECIALLY cautious if the video includes cognitively complicated material — where cognitive load is already high.

c) Be aware of legal requirements, especially for students with diagnosed learning differences. Also, I myself would be more open to the benefits of captions for students watching videos in languages they don’t speak fluently. I haven’t done a deep dive into that research pool, but common sense suggests such captions could have real benefits.

Gernsbacher M. A. (2015). Video Captions Benefit Everyone. Policy insights from the behavioral and brain sciences, 2(1), 195–202. https://doi.org/10.1177/2372732215602130

Linebarger, D., Piotrowski, J. T., & Greenwood, C. R. (2010). On‐screen print: the role of captions as a supplemental literacy tool. Journal of Research in Reading, 33(2), 148-167.

Lovell, O., & Sherrington, T. (2020). Sweller’s cognitive load theory in action. John Catt.

“Video Captions Benefit Everyone”: An Investigation

Focus on Definitions

Working the Steps

Let’s Talk Theory

Putting It All Together

Recent Blogs

When Retrieval Practice Backfires (and When It Doesn’t)

Beyond the Science of Reading by Natalie Wexler

Why Cognitive Science in Education Feels Fragmented—and How Self-Efficacy Helps

Focus on Definitions

Working the Steps

Let’s Talk Theory

Putting It All Together

Recent Blogs

When Retrieval Practice Backfires (and When It Doesn’t)

Beyond the Science of Reading by Natalie Wexler

Why Cognitive Science in Education Feels Fragmented—and How Self-Efficacy Helps

Newsletter & Event Emails