In 2012, researchers in Britain found that Omega 3 fish oil benefited students who struggled in schools. In fact, it helped students both concentrate better and learn more.
That was exciting news, because we can provide those dietary supplements relatively easily. It sounded like an easy way to fix to a real problem.
However, other studies didn’t confirm this result. For that reason, the original lab decided to try a replication study. In other words: they repeated what they had originally done to see if they got the same results.
Omega 3 Fish Oil: The Bad News
Nope, they didn’t help.
You can review the study here. Most impressive — and most discouraging: chart after chart and graph after graph showing no meaningful difference between the students who got Omega 3 supplements and those who didn’t.
(By the way: nobody knew who got the supplements until after the study. It was, as they say, “blind.”)
In the muted language of research, the authors conclude:
In summary, this study did not replicate the original findings of significant, positive effects of omega-3 DHA on either learning or behavior. No systematic adverse effects from the supplementation were observed. As such the study does not provide supporting evidence for the benefits of this safe nutritional intervention.
Alas, this easy solution simply doesn’t pan out.
The Good News
The system worked.
When researchers come across a positive finding, they should both spread the news and double check their work.
That is, they should let us know that omega 3 fish oil might be beneficial, and run the study again to be sure.
Of course, replicating a study is expensive and time consuming; it’s easy to decide that other research priorities are more important.
In this case, however, the researchers did what they ought to have done. As a result, we know more than we did before. And, we’re not wasting time and money stuffing our children with needless dietary supplements.
We should all tip our hats to this research team for doing the right thing. I don’t doubt they’re disappointed, but they’ve shown themselves to be a real model for research probity.
(For another example of researchers sharing conflicting results, see this story from last October.)
__________________
PS: After I finished writing this post, I came across another article about fish. It might not help with working memory, but it just might help prevent MS.
On the one hand, the obvious answer is “yes.” We’ve all heard that meditation reduces stress, improve concentration, deepens sleep, and whitens teeth. (I think I made that last one up.)
Some of you reading this post may have embraced mindfulness, and perhaps tell your neighbors and friends about its healing powers.
This sort of evidence–coming from personal experience–can be powerfully persuasive.
Other Ways of Knowing
On the other hand, if we want to know about mindfulness in a scientific way, we’d like some research. Please.
Research on topics like these typically follows a predictable pattern. In the early days of Mindset theory, for example, Dweck worked with a few dozen people for an hour or so.
When these studies showed promise, she then followed larger groups of people for longer periods of time. In one study, for example, she and Lisa Blackwell followed hundreds of 7th graders for over 4 years.
One recent analysis I saw looked at Mindset data for 125,000 grade-school students. Yup: 125,000.
This trajectory–from small test studies to large and rigorous trials–makes sense. We can’t fund huge investigations of every idea that comes along, so we need to test for the good ideas before we examine them in depth.
But, once an idea–like, say, mindfulness–shows promise in early trials, we’d like to see larger and more rigorous trials as the years go by.
So: is that happening? Are we seeing better studies into mindfulness?
Investigating Mindfulness
Sadly, not so much. That’s the conclusion of a recent study, which compared early mindfulness research to more recent examples.
We would like to see studies with larger sample sizes, active control conditions, longer-term evaluation of results and so forth. This study finds some positive trends, but overall isn’t impressed with the research progress over the last 13 years.
Of course, their conclusion doesn’t mean that mindfulness doesn’t help.
It does mean, however, that our evidence isn’t as strong as it might seem to be, because we haven’t yet “taken it to the next level.”
___________________
By the way: you’ll have the chance to learn more about mindfulness, and about the ways that researchers investigate it, at the upcoming Learning and the Brain conference in New York.
Blackwell, L. S., Trzesniewski, K. H., & Dweck, C. S. (2007). Implicit theories of intelligence predict achievement across an adolescent transition: A longitudinal study and an intervention. Child development, 78(1), 246-263.
Goldberg, S. B., Tucker, R. P., Greene, P. A., Simpson, T. L., Kearney, D. J., & Davidson, R. J. (2017). Is mindfulness research methodology improving over time? A systematic review. PloS one, 12(10), e0187298.
The ever provocative Freddie deBoer explores the relationship between correlation and causation.
You know, of course, that the one does not imply the other.
DeBoer, however, wants to push your certainty on this point.
Are there circumstances under which proving causation would be immoral, and therefore correlation is a useful placeholder? (Do you really want to run the double-blind study about smoking cigarettes?)
Are there circumstances under which the causal chain is wildly complicated, and so correlation is an important clue?
In other words: while correlation doesn’t prove causation, common sense tells us that it’s an interesting starting point. And: we often face circumstances where causal proof is hard to come by, and so correlation gets our attention as a useful indicator of potential causation.
As long as we’re careful about these subtleties, we can allow ourselves to notice correlation, and speculate (humbly) about causation.
Here’s how deBoer concludes his article:
What we need, I think, is to contribute to a communal understanding of research methods and statistics, including healthy skepticism. […] Reasonable skepticism, not unthinking rejection; a critical utilization, not a thoughtless embrace.
That’s a hard logical place to find; here’s hoping we can find it together.
____________________
Update: I wrote the piece above on 11/8. Today (11/16), Greg Ashman posted a thoughtful piece making very similar arguments. I wonder what coincidence implies about causation…
How do psychologists know what they know about human mental processes?
Quite often, they run studies to see how people behave: what do they remember? where do they look? what do they choose? how do they describe their thoughts?
If they run those studies just right, psychologists can test a very small number of people, and reach conclusions about a very large number of people.
Perhaps they can reach conclusions about all 7,400,000,000 of us.
Unless…
What if that small group of people being studied isn’t even remotely a representative sample of the world’s population. What if almost all of them are psychology majors at American colleges and universities?
What if they are–almost exclusively–from countries that are Western, Educated, Industrial, Rich, and Democratic?
(Notice that, cleverly, those adjectives acronym up to the word WEIRD.)
Here’s an example of the problem. Last year, I spoke about Mindset at the African Leadership Academy in South Africa: a school that draws students from all across the African continent.
And yet, I know of no research at all that studies Mindset in an African cultural context. I could share with them research from the US, and from Hong Kong, and from France, and from Taiwan. But Africa? Nothing.
How valid are Mindset conclusions for their students? We don’t really know–at least, “know” in the way that psychologists want to know things–until we do research in Africa.
(By the way: if you know of some Mindset research done in Africa, please send it my way…)
Beyond Psychology
This article over at The Atlantic does a good job of describing this problem in neuroscience.
Because the sample of the population included in neuroscience studies is so skewed, the conclusions we reach about…say…typical brain development schedules are simply wrong.
Better said: those conclusions are correct about the subset of the population being studied, but not necessarily correct for everyone else.
And, of course, most people are “everyone else.”
What Does This Problem Mean for Teachers?
Here’s my advice to teachers:
When a researcher gives you advice, find out about the participants included in their study. If those participants resemble your students, that’s good. But if not, you needn’t be too quick to adopt this researcher’s advice.
For example: if a study of college students shows that a particular kind of challenging feedback promotes a growth mindset, that information is very helpful for people who teach college.
But, if you teach 3rd grade, you might need to translate that challenging feedback to fit your students’ development. In fact, you might need to set it aside altogether.
Because participants in these studies are often so WEIRD, we should beware extrapolating results to the rest of the world’s students, including our own.
Does even a short bout of exercise immediately after learning help form long-term memories?
A recent article, published by Cognitive Research: Principles and Implications, suggests intriguing—even surprising—answers to this question.
From a different perspective, this article also offers useful insights into the way that psychologists think and work
Specifically, it helps answer a second question: what should researchers do when their data are inconsistent?
The Study
Steven Most and colleagues wondered if even 5 minutes of exercise immediately after learning would increase the exerciser’s memory of that information.
To test this question, Most had students study pairs of names and faces, and then do five minutes of exercise. (They stepped on and off a low platform.) He then tested their memory of those name/face pairs the following day, and compared their performance to two control groups.
Compared to one control group which did not exercise, these steppers remembered more words.
Similarly, compared to another control group which did exercise before they learned the name/face pairs, these steppers remembered more words.
But here’s the surprise. On average, the exercising men in the study remembered slightly fewer pairs than the non-exercising men. But the exercising women remembered more than twiceas many pairs as their non-exercising female peers.
This article opened with a question: does a short bout of exercise immediately after learning help form long-term memories?
The answer: it certainly seems to, but only for women.
Psychologists at Work
Although a lot of work goes into this kind of study, psychologists are rarely satisfied to examine a question just once. When they get these results—especially such interesting results—they’re inclined to repeat their study with slight variations.
They are, in effect, trying to prove themselves wrong. Or, at least, trying to discover the limits outside of which their findings aren’t true.
So, Most et. al. repeated their study. This time, instead of testing the students the following day, they tested them later the same day.
The results? They arrived at the same major findings. Although the women’s increase wasn’t so dramatic post exercise (they remembered almost twice as many name/face pairs, not more than twice as many name/face pairs), post-study exercisers still remembered more pairs than pre-study exercisers, and than non-exercisers.
Brace Yourself
Up to this point, Most’s team had gotten the same dramatic answer twice. What does a good psychologist do?
Most repeated the study again—this time using name/shape pairs instead of name/face pairs.
The results? Nada.
This time, none of the groups should significant differences at all. No differences between the pre- and post-study exercisers. No differences between the exercisers and non-exercisers. No meaningful gender differences. Bupkis.
So, you know what happens next: they performed their research paradigm a 4th time. This version was practically identical to the first; they simply made a slight change to the non-exercise task. (Crucially, Most’s team went back to name/face pairs.)
The results?
Drum roll please…
Basically, a nothingburger.
As was true in study #3 — but contrary to studies #1 and #2 — study #4 showed no statistically significant differences. As the authors write
“Examining the data only from the women, those in the exercise group exhibited somewhat better memory than those in the non-exercise group, but this [difference] fell short of significance.”
In the world of psychology, if a result falls short of statistical significance, you can’t make strong claims about your findings.
Psychologists at Work, Part II
Imagine that you’re a professional psychologist. You’ve spent months—probably years—running these studies. Some of your results—studies #1 and #2—are strong and compelling. Others—#3 and #4—don’t get you very far.
What do you do with this muddle?
As we asked at the top of this article: what should researchers do when their data are inconsistent?
The answer is: You publish it. You publish it all.
You say: look, we ran our studies and came up with a confusing and interesting collection of results. Here you go, world, see what you make of them.
You do not hide it. You do not, for example, publish studies #1 and #2 and pretend that #3 and #4 didn’t happen. You publish it all.
In fact, Most and colleagues went further. They created a handy graph (on page 11) making this inconsistency extremely clear. It’s easy to see that, for men, the short bout of exercise didn’t make much of a difference in any of the studies. For women, on the other hand, the exercise made a big difference in the first study, a modest difference in the second, and practically none in the 3rd and 4th.
Fig. 4 Means and 95% confidence intervals for each experiment indicating how many more paired associations were correctly recalled among female and male participants when the post-learning activity was exercise, relative to the non-exercise post-learning activity. For experiment 3, error bars reflect a repeated measures design, whereas those for the other experiments reflect independent measures designs. A meta-analysis across these experiments indicated that, among the female participants and with 95% confidence, 5 minutes of post-learning exercise increased memory for paired association by 0.40 to 4.63 items. Image from Most, S. B., Kennedy, B. L., & Petras, E. A. (2017). Evidence for improved memory from 5 minutes of immediate, post-encoding exercise among women. Cognitive Research: Principles and Implications, 2(1), 33.
Hats Off
Before I started attending Learning and the Brain conferences, I had been an English and Theater teacher for years. My undergraduate degree is in Medieval History and Literature; I have an MA (and half of a PhD) in English. I am, in other words, historically a Humanities kind of guy.
But I have to say, this article exemplifies some of the many reasons that I have grown to admire a scientist’s approach to teaching and learning.
Most and his colleagues, Briana Kennedy and Edgar Petras, not only tried to prove themselves wrong, they went out of their way to show the results when they partially succeeded in doing so.
Yes, there’s a lot of talk about a “replication crisis” in psychology. Yes, nobody knows what a p-value really means, and why .05 is the chosen threshold.
But at the end of the day, researchers like Most, Kennedy, and Petras are doing hard, fascinating, and helpful work—and they’re being remarkably straightforward with others about the complexity of their findings.
We should all admire this article. And me: I’m going to work out…
As most parents, teachers, and education policy folks know well, early childhood education is expensive. Whether federally-funded, state-funded, or family-funded, preschool and structured early care generally operate on a pretty tight budget. They also generally operate on pretty high hopes: academic achievement, personal growth, reduced delinquency, and much more.
And they should! As Ralph Waldo Emerson wrote, “there is no knowledge that is not power.” We certainly need to maintain high expectations for youth to get the most out of their academic careers. As well, we should expect the programs that we invest in to set children up for the success that they promise.
Show us the Results
So what happens when we don’t see those hopes result in program outcome data; in particular, at the state- and federally-funded program level?
Do we launch an investigation into what went wrong?
Do we take the money away?
Do we blame the teachers, or parents, or school districts?
The “what now?” of underwhelming achievement is a challenging road to venture down. For some context, check out my colleague Austin’s recent blog post regarding a newly published study looking at the infamous fadeout effects in Head Start preschools.
Unfortunately, questions of whom to blame have dominated much of the “what now?” conversation over the years. Yet some studies, like the one Austin discussed, are trending in a new, positive direction for developmental and educational research alike.
Let’s Re-think ‘Results’
This new genre of studies does two things. First, it looks at such factors as fidelity to a particular program’s plan. Let’s take Head Start as an example. Researchers will ask: how well and how often are Head Start’s specialized strategies actually being implemented in classrooms?
Second, and most important, these studies don’t stop there. Instead, they go on to broaden the idea of an outcome to include measures of mental health and social growth, and the image of a learning environment to include the home and child care centers.
Broadening what we think achievement is, and where we think learning happens, is an important movement. Of course, many developmental psychologists have been advocating for this broadening for years. Social psychologist Urie Bronfenbrenner, for example, began studying ways in which intra- and inter-person factors affect learning back in the 1970’s. But the merging of research questions that focus on individual context with research questions that focus on school program evaluation is an exciting new empirical endeavor.
Differential Susceptibility
An endeavor that we stand to gain a lot from. One way that these new context+program evaluation research questions are making an impact is in studies of early achievement and differential susceptibility (DS).
DS is a theoretical model that aims to understand why some things affect some people differently. In developmental research, DS refers to children who are more behaviorally or biologically reactive to stimuli and, as a result, more affected by both positive and negative environments. [1]
Study 1
Let’s look at a longitudinal study conducted by researchers at Birkbeck University of London. [2] They investigated the effects of early rearing contexts on children of different temperaments. The following data was collected from 1,364 families:
predictive measures
parents reported the temperament of their child at 6 months (general mood, how often they engage in play behavior, how well they transition to a babysitter, etc.);
parenting quality (i.e. maternal sensitivity) was assessed at 6 and 54 months during laboratory and home observations;
quality of child care (e.g. daycare) was assessed at 6, 15, 24, 36, and 54 months via observation
outcome measures
academic achievement, behavior problems, teacher-child conflict, academic work habits, and socio-emotional functioning were assessed regularly between 54 months and 6th grade
Results showed that children who had a difficult temperament in infancy were more likely than children who didn’t to benefit from good parenting and high-quality childcare. They also suffered more from negative parenting and low-quality child care.
Most pronounced was the finding of differential effects for child care quality. Here, high quality care fostered fewer behavior problems, less teacher-child conflict, and better reading skills while low quality care fostered the opposite — but, only for those children who had a difficult temperament.
The takeaway: children that had a difficult temperament in infancy were differentially susceptible to quality of parenting and child care. For them, the good was extra good, and the bad was extra bad.
Study 2
Researchers at Stanford University engaged high- and low-income kindergartners in activities designed to elicit physiological reactivity (measured by the amount of the stress hormone cortisol in their saliva). [3] In other words, the children completed activities that were difficult and kind of frustrating. They also completed a battery of executive function assessments.
It turns out that children who displayed higher reactivity (more cortisol) during the activities were more susceptible to their family’s income. That is, family income was significantly associated with children’s EF skills — but only for those children with high cortisol response. Highly reactive children had higher EF skills if their family had a higher income, but lower EF skills if their family was lower income.
The takeaway: children that were highly reactive when faced with challenging activities were differentially susceptible to their family’s resources. Their EF was particularly strong if their family had high income, yet particularly weak if their family had a lower income.
Evaluating Program Evaluation
How is being mindful of phenomena like differential susceptibility helpful when we receive the news that children made no special long-term gains after being enrolled in a publicly-funded program?
First, we should recognize that we may have set ourselves up for some disappointment at the outset if we assumed that all children would be equally susceptible to the positive effects of home or school interventions.
Of course, at school entry, we don’t necessarily know which students are arriving with difficult temperaments. Or whether their child care environment has exacerbated or buffered it. Which means that we’re also not going to be able (practically or ethically) to separate students by level of disadvantage in order to decide which program they should be enrolled in. So let’s just accept that we’re going to see some variation in individual outcomes.
Let’s also remind ourselves that variation is not necessarily reflective of an ineffective program. At Head Start, for example, it is probably safe to speculate that most families are juggling some amount of stress, financial instability, and social tension. And according to the DS model, students who are predisposed to be highly reactive will be hit hardest by these things. As a result, reactivity is probably going to interfere with their reaching what we define as success. But DS also tells us that they have the most to gain from a nurturing, consistent environment.
So let’s not take the money away. Let’s hold off on passing the blame around. And let’s not refer to these data as something going “wrong”. Let’s instead look at the students who continue to struggle and ask what contextual factors — such as a child’s weak self-regulation skills and their parent’s inability to address it in the way their teacher wants because they work two jobs — are at play.
I’m no gambler, but if we can figure those things out, and commit to doing something about them, then I say we double-down when it comes to funding.
References
Ellis, B. J., Boyce, W. T., Belsky, J., Bakermans-Kranenburg, M. J., & van IJzendoorn, M. H. (2011). Differential susceptibility to the environment: An evolutionary–neurodevelopmental theory. Development and Psychopathology, 23, 7–28. doi:10.1017/S0954579410000611 [link]
Pluess, M., & Belsky, J. (2010). Differential susceptibility to parenting and quality child care. Developmental Psychology, 46, 379-390. [link]
Obradovic, J., Portilla, X. A., & Ballard, P. J. (2015). Biological sensitivity to family income: Differential effects on early executive functioning. Child Development, 87(2), 374-384. doi: 10.1111/cdev.12475 [link]
Debates about the meaning and value of IQ have long raged; doubtless, they will continue to do so.
This article, by a scholar steeped in the field, argues that — even for those who see real benefit in focusing on IQ — it is essential to distinguish between fluid intelligence (the ability to solve new problems) and crystallized intelligence (knowledge already stored in long-term memory).
If you’ve read Todd Rose’s bookThe End of Average, you will remember that “talent is always jagged.” That is: two people who have the same IQ might nonetheless be very different thinkers — in part because their score might result from dramatically different combinations of fluid and crystallized subscores.
In short: even advocates for IQ see potential perils in misusing this well-known metric.
If you’re a Learning and the Brain devotee, you may have heard about p-values; you may even have heard about the “p-value crisis” in the social sciences — especially psychology.
This white paper by Fredrik deBoer explains the problem, offers some useful context, and gives you several strategies to see past the muddle.
Although deBoer’s considering very technical questions here, he writes with clarity and even a bit of humor. If you like digging into stats and research methodology, this short paper is well worth your time.
(As you may know, deBoer writes frequently — and controversially — about politics. I’m neither endorsing nor criticizing those views; I just think this paper makes an abstruse topic unusually clear.)
There are a few key steps to effectively incorporating MBE (Mind, Brain & Education) ideas and concepts into one’s daily teaching routine. The first key is the low hanging fruit, namely, educating oneself on the research about learning and the brain and what the research suggests are effective pedagogies. If you are reading this blog, you are probably already familiar with one fantastic resource for such information (shameless plug warning!)) – www.learningandthebrain.com
There are certainly plenty of resources out there and I strongly encourage you to seek them out. This first step is critical and has become easier in the last few years as more and more of the actual research is available online and more and more has been written for teachers as the target audience.
The second key step involves actually trying something new in your classroom, whether it is using more retrieval practice exercises [1], incorporating movement [2]or perhaps shifting to a more student-centered model for class discussions [3].
Quantum Leaps
But wait, your work is not done! Trying something new based on the conclusions of a research paper you read is certainly a big step but how will you know that the change you made was effective? What is your evidence that the change you incorporated actually improved student learning? THIS is the difficult part.
Analyzing the impact or effect of a new pedagogy is quite complex and requires the collection and analysis of data. While you may not be able to perform a double-blind controlled experiment–the gold standard in scientific research–you CAN analyze the impact of your intervention and use data to inform your teaching practice going forward.
So how do you collect data that can help you improve your teaching practice?
I have found that one of the most useful tools for collecting data is one of the easiest to set up and use, but is frequently one of the least likely to be used by teachers – videotaping your class.
Watching a videotape of your class and objectively analyzing the tape for evidence of improved learning can be extremely enlightening, illuminating and humbling.
Did I really only give Hermione 2 seconds of wait time before I moved on to Draco?
Were the students really trying to take notes, listen to me deliver content and participate in the conversation simultaneously?
Did I really shake my head in disapproval as Luke responded to my question with an answer that was way off base?
I have yet to find a teacher who enjoys watching himself/herself on video but have found that most teachers who actually go through with it find the experience to be incredibly informative. Watching your video with a trusted colleague or Critical Friends group can be even more thought provoking and lead to fruitful conversations about teaching and learning.
Data 2.0
I have been playing around with an exciting new tool for data collection lately that has the potential to make the time consuming analysis of videotape seem like a thing of the past. The app does a deep dive into an audio recording from class and provides me with nearly immediate data to analyze.
Here’s how it works: At the beginning of class I start an audio recording of the class on my phone and hit stop when the class is over. In the current iteration of the app, I upload the audio file to be analyzed and within an hour or so, I receive a report back on the class. Right now, the report includes data in 5 minute increments on:
My talking speed (words per minute)
How many questions I ask
The types of questions I asked – How? vs Why? vs What?
Percentage of the time that I was talking vs. the students were talking
Questions that I have been able to think more critically about with this data include:
Was my student-centered class discussion really as student-centered as I thought?
Am I asking questions that require surface level knowledge (“what”) or ones that will lead to more critical thinking on the part of my students (“why,” “how,” “if”)?
Am I speaking too fast when giving instructions as I set up an activity?
The app is still in its development phase and there are bugs to be worked out before it will be available to a wider audience; but if you are interested in participating in the pilot, you can sign up here. Of all the data collection tools out there, I think that this app has the potential to be an incredibly valuable tool for teachers as they attempt to evaluate the impact of changes in their practice.
For all of its potential uses, I do realize that their are potential dangers with the collection of this type of data. Who initiates the collection of the data? What if an evaluator or administrator wants to use the data? What are the privacy concerns about collecting this type of data? Who has access to the files and data?
All of these questions are important ones that need to be fleshed out to be certain; however, I believe that if properly used, this app has the potential to be a powerful tool for teachers who want to use data to inform their teaching as they incorporate new strategies and pedagogies.
Donna Wilson, Move your body, grow your brain, March 12, 2014 [link]
Goldschmidt, M., Scharfenberg, F. J., & Bogner, F. X. (2016). Instructional efficiency of different discussion approaches in an outreach laboratory: Teacher-guided versus student-centered. The Journal of Educational Research, 109(1), 27-36. [link]
The controversy over famous patient Henry Molaison — a.k.a. H.M. — is #7 on the Guardian’s list of top science news stories of 2016.
In brief: Luke Dittrich has accused memory researcher Suzanne Corkin of several ethical breaches — including shredding research — in her work with H.M. (Corkin’s peers have strongly defended her work.)
This story is rich in human interest: Dittrich, after all, is the grandson of the surgeon who — in an attempt to cure H.M.’s epilepsy — removed H.M.’s hippocampi.
And yet, given that Corkin’s work and H.M.’s story are foundational for many accounts of human memory, this controversy goes beyond family scandal to deep scientific import.