GUEST POST: Who Really Benefits from Retrieval Practice

GUEST POST: Who Really Benefits from Retrieval Practice

by Jen Coane, PhD and Meredith Minear, PhD

Coane headshot.JPG

Jen Coane, PhD, is an associate professor of psychology at Colby College, where she teaches courses in cognitive psychology and memory and directs the Memory and Language Lab. Her research examines how knowledge is acquired, how prior knowledge affects other cognitive tasks, and how individuals evaluate their own knowledge. Jen’s research is supported by Understanding Human Cognition Scholar Award # from the James S. McDonnell Foundation.

Minear headshot.jpg

Meredith Minear, PhD, is an assistant professor of psychology at the University of Wyoming, where she teaches courses in cognitive neuroscience and sensation and perception. She directs the Spatial Cognition Lab. Her research is centered around spatial processing and navigation as well as the role of individual differences in working memory and fluid intelligence on cognitive performance. Her lab is supported by the National Science Foundation, Award #1660996.


Recently, the fields of cognitive psychology and education have been awash in evidence that retrieval practice – the process of trying to answer questions or taking practice tests while studying – improves performance in the lab and in the classroom. In a nutshell, copious amounts of research suggest that children (1), middle school students (2), older adults (3), and individuals with various forms of brain damage (4) all perform better on a final test after taking intermediate tests instead of re-reading or restudying material. This is the Testing Effect.

Image from Pixabay

Image from Pixabay

So, all is good. Let’s all take tests and move on. Right? Well, maybe there are some of those pesky things researchers like to call boundary conditions. Fancy term for: It depends. Sometimes an intervention works and sometimes it doesn’t. Turns out, boundary conditions are really important to know about because they help us (nerdy researchers) understand when, why, and how something works. And understanding why and how is pretty much what we do for a living.

So, the testing effect. Does it really work all the time, for all people, on all materials? A few studies have looked into this very question. For example, working memory (which is important for language comprehension and problem-solving and involves actively holding information in mind to process it) seems to matter, such that students with lower working memory benefitted more from testing than participants with higher working memory (5). Another study examined how overall memory ability (measured by a standard test of memory) and fluid intelligence (which is the ability to engage in critical thinking and solve new problems) affected the testing effect (6). They also found that students who scored worse on both measures benefited more from testing than students with higher scores. So, it looks like retrieval practice might be a great equalizer – those who need it most, really benefit a lot. And overall, everyone seems to benefit, so all is good.

Well, it turns out things are not quite so simple. A lot of published studies have one very simple limitation – they do not generally examine or even report whether all participants show a testing effect. So we don’t really know for sure whether all the participants are benefiting from testing. Another thing we don’t really know is whether retrieval practice is equally effective for easy and hard to learn material. One reason for this is that it’s quite hard to define easy and hard – what is easy for some people might be hard for others, maybe because they don’t have as much prior knowledge. Or it may be because they have lower levels of some basic cognitive skill such as intelligence or working memory. So, we have these questions:

  1. Does everyone really benefit from testing?

  2. If not, are there cognitive skills and abilities predict who benefits from testing and who doesn’t?

  3. Does difficulty of the to-be-learned material matter?

To answer these questions, we designed a research study. We recruited over 300 participants and they completed a number of tasks, including measures of working memory, fluid intelligence, and vocabulary (which measures crystallized intelligence – stable knowledge, that is). To examine the testing effect, participants studied 48 Swahili-English word pairs and restudied 24 of them four times and practiced retrieving the other 24 four times (with feedback after each attempt, in which they were shown the correct answer). The Swahili-English pairs were selected to be easier or harder to learn, based on prior studies (7). For example, yai-egg was an easy- to-learn pair and kasuku-parrot was one of the hard ones. Two days later, participants came back to the lab and were tested on all pairs. At the end of the experiment, participants answered a question about how they studied the pairs.

The main results can be summarized pretty simply:

Coane and Minear Fig 1.jpg
  1.  Does everyone benefit from testing? Not really. About one-third of our participants did not show a testing effect. We call them “negative testers” in contrast to the “positive testers” who remembered more tested than restudied pairs. In fact, our negative testers performed worse after testing than after restudying. Was this because they performed worse overall, suggesting maybe they didn’t even try? No – in fact, and as can be seen in the figure below, negative testers did better overall than those showing a testing effect, especially on restudied items. So, it looks like for some participants – over 100 in our sample – testing actually impairs their learning.

  2. Do we know why some people benefit from testing and some do not? This is an open question – the two groups of participants did not differ in working memory, fluid, or crystallized intelligence. The only explanation we have from our data is that negative testers were more likely to self-report using deep processing strategies, such as making connections between the Swahili word and the English word, instead of just memorizing or repeating the pair. Deep processing means processing information for meaning, making connections to previously known material, or using strategies such as generating sentences to connect the words. So maybe they were using good strategies already and testing disrupted this.

  3. What about difficulty of to-be-learned material? Interestingly, the effects of pair difficulty seemed to depend on participant abilities. For those participants who did show a testing effect, those who scored lower on our test of fluid intelligence had a larger testing effect for easy pairs than for difficult pairs, whereas the opposite was true for the participants who scored higher on fluid intelligence. This suggests that there is a “sweet spot” where testing is most beneficial – when material is too easy or too difficult, the effect of testing is not as large as when it’s “just right.” Importantly, what is too easy or too difficult seems to depend on the abilities of the person doing the learning.

Image from Pixabay

Image from Pixabay

As is clear in the figure below, participants with high scores on the fluid intelligence measure did as well on the difficult pairs as those with low fluid intelligence did on the easy pairs. Why might fluid intelligence affect performance? One possibility is that fluid intelligence allows participants to switch to more effective strategies for more difficult pairs. Another possibility is that crystallized intelligence – vocabulary, in our study – also contributed. In fact, participants with high fluid intelligence also scored higher on the vocabulary test. Knowing more words can then help acquire new words, especially for the more difficult pairs. So, although we did observe a testing effect overall, after excluding the negative testers, the extent to which participants actually benefit from testing varies as a function of their pre-existing abilities.

Coane and Minear Figure 2.png

So, what does this mean?

First of all, although the benefits of testing have been extensively reported, it appears that not all participants, at least among college students, do benefit directly from testing. In fact, for about 1/3 of our participants, retrieval practice made them worse. Does this mean we should stop incorporating testing as a pedagogical technique? Probably not – there are other benefits of retrieval practice (such as more frequent review of material, increased metacognitive awareness, and so on [8]). However, we would encourage students and learners of all ages to critically evaluate what works for them and what doesn’t.

Second, even among participants who do benefit from testing, the benefits vary. Testing might be more beneficial at some levels of learning than at others. When material is too easy or too difficult, the expected benefits might not be evident and frustration could result. Again, we recommend critically assessing which strategies are effective in which situations.



(1) Karpicke, J. D., Blunt, J. R., & Smith, M. A. (2016). Retrieval-based learning: Positive effects of retrieval practice in elementary school children. Frontiers in Psychology, 7, 350. doi: 10.3389/fpsyg.2016.00350

(2) Agarwal, P. K., Bain, P. M., & Chamberlain, R. W. (2012). The value of applied research: Retrieval practice improves classroom learning and recommendations from a teacher, a principal, and a scientist. Educational Psychology Review, 24(3), 437-448. doi:10.1007/s10648-012-9210-2

(3) Coane, J. H. (2013). Retrieval practice and elaborative encoding benefit memory in younger and older adults. Journal of Applied Research in Memory and Cognition2(2), 95-100. doi:10.1016/j.jarmac.2013.04.001

(4) Sumowski, J. F., Coyne, J., Cohen, A., & Deluca, J. (2014). Retrieval practice improves memory in survivors of severe traumatic brain injury. Archives of Physical Medicine and Rehabilitation, 95(2), 397-400. doi:10.1016/j.apmr.2013.10.021

(5) Agarwal, P. K., Finley, J. R., Rose, N. S., & Roediger, H. L. (2017). Benefits from retrieval practice are greater for students with lower working memory capacity. Memory, 25(6), 764-771. doi:10.1080/09658211.2016.1220579

(6) Brewer, G.A., & Unsworth, N. (2012). Individual differences in the effects of retrieval from long-term memory. Journal of Memory and Language, 66, 407-415. doi: 10.1016/j.jml.2011.12.009.

(7) Nelson, T. O., & Dunlosky, J. (1994). Norms of paired-associate recall during multitrial learning of Swahili-English translation equivalents. Memory, 2, 325-335. doi: 10.1080/09658219408258951

(8) Roediger, H. L., Putnam, A. L., & Smith, M. A. (2011). Ten benefits of testing and their applications to educational practice. In J. Mestre & B. Ross (Eds.), Psychology of learning and motivation: Cognition in education (pp. 1-36). Oxford: Elsevier.