GUEST POST: Multiple-choice Testing: Are the Best Practices for Assessment Also Good for Learning?
By: Andrew Butler
Dr. Butler is an associate professor in the Department of Education and the Department of Psychological & Brain Sciences at Washington University in St. Louis. He earned a Ph.D. in cognitive psychology at Washington University in St. Louis in 2009 and completed a postdoctoral fellowship at Duke University. Dr. Butler is interested in the malleability of memory – the cognitive processes and mechanisms that cause memories to change or remain stable over time. More specifically, his research focuses on how the process of retrieving memories affects the content (e.g., events, specific details, narrative structure, etc.) and phenomenological characteristics (e.g., confidence, emotional intensity, vividness, etc.) of those memories. His program of research addresses both theoretical issues in cognitive psychology and practical applications to education and mental health. You can find more information about him at his Washington University Website.
Multiple-choice tests are very popular in education for a variety of reasons – they are easy to grade, offer greater objectivity, and can allow more content to be covered on a single test. As a result of its popularity and utility, multiple-choice testing has been the focus of a lot of research. The vast majority of this research has focused on issues related to assessment (e.g., reliability, validity, etc.), and it has produced many pieces of practical advice for educators on how best to construct and use multiple-choice tests to evaluate learning (1). However, tests do more than just assess learning – they also cause learning (2, also see this blog post on the many benefits of retrieval practice). That is, every time students retrieve information from memory and use it to answer a test question, they are potentially strengthening (i.e. better retention) and/or changing the representation of that information in memory (i.e. deeper understanding). Numerous studies have investigated the consequences of multiple-choice testing for learning, and this research has also produced practical advice for educators. Given that all multiple-choice tests will inevitably both cause and assess learning (i.e. no matter how low or high the stakes), it is important to consider whether there is agreement in the practical recommendations about how to use multiple-choice tests from these two research literatures.
The good news is that there is consensus – here are some best practices from the assessment literature with a brief description of relevant research from the learning literature that support it:
1. Make the test challenging, but not too difficult
When using a multiple-choice test for the purpose of assessment, the goal is to create items that successfully differentiate among students based on how well they know the material being tested. In practice, this produces tests that challenge students but also allow them to succeed when they have the requisite knowledge, ideally resulting in a range of performance with a relatively high average (e.g., educators often aim for a mean of 80%). The goal is the same when designing a multiple-choice test to facilitate student learning – average performance should be relatively high for all students. This recommendation is based on the finding that multiple-choice tests can produce positive and negative effects on learning (3). When students successfully retrieve and use their knowledge on the test, they learn the right thing. However, when students fail to retrieve and select an incorrect alternative (i.e. a lure or distractor), they can potentially learn the wrong thing (4,5). Research shows that as performance on a multiple-choice test improves, the positive effects on learning are increased and the negative effects are decreased (6). One important caveat is that increasing performance by making the test super easy actually undermines learning because students do not have to process the information in a meaningful way. Overall, if a test is too hard or too easy, then it is essentially useless for both learning and assessment.
2. Use three or four plausible answer choices
In research on assessment, the recommendation for the optimal number of answer choices adheres to the Goldilocks principle (do not use too many or too few), and the exact number chosen should be driven by the number of plausible incorrect alternatives that one can produce (1). The findings from research on using multiple-choice testing for learning suggest the same thing, but with some added nuance. Generally speaking, increasing the number of answer choices increases the difficulty of the item, assuming all the alternatives are plausible. Research shows that using more alternatives tends to decrease the positive effects and increase the negative effects on learning; however, these same studies show that if students can overcome the greater difficulty and answer the item correctly, then they actually benefit more than if they had correctly answered an easier item (6,7). This pattern is also found with other ways of ramping up the difficulty of multiple-choice items, such as increasing the conceptual similarity of the answer choices (8).
One other way that educators use to increase the difficulty of an item is to include a distractor that is the correct answer to another item. Here too the same pattern emerges – students can benefit if they process but reject such distractors because it facilitates answering a subsequent question (9). The bottom line is that using three or four plausible answer choices is a good heuristic for producing a moderate amount of difficulty without overdoing it – and that is the right idea for assessment and learning. Nevertheless, the level of difficulty may need to be adjusted depending on where students are in the learning process in order to sufficiently challenge them while still allowing them to succeed.
3. Avoid using “none-of-the-above” and “all-of-the-above” as answer choices
Many educators like to include “none-of-the-above” and “all-of-the-above” as alternatives on multiple-choice questions, but the general consensus from research on assessment is that such answer choices should be avoided because they reduce the discriminability of the item (1,10). Research on using multiple-choice tests for learning makes the same recommendation. Generally speaking, including “none-of-the-above” as an alternative tends to be harmful to learning when it is the correct answer and to have a negligible effect if it is a distractor (11). This finding makes sense because when “none-of-the-above” is correct, students do not necessarily need to retrieve the correct information to successfully answer the question and they are also exposed a lot of incorrect information. In contrast, “all-of-the-above” can be beneficial in certain circumstances when it is the correct alternative (12), but it is unclear how broadly this finding generalizes. Using “all-of-the-above” as the correct answer choice does have the advantage of exposing the learner to only correct information because all the alternatives are true. That said, “all-of-the-above” items tend to be easier than other types items, which could be good or bad depending on the level of challenge students need. One other consideration with using “none-of-the-above” and “all-of-the-above” is how often they are used within a particular test because it may affect the strategy that students use when taking the test. For example, if they are rarely used, it may signal to students that they are the correct alternative (i.e. regardless of whether they are or not), and lead them to answer quickly without fully processing the question and alternatives. In sum, it is probably best to refrain from using “none-of-the-above” and “all-of-the-above” as alternatives – any possible benefits to learning are small and they can be detrimental to assessment.
In summary, the recommendations for creating and using multiple-choice questions from the assessment literature are well-aligned with the recommendations from the learning literature. When using multiple-choice tests, the goal should be to challenge students while still allowing them to largely succeed. In creating and subsequently evaluating items, consider the thought process that students will be using to arrive at the correct answer and make sure that it is productive for their learning. Keep it simple – avoid unusual item types, large numbers of alternatives, trick questions, etc. And finally, provide students with feedback – the negative effects of multiple-choice testing can be substantially reduced (if not eliminated) by having students go through the correct answers after the test (6).
(1) Haladyna, T. M., Downing, S. M., and Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15, 309- 344.
(2) Roediger, H. L., III, & Butler, A. C. (2011). The critical role of retrieval practice in long-term retention. Trends in Cognitive Sciences, 15, 20-27.
(3) Marsh, E. J., Roediger, H. L. III, Bjork, R. A., & Bjork, E. L. (2007). The memorial consequences of multiple-choice testing. Psychonomic Bulletin & Review, 14, 194-199.
(4) Roediger, H. L. III, & Marsh, E. J. (2005). The positive and negative consequences of multiple- choice testing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 1155-1159.
(5) Brown, A. S., Schilling, H. E., & Hockensmith, M. L. (1999). The negative suggestion effect: Pondering incorrect alternatives may be hazardous to your knowledge. Journal of Educational Psychology, 91, 756-764.
(6) Butler, A. C., & Roediger, H. L. III. (2008). Feedback enhances the positive effects and reduces the negative effects of multiple-choice testing. Memory & Cognition, 36, 604–616.
(7) Butler, A. C., Marsh, E. J., Goode, M. K., & Roediger, H. L., III (2006). When additional multiple-choice lures aid versus hinder later memory. Applied Cognitive Psychology, 20, 941-956.
(8) Whitten, W. B., & Leonard, J. M. (1980). Learning from tests: Facilitation of delayed recall by initial recognition alternatives. Journal of Experimental Psychology: Human Learning and Memory, 6, 127-134.
(9) Little, J. L., Bjork, E. L., Bjork, R. A., & Angello, G. (2012). Multiple-choice tests exonerated, at least of some charges: Fostering test-induced learning and avoiding test-induced forgetting. Psychological Science, 23, 1337-1344.
(10) Pachai, M. V., DiBattista, D., & Kim, J. A. (2015). A systematic assessment of ‘none of the above’ on multiple choice tests in a first year psychology classroom. The Canadian Journal for the Scholarship of Teaching and Learning, 6, Article 2.
(11) Odegard, T. N., & Koen, J. D. (2007). “None of the above” as a correct and incorrect alternative on a multiple-choice test: Implications for the testing effect. Memory, 15, 873-885.
(12) Bishara, A. J., & Lanzo, L. A. (2015). All of the above: When multiple correct response options enhance the testing effect. Memory, 23, 1013-1028.