Sit down to revise for almost any subject and you face the same choice: should you test yourself with multiple-choice questions, or with free-recall — open-ended prompts where you have to produce the answer from a blank page? The intuitive answer is that free recall must be harder, and therefore better. The intuition is half right. Free recall is generally harder, and difficulty matters, but the picture from a half-century of testing-effect research is genuinely more interesting than “harder is better.” Some carefully designed multiple-choice formats outperform free recall on long-term retention; some don’t. This article walks through what the research actually shows, and how to set up self-tests that turn quizzing into learning rather than just measurement.
The testing effect, briefly
Before getting to the format comparison: the underlying finding the whole conversation rests on. Henry Roediger and Jeffrey Karpicke’s 2006 Science paper “Test-Enhanced Learning” gave the testing effect its modern shape. They compared three groups of college students studying short prose passages. Group A studied the passages four times. Group B studied them three times and was tested once. Group C studied once and was tested three times. Measured immediately after the session, Group A — the most-studied group — performed best. But measured one week later, the order had reversed: Group C, the most-tested group, retained around 61% of the material, compared with about 40% for the most-studied group. Retrieval, not rereading, was what laid down durable memory.
The effect is one of the most heavily replicated findings in the cognitive psychology of learning. A 2014 meta-analysis by Andrew Rowland in Psychological Bulletin, covering 159 effect sizes, found a robust testing benefit across age groups, materials, and contexts. The size of the effect varies; its direction does not. Practising recall consistently outperforms practising re-exposure.
The follow-on question is the one this article addresses: what kind of test?
What multiple choice gets you
The objection to multiple-choice testing is well-known. You can hit the right answer without producing it from memory — recognition, the argument goes, doesn’t exercise the same retrieval pathway as recall, and so doesn’t strengthen memory in the same way. Worse, exposure to wrong answers (the “distractors”) might actively interfere with later recall by planting incorrect associations.
The research turns out to be more nuanced. A 2008 study by Larsen, Butler and Roediger compared multiple-choice and short-answer testing as study tools across several experiments. Their consistent finding: multiple-choice questions produced testing-effect gains. The size was smaller than for free recall in some conditions, comparable in others, and the direction of the effect was positive across the board. The “wrong-answer interference” worry — that distractors would corrupt later memory — is real but small, and in most realistic study contexts is outweighed by the testing benefit.
A 2009 paper by Marsh, Roediger, Bjork and Bjork in Memory & Cognition sharpened the picture. They found that the way multiple-choice testing interacts with later learning depends substantially on what happens after the test. If you take a multiple-choice quiz and never receive feedback on which answers were right, the wrong-answer interference effect is at its largest. If you take the quiz and immediately get clear feedback on the correct answer — particularly with a brief explanation — the interference effect largely vanishes, and the testing benefit dominates.
That feedback finding is one of the most practically useful things in the entire literature, and it’s almost always the difference between self-testing that works and self-testing that doesn’t.
What free recall gets you
Free recall — given a prompt, produce the answer with no scaffolding — is the harder test format, and “desirable difficulty” (a term coined by Robert Bjork in the 1990s) is one of the better-supported principles in modern learning research. Tasks that feel harder during study, within limits, tend to produce stronger long-term retention.
The mechanism is reasonably well understood. Free recall requires you to construct the retrieval pathway yourself, without the safety net of recognising the answer in a list. Each successful retrieval strengthens that pathway. Each unsuccessful retrieval — even when it ends in “I don’t know” — produces some learning when feedback follows (the so-called “pretesting” effect, replicated by Kornell and colleagues across multiple studies in the late 2000s and early 2010s).
The size of the free-recall advantage over multiple choice grows with the retention interval — the gap between study and test. On an immediate test, the two formats are often comparable. At a one-week interval, free recall typically pulls ahead. At a one-month interval, the gap widens further. If you’re studying for the day after tomorrow, format matters less. If you want the material to be available six months from now, the format gap becomes substantial.
When multiple choice wins
The most surprising and consistent finding in this area: well-designed multiple-choice questions can produce free-recall-comparable retention if they meet a few criteria.
The distractors are plausible. A four-option question with three obviously-wrong distractors is functionally a free-recall question (you knew the answer without seeing the options). A four-option question with three plausible distractors forces real discrimination and exercises a genuinely retrieval-adjacent process.
Feedback is immediate and informative. As Marsh and colleagues found, feedback timing is doing a lot of the work. Quizzes that show the correct answer the moment you answer, with a brief explanation, produce gains close to free-recall levels.
The test happens more than once, spaced. A single multiple-choice test on Monday is a much weaker intervention than three multiple-choice tests on Monday, Wednesday, and the following Monday. The spacing effect (Cepeda et al., 2008) compounds with the testing effect.
A 2014 study by Pyc, Agarwal and Roediger, summarising the evidence for educators in Applied Cognitive Psychology, concluded that well-designed multiple-choice quizzing is a strong study tool and that the popular assumption that “free recall must be better” overstates the difference. The bigger gap in real classrooms tends to be between students who test themselves at all and students who don’t — not between students who test themselves one way versus another.
A practical guide to designing self-tests that work
If you’re putting together your own quizzes — whether for revision, language learning, or knowledge maintenance — the consolidated guidance from this research breaks down to a small number of rules.
-
Test yourself rather than rereading. This is the headline finding and it’s worth restating. The lowest-effort study habit that visibly fails the research is rereading. The highest-effort habit that consistently wins is testing.
-
Use a mix of formats. Free recall for the material you most want to retain long-term; multiple choice for breadth coverage and warm-up. A typical weekly revision routine might be 20 minutes of free-recall practice on the most important topics, followed by 20 minutes of multiple-choice coverage across a broader set.
-
Get feedback every time. Without feedback, multiple-choice testing risks installing wrong answers; with feedback, the risk largely vanishes. Spend the extra moment to read the correct answer and the explanation before moving on.
-
Space the tests. A 2008 study by Cepeda, Vul, Rohrer, Wixted and Pashler found that the optimal spacing depends on how long you want the material to last. For one-week retention, a one-day gap between sessions worked best. For one-month retention, a one-week gap. For one-year retention, around a one-month gap. The general principle: longer retention requires longer gaps. Cramming is the opposite of this and has a worse return per minute invested than almost any other intervention.
-
Make distractors plausible. If you’re writing your own multiple-choice questions, the test of a good distractor is whether someone who half-knows the topic could reasonably pick it. Implausible distractors give you false confidence without the underlying learning.
-
Test before you’ve finished studying. Self-testing on material you only partly know — including answering “I don’t know” — produces gains even before feedback. This is the pretesting effect, and it’s one of the better-replicated findings of the last fifteen years.
-
Don’t conflate “feels easy” with “is learning.” Bjork’s desirable-difficulty principle is uncomfortable to apply. Rereading feels productive because the material flows past you smoothly. Testing yourself feels punishing because you’re constantly hitting the limits of what you actually know. The discomfort is the point. The smoother study session is usually the less effective one.
What this means for using a quiz site
If you’re a casual user of an online quiz platform — playing rounds for entertainment, with the side benefit of learning — none of this is critical. You will still pick up information, particularly over time, simply by playing.
If you’re using one as a deliberate study tool, the implications are more pointed. The most effective use pattern is:
- Play a round on a topic you want to know better.
- Read the feedback for each question — particularly the wrong ones.
- Return to the same topic later in the week and play another round.
- Test yourself across the topic from memory, without the quiz, every few weeks.
That last step is the one most quiz players skip, and it’s the one that converts “answering a quiz question” into “knowing the material.” Free recall, even brief and informal — “what do I remember about the geography of South America” — is the strongest single learning intervention available to you, and you don’t need any tool to do it.
For a longer reading on the underlying habits, our evidence-based study guide and improving general knowledge pieces sit alongside this one. If you want to put the principles to immediate use, pick a category from the main quiz list and play a round; the feedback on each question is built specifically for the testing-effect mechanism this article describes.
Sources and further reading
- Roediger, H. L., & Karpicke, J. D. (2006). Test-enhanced learning: taking memory tests improves long-term retention. Psychological Science, 17(3), 249–255.
- Rowland, C. A. (2014). The effect of testing versus restudy on retention: a meta-analytic review of the testing effect. Psychological Bulletin, 140(6), 1432–1463.
- Larsen, D. P., Butler, A. C., & Roediger, H. L. (2008). Test-enhanced learning in medical education. Medical Education, 42(10), 959–966.
- Marsh, E. J., Roediger, H. L., Bjork, R. A., & Bjork, E. L. (2007). The memorial consequences of multiple-choice testing. Psychonomic Bulletin & Review, 14(2), 194–199.
- Kornell, N., Hays, M. J., & Bjork, R. A. (2009). Unsuccessful retrieval attempts enhance subsequent learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35(4), 989–998.
- Cepeda, N. J., Vul, E., Rohrer, D., Wixted, J. T., & Pashler, H. (2008). Spacing effects in learning: a temporal ridgeline of optimal retention. Psychological Science, 19(11), 1095–1102.
- Pyc, M. A., Agarwal, P. K., & Roediger, H. L. (2014). Test-enhanced learning. In V. A. Benassi, C. E. Overson, & C. M. Hakala (Eds.), Applying Science of Learning in Education. American Psychological Association.
- Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In J. Metcalfe & A. Shimamura (Eds.), Metacognition: Knowing about knowing. MIT Press.