I would really value any insights any historians could share. Thank you!
I love thinking about the pencil and paper multiple choice test as something that had to be invented, like the lightbulb or socks. Before we get to their invention in the early 20th century, it's helpful to start a century or so earlier. I’m going to be borrowing from a few older answers I’ve written and wander back and forth between college and high school histories as the history of multiple choice involves secondary and tertiary education.
To start: the concept of presenting a learner with a question that has one right or best answer is as old as formal education itself and even makes an appearance in the Bible (Wainer, 2011, p. 4):
The Bible (Judges 12:4–6) provides an early reference in Western culture. It describes a short verbal test that the Gileadites used to uncover the fleeing Ephraimites hiding in their midst. The test was one item long. Candidates had to pronounce the word shibboleth; Ephraimites apparently pronounced the initial sh as s.
Educators throughout history have struggled with one of the big questions of learning in a way not unlike how we talk about a falling tree in the forest: if a teacher teaches something and the student hasn't learned it, has teaching occurring? (Educators throughout history have had a wide range of answers to that question.) Which is to say, "I taught it, therefore they learned it" is a flawed construct. As formal education spread around the English-speaking world, the most common form of uncovering student learning was known as “recitation.” The term was multi-purpose, covering all of the ways in which a young person would share their learning verbally with a teacher following instruction or independent study. Students would learn new information, by reading it themselves or listening to a teacher/tutor/professor, and then repeat it back.
Recitations could be whole class, as in A Visit to Boston Schools, (1856) where the author describes:
... in another, the Hancock, for girls, a sister of the Quincy, our visit occurred just at recitation. The teacher gave a slip of paper to a gentleman present, requesting him to write on it the names of several cities or towns in some way noticeable. Meanwhile, he said to the class, "English kings." They at once repeated in excellent concert, "Egbert, Ethelwolf, Ethelbald, Ethelbert," down to Victoria.
Or individual as seen in college entrance exams where a young man was given explicit instructions on which Greek or Latin texts to memorize and would be asked to recite them on demand. Recitations could also happen during a class lecture and take on more of the form of a Q and A between a professor and a student or the professor and the entire class. Such public demonstrations of learning in front of the entire class were the norm at colleges until well into the 1800s. By the 1850s though, professors were starting to push for a change for a greater reliance on written responses as a form of assessment. One of the most vocal advocates for written exams was Charles Eliot, who would go on to be the president of Harvard. He was instrumental in getting the Mathematics department at Harvard to shift from recitations to written exams and other departments slowly followed suit.
Around the same time Eliot and his fellow schoolmen at Harvard were pushing for a shift away from a reliance on verbal assessment of student learning, advocates for tax-payer funded grammar and high schools up and down the East coast were doing something similar. Students’ success – or lack thereof - at school exhibitions and public demonstrations of their learning was a popular source of evidence for educators looking to make a point. Frederick Douglass’ North Star paper frequently contained glowing descriptions of Black students’ recitations and white public school advocates described the difference in quality between struggling donation-funded students and those at tax-payer-funded schools (Reese, 2013). However, these reports were all removed from the students. It was impossible for an evaluator to make it to all the schools they needed to report on in a reasonable amount of time and there was a concern that adults might be less than fully honest in their summary of student performance if they had a particular like – or dislike – for a specific school or its principal. Or that students and their teachers were more concerned with looking good than anything else. (A common anecdote is that of a schoolman who attended a school recitation and heard student after student deliver long passages they’d memorized. The schoolman reportedly pulled aside a child and asked them to spell cow. The student couldn’t.)
Meanwhile, some education leaders would focus on reporting what was easier or what felt more objective to report. For example, in New York State, the school superintendent traveled the state to visit schools and document their current status and in most cases, wrote a few lines about the students and the physical state of the building but pages and pages about the names of textbooks, salaries, count of outhouses, etc. Schools did use writing assessments before this point and have students write in “copy” books but there wasn’t a uniform approach. Schoolmen would review students’ copybooks and note quality handwriting – so teachers would make sure the copybooks they looked at were those belonging to students with the best penmanship. By the 1860, there was a demand among policy makers and educators for more reliable, consistent, and trustworthy measures of student learning.
States began shifting to paper assessments as the predominant form of assessment, rather than verbal. New York State started their high school exit exam structure (known as the Regents Exams - an example of a question from 1870) - and schools in Massachusetts started giving students in grammar and high schools a common - or standardized – assessments. The greatest advantage of paper assessments was that an evaluator did not need to be present at the time of the assessment. A school superintendent could distribute papers to all of the schools in their charge, tell teachers to administer the test at a particular time and then return to collect them. These assessments were used in a variety of ways and soon after their adoption, there were concerns about the amount of time it took to review and score them.
These concerns hit their stride just as American schools were being stretched to their limits by influxes of immigrant children. They weren’t enough schools, seats, or dollars to go around and every schoolman was eager to show their way of running a school was the best, and most importantly for the purpose of your question, the most effective. Although historians are still debating the extent of the movement’s impact on American schools, the “scientific management” idea was incredibly popular among management figures in the late 1800s, early 1900s. Stand-alone schoolhouses across the country consolidated into school districts and the role of the evaluator, at the high school and college level – became even more focused on numbers and things that could be counted as a way to increase efficiency. It's important to stress that these standardized tests were mostly about large-scale evaluation, not individual student - much less teacher - performance. (Except for NYS' Regents exams, no other state went that route.)
All of which is to say that by World War I, schools and colleges were primed for a new tool described as a way to objectively measure student learning and do it effectively and efficiently. The first thing that needed to happen was the creation of that new tool. It’s generally recognized that the creator of the multiple choice item – that is, a question where a test taker is given a list of choices and must select one – is Frederick Kelly in 1914. From Anya Kamenetz’s 2015 book The Test:
The multiple-choice question was an important technique for simplifying and mass-producing tests. Frederick Kelly completed his doctoral thesis in 1914 at Kansas State Teacher’s College. He recognized that different teachers tend to give different judgments of student work. And Kelly saw this as a big problem in education. He proposed eliminating this variation through the use of standard tests with predetermined answers. His Kansas Silent Reading Test was a timed reading test that could be given to groups of students all at the same time, without requiring them to write a single sentence, and graded as easily as scanning one’s eyes down a page.
Kelly’s invention was soon folded into work being done in the US Army’s work related to intelligence testing as part of World War I readiness. Inspired by the work of Alfred Binet, those responsible for assigning soldiers to various duties thought the best way to determine what a soldier would be well-suited for would be by testing their intelligence; the higher a soldier’s score, the better suited they were for leadership or tasks with a high degree of responsibility. The lower their score, the more likely they would be assigned grunt or unsafe work. There is a whole other answer regarding just how wildly racist and ableist the test – and subsequent IQ tests – was but that’s outside the scope of your question.