Automatic Item Generation in testing

Automatic Item Generation in testing

by Pisana Ferrari – cApStAn Ambassador to the Global Village

Automatic item generation (AIG) is one of the innovations in testing discussed at the recent Association of Test Publishers’ (ATP) annual conference in San Antonio, Texas. Multiple choice questions have been the traditional way of testing for generations; the advent of new forms of computerised assessement has led to a demand for very large numbers of new, diverse and high quality multiple choice items, especially in high stakes tests. The challenge is that developing new test items is a lengthy, labor intensive and costly process. AIG is a process that involves using “models” to generate items with the aid of computer technology. A model can be defined as a “representation of the knowledge, skills and abilities that are required to solve a problem in a specific domain”. By using combinations AIG can produce produce both more and a greater range of test items in less time, reducing costs.

Two presentations in particular at ATP analysed the benefits and challenges of AIG. The first one was about the use of AIG to develop items for the Canadian Organization of Paramedic Regulators’ COPR, a high stakes testing program. Using AIG, it takes approximately 40% less time to create the same number of items, resulting in a clear reduction in costs. With templates covering the entire “breadth of the competency” the item bank also covers the domain better, which makes for more valid measurement. Another presentation, from the Medical Council of Canada, focused on how the use of “cognitive” models can improve the quality of items, not only in AIG but also in the traditional (human) item development approach. Thousands of items were generated from 70+ “cognitive” models and did well in pilots in terms of terms of accuracy and retention rates. The quality of the items is much improved thanks to the effort in creating cognitive maps, not only for correct response, but also for the distractors (clinical reasoning errors). In both cases the AIG distractors tended to be stronger and function better.

Although translation and localization were not the focus for this year’s program, our team at the conference had interesting conversations exploring how translation can be taken into consideration in AIG models from beginning stages, and how our processes are helping organizations in the tests and assessments space achieve their desired goals in localization. The AIG models presented at the conference were not yet being used for multilingual assessments, but with their increased implementation, translation will surely become a necessity. Just as we at cApStAn advocate for incorporating translation into other item development processes, we believe that it will be advantageous to begin to consider how AIG can be used to produce multilingual versions of an assessment.

cApStAn LQC has 18+ years of experience in linguistic quality assurance for data collection instruments in large scale international assessments, including the OECD/PISA and PIAAC. Look us up at: capstan.be

ATP sessions

“The Experience of Using Item Mass Production in a High Stakes Testing Program”: Greg Sadesky, Janel Swain.

“Innovative solutions to improve the quality of your items”: Vikas Wadwani, André F. DeChamplain, Ada Woo, Manny Straehle.