Translation quality: entertaining anecdotes don’t give you the big picture

Our history class on the Middle Ages was a revelation (I remain indebted to Herman Van Bostraeten at Sint-Barbaracollege): instead of just being taught the starting point and the end point of the Dark Ages, e.g. from the fall of the Western Roman Empire in 476 till the invention of the printing press by Gutenberg in 1444, our teacher extraordinaire broke us up into groups. He gave each group a different set of dates and events. And he asked us to first defend and then challenge “our” dates as the best selection of start and end thresholds for the Middle Ages. Although this must have been 1979, to this day I remember my group had the Council of Nicaea (325) and the Sack of Rome (410) as possible entry points into the Middle Ages, and the Fall of Byzantium (1453) and Columbus’ expedition to the Americas (1492) as plausible stepping stones towards Enlightenment.

So each group did some research and came up with arguments for and against our respective milestones. I can tell you that our class of 14-year-olds developed a robust understanding of what “Middle Ages” refers to – and very few of us remained allergic to history after this experience.

When reporting on the translation quality of an assessment, it is tempting to provide the client organisation or the end user with an astute selection of errors that required corrective action. When reporting on the target language version’s equivalence to the source, it is slick to come up with a memorable example of a mistranslation. However, like the Dark Ages, linguistic quality and equivalence to the source is too complex to be summarized using a few striking samples. These snapshots may or may not be representative of the overall balance between faithfulness to the source and fluency in the target version. There are different types of equivalence: semantic equivalence, register equivalence, normative equivalence to the source, conceptual equivalence across cultures… and there is psychometric equivalence, which no linguist should claim to be able to determine. This is the task of data analysts. Linguists may help investigating why a test item did not function as expected, but it would obviously be reckless to skip focus groups, cognitive pretesting, pilots and/or field trials on the sole basis of trust in the quality of a translation.

At cApStAn, we have a robust methodology in place, which includes a checklist of linguistic (and other) features that are known to influence the psychometric characteristics of an assessment item or a survey question, such as the relative length of key and distractors in multiple choice questions. We also have an online, searchable repository of translation memories of assessments and questionnaires in multiple languages; finally, but importantly, we have a great partner to conduct focus groups and cognitive pre-testing. As we did for the Middle Ages, we take it upon ourselves to examine, analyse, scrutinise the adapted version of questionnaires through different filters and from different angles. The use of intervention categories — that linguists are trained to use as a reference framework — contributes to a more harmonized feedback, and the number of interventions for each category is the first step towards a metric. Perhaps it is more interesting to find out that a given set of response categories raised translation issues resulting in meaning shifts in say 17 out of 40 languages than to read off a chart that in the Esperanto version of the test there were 17 grammar/syntax issues. Decoding the information gathered by linguists and by automated linguistic quality assurance routines requires discernment and, most of all, a holistic approach, similar to what our history teacher proposed to decode landmark dates of the Middle Ages.