Why TAUS’ 3-step recipe for better machine translation does not work for all

It is true that (much) more than half a century of research into machine translation (MT) has not resulted in a technical solution that comes close to “human parity”, contrary to claims based on obsolete metrics such as BLEU scores. It is also true that we have witnessed significant progress in the quality of raw MT output in the past 5-6 years. MT is useful and brings efficiency and productivity gains that can occasionally be spectacular. Larger volumes can be translated in shorter time frames. We’re nevertheless reaching a plateau. A variant of the Pareto principle: with the right MT engine and the right datasets and a good translation technologist, it will take 20% of the effort to get 80% of the translation just right. And another 80% of effort to get the last 20% right.

TAUS recipe

TAUS‘ three-step recipe for better MT (Evaluate, Build, Translate) was described in a recent article by Slator.

1. Evaluate: Training and customizing different MT engines and then selecting the engines with the maximum achievable quality in the customer domain.

2. Build: Creating in-domain customer-specific training datasets, using a context-based ranking technique.

3. Translate: Generating the improved MT.

TAUS says improvements demonstrated show scores between 11% and 25% over the baseline engines from Amazon, Google and Microsoft and that, in many cases, this brings the quality up to levels equal to human translation or post-edited MT.


While this recipe has theoretical merits, we believe that in practice it only works for a limited number of use cases: mainly use cases related to domains with a very specific terminology, a specialised jargon. That is when training the data engine on a dataset yields the best results.

When working with psychological assessments, certification exams, tests, survey questionnaires or other data collection instruments, machine translation and post-editing do not produce equivalent or comparable versions.

This is because the content is immensely variable, and the syntax and lexical choices contribute to the psychometric properties of the items. At cApStAn we do use MT for a number of business cases, but *not* for data collection instruments.

Contact us if you want to learn more about our solutions.


“A Recipe for Better Machine Translation”, Jaap van der Meer, Slator, July 4, 2022

Photo credit Shutterstock