Recent research looks at the debate on man vs machine translation parity from a new angle

by Pisana Ferrari – cApStAn Ambassador to the Global Village

Tech companies driving the progress in neural machine translation (NMT) tend to fall into the trap of comparing NMT output with human output. Earlier this year Microsoft announced that its NMT had reached parity with human translation, which prompted lively debates in the linguistic ecosystem (1). A recent article in the Slator newsletter calls this debate both “fascinating and tiring” and reports on a new study led by a group of researchers from the universities of Zurich and Edinburgh, which looks at the issue from a new angle and highlights the importance of the broader context in evaluating translation quality (2)(3).

According to the Microsoft paper human parity is achieved when a bilingual human judges the quality equivalent. In their paper, Samuel Läubli, Rico Sennrich and Martin Volk tested the Microsoft claim by contrasting the evaluation of isolated sentences with entire documents and found that professional human translators strongly preferred human translations compared to NMT, when provided with the context of the entire document and not just single sentences. Laubli is quoted in the article as saying that Microsoft is not to blame for their evaluation as it followed “best practice” in the community but that MT has now reached a level of quality where this needs to change. In the submission for publication it is emphasized that the shift towards document-level evaluation is all the more necessary “as machine translation improves to the degree that errors which are hard or impossible to spot at the sentence-level become decisive in discriminating quality of different translation outputs”.(4)

At cApStAn LQC our linguists have been working in a technologically-rich environment for years and are constantly experimenting with combinations of automation and human discernment, aiming to develop what specialised press refers to as the “augmented translator model”. While we may agree that this novel approach is more conducive to a better evaluation of the quality of NMT we ultimately think that parity of NMT with human translation is far less interesting than reports on how efficient a human translator can become when integrating state of the art NMT in his/her translation workflow.


1) https://www.microsoft.com/en-us/research/publication/achieving-human-parity-on-automatic-chinese-to-english-news-translation/

2) https://slator.com/academia/in-human-vs-machine-translation-compare-documents-not-sentences/

3) “Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation” – Samuel Läubli, PhD Candidate at the University of Edinburgh and co-authors Dr. Rico Sennrich, Assistant Professor at the University of Edinburgh’s School of Informatics and Dr. Martin Volk of the Institute of Computational Linguistics at the University of Zurich

4) https://arxiv.org/abs/1808.07048, full report at: https://arxiv.org/pdf/1808.07048.pdf

Photo credit: Sergei Tarasov/Shutterstock