Machine Translation Quality Evaluation

Machine Translation Quality Evaluation & Post-Editing

About the project

The client needed to code responses to open-ended questions from a survey that collected data from Chinese speakers. The client tried to use machine translation, but the results were unsatisfactory and unreliable. The project faced several challenges, such as a large volume of data (~211K words), a tight deadline (one week), and a limited budget. The client sought an efficient and cost-effective solution that could ensure the quality and accuracy of the coding process. We identified an appropriate machine translation quality evaluation process.

Project Scope with cApStAn

Machine Translation Quality Evaluation

The project scope with cApStAn involved leveraging automated translation quality estimation to enhance the efficiency and accuracy of translating content.

  • We selected and tested a Machine Translation (MT) engine that worked well for Chinese-to-English translation
  • A Translation Quality Estimation (TQE) tool was also selected and tested to decide which metric to use for quality assurance
  • A threshold was determined using the metric; translations above the threshold required no post-editing while those below it needed rapid post-editing (RPE)
  • cApStAn post-edited a sample of responses above the threshold to validate the approach and ensure optimal translation quality


  • A human-in-the-loop methodology was employed to evaluate and enhance MT output
  • A random 10% sample of the raw MT output was evaluated and post-edited by humans, categorizing translations into ‘Perfect’, ‘OK’, and ‘Low Quality’
  • The same sample was also assessed by COMET, a metric that quantified translation quality between 0 and 1
  • A COMET score threshold (between 0 and 1, with 1 = perfect) was established to differentiate between translation quality levels.
  • In this case, a score below 0.8 correlated with ‘Low’ and ‘OK’ translations, so a threshold of 0.81 was deemed safe



Translation Process diagram

Machine Translation Quality Evaluation


Human Rapid Post-editing (RPE)

  • The post-editor worked on responses that scored ≤ 0.81
  • Some serious issues were detected. Example of RPE below (segment with 0.7451 COMET score)

Machine Translation Quality Evaluation

  • As a final quality control step, we selected a sample of a few segments (that had a score of higher than 0.81. Over 95% of the segments passed the quality check.

Outcome Achieved

  • Delivered the translations of 5708 responses (211K words) within 5 working days.
  • We post-edited ~40% of the raw MT output, ensuring quality and accuracy in the translations
  • cApStAn delivered the translations in the client’s survey response analysis format with post-edited segments highlighted, facilitating clarity and ease of understanding
  • Provided comments to explain specific terms like “KTV” (Abbreviation for “karaoke in mainland China)

Applications & Caveats


  • Automatically generated reports (e.g. of personality screening tests)
  • Survey responses
  • Interview transcriptions


  • High-stakes tests
  • Publication grade content
  • Executive summaries


  • MT output vary per language pair and per domain
  • Thresholds to be determined for each project

Want to try this out on your materials?

Sign a mutual NDA and send your content along with the requirements. Request a quote at or contact us via the form below for more details.

See also, from our blog

 The translator of the future is a human-machine hybrid

Latest “IEA Insider” now out featuring article by cApStAn LQC co-founder Steve Dept

Contact Us

We'd love to hear from you, be prepared for a quick response

This field is for validation purposes and should be left unchanged.


Chaussée de La Hulpe 268, 1170 Brussels

+32 2 663 1729


121 S. Broad Street, Suite 1710, Philadelphia, PA 19107

+1 267 469 2611