16.03.2025

Machine Translation Quality Evaluation & Post-Editing

About the project

The client needed to code responses to open-ended questions from a survey that collected data from Chinese speakers. The client tried to use machine translation, but the results were unsatisfactory and unreliable. The project faced several challenges, such as a large volume of data (~211K words), a tight deadline (one week), and a limited budget. The client sought an efficient and cost-effective solution that could ensure the quality and accuracy of the coding process. We identified an appropriate machine translation quality evaluation process.

Project Scope with cApStAn

Machine Translation Quality Evaluation

The project scope with cApStAn involved leveraging automated translation quality estimation to enhance the efficiency and accuracy of translating content.

We selected and tested a Machine Translation (MT) engine that worked well for Chinese-to-English translation
A Translation Quality Estimation (TQE) tool was also selected and tested to decide which metric to use for quality assurance
A threshold was determined using the metric; translations above the threshold required no post-editing while those below it needed rapid post-editing (RPE)
cApStAn post-edited a sample of responses above the threshold to validate the approach and ensure optimal translation quality

Methodology

A human-in-the-loop methodology was employed to evaluate and enhance MT output
A random 10% sample of the raw MT output was evaluated and post-edited by humans, categorizing translations into ‘Perfect’, ‘OK’, and ‘Low Quality’
The same sample was also assessed by COMET, a metric that quantified translation quality between 0 and 1
A COMET score threshold (between 0 and 1, with 1 = perfect) was established to differentiate between translation quality levels.
In this case, a score below 0.8 correlated with ‘Low’ and ‘OK’ translations, so a threshold of 0.81 was deemed safe

Translation Process diagram

Human Rapid Post-editing (RPE)

The post-editor worked on responses that scored ≤ 0.81
Some serious issues were detected. Example of RPE below (segment with 0.7451 COMET score)

As a final quality control step, we selected a sample of a few segments (that had a score of higher than 0.81. Over 95% of the segments passed the quality check.

Outcome Achieved

Delivered the translations of 5708 responses (211K words) within 5 working days.
We post-edited ~40% of the raw MT output, ensuring quality and accuracy in the translations
cApStAn delivered the translations in the client’s survey response analysis format with post-edited segments highlighted, facilitating clarity and ease of understanding
Provided comments to explain specific terms like “KTV” (Abbreviation for “karaoke in mainland China)

Applications & Caveats

THIS PROCESS IS SUITED FOR

Automatically generated reports (e.g. of personality screening tests)
Survey responses
Interview transcriptions

THIS PROCESS IS NOT SUITED FOR

High-stakes tests
Publication grade content
Executive summaries

SOME TESTING IS ALWAYS NEEDED

MT output vary per language pair and per domain
Thresholds to be determined for each project

Want to try this out on your materials?

Sign a mutual NDA and send your content along with the requirements. Request a quote at bizdev@capstan.be or contact us via the form below for more details.

See also, from our blog

“The translator of the future is a human-machine hybrid”

“Latest “IEA Insider” now out featuring article by cApStAn LQC co-founder Steve Dept”

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.