Quality Evaluation and Benchmarking

What We Do

QUALITY EVALUATION AND BENCHMARKING

Data-driven decision-making

Human quality evaluation, automated quality estimation

cApStAn masters the art of selecting the technology that works best for your project, of testing and of evaluating. We use professional versions of NMT engines and paid versions of language models specially designed for translation. We use AB testing and human evaluation to assess their output. Once the provider is selected, we leverage automated quality estimation (AIQE), establish benchmarks for each locale and provide confidence labels for translated content.

cApStAn Modular Approach – Our language services are organized in 20 modules that can be combined on the basis of needs, requirements and goals to arrange the best workflow for each project.
Download Our Modular Approach Document

Quality Evaluation and Benchmarking

Quality Evaluation of Automated Translation (AT)

What? Evaluation of the output of different engines, models and providers
How? Blind test on a sample of content translated by different engines.

For each translated segment, our linguists use a multidimensional framework to attribute accuracy and fluency scores. They also provide feedback about key issues. The system computes the proportion of segments that required no post-editing at all. At the end of this process, we have comparable feedback on translation quality for the different engines or models we evaluated.

We determine which solution performs best for a given language pair and domain

AI Quality Estimation (AIQE) and Thresholds

What? Comparison of automatically generated confidence scores with edit distance
How? Once we have selected the MT engine or language model, we use it to translate a sufficiently large sample of the content. A post-editor reviews and improves the automated translation as needed. An algorithm computes the edit distance (the number of deletions, insertions or substitution required to transform one string into another). At the same time, we automatically generate confidence scores for each segment of the translated sample. Our translation technology team compares the edit distance and the confidence scores to determine a threshold below which full post-editing is required.

Leveraging AI to help revisers focus on parts that actually need revision

Dual Verification

What? Instruments with specialized subject matter
How? Same process as full verification, but both a linguist and a subject matter expert (SME) work together to verify, and a cApStAn project manager combines their feedback into an actionable report.

Check content localization in addition to linguistic quality

Automated check using VeryFire™

What? Large-scale data collection instruments
How? Our translation technologists program project-specific rules and language-specific rules. VeryFire™, cApStAn’s in-house QA tool, automatically checks adherence to these rules or to a glossary, and flags all violations.

Add an extra quality check step

Machine translation quality evaluation

What? Instruments translated using machine translation
How? A combination of algorithms and targeted human evaluation, which will help make an informed decision about post-editing needs.

Take advantage of machine translation

Scroll down to read our linguistic quality control case studies.

Check Out Some of Our Case Studies

Machine Translation Quality Evaluation

Machine Translation Quality Evaluation & Post-Editing

About the project The client needed to code responses to open-ended questions from a survey that collected data from Chinese speakers. The client tried to use machine translation, but the [...]

Learn More
Translation validation of adult literacy survey

Translation validation of adult literacy survey – The LAMP

About LAMP The Literacy Assessment and Monitoring Programme (LAMP) organised by UNESCO’s Institute of Statistics (UIS) is an adult literacy survey designed to be administered in emerging countries, where building [...]

Learn More
OECD IELS

International Early Learning and Child Well-being Study (IELS)

About the Study International Early Learning and Child Well-being Study (IELS) is an OECD international survey that assesses emergent cognitive skills; social and emotional skills; and skills that draw from [...]

Learn More

Contact Us

We'd love to hear from you, be prepared for a quick response

This field is for validation purposes and should be left unchanged.

Brussels

Chaussée de La Hulpe 268, 1170 Brussels

+32 2 663 1729

Philadelphia

121 S. Broad Street, Suite 1710, Philadelphia, PA 19107

+1 267 469 2611