28.03.2019

Promising research on machine translation for low-resource languages

Published in: Translation industry, Translation technology

by Pisana Ferrari – cApStAn Ambassador to the Global Village

Slator has been monitoring research related to neural machine translation (NMT) for a number of years. They have recently said that research output has consistently increased since 2014 and more than doubled in 2018 compared to 2017. They base their findings on the number of papers submitted to the Cornell University portal arXiv.org. Interestingly, several large tech companies are focusing on the challenge posed by so-called low-resource languages, i.e. those for which there is little training data (small amounts of parallel texts).

NAIST Japan has experimented augmenting incomplete training data with multiple language sources. Examples of this include the multilingual document collections of the European institutions and the UN, where it is mandatory to (manually) translate all official papers into all the official languages of the organizations. Other sources cited in the paper are “multilingual captions” such as those of talks and movies, based on “voluntary translation efforts”. https://arxiv.org/pdf/1810.06826.pdf

Chinese e-commerce giant Alibaba has trained a NMT system with image descriptions in multiple languages. Their assumption is that the description of the same visual content by different languages should be approximately similar. Their research paper notes that image has become an important source for humans to learn and acquire knowledge so that “the visual signal might be able to disambiguate certain semantics”. They found that combining image with narrative descriptions that can be self-explainable gave better results. https://arxiv.org/pdf/1811.11365.pdf

Microsoft is looking into what is called “transfer learning” to see if this can be applied to low-resource situations. In transfer learning there is an “assisting” source language, just like there are “relay languages” in a number of international institutions. For example, it is possible to find parallel corpora between English and some Indian languages, but very little parallel corpora between Indian languages. Hence, the paper says, it is natural to use English as an assisting language for inter-Indian language translation. However, transfer learning poses a number of issues, including word order divergence, which can create inconsistencies. Pre-ordering the assisting language to match the word order of the source language significantly improved the quality of the translation. https://arxiv.org/pdf/1811.00383.pdf

Photo credit: Persnickety Prints @Unsplash

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.