Working at the intersection of linguistics and artificial intelligence to advance machine translation performance

by Pisana Ferrari – cApStAn Ambassador to the Global Village

Chris Callison-Burch –associate professor in Computer and Information Science, University of Pennsylvania — has in past years developed novel cost- and time-saving methods to translate languages, including crowdsourcing and images. In this recent interview for “Medium” he shares a new translation method which is very promising for some of the world’s most difficult-to-translate languages. His research group used images (for instance, of a cat) plus vast quantities of crowdsourced data identifying linked words for each image, to create “reverse-engineered dictionaries” for 10.000 words in 100 languages. Images “are somehow interlingual”, he says, i.e. an image of a cat is the same whether in English or Indonesian, and simplified representations of images were used to train the model. “This language-independent way of thinking about words through their visual representations allows us to use a new type of data to learn translations.” In a recent post we mentioned research along the same lines by Chinese e-commerce giant Alibaba, which has trained a NMT system with image descriptions in multiple languages.

Medium article: https://medium.com/penn-engineering/translating-the-worlds-languages-e100f98c4c1d

cApStAn blog post on NMT research: https://www.capstan.be/promising-research-on-machine-translation-for-low-resource-languages/