In South Africa a new AI-based project offers the opportunity for a large corpus of scientific terms to be translated into six indigenous languages

In South Africa a new AI-based project offers the opportunity for a large corpus of scientific terms to be translated into six indigenous languages

by Pisana Ferrari – cApStAn Ambassador to the Global Village

Language really matters when it comes to scientific communication and education. How can you have conversations about science when the scientific terms you wish to talk about do not exist in your own language? Sibusiso Biyela is a science communicator and journalist from South Africa, based in the Johannesburg Metropolitan Area, who is trying to bring science closer to people by reporting in an indigenous African language. Biyela says he can speak to his friends and family about a whole host of topics in his mother tongue, isiZulu, but is forced to switch to English when talking about science. Despite the fact that Zulu — or isiZulu, as the language is called in South Africa — is spoken by some 10 million people, it simply doesn’t have the words for communicating science, he says. To write about DNA — a word for which there is no Zulu counterpart — a writer would have to go back and explain terms like molecules, cells, and genetic codes, which themselves have no Zulu counterparts. One of Biyela’s recent science assignments was about a new fossil discovery in South Africa. As there are no words for dinosaur in his language he decided to go with the more general term isilwane sasemandulo, Zulu for ancient animal. He described Ledumahadi mafube as a creature of elephant-like size and shape, with a head and neck resembling a stiff, tiny-headed snake and a tail resembling that of a monitor lizard. Other seemingly simple words required similarly careful treatment, eg. origin of nature or creatures, for evolution, old bones found in the ground, for fossils. In the final story, he says he spent as many words explaining the meanings of scientific terms as he did describing the discovery itself. “So my news piece wasn’t just a news piece. It was an attempt to tell a science story in a language that science overlooked — to help right a societal wrong”. A new AI project may help to bridge the science language divide in South Africa and improve trust and confidence in science. This is of course all the more essential in times of the COVID-19 pandemic.

Consequences of the science language divide

The issue is not so much that the people one is communicating with cannot speak or understand English, Biyela claims, but that discussing science in your own language makes it easier to culturally “own” it. He quotes a prominent sociology professor and fervent proponent for the “decolonization” of knowledge, Kwesi Kwaa Prah, in a 2007 report to the Foundation for Human Rights in South Africa, who says that “without literacy in the languages of the masses, science and technology cannot be culturally-owned by Africans. Africans will remain mere consumers, incapable of creating competitive goods, services and value-additions in this era of globalization.” The language divide has its roots in the colonial history of South Africa. For most of the 20th century, English was the country’s only language of science. During apartheid, the government invested funds to “transform” Afrikaans into a language of science, a privilege which was not given to the other indigenous languages. Still today, science examinations in public schools are conducted only in English and Afrikaans, says Biyela. The language divide has therefore alienated large parts of South Africa’s population from scientific education and enterprise, he adds.

New AI project for six indigenous languages

Today, Biyela is one of the partners of the ‘Masakhane MT: Decolonise Science’ project, an initiative aimed at building a multilingual parallel corpus of African research by translating African pre-print research papers into six African languages: siZulu, Northern Sotho, Yoruba, Hausa, Luganda, Amharic. Masakhane is a grassroots organisation whose mission is to strengthen and spur natural language processing (NLP) research in African languages. The data to be translated will be pre-print papers published on AfricArxiv, a community-led digital archive for African research, which currently has around 600 articles in a number of different fields (life sciences, engineering, law, social sciences, mathematics). The aim of the dataset is to facilitate tools and products being developed (e.g. translation models and tools, teaching aids, etc) and the creation of glossaries and new terminology. A pilot run is due to be performed; the translations from the pilot will be assessed by the project’s linguistic partners and data curators to ensure quality and format of translations and allow for feedback, before work continues.

Bringing science closer to people in COVID-19 times

As Biyela rightly points out, understanding science in one’s own language not only makes it easier to trust the institution of science but can go a very long way to reaching the people that need science the most. During the pandemic, many African governments did not communicate information about COVID-19 in all the languages in their countries, or not in a timely manner (this has regrettably happened in many other countries around the world, see our blog post at this link). This means that people risked missing out on potentially life saving information and this is turn can affect compliance to containment measures or travel restrictions, critical in order to protect not only individuals but also the community at large. Right now, information is needed more than ever because there is an alarming amount of vaccine hesitancy, Biyela says.


Pichon, A. The language of science. Nat. Chem. 13, 1025–1026 (2021)

“Masakhane MT: Decolonise Science”

“Decolonizing Science Writing in South Africa”, Sibusiso Biyela, TheOpenNotebook, February 12, 2019

See also our other articles about the language of science and decolonization of knowledge

In South Africa a new movement seeks to “decolonise” the teaching of mathematics

Letting English become the lingua franca in science means that studies in other languages often go unread

Today, more than 90% of the indexed articles in the natural sciences are published in English, that wasn’t always the case

Photo credit Hands intertwined against with colours and backdrop of South African flag, Hasanov Jeyhun at Shuttertock