Introduction

When it comes to scientific communication and education, language matters. The ability of science to be discussed in local indigenous languages not only has the ability to reach more people who do not speak English or French as a first language, but also has the ability to integrate the facts and methods of science into cultures that have been denied it in the past. As sociology professor Kwesi Kwaa Prah put it in a 2007 report to the Foundation for Human Rights in South Africa, “Without literacy in the languages of the masses, science and technology cannot be culturally-owned by Africans. Africans will remain mere consumers, incapable of creating competitive goods, services and value-additions in this era of globalization.”

During the COVID19 pandemic, many African governments did not communicate about COVID19 in the most wide-spread languages in their country. ∀ et al (2020) demonstrated that the machine translation tools failed to translate COVID19 surveys since the only data that was available to train the models was religious data. Furthermore, they noted that scientific words did not exist in the respective African languages.

Thus, we propose to build a multilingual parallel corpus of African research, by translating African preprint research papers released on AfricArxiv into 6 diverse African languages.

Proposed Dataset and Use Cases

When it comes to scientific communication, language matters. Jantjies (2016) demonstrates how language matters when it comes to STEM education: students perform better when taught mathematics in their home language. Language matters, in scientific communication, in how it can dehumanise the people it chose to study – Robyn Humphreys, at the #LanguageMatters seminar at UCT Heritage 2020, noted the following “During the continent’s colonial past, language – including scientific language – was used to control and subjugate and justify marginalisation and invasive research practices”.

The ability of science being discussed in local indigenous languages not only has the ability to reach more people who do not speak English as a first language, it also has the ability to integrate the facts and methods of science into cultures that have been denied it in the past.

As sociology professor Kwesi Kwaa Prah put it in a 2007 report to the Foundation for Human Rights in South Africa, “Without literacy in the languages of the masses, science and technology cannot be culturally-owned by Africans. Africans will remain mere consumers, incapable of creating competitive goods, services and value-additions in this era of
globalization.” (Prah, Kwesi Kwaa, 2007). When science becomes “foreign” or something non-African, when one has to assume another identity just to theorize and practice science, it’s a subjugation of the mind – mental colonization.

There is a substantial amount of distrust in science, in particular by many black South Africans who can cite many examples of how it has been abused for oppression in the past. In addition, the communication and education of science was weaponized by the oppressive apartheid government in South Africa, and that has left many seeds of distrust in citizens who only experience science being discussed in English.

Through government-funded efforts, European derived Languages such as Afrikaans, English, French, and Portuguese, have been used as vessels of science, but African indigenous languages have not been given the same treatment. Modern digital tools like machine learning
offer new, low-cost opportunities for scientific terms and ideas to be communicated in African indigenous languages.
During the COVID19 pandemic, many African governments did not communicate about COVID19 in the most wide-spread languages in their country. ∀ et al (2020) demonstrated the difficulty in translating COVID19 surveys since the only data that was available to train the models was religious data. Furthermore, they noted that scientific words did not exist in the respective African languages.

Use cases:

  • A machine translation tool for AfricArxiv to aid translation of their research to and from African languages
  • Terminology developed will be submitted to respective boards for addition to official language glossaries for further improvements to scientific communication
  • A machine translation tool for African universities to ensure accessibility of their publications
  • A machine translation tool for scientific journalists to assist in widely distributing their work on the African continent
  • A machine tool to aid translation of impactful STEM University curricula into African languages

Personnel

Jade Abbott is the co-founder of Masakhane and Staff Engineer at Retro Rabbit South Africa, working primarily in NLP with an MSc in Computer Science from the University of Pretoria. She is a thought leader in the space of NLP in production, African NLP (especially machine translation) and has published and spoken at numerous conferences across the world, including the Deep Learning Indaba, ICLR 2020,and the UN World Data Forum. In 2019, she co-founded and leads Masakhane – an initiative to spur NLP research in Africa, which have collectively published over 15 works in the past year and are leading the conversation around geographic and language diversity in NLP in Africa

Dr. Johanna Havemann is a trainer and consultant in [Open] Science Communication and [digital] Science Project Management and AfricArxiv. Her work experience covers NGOs, a science startup and international institutions including the UN Environment Programme. With a focus on digital tools for science and her label Access 2 Perspectives, she aims at strengthening global science communication in general – and with a regional focus on Africa – through Open Science. For the past two years, she has laid an additional focus on language diversity in Science and the pan-African Open Access portal coordinated provides information and accepts submissions in 12 official African languages.

Sibusiso Biyela has been a science communicator at ScienceLink since 2016, where he has worked with South African universities and international research institutions to produce science communication content for many audiences that include policymakers, the research
community, and the lay public. He has experience as a thought leader on the decolonisation of science and science communication. He has given talks on the topic at international conferences, contributing to discussions on platforms such as national radio and international
podcasts. He is the author of a widely regarded article; “Decolonizing Science Writing in South Africa” in which he has been vocal about creating scientific terms in the isiZulu language.