Machine translation (MT) is a popular Natural Language Processing (NLP) task which involves the automatic translation of sentences from a source language to a target language. Machine translation models are very sensitive to the domain they were trained on which limit their generalization to multiple domains of interest like legal or medical domains. The problem is more severe in low-resource languages like Yorùbá where the most available datasets used for training are in the religious domain like JW300.
How can we train MT models to generalize to multiple domains or quickly adapt to new domains of interest? In this challenge, you are provided with 10,000 Yorùbá to English parallel sentences sourced from multiple domains like news articles, ted talks, movie transcripts, radio transcripts, software localization texts, and other short articles curated from the web. Your task is to train a multi-domain MT model that will perform very well for practical use cases.
The goal of this challenge is to build a machine translation model to translate sentences from Yorùbá language to English language in several domains like news articles, daily conversations, spoken dialog transcripts and books. Your solution will be judged by how well your translation prediction is semantically similar to the reference translation.
The translation models developed will assist human translators in their jobs, help English speakers to have better communication with native speakers of Yorùbá, and improve the automatic translation of Yorùbá web pages to English language.
This competition is one of five NLP challenges we will be hosting on Zindi as part of AI4D’s ongoing African language NLP project, and is a continuation of the African language dataset challenges we hosted earlier this year. You can read more about the work here.
Masakhane is the open research, participatory, grassroots NLP initiative for Africans by Africans, with the aim of putting African research in NLP on the map, by holistically tackling the problems facing society. Founded in 2019, Masakhane has since garnered over 400 researchers from over 30 African countries, published state of the art research for over 38 African languages at various venues, and has built a thriving community. Masakhane’s participatory approach has enabled researchers without formal scientific training to contribute data, evaluations and models to published research, by focusing on lowering the barriers of entry.