Description

Twi is arguably the most recognizable Akan Language natively spoken in parts of southern and central Ghana, as well as parts of Cote d’Ivoire. By some estimates it has approximately 20 million native speakers [1]. It is a tonal language. It comprises at least four distinct dialects, namely Asante, Akuapem, Fante and Bono. Asante is arguably the most widely spoken and common dialect.

Building a database for Twi language in Africa
Building a database for Twi language in Africa

Pertinence

In practice, knowing this language alone allows one to navigate most parts of Ghana. You are likely to find someone who at the very least understands the language in every part of Ghana.

Example Sentences

English Twi
What is going on here? Ɛdeɛn na  ɛrekɔ so wɔ ha?
Wake up Sɔre
She comes here every Friday Ɔba ha Fiada biara
Learn to be wise Sua nyansa

Prior Work

The website and app Kasahorow [2] has a rather limited set of translations. The JW300 dataset [3] has some (over ½ million) extremely noisy English to (Akuapem) Twi parallel translation sentence pairs. A noisy Wikipedia is available [4], but the volume and quality leave much to be desired [4]. Some 700 sentence pairs are available in the TC Akan Corpus [9].

A recent study [5], which investigated the quality of these data sources in the context of FastText embeddings constructed on Twi, found them to be woefully insufficient. It is the only modern computing study of Twi that we are aware of. We have since replicated and slightly improved these FastText embeddings [6], trained and shared a variety of embeddings from the Transformers/BERT family through the HuggingFace model repo [7] and crowd sourced close to 1000 manually curated translation pairs. We have also developed a fairly decent English-Twi translator (transformer-based seq2seq model) which we are hoping to refine on the data that this collaboration yields. You can find more information on our official and github pages [8].

Researcher Profile: Paul Azunre

Paul Azunre holds a PhD in Computer Science from MIT and has served as a Principal Investigator on several DARPA research programs. He founded Algorine, a Research Lab dedicated to advancing AI/ML and identifying scenarios where they can have a significant social impact. Paul also co-founded NLP Ghana, an open source initiative focused on using NLP and Transfer Learning with Ghanaian and other low-resource languages. He frequently contributes to peer-reviewed journals and has served as a program committee member at some ICML workshops in AutoML and NLP. He is the author of the “Transfer Learning for NLP” book recently published by Manning Publications.

Researcher Profile: Lawrence Adu-Gyamfi

A subsea installation engineer by profession with a background in Aerospace engineering. Currently devoting the rest of my off-work time to contributing to the activities of NLP Ghana, assisting with the collection of data, preprocessing them and making them ready for use in the models we are testing internally. Serving as the NLP Ghana Director of Product, overseeing how the different teams of NLP Ghana work together.

Researcher Profile:Esther Appiah

Esther Appiah holds a BA in Modern Languages from the Kwame Nkrumah University of Science and Technology with a Diploma in French Studies from the Université D’Abomey Calavi, Centre Beninois des Langues Étrangères (CEBELAE) in Benin. She is currently pursuing an MPhil in Theoretical Linguistics at UiT, Norway. Her language specialties include French, English and Akan. She has a vast experience spanning various sectors/industries on language use and interface with core tasks on writing, proofreading, translation and researching. She works with the Ghana NLP as a data researcher and ultimately hopes to specialise in Computational Linguistics to help streamline NLP processes in underrepresented African languages in the digital space.

Researcher Profile: Felix Akwerh

Felix is currently enrolled in a Masters program in Computer Science at the Kwame Nkrumah University of Science and Technology. He augments his education with online classes and Machine Learning events. He is actively involved in the development of natural language processing  with Ghana NLP. He co-authored a paper on Artificial Intelligence in Construction for submission. He holds a Bsc in Mathematics at the Kwame Nkrumah University. He worked with the UITS-KNUST where he helped build a transport system and other  software projects. His research interest lies in Machine Learning and NLP, specifically in neural conversational models.

Researcher Profile: Salomey Osei

Salomey holds a Master of Philosophy in Applied Mathematics and an Msc in both Industrial Mathematics and Machine Intelligence. She is a recipient of Google and Facebook Scholarship, MasterCard Foundation Scholarship amongst others. She is the team lead for unsupervised methods for Ghana NLP and a co organizer for Women in Machine Learning and Data Science Accra chapter (WiMLDS). She is also passionate about mentoring students, especially females in STEM and her long term goal is to share her knowledge with others by lecturing.

Researcher Profile: Samuel Owusu

Samuel Owusu is currently working as a data scientist for the Ministry of Finance, Ghana. He holds a BSc in Information Technology from Ghana Technology University College. He was a team member of the group that won 1st prize of Ghana’s maiden national hackathon organised by the World Bank and Ministry of Water Resources and Sanitation. His Research interest lies in NLP – Automatic Speech Recognition for low resourced languages. He is involved in developing open source curriculums in Machine Learning and Computer Science for young girls. Samuel is a life-long learner.

Researcher Profile: Cynthia Amoaba

Cynthia Amoaba is a high school graduate from Chemu Senior High School and a student at the University For Development Studies. She’s an Ambassador and founder of the first Women In Stem (WiSTEM) chapter in Ghana.She also founded the STEM club in her high school and looks forward to extending it to schools in deprived areas. Currently,  she tutors high school students in her community in Physics and Maths and helps train school dropouts in beads and soap making. She’s a science enthusiast and looks forward to learning more through her involvement in the development of NLP with Ghana-NLP.

Researcher Profile: Salomey Afua Add

Salomey Afua Addo is the founder of  Lighted Hope, a Non Governmental  Organization that seeks to promote literacy and coding skills among children living in slums in Ghana. She holds an MSc in Mathematical Sciences from the African Institute for Mathematical Sciences and a certificate in business management from the European School of Management and Technology, Berlin. She is the coding instructor for The Love Academy in the USA. Currently, she serves as a volunteer at Ghana NLP, and she plays a vital role in collecting and preprocessing data for the data team  at Ghana NLP. Salomey Afua Addo lives a purpose driven life.

 Researcher Profile: Edwin Buabeng-Munkoh

Edwin Buabeng-Munkoh is currently working as a Software Engineer at Huawei Technologies Ghana Limited. He holds a BSC in Computer Engineering from Kwame Nkrumah University of Science and Technology. He is enrolled in the Data Science Mentorship program with Notitia AI. He is actively involved in the development of natural language processing with GhanaNLP. He serves as a volunteer at Ghana NLP where he helps with preprocessing data for the data team. Along with his daily work he has enrolled and completed multiple online courses on Data Science, AI and NLP. His research interest lies in Machine Learning, NLP and Computer Vision. He plans to help build a world where language is not a barrier in education and good healthcare

Researcher Profile:Nana Boateng

Nana Boateng holds a PhD. in Statistics from The University of Memphis. He  has  three masters degrees in Statistics, Mathematics and Economics. He  has worked as a Data Scientist for Companies such as Fiat Chrysler Automobiles, Nice Systems Inc and Baptist Memorial Hospital. He is interested in application of mathematics, statistics and economics principles  in solving problems in healthcare, finance and several other industries. He has several peer-reviewed publications to his name. He is the founder of Rest Analytics which advises companies on how to apply machine learning  to increase efficiency and productivity. He contributes to GhanaNLP in the area of supervised learning.

Partners

Partners in Cracking the Language Barrier for a Multilingual Africa
Partners in Cracking the Language Barrier for a Multilingual Africa

References

  1. https://en.wikipedia.org/wiki/Twi
  2. https://www.kasahorow.org/
  3. Z. Agic et. al., JW300: A Wide-Coverage Parallel Corpus for Low-Resource Languages, ACL Proceedings
  4. https://ak.wikipedia.org/
  5. J. Alibi et al., Massive vs. Curated Word Embeddings for Low-Resourced Languages. The Case of Yorùbá and Twi, LREC Proceedings 2020
  6. https://medium.com/swlh/ghana-nlp-computational-mapping-of-ghanaian-languages-edf60c56bcce
  7. https://huggingface.co/Ghana-NLP
  8. https://ghananlp.github.io/
  9. https://www.researchgate.net/publication/323998547_TypeCraft_Akan_Corpus_Release_10

Disclaimer

The designations employed and the presentation of material on these map do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations concerning the legal status of any country, territory, city or any area or of its authorities, or concerning the delimitation of its frontiers or boundaries. Final boundary between the Republic of Sudan and the Republic of South Sudan has not yet been determined. Final status of the Abyei area is not yet determined.