Crowdsourcing Parallel Corpus For English-Oromo Neural Machine Translation Using Community Engagement Platform
Crowdsourcing Parallel Corpus For English-Oromo Neural Machine Translation Using Community Engagement Platform
Crowdsourcing Parallel Corpus For English-Oromo Neural Machine Translation Using Community Engagement Platform
net/publication/349335658
CITATIONS READS
0 35
6 authors, including:
Some of the authors of this publication are also working on these related projects:
MikE: Interfacultative Research Network for Feasibility Study of Integrated and Cost-effective Model for Monitoring of Innovative Energy Infrastructures View project
All content following this page was uploaded by Sisay Chala on 16 February 2021.
Abstract
Even though Afaan Oromo is the most widely spoken language in the Cushitic family by more than
fifty million people in the Horn and East Africa, it is surprisingly resource-scarce from a technological
point of view. The increasing amount of various useful documents written in English language brings
to investigate the machine that can translate those documents and make it easily accessible for local
language. The paper deals with implementing a translation of English to Afaan Oromo and vice versa
using Neural Machine Translation. But the implementation is not very well explored due to the limited
amount and diversity of the corpus. However, using a bilingual corpus of just over 40k sentence pairs
we have collected, this study showed a promising result. About a quarter of this corpus is collected via
Community Engagement Platform (CEP) that was implemented to enrich the parallel corpus through
crowdsourcing translations.
Keywords: Oromo, neural machine translation, under-resourced languages
5.1.1 Translation
4.4 Evaluation
Machine translation systems can be evaluated by
human or automatic evaluation method. Even if
human evaluation is accurate it is costly and Figure 3 Single translation of a given sentence
suffers from limited efficiency as compared to
automatic evaluation. Therefore, we used BLEU
Both translation and verification are done in
batches in order to not overwhelm the contributor
and divide the contribution in manageable
episodes of 5 contributions at a time as shown in
Figure 7. The choice of 5 is arbitrary with the
objective of gathering more inputs without
imposing stress on users.
5.1.2 Verification