Iccit 2019
Iccit 2019
Iccit 2019
net/publication/338223294
CITATIONS READS
0 4,913
4 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Md. Arid Hasan on 29 December 2019.
Abstract—Due to the rapid advancement of different neural variant of Recurrent Neural Networks (RNNs), however, other
network architectures, the task of automated translation from one architectures such as a Convolutional Neural Network (CNN)
language to another is now in a new era of Machine Translation can also be used for the encoder.
(MT) research. In the last few years, Neural Machine Translation
(NMT) architectures have proven to be successful for resource- The advantage of NMT is that it learns mapping from the
rich languages, trained on a large dataset of translated sentences, input to the output in an end-to-end fashion, trained in a single
with variations of NMT algorithms used to train the model. In big neural network. The model jointly learns the parameters
this study, we explore different NMT algorithms – Bidirectional in order to maximize the performance of the translation output
Long Short Term Memory (LSTM) and Transformer based [4]–[6], which also requires minimum domain knowledge. In
NMT, to translate the Bangla to English language pair. For the
experiments, we used different datasets and our experimental addition, similar to Statistical Machine Translation (SMT),
results outperform the existing performance by a large margin NMT does not need to tune and store different models such
on different datasets. We also investigated the factors affecting as the translation language, and reordering models. The study
the data quality and how they influence the performance of the of Cho et al. [6] reports that the NMT models require only a
models. It shows a promising research avenue to enhance NMT fraction of the memory needed by traditional SMT models.
for the Bangla-English language pair.
Index Terms—Machine Translation, Bangla-to-English, Neural Since NMT emerged, it has been providing state-of-the-
Machine Translation, Transformer, Bidirectional LSTM art performance for various language pairs, however, the
literature also reports its limitations, such as dealing with long
I. Introduction sentences [7]. In order to deal with said issues Attention based
The task of automated translation from one language to an- mechanisms have been introduced, in which the model jointly
other has undergone rapid advancement due to the emergence learns to align and translate. Various attention mechanisms
of deep neural networks. Neural networks have been studied have also been proposed in the literature [8], [9], however,
for machine translation in the 20th-century [1]. However, the transformer architecture [9] has become well known to the
very recently it has reached state-of-the-art performance [2] community. It is based on self-attention, as discussed in-detail
with large scale deployment. In the Machine Translation (MT) in Section IV-B2.
community, a neural network based model for machine trans- The literature with NMT techniques report higher perfor-
lation is referred to as Neural Machine Translation (NMT), mances for resource-rich languages such as English to German
where a sequence-to-sequence (seq2seq) [3] model is most [10] and English to French [11]. Compared to resource-
commonly used. Although Statistical Machine Translation rich languages the literature of NMT for the Bangla-English
(SMT) has been successful in the community in the last language pair is relatively sparse. More details of the current
decade, however, the complete pipeline gets complex with the state-of-the-art can be found in the next section. In this study,
addition of more features, saturating the translation quality. we aim to shed light on this area. Our contributions include,
This limitation of SMT and the success of deep learning has (i) conducting experiments using different NMT approaches,
led to a focus on NMT approaches for machine translation in (ii) consolidating publicly available data from different sources
the MT community. and evaluating them using these approaches.
Typically, the NMT consists of an encoder and a decoder. The structure of this paper is as follows. Section II, provides
The first network, the encoder, processes a source sentence a brief overview of the existing work on Bangla MT systems.
(e.g., Bangla) into a vector (i.e., also referred to as a context In Section III, we discuss the datasets that we use in this study.
vector or thought vector). A second network, called the We present the approaches that we use for our experiments
decoder, uses this vector to predict the words in the target in Section IV. In Section V, we discuss the results of our
language (e.g., English). Traditionally, NMT uses a different experiments. Finally, we conclude our work in Section VI.
The research leading to the results has been supported and funded by II. Related Work
Cognitive Insight Limited – https://cogniinsight.com/.
The first MT research for the Bangla-English language pair
along with other Indic languages was introduced in 1991 [12].
There have been many endeavours for the Bangla-English