Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2018, International Journal of Advanced Computer Science and Applications
…
5 pages
1 file
Recurrent neural networks are powerful tools giving excellent results in various tasks, including Natural Language Processing tasks. In this paper, we use Gated Recurrent Unit, a recurrent neural network implementing a simple gating mechanism in order to improve the diacritization process of Arabic. Evaluation of Gated Recurrent Unit for diacritization is performed in comparison with the state-of-the art results obtained with Long-Short term memory a powerful RNN architecture giving the best-known results in diacritization. Evaluation covers two performance aspects, Error rate and training runtime.
International Journal on Document Analysis and Recognition (IJDAR), 2015
This paper presents a sequence transcription approach for the automatic diacritization of Arabic text. A recurrent neural network (RNN) is trained to transcribe undiacritized Arabic text with fully diacritized sentences. We use a deep bidirectional long short-term memory (LSTM) network that builds high-level linguistic abstractions of text and exploits long-range context in both input directions. This approach differs from previous approaches in that no lexical, morphological, or syntactical analysis is performed on the data before being processed by the net. Nonetheless, when the network is post-processed with our error-correction techniques, it achieves state of the art performance, yielding an average diacritic and word error rates of 2.09% and 5.82% respectively on samples from 11 books. For the LDC ATB3 benchmark, this approach reduces the diacritic error rate by 25%, the word error rate by 20%, and the last letter diacritization error rate by 33% over the best published results.
2019 2nd International Conference on Computer Applications & Information Security (ICCAIS), 2019
Diacritization of Arabic text is both an interesting and a challenging problem at the same time with various applications ranging from speech synthesis to helping students learning the Arabic language. Like many other tasks or problems in Arabic language processing, the weak efforts invested into this problem and the lack of available (open-source) resources hinder the progress towards solving this problem. This work provides a critical review for the currently existing systems, measures and resources for Arabic text diacritization. Moreover, it introduces a much-needed free-for-all cleaned dataset that can be easily used to benchmark any work on Arabic diacritization. Extracted from the Tashkeela Corpus, the dataset consists of 55K lines containing about 2.3M words. After constructing the dataset, existing tools and systems are tested on it. The results of the experiments show that the neural Shakkala system significantly outperforms traditional rule-based approaches and other closed-source tools with a Diacritic Error Rate (DER) of 2.88% compared with 13.78%, which the best DER for the non-neural approach (obtained by the Mishkal tool).
Sally Ezz, 2021
The common is to record the language model by using static vocabulary to build the language model, but Arabic is a morphologically rich language, so the number of out-of-vocabularies (OOV) is large and this is the reason for the failure of the discretization algorithm. BPE method words into variable length and open subword units from the fixed subword unit dictionary. The researcher emphasized the superiority of the open vocabulary method over methods based on words and characters used in general.
Global and local attention for automatic Arabic text diacritization, 2023
Automatic Arabic diacritization is the task to restore diacritic or vowel marks for a non-vocalized Arabic text. This task showed its importance in the natural language processing NLP field and it helps people with specific learning difficulties to access Arabic web content. To tackle the problem, we suggest a letter-based encoder-decoder that uses previous deep learning attention models known as Luong attention. The training of the models knew unstable loss. And, as was expected-from the proposed models-the model that uses local predictive attention achieved the best word and letter error rates. The best-achieved diacritic error rate in the test data is about 26.80%. Nevertheless, the models need improvements in future work.
IEEE Access, 2021
Arabic diacritics are signs used in Arabic orthography to represent essential morphophonological and syntactic information. It is a common practice to leave out those diacritics in written Arabic. Most Arabic electronic texts lack such diacritics. The processing of those texts for various purposes of Natural Language Processing is a complicated task. Diacritized words are necessary for applications such as machine translation, sentiment analysis, and speech synthesis. To address this problem, several studies proposed automatic systems to restore diacritics in Arabic texts. The present paper presents an in-depth survey of 56 most recent Arabic diacritization studies. Based on the diacritization approach, the studies are grouped into four sections in terms of method; rule-based, simple statistical, hybrid, and Neural Networks. While rule-based methods such as morphological analyzers and lexicon retrievals were the earliest approaches, results indicated that they are still valuable tools that can aid in the process of diacritization. Effective statistical methods that produced diacritics with acceptable accuracy include Hidden Markov Model, n-grams, and Support Vector Machines. They are often accompanied by either rule-based or neural networks in hybrid systems. Neural networks, specifically Bidirectional Long Short Term Memory, reached very high diacritization accuracy levels. Studies employing neural networks focused on evaluating and comparing the efficacy of several types of neural networks or a hybrid of them, testing alternatives of input units or suggested schemes for partial daicritization. The study synthesizes the results of the studies, identifies research gaps, and offers recommendations for future research. INDEX TERMS Arabic text diacritization, neural networks, Arabic corpora, deep learning, Arabic natural language processing.
Advances in Human-Computer Interaction
Arabic diacritization is the task of restoring diacritics or vowels for Arabic texts considering that they are mostly written without them. This task, when automated, shows better results for some natural language processing tasks; hence, it is necessary for the field of Arabic language processing. In this paper, we are going to present a comparative study of some automatic diacritization systems. One uses a variant of the hidden Markov model. The other one is a pipeline, which includes a Long Short-Term Memory deep learning model, a rule-based correction component, and a statistical-based component. Additionally, we are proposing some modifications to those systems. We have trained and tested those systems in the same benchmark dataset based on the same evaluation metrics proposed in previous work. The best system results are 9.42% and 22.82% for the diacritic error rate DER and the word error rate WER, respectively.
International Journal of Computer Science and Telecommunications (IJCST), ISSN 2047-3338 ,Vol. 3, Issue 11, Pages 43-48, 2012
The recognition of unconstrained handwriting continues to be a difficult task for computers despite active research for several decades. This is because handwritten text offers great challenges such as character and word segmentation, character recognition variation between handwriting styles, different character size and no font constraints as well as the background clarity. In this paper primarily discussed Online Handwriting Recognition methods for Arabic words which being often used among then across the Middle East and North Africa people. Because of the characteristic of the whole body of the Arabic words, namely connectivity between the characters, thereby the segmentation of An Arabic word is very difficult. We introduced a recurrent neural network to online handwriting Arabic word recognition. The key innovation is a recently produce recurrent neural networks objective function known as connectionist temporal classification. The system consists of an advanced recurrent neural network with an output layer designed for sequence labeling, partially combined with a probabilistic language model. Experimental results show that unconstrained Arabic words achieve recognition rates about 79%, which is significantly higher than the about 70% using a previously developed hidden Markov model based recognition system.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Diacritic restoration has gained importance with the growing need for machines to understand written texts. The task is typically modeled as a sequence labeling problem and currently Bidirectional Long Short Term Memory (BiLSTM) models provide state-of-the-art results. Recently, Bai et al. (2018) show the advantages of Temporal Convolutional Neural Networks (TCN) over Recurrent Neural Networks (RNN) for sequence modeling in terms of performance and computational resources. As diacritic restoration benefits from both previous as well as subsequent timesteps, we further apply and evaluate a variant of TCN, Acausal TCN (A-TCN), which incorporates context from both directions (previous and future) rather than strictly incorporating previous context as in the case of TCN. A-TCN yields significant improvement over TCN for diacritization in three different languages: Arabic, Yoruba, and Vietnamese. Furthermore, A-TCN and BiLSTM have comparable performance, making A-TCN an efficient alternative over BiLSTM since convolutions can be trained in parallel. A-TCN is significantly faster than BiLSTM at inference time (270%∼334% improvement in the amount of text diacritized per minute).
International Journal of Informatics and Communication Technology (IJ-ICT)
Artificial Neural Networks have proved their efficiency in a large number of research domains. In this paper, we have applied Artificial Neural Networks on Arabic text to prove correct language modeling, text generation, and missing text prediction. In one hand, we have adapted Recurrent Neural Networks architectures to model Arabic language in order to generate correct Arabic sequences. In the other hand, Convolutional Neural Networks have been parameterized, basing on some specific features of Arabic, to predict missing text in Arabic documents. We have demonstrated the power of our adapted models in generating and predicting correct Arabic text comparing to the standard model. The model had been trained and tested on known free Arabic datasets. Results have been promising with sufficient accuracy.
Studia Teologica – Nordic Journal of Theology, 2024
Cuadernos de Literatura, 2018
WIT Transactions on the Built Environment, 1970
Stanford Law Review, 1996
Межитов А.З., Эльдаров Э.М. Туристический брендинг как фактор устойчивого развития Ногайской степи // Степи Северной Евразии: материалы X международного симпозиума. Оренбург: ИС УрО РАН, 2024. С. 832-837.
Annals of the New York Academy of Sciences, 2009
International journal of innovative research and development, 2024
Interdisciplinaria Archaeologica Natural Sciences in Archaeology, 2022
Tarraco Biennal. 5è Congrés Internacional d’Arqueologia i Món Antic. Ports romans. Arqueologia dels Sistemes Portuaris. Tarragona, 24 to 27/11/2021, 2022
Acta Ortopédica Brasileira, 2006
Dilemas contemporáneos: Educación, Política y Valores
Frontiers in Aging Neuroscience, 2022
Journal of the American College of Cardiology, 2018
Organic Letters, 2019
Historia y Memoria de la Educación, 2019
Clinical and Experimental Immunology, 2020
Image and Vision Computing, 1995
2013 ASEE Annual Conference & Exposition Proceedings