Papers by Hend Al-Khalifa
International Journal of Distributed Sensor Networks, Jul 1, 2016
The number of individuals who suffer from visual impairment is increasing rapidly. The most signi... more The number of individuals who suffer from visual impairment is increasing rapidly. The most significant barrier to improving the lives of the blind and visually impaired people is their inability to navigate independently and safely. Indoor navigation systems for blind and visually impaired people aim to guide them in navigating independently in familiar and unfamiliar environments. Our system aims to provide an assistive technology for blind and visually impaired individuals by exploiting popular existing technologies that are often used by blind individuals, such as the smartphone. The system provides users with guidance statements that help them navigate from their current positions to desired destinations. The system consists of four main components: mapping, positioning, navigation, and interface. In order to implement these components, three main applications need to be developed: application in localization server, application in communication server, and application in smartphone, each of which is located in a different place but connected to the others. Functionalities test and blind test were conducted to evaluate the system. The system proved its ability to aid blind individuals effectively.
2015 Tenth International Conference on Digital Information Management (ICDIM), 2015
Text-to-Speech (TTS) synthesizers are useful tools for the visually impaired people. However, cur... more Text-to-Speech (TTS) synthesizers are useful tools for the visually impaired people. However, current TTS synthesizers are lacking in several aspects. Different factors of TTS synthesizers can be evaluated for the use in specific applications. As part of an undergoing project for building an Arabic navigation system for the blind, we need to choose the best Arabic TTS to be used within the system. For that purpose, we conducted an evaluation of five Arabic TTS synthesizers for mobile devices, namely VoiceOver, Uspeech, Acapela, Adel, and SVOX. First, we evaluated their intelligibility and naturalness using the modified Mean Opinion Score (MOS) scale with visually impaired subjects. The results showed the subjects' biasness to VoiceOver and Acapela since they are familiar with. Due to the biasness of the MOS test results, we performed another evaluation, namely Semantically Unpredictable Sentences (SUS) test, to evaluate the intelligibility of the systems. Our results show that Adel has the highest score among otherTTS while Acapela has the lowest score.
Proceedings of the 6th International Conference on Management of Emergent Digital EcoSystems, 2014
Corpora have opened up many new areas of research in the linguistic domain, which would never bee... more Corpora have opened up many new areas of research in the linguistic domain, which would never been possible without them. Moreover, corpora have proved their usefulness not only in the linguistic domain but also in other domains, such as medical, economic, legal, pharmacological, etc. English is considered to have the richest language resources in most of these domains, while Arabic reveals a gap in most of them. This paper tries to fill the gap in the pharmacological domain, especially for drugs, by constructing the first Arabic drug corpus, which is composed of 202 drugs, each drug is saved in a text file with UTF-8 character encoding. The corpus was manually annotated with four-entity types: generic (for drug's generic name), brand for (trade names), chemical formula and class (for drug classes).
Encyclopedia of Mobile Computing and Commerce
2017 16th International Conference on Information Technology Based Higher Education and Training (ITHET), 2017
The way research methods course has been taught across disciplines was a combination of hands-on ... more The way research methods course has been taught across disciplines was a combination of hands-on experience and research activities. The applied activities came in different forms such as conducting literature review, applying peer review, writing papers, etc. In class activities, which are outside the silo of research methods, are not widely popular. The activity in mind is the “Marshmallow Challenge” which gained popularity in domains such as Management, yet it has not been reported to be used in research methods courses. In this paper we report our experience in applying the “Marshmallow Challenge” in a research methods course taught to master students of Information Technology degree and discuss the lessons learned from this experience.
Lecture Notes in Computer Science, 2023
Despite the noticeable progress that we recently witnessed in Arabic pre-trained language models ... more Despite the noticeable progress that we recently witnessed in Arabic pre-trained language models (PLMs), the linguistic knowledge captured by these models remains unclear. In this paper, we conducted a study to evaluate available Arabic PLMs in terms of their linguistic knowledge. BERT-based language models (LMs) are evaluated using Minimum Pairs (MP), where each pair represents a grammatical sentence and its contradictory counterpart. MPs isolate specific linguistic knowledge to test the model's sensitivity in understanding a specific linguistic phenomenon. We cover nine major Arabic phenomena from: Verbal sentences, Nominal sentences, Adjective Modification, and Idafa construction. The experiments compared the results of fifteen Arabic BERT-based PLMs. Overall, among all tested models, CAMeL-CA and GigaBERT outperformed the other PLMs by achieving the highest overall accuracy.
IEEE Access, 2023
Predatory publishing venues publish questionable articles and pose a global threat to the integri... more Predatory publishing venues publish questionable articles and pose a global threat to the integrity and quality of the scientific literature. They have given rise to the dark side of scholarly publishing and their effects have reached political, societal, economic, and health aspects. Given their consequences and proliferation, several solutions have been developed to help detect them; however, these solutions are manual and time-consuming. While researchers, students, and readers are in need of a tool that automatically detects predatory venues and their violations, in this study, we proposed an intelligent framework that can automatically detect predatory venues and their violations using different artificial intelligence techniques. This work contributes through the following: (1) creating a dataset of 9,866 journals annotated as predatory and legitimate, and (2) proposing an intelligent framework for classifying a venue as legitimate or predatory, with appropriate reasoning. Our framework was evaluated using seven different machine learning and deep learning models, including Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Neural Networks (NNs), Long short-term memory (LSTM), Convolutional Neural Network (CNN), Bidirectional Encoders from Transformers (BERT), A Lite BERT (ALBERT), and different feature representation techniques. The results showed that the CNN model outperformed the other models in journal classification task, with an F1 score of 0.96. For appropriate reasoning of the provisioning task, the SVM model achieved the best micro F1 of 0.67.
arXiv (Cornell University), Nov 25, 2020
Journal of Software Engineering and Applications, 2012
Arabic Sign Language (ArSL) is the native language for the Arab deaf community. ArSL allows deaf ... more Arabic Sign Language (ArSL) is the native language for the Arab deaf community. ArSL allows deaf people to communicate among themselves and with non-deaf people around them to express their needs, thoughts and feelings. Opposite to spoken languages, Sign Language (SL) depends on hands and facial expression to express the thought instead of sounds. In recent years, interest in translating sign language automatically for different languages has increased. However, a small set of these works are specialized in ArSL. Basically, these works translate word by word without taking care of the semantics of the translated sentence or the translation rules of Arabic text to Arabic sign language. In this paper we present a proposed system for semantically translating Arabic text to Arabic SignWriting in the jurisprudence of prayer domain. The system is designed to translate Arabic text by applying Arabic Sign Language (ArSL) grammatical rules as well as semantically looking up the words in domain ontology. The results of qualitatively evaluating the system based on a SignWriting expert judgment proved the correctness of the translation results.
Recently, subjectivity and sentiment analysis of Arabic has received much attention from the rese... more Recently, subjectivity and sentiment analysis of Arabic has received much attention from the research community. In the past two years, an enormous number of references in the field have emerged compared to what has been published in previous years. In this paper, we present an updated survey of the emerging research on subjectivity and sentiment analysis of Arabic. We also highlight the challenges and future research directions in this field.
IEEE Access, 2019
Language learners face difficulties while reading and comprehending Arabic text, this is because ... more Language learners face difficulties while reading and comprehending Arabic text, this is because of the interwoven nature of Arabic script. In this paper, we present (Arcode) an automatic web-based system, which simplifies instruction on Arabic word decoding and comprehension through utilizing color-coding on Arabic text. The system is designed to help Arabic language learners to analyze and identify attached affixes (prefixes & suffixes), particle and silent letters of the selected text, and sentences based on the sentence morphological structure. This is done by encoding and presenting color conversion per word and character, using a certain color code. The proposed system is a combination of color-coding technique, transliteration, and text-to-speech technologies that creates an educational tool for learning the Arabic language. It also provides web services for developers who want to integrate the system into their own applications. INDEX TERMS Arabic as a second language, Arabic language, color-coding, language learner, natural language processing, web application, world wide web.
Natural language processing (NLP) is the branch of Artificial Intelligence that is concerned with... more Natural language processing (NLP) is the branch of Artificial Intelligence that is concerned with enabling computers understand human languages. Implementing new NLP tools that effectively and efficiently process Arabic is not an easy task, usually such tools face challenges related to NLP various tasks. However, with the movement of many NLP companies to provide their NLP services via Web APIs, building NLP systems that can benefit from such APIs is becoming a reality. This paper will explore the available NLP Web APIs that supports Arabic language. It will also discuss their strengths and weaknesses and provide suggestion for future use.
Stemming has shown to be effective in many natural language processing (NLP) applications such as... more Stemming has shown to be effective in many natural language processing (NLP) applications such as in document classification, machine translation, and information retrieval (IR). This paper compares the performance of nine stemmers for Arabic language on microblog IR. These stemmers include: Information Science Research Institute (ISRI), Tashaphyne, Khoja, AL-stem, Light10, Motaz, Assem, Farasa, and ARLStem. Each stemmer was studied independently using the EveTAR dataset on a specific information retrieval task to obtain relevant query tweets. The performance of the nine stemmers was evaluated using BM25, precision at 30, and Mean Average Precision (MAP). The results show that root-based stemmers (i.e. ISRI and Khoja) outperformed others.
Data in Brief, Feb 1, 2019
Grammar error correction can be considered as a "translation" problem, such that an erroneous sen... more Grammar error correction can be considered as a "translation" problem, such that an erroneous sentence is "translated" into a correct version of the sentence in the same language. This can be accomplished by employing techniques like Statistical Machine Translation (SMT) or Neural Machine Translation (NMT). Producing models for SMT or NMT for the goal of grammar correction requires monolingual parallel corpora of a certain language. This data article presents a monolingual parallel corpus of Arabic text called A7'ta (). It contains 470 erroneous sentences and their 470 error-free counterparts. This is an Arabic parallel corpus that can be used as a linguistic resource for Arabic natural language processing (NLP) mainly to train sequence-tosequence models for grammar checking. Sentences were manually collected from a book that has been prepared as a guide for correctly writing and using Arabic grammar and other linguistic features. Although there are a number of available Arabic corpora of errors and corrections [2] such as QALB [10] and Arabic Learner Corpus [11], the data we present in this article is an effort to increase the number of freely available Arabic corpora of errors and corrections by providing a detailed error specification and leveraging the work of language experts.
In this paper, we describe our efforts on the shared task of sarcasm and sentiment detection in A... more In this paper, we describe our efforts on the shared task of sarcasm and sentiment detection in Arabic (Abu Farha et al., 2021). The shared task consists of two subtasks: Sarcasm Detection (Subtask 1) and Sentiment Analysis (Subtask 2). Our experiments were based on fine-tuning seven BERT-based models with data augmentation to solve the imbalanced data problem. For both tasks, the MARBERT BERT-based model with data augmentation outperformed other models with an increase of the F-score by 15% for both tasks which shows the effectiveness of our approach.
arXiv (Cornell University), May 22, 2018
Sentiment Analysis in Arabic is a challenging task due to the rich morphology of the language. Mo... more Sentiment Analysis in Arabic is a challenging task due to the rich morphology of the language. Moreover, the task is further complicated when applied to Twitter data that is known to be highly informal and noisy. In this paper, we develop a hybrid method for sentiment analysis for Arabic tweets for a specific Arabic dialect which is the Saudi Dialect. Several features were engineered and evaluated using a feature backward selection method. Then a hybrid method that combines a corpus-based and lexicon-based method was developed for several classification models (two-way, three-way, four-way). The best F1-score for each of these models was (69.9,61.63,55.07) respectively.
Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments
Loneliness and social isolation are serious and widespread problems among older people, affecting... more Loneliness and social isolation are serious and widespread problems among older people, affecting their physical and mental health, quality of life, and longevity. In this paper, we propose a ChatGPT-based conversational companion system for elderly people. The system is designed to provide companionship and help reduce feelings of loneliness and social isolation. The system was evaluated with a preliminary study. The results showed that the system was able to generate responses that were relevant to the created elderly personas. However, it is essential to acknowledge the limitations of ChatGPT, such as potential biases and misinformation, and to consider the ethical implications of using AI-based companionship for the elderly, including privacy concerns. CCS CONCEPTS • Human-centered computing → Human computer interaction (HCI) → HCI design and evaluation methods → User studies • Computing methodologies → Artificial intelligence → Natural language processing → Language generation
arXiv (Cornell University), Apr 5, 2023
arXiv (Cornell University), Nov 3, 2022
Automatic Arabic handwritten recognition is one of the recently studied problems in the field of ... more Automatic Arabic handwritten recognition is one of the recently studied problems in the field of Machine Learning. Unlike Latin languages, Arabic is a Semitic language that forms a harder challenge, especially with variability of patterns caused by factors such as writer's age. Most of the studies focused on adults, with only one recent study on children. Moreover, much of the recent Machine Learning methods focused on using Convolutional Neural Networks, a powerful class of neural networks that can extract complex features from images. In this paper we propose a convolutional neural network (CNN) model that recognizes children handwriting with an accuracy of 91% on the Hijja dataset, a recent dataset built by collecting images of the Arabic characters written by children, and 97% on Arabic Handwritten Character Dataset. The results showed a good improvement over the proposed model from the Hijja dataset authors, yet it reveals a bigger challenge to solve for children's Arabic handwritten character recognition. Moreover, we proposed a new approach using multi models instead of single model based on the number of strokes in a character, and merged Hijja with AHCD which reached an averaged prediction accuracy of 96%.
Uploads
Papers by Hend Al-Khalifa