Social networks facilitate communication between people from all over the world. Unfortunately, t... more Social networks facilitate communication between people from all over the world. Unfortunately, the excessive use of social networks leads to the rise of antisocial behaviors such as the spread of online offensive language, cyberbullying (CB), and hate speech (HS). Therefore, abusive\offensive and hate detection become a crucial part of cyberharassment. Manual detection of cyberharassment is cumbersome, slow, and not even feasible in rapidly growing data. In this study, we addressed the challenges of automatic detection of the offensive tweets in the Arabic language. The main contribution of this study is to design and implement an intelligent prediction system encompassing a two-stage optimization approach to identify and classify the offensive from the non-offensive text. In the first stage, the proposed approach fine-tuned the pre-trained word embedding models by training them for several epochs on the training dataset. The embeddings of the vocabularies in the new dataset are trained and added to the old embeddings. While in the second stage, it employed a hybrid approach of two classifiers, namely XGBoost and SVM, and a genetic algorithm (GA) to mitigate the drawback of the classifiers in finding the optimal hyperparameter values to run the proposed approach. We tested the proposed approach on Arabic Cyberbullying Corpus (ArCybC), which contains tweets collected from four Twitter domains: gaming, sports, news, and celebrities. The ArCybC dataset has four categories: sexual, racial, intelligence, and appearance. The proposed approach produced superior results, in which the SVM algorithm with the Aravec SkipGram word embedding model achieved an accuracy rate of 88.2% and an F1-score rate of 87.8%. INDEX TERMS Arabic harassment dataset, deep learning, evolutionary algorithm, fine-tuned word embedding, hate speech, offensive language, optimization.
The problem of finding the shortest path between two nodes is a common problem that requires a so... more The problem of finding the shortest path between two nodes is a common problem that requires a solution in many applications like games, robotics, and real-life problems. Since its deals with a large number of possibilities. Therefore, parallel algorithms are suitable to solve this optimization problem that has attracted a lot of researchers from both industry and academia to find the optimal path in terms of runtime, speedup, efficiency, and cost compared to sequential algorithms. In mountain climbing, finding the shortest path from the start node under the mountain to reach the destination node is a fundamental operator, and there are some interesting issues to be studied in mountain climbing that cannot be found in a traditional two-dimensional space search. We present a parallel Ant Colony Optimization (ACO) to find the shortest path in the mountain climbing problem using Apache Spark. The proposed algorithm guarantees the security of the selected path by applying some constraints that take into account the secure slope angle for the path. A generated dataset with variable sizes is used to evaluate the proposed algorithm in terms of runtime, speedup, efficiency, and cost. The experimental results show that the parallel ACO algorithm significantly (p < 0.05) outperformed the best sequential ACO. On the other hand, the parallel ACO algorithm is compared with one of the most recent research from the literature for finding the best path for mountain climbing problems using the parallel A* algorithm with Apache Spark. The parallel ACO algorithm with Spark significantly outperformed the parallel A* algorithm. INDEX TERMS Apache spark, ant colony, parallel algorithm, path-finding problem, optimization.
2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT)
Event detection is essential for decision makers to understand the events surrounding their real ... more Event detection is essential for decision makers to understand the events surrounding their real world. Social media microblogging platforms play a significant role in our life. One of these platforms is Twitter, which has an extreme high exchange rate and accordingly has become a valuable and relevant source for many political and social events. Event detection from social media attracted the attention of researchers in different natural languages. Extracting and detecting events from Arabic tweets is still under investigation. In this paper, we have used the Python Natural Language Toolkit (NLTK) library to develop two classifiers for filtering and detecting extracted events from Arabic tweets. The first classifier filters the collected tweets using two passes. The first pass identifies the hashtags while the second pass does a shallow analysis on the tweets content. The second classifier analyzes the text extracted from the tweets. As a case study, we present the tragic events of the Jordan flash floods near the Dead Sea. The model successfully filtered all the collected tweets and picked the ones describing the incidents within that region. Analyzed data revealed important information to learn from this lesson for the future. The solution can be generalized and adapted to other problems.
Cyberbullying (CB) is classified as one of the severe misconducts on social media. Many CB detect... more Cyberbullying (CB) is classified as one of the severe misconducts on social media. Many CB detection systems have been developed for many natural languages to face this phenomenon. However, Arabic is one of the under-resourced languages suffering from the lack of quality datasets in many computational research areas. This paper discusses the design, construction, and evaluation of a multi-dialect, annotated Arabic Cyberbullying Corpus (ArCybC), a valuable resource for Arabic CB detection and motivation for future research directions in Arabic Natural Language Processing (NLP). The study describes the phases of ArCybC compilation. By way of illustration, it explores the corpus to discover strategies used in rendering Arabic CB tweets pulled from four Twitter groups, including gaming, sports, news, and celebrities. Based on thorough analysis, we discovered that these groups were the most susceptible to harassment and cyberbullying. The collected tweets were filtered based on a compiled harassment lexicon, which contains a list of multi-dialectical profane words in Arabic compiled from four categories: sexual, racial, physical appearance, and intelligence. To annotate ArCybC, we asked five annotators to classify 4,505 tweets into two classes manually: Offensive/non-Offensive and CB/non-CB. We conducted a rigorous comparison of different machine learning approaches applied on ArCybC to detect Arabic CB using two language models: bag-of-words (BoW) and word embedding. The experiments showed that Support Vector Machine (SVM) with word embedding achieved an accuracy rate of 86.3% and an F1-score rate of 85%. The main challenges encountered during the ArCybC construction were the scarcity of freely available Arabic CB texts and the deficiency of annotating the texts.
Event detection is essential for decision makers to understand the events surrounding their real ... more Event detection is essential for decision makers to understand the events surrounding their real world. Social media microblogging platforms play a significant role in our life. One of these platforms is Twitter, which has an extreme high exchange rate and accordingly has become a valuable and relevant source for many political and social events. Event detection from social media attracted the attention of researchers in different natural languages. Extracting and detecting events from Arabic tweets is still under investigation. In this paper, we have used the Python Natural Language Toolkit (NLTK) library to develop two classifiers for filtering and detecting extracted events from Arabic tweets. The first classifier filters the collected tweets using two passes. The first pass identifies the hashtags while the second pass does a shallow analysis on the tweets content. The second classifier analyzes the text extracted from the tweets. As a case study, we present the tragic events of...
Social networks facilitate communication between people from all over the world. Unfortunately, t... more Social networks facilitate communication between people from all over the world. Unfortunately, the excessive use of social networks leads to the rise of antisocial behaviors such as the spread of online offensive language, cyberbullying (CB), and hate speech (HS). Therefore, abusive\offensive and hate detection become a crucial part of cyberharassment. Manual detection of cyberharassment is cumbersome, slow, and not even feasible in rapidly growing data. In this study, we addressed the challenges of automatic detection of the offensive tweets in the Arabic language. The main contribution of this study is to design and implement an intelligent prediction system encompassing a two-stage optimization approach to identify and classify the offensive from the non-offensive text. In the first stage, the proposed approach fine-tuned the pre-trained word embedding models by training them for several epochs on the training dataset. The embeddings of the vocabularies in the new dataset are trained and added to the old embeddings. While in the second stage, it employed a hybrid approach of two classifiers, namely XGBoost and SVM, and a genetic algorithm (GA) to mitigate the drawback of the classifiers in finding the optimal hyperparameter values to run the proposed approach. We tested the proposed approach on Arabic Cyberbullying Corpus (ArCybC), which contains tweets collected from four Twitter domains: gaming, sports, news, and celebrities. The ArCybC dataset has four categories: sexual, racial, intelligence, and appearance. The proposed approach produced superior results, in which the SVM algorithm with the Aravec SkipGram word embedding model achieved an accuracy rate of 88.2% and an F1-score rate of 87.8%. INDEX TERMS Arabic harassment dataset, deep learning, evolutionary algorithm, fine-tuned word embedding, hate speech, offensive language, optimization.
The problem of finding the shortest path between two nodes is a common problem that requires a so... more The problem of finding the shortest path between two nodes is a common problem that requires a solution in many applications like games, robotics, and real-life problems. Since its deals with a large number of possibilities. Therefore, parallel algorithms are suitable to solve this optimization problem that has attracted a lot of researchers from both industry and academia to find the optimal path in terms of runtime, speedup, efficiency, and cost compared to sequential algorithms. In mountain climbing, finding the shortest path from the start node under the mountain to reach the destination node is a fundamental operator, and there are some interesting issues to be studied in mountain climbing that cannot be found in a traditional two-dimensional space search. We present a parallel Ant Colony Optimization (ACO) to find the shortest path in the mountain climbing problem using Apache Spark. The proposed algorithm guarantees the security of the selected path by applying some constraints that take into account the secure slope angle for the path. A generated dataset with variable sizes is used to evaluate the proposed algorithm in terms of runtime, speedup, efficiency, and cost. The experimental results show that the parallel ACO algorithm significantly (p < 0.05) outperformed the best sequential ACO. On the other hand, the parallel ACO algorithm is compared with one of the most recent research from the literature for finding the best path for mountain climbing problems using the parallel A* algorithm with Apache Spark. The parallel ACO algorithm with Spark significantly outperformed the parallel A* algorithm. INDEX TERMS Apache spark, ant colony, parallel algorithm, path-finding problem, optimization.
2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT)
Event detection is essential for decision makers to understand the events surrounding their real ... more Event detection is essential for decision makers to understand the events surrounding their real world. Social media microblogging platforms play a significant role in our life. One of these platforms is Twitter, which has an extreme high exchange rate and accordingly has become a valuable and relevant source for many political and social events. Event detection from social media attracted the attention of researchers in different natural languages. Extracting and detecting events from Arabic tweets is still under investigation. In this paper, we have used the Python Natural Language Toolkit (NLTK) library to develop two classifiers for filtering and detecting extracted events from Arabic tweets. The first classifier filters the collected tweets using two passes. The first pass identifies the hashtags while the second pass does a shallow analysis on the tweets content. The second classifier analyzes the text extracted from the tweets. As a case study, we present the tragic events of the Jordan flash floods near the Dead Sea. The model successfully filtered all the collected tweets and picked the ones describing the incidents within that region. Analyzed data revealed important information to learn from this lesson for the future. The solution can be generalized and adapted to other problems.
Cyberbullying (CB) is classified as one of the severe misconducts on social media. Many CB detect... more Cyberbullying (CB) is classified as one of the severe misconducts on social media. Many CB detection systems have been developed for many natural languages to face this phenomenon. However, Arabic is one of the under-resourced languages suffering from the lack of quality datasets in many computational research areas. This paper discusses the design, construction, and evaluation of a multi-dialect, annotated Arabic Cyberbullying Corpus (ArCybC), a valuable resource for Arabic CB detection and motivation for future research directions in Arabic Natural Language Processing (NLP). The study describes the phases of ArCybC compilation. By way of illustration, it explores the corpus to discover strategies used in rendering Arabic CB tweets pulled from four Twitter groups, including gaming, sports, news, and celebrities. Based on thorough analysis, we discovered that these groups were the most susceptible to harassment and cyberbullying. The collected tweets were filtered based on a compiled harassment lexicon, which contains a list of multi-dialectical profane words in Arabic compiled from four categories: sexual, racial, physical appearance, and intelligence. To annotate ArCybC, we asked five annotators to classify 4,505 tweets into two classes manually: Offensive/non-Offensive and CB/non-CB. We conducted a rigorous comparison of different machine learning approaches applied on ArCybC to detect Arabic CB using two language models: bag-of-words (BoW) and word embedding. The experiments showed that Support Vector Machine (SVM) with word embedding achieved an accuracy rate of 86.3% and an F1-score rate of 85%. The main challenges encountered during the ArCybC construction were the scarcity of freely available Arabic CB texts and the deficiency of annotating the texts.
Event detection is essential for decision makers to understand the events surrounding their real ... more Event detection is essential for decision makers to understand the events surrounding their real world. Social media microblogging platforms play a significant role in our life. One of these platforms is Twitter, which has an extreme high exchange rate and accordingly has become a valuable and relevant source for many political and social events. Event detection from social media attracted the attention of researchers in different natural languages. Extracting and detecting events from Arabic tweets is still under investigation. In this paper, we have used the Python Natural Language Toolkit (NLTK) library to develop two classifiers for filtering and detecting extracted events from Arabic tweets. The first classifier filters the collected tweets using two passes. The first pass identifies the hashtags while the second pass does a shallow analysis on the tweets content. The second classifier analyzes the text extracted from the tweets. As a case study, we present the tragic events of...
Uploads
Papers by Fatima Shannag