Mots-clefs analyse syntaxique, grammaires catégorielles, théorie des types, assistant de preuves ... more Mots-clefs analyse syntaxique, grammaires catégorielles, théorie des types, assistant de preuves Résumé Cet article présente le projet de l'atelier logique Á À Ê Ì dédié à l'étude des grammaires catégorielles multimodales. Cet atelier se présente sous la forme de bibliothèques pour l'assistant de preuves Coq.
Abstract. Standard Arabic (SA) is an extremely rich natural language that has unfortunately recei... more Abstract. Standard Arabic (SA) is an extremely rich natural language that has unfortunately received very little interest within computational linguistics literature. We propose in this paper to explore this fertile ground and show the first steps towards the formalization of Arabic syntax and semantics by means of MultiModal Categorial Grammars. We will particularly focus on the analysis of some phenomena related to nominal sentences construction in SA using relevant packages of lexically anchored structural rules. 1
Cet article présente le projet de l’atelier logique ICHARATE dédié à l’étude des grammaires catég... more Cet article présente le projet de l’atelier logique ICHARATE dédié à l’étude des grammaires catégorielles multimodales. Cet atelier se présente sous la forme de bibliothèques pour l’assistant de preuves Coq.
Standard Arabic (SA) is an extremely rich natural language that has unfortunately received very l... more Standard Arabic (SA) is an extremely rich natural language that has unfortunately received very little interest within computational linguistics literature. We propose in this paper to explore this fertile ground and show the first steps towards the formalization of Arabic syntax and semantics by means of MultiModal Categorial Grammars. We will particularly focus on the analysis of some phenomena related to nominal sentences construction in SA using relevant packages of lexically anchored structural rules.
International Journal of Computing and Digital Systems
Topic modeling algorithms can better understand data by extracting meaningful words from text col... more Topic modeling algorithms can better understand data by extracting meaningful words from text collection, but the results are often inconsistent, and consequently difficult to interpret. Enrich the model with more contextual knowledge can improve coherence. Recently, neural topic models have emerged, and the development of neural models, in general, was pushed by BERT-based representations. We propose in this paper, a model named AraBERTopic to extract news from Facebook pages. Our model combines the Pre-training BERT transformer model for the Arabic language (AraBERT) and neural topic model ProdLDA. Thus, compared with the standard LDA, pre-trained BERT sentence embeddings produce more meaningful and coherent topics using different embedding models. Results show that our AraBERTopic model gives 0.579 in topic coherence.
With the fast growth of mobile technology, social media has become important for people to share ... more With the fast growth of mobile technology, social media has become important for people to share their thoughts and feelings. Businesses and governments can make better strategic decisions when they know what the public thinks. Because of this, sentiment analysis is an important tool for figuring out how different people’s opinions are. This article presents a deeplearning ensemble model for sentiment analysis. The ensemble model proposed consists of three deep-learning models, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM), as base classifiers. AraBERT is responsible for presenting the textual input data into representative embeddings. The stacking ensemble model then captures the long-range dependencies in the embedding for a given class. As a meta-classifier, Support Vector Machine (SVM) then combines the predictions made by the stacking deep learning model. In addition, data augmentation with AraGPT was implemented to address the imbal...
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Topic models extract meaningful words from text collection, allowing for a better understanding o... more Topic models extract meaningful words from text collection, allowing for a better understanding of data. However, the results are often not coherent enough, and thus harder to interpret. Adding more contextual knowledge to the model can enhance coherence. In recent years, neural network-based topic models become available, and the development level of the neural model has developed thanks to BERT-based representation. In this study, we suggest a model extract news on the Aljazeera Facebook page. Our approach combines the neural model (ProdLDA) and the Arabic Pre-training BERT transformer model (AraBERT). Therefore, the proposed model produces more expressive and consistent topics than ELMO using different topic model algorithms (ProdLDA and LDA) with 0.883 in topic coherence.
Proceedings of the 2nd International Conference on Networking, Information Systems & Security
Twitter is a social networking service, on which users can share thoughts and interact with event... more Twitter is a social networking service, on which users can share thoughts and interact with events. In this paper, the authors propose a distributed approach to combine the multilingualism analysis of hashtags generated by Moroccan users in the social network Twitter, to discover and understand the hot subjects that attract the community. The analysis of Moroccan twitter hashtags is a challenge for two main reasons. Firstly, since the Moroccan society is characterized by linguistic diversity, hashtags are expressed in several languages. Secondly, the hashtags of Moroccan users may include spelling errors and abbreviations and do not contain delimiters between words, which leads to misinterpretation. In this paper, we propose a distributed approach using Apache Hadoop Framework and Natural Language Processing Techniques for processing and mining Moroccan hashtags by a program we developed using open source libraries. The result is a clean corpus, which is stored in Apache Hive to allow applying analytic queries. Finally, we apply K-means algorithm to cluster all hashtags into general topics, and then plot them on the Moroccan map to specify their sources of publication by using the coordinates extracted from the tweets.
International Journal on Electrical Engineering and Informatics
The pre-trained word embedding models become widely used in Natural Language Processing (NLP), bu... more The pre-trained word embedding models become widely used in Natural Language Processing (NLP), but they disregard the context and sense of the text. We study in this paper, the capacity of pre-trained BERT model (Bidirectional Encoder Representations from Transformers) for the Arabic language to classify Arabic tweets using a hybrid network of two famous models; Bidirectional Long Short Term Memory (BiLSTM) and Gated Recurrent Unit (GRU) inspired by the great achievement of deep learning algorithms. In this context, we finetuned the Arabic BERT (AraBERT) parameters and we used it on three merged datasets to impart its knowledge for the Arabic sentiment analysis. For that, we lead the experiments by comparing the AraBERT model in one hand in the word embedding phase, with a statics pretrained word embeddings method namely AraVec and FastText, and on another hand in the classification phase, we compared the hybrid model with convolutional neural network (CNN), long short-term memory (LSTM), BiLSTM, and GRU, which are prevalently preferred in sentiment analysis. The results demonstrate that the fine-tuned AraBERT model, combined with the hybrid network, achieved peak performance with up to 94% accuracy.
Unsupervised machine learning is utilized as a part of the process of topic modeling to discover ... more Unsupervised machine learning is utilized as a part of the process of topic modeling to discover dormant topics hidden within a large number of documents. The topic model can help with the comprehension, organization, and summarization of large amounts of text. Additionally, it can assist with the discovery of hidden topics that vary across different texts in a corpus. Traditional topic models like pLSA (probabilistic latent semantic analysis) and LDA suffer performance loss when applied to short-text analysis caused by the lack of word co-occurrence information in each short text. One technique being developed to solve this problem is pre-trained word embedding (PWE) with an external corpus used with topic models. These techniques are being developed to perform interpretable topic modeling on short texts. Deep neural networks (DNN) and deep generative models have recently advanced, allowing neural topic models (NTM) to achieve flexibility and efficiency in topic modeling. There hav...
Indonesian Journal of Innovation and Applied Sciences (IJIAS)
Twitter Sentiment Analysis is the task of detecting opinions and sentiments in tweets using diffe... more Twitter Sentiment Analysis is the task of detecting opinions and sentiments in tweets using different algorithms. In our research work, we conducted a study to analyze and compare different Algorithms of Machine Learning (MLAs) for the classification task, and hence we collected 37 875 Moroccan tweets, during the COVID-19 pandemic, from 01 March 2020 to 28 June 2020. The analysis was done using six classification algorithms (Naive Bayes, Logistic Regression, Support Vector Machine, K-Nearest Neighbors, Decision Tree, Random Forest classifier) and considering Accuracy, Recall, Precision, and F-Score as evaluation parameters. Then we applied topic modeling over the three classified tweets categories (negative, positive, and neutral) using Latent Dirichlet Allocation (LDA) which is among the most effective approaches to extract discussed topics. As result, the logistic regression classifier gave the best predictions of sentiments with an accuracy of 68.80%.
Mots-clefs analyse syntaxique, grammaires catégorielles, théorie des types, assistant de preuves ... more Mots-clefs analyse syntaxique, grammaires catégorielles, théorie des types, assistant de preuves Résumé Cet article présente le projet de l'atelier logique Á À Ê Ì dédié à l'étude des grammaires catégorielles multimodales. Cet atelier se présente sous la forme de bibliothèques pour l'assistant de preuves Coq.
Abstract. Standard Arabic (SA) is an extremely rich natural language that has unfortunately recei... more Abstract. Standard Arabic (SA) is an extremely rich natural language that has unfortunately received very little interest within computational linguistics literature. We propose in this paper to explore this fertile ground and show the first steps towards the formalization of Arabic syntax and semantics by means of MultiModal Categorial Grammars. We will particularly focus on the analysis of some phenomena related to nominal sentences construction in SA using relevant packages of lexically anchored structural rules. 1
Cet article présente le projet de l’atelier logique ICHARATE dédié à l’étude des grammaires catég... more Cet article présente le projet de l’atelier logique ICHARATE dédié à l’étude des grammaires catégorielles multimodales. Cet atelier se présente sous la forme de bibliothèques pour l’assistant de preuves Coq.
Standard Arabic (SA) is an extremely rich natural language that has unfortunately received very l... more Standard Arabic (SA) is an extremely rich natural language that has unfortunately received very little interest within computational linguistics literature. We propose in this paper to explore this fertile ground and show the first steps towards the formalization of Arabic syntax and semantics by means of MultiModal Categorial Grammars. We will particularly focus on the analysis of some phenomena related to nominal sentences construction in SA using relevant packages of lexically anchored structural rules.
International Journal of Computing and Digital Systems
Topic modeling algorithms can better understand data by extracting meaningful words from text col... more Topic modeling algorithms can better understand data by extracting meaningful words from text collection, but the results are often inconsistent, and consequently difficult to interpret. Enrich the model with more contextual knowledge can improve coherence. Recently, neural topic models have emerged, and the development of neural models, in general, was pushed by BERT-based representations. We propose in this paper, a model named AraBERTopic to extract news from Facebook pages. Our model combines the Pre-training BERT transformer model for the Arabic language (AraBERT) and neural topic model ProdLDA. Thus, compared with the standard LDA, pre-trained BERT sentence embeddings produce more meaningful and coherent topics using different embedding models. Results show that our AraBERTopic model gives 0.579 in topic coherence.
With the fast growth of mobile technology, social media has become important for people to share ... more With the fast growth of mobile technology, social media has become important for people to share their thoughts and feelings. Businesses and governments can make better strategic decisions when they know what the public thinks. Because of this, sentiment analysis is an important tool for figuring out how different people’s opinions are. This article presents a deeplearning ensemble model for sentiment analysis. The ensemble model proposed consists of three deep-learning models, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM), as base classifiers. AraBERT is responsible for presenting the textual input data into representative embeddings. The stacking ensemble model then captures the long-range dependencies in the embedding for a given class. As a meta-classifier, Support Vector Machine (SVM) then combines the predictions made by the stacking deep learning model. In addition, data augmentation with AraGPT was implemented to address the imbal...
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Topic models extract meaningful words from text collection, allowing for a better understanding o... more Topic models extract meaningful words from text collection, allowing for a better understanding of data. However, the results are often not coherent enough, and thus harder to interpret. Adding more contextual knowledge to the model can enhance coherence. In recent years, neural network-based topic models become available, and the development level of the neural model has developed thanks to BERT-based representation. In this study, we suggest a model extract news on the Aljazeera Facebook page. Our approach combines the neural model (ProdLDA) and the Arabic Pre-training BERT transformer model (AraBERT). Therefore, the proposed model produces more expressive and consistent topics than ELMO using different topic model algorithms (ProdLDA and LDA) with 0.883 in topic coherence.
Proceedings of the 2nd International Conference on Networking, Information Systems & Security
Twitter is a social networking service, on which users can share thoughts and interact with event... more Twitter is a social networking service, on which users can share thoughts and interact with events. In this paper, the authors propose a distributed approach to combine the multilingualism analysis of hashtags generated by Moroccan users in the social network Twitter, to discover and understand the hot subjects that attract the community. The analysis of Moroccan twitter hashtags is a challenge for two main reasons. Firstly, since the Moroccan society is characterized by linguistic diversity, hashtags are expressed in several languages. Secondly, the hashtags of Moroccan users may include spelling errors and abbreviations and do not contain delimiters between words, which leads to misinterpretation. In this paper, we propose a distributed approach using Apache Hadoop Framework and Natural Language Processing Techniques for processing and mining Moroccan hashtags by a program we developed using open source libraries. The result is a clean corpus, which is stored in Apache Hive to allow applying analytic queries. Finally, we apply K-means algorithm to cluster all hashtags into general topics, and then plot them on the Moroccan map to specify their sources of publication by using the coordinates extracted from the tweets.
International Journal on Electrical Engineering and Informatics
The pre-trained word embedding models become widely used in Natural Language Processing (NLP), bu... more The pre-trained word embedding models become widely used in Natural Language Processing (NLP), but they disregard the context and sense of the text. We study in this paper, the capacity of pre-trained BERT model (Bidirectional Encoder Representations from Transformers) for the Arabic language to classify Arabic tweets using a hybrid network of two famous models; Bidirectional Long Short Term Memory (BiLSTM) and Gated Recurrent Unit (GRU) inspired by the great achievement of deep learning algorithms. In this context, we finetuned the Arabic BERT (AraBERT) parameters and we used it on three merged datasets to impart its knowledge for the Arabic sentiment analysis. For that, we lead the experiments by comparing the AraBERT model in one hand in the word embedding phase, with a statics pretrained word embeddings method namely AraVec and FastText, and on another hand in the classification phase, we compared the hybrid model with convolutional neural network (CNN), long short-term memory (LSTM), BiLSTM, and GRU, which are prevalently preferred in sentiment analysis. The results demonstrate that the fine-tuned AraBERT model, combined with the hybrid network, achieved peak performance with up to 94% accuracy.
Unsupervised machine learning is utilized as a part of the process of topic modeling to discover ... more Unsupervised machine learning is utilized as a part of the process of topic modeling to discover dormant topics hidden within a large number of documents. The topic model can help with the comprehension, organization, and summarization of large amounts of text. Additionally, it can assist with the discovery of hidden topics that vary across different texts in a corpus. Traditional topic models like pLSA (probabilistic latent semantic analysis) and LDA suffer performance loss when applied to short-text analysis caused by the lack of word co-occurrence information in each short text. One technique being developed to solve this problem is pre-trained word embedding (PWE) with an external corpus used with topic models. These techniques are being developed to perform interpretable topic modeling on short texts. Deep neural networks (DNN) and deep generative models have recently advanced, allowing neural topic models (NTM) to achieve flexibility and efficiency in topic modeling. There hav...
Indonesian Journal of Innovation and Applied Sciences (IJIAS)
Twitter Sentiment Analysis is the task of detecting opinions and sentiments in tweets using diffe... more Twitter Sentiment Analysis is the task of detecting opinions and sentiments in tweets using different algorithms. In our research work, we conducted a study to analyze and compare different Algorithms of Machine Learning (MLAs) for the classification task, and hence we collected 37 875 Moroccan tweets, during the COVID-19 pandemic, from 01 March 2020 to 28 June 2020. The analysis was done using six classification algorithms (Naive Bayes, Logistic Regression, Support Vector Machine, K-Nearest Neighbors, Decision Tree, Random Forest classifier) and considering Accuracy, Recall, Precision, and F-Score as evaluation parameters. Then we applied topic modeling over the three classified tweets categories (negative, positive, and neutral) using Latent Dirichlet Allocation (LDA) which is among the most effective approaches to extract discussed topics. As result, the logistic regression classifier gave the best predictions of sentiments with an accuracy of 68.80%.
Uploads
Papers by Houda Anoun