Academia.eduAcademia.edu

The development of a chatbot using Convolutional Neural Networks

SHS Web of Conferences

Chatbots are artificial intelligence systems that comprehend the intent, context, and sentiment of the user, interact properly with them leading to an increased development of their creation, the past few years. In this study, Convolutional Neural Networks (CNNs) are applied as the classifier and some specific tools for tokenization are used for the creation of a chatbot. Taking into account that it is difficult to apply any algorithm in text, we use a technique called “Word Embedding”, which converts a text into numbers in order to run text processing. Specifically, “Word2Vec” Word Embedding technique was applied. AlexNet, LeNet5, ResNet and VGGNet CNN architectures were utilised. These architectures were compared for their accuracy, f1 score, training time and execution time. The results obtained highlighted that there were significant differences in the performance of the architectures applied. The most preferable architecture of our study was LeNet-5 having the best accuracy com...

SHS Web of Conferences 139, 03009 (2022) ETLTC2022 https://doi.org/10.1051/shsconf/202213903009 The development of a chatbot using Convolutional Neural Networks Giorgos Tsakiris1, Christos Papadopoulos1, Giannis Patrikalos1, Konstantinos-Filippos Kollias1*, Nikolaos Asimopoulos1 and George F. Fragulis1 1 Department of Electrical and Computer Engineering, University of Western Macedonia, 501 00 Kozani, Greece Abstract. Chatbots are artificial intelligence systems that comprehend the intent, context, and sentiment of the user, interact properly with them leading to an increased development of their creation, the past few years. In this study, Convolutional Neural Networks (CNNs) are applied as the classifier and some specific tools for tokenization are used for the creation of a chatbot. Taking into account that it is difficult to apply any algorithm in text, we use a technique called “Word Embedding”, which converts a text into numbers in order to run text processing. Specifically, “Word2Vec” Word Embedding technique was applied. AlexNet, LeNet5, ResNet and VGGNet CNN architectures were utilised. These architectures were compared for their accuracy, f1 score, training time and execution time. The results obtained highlighted that there were significant differences in the performance of the architectures applied. The most preferable architecture of our study was LeNet-5 having the best accuracy compared to other architectures, the fastest training time, and the least losses whereas it was not very accurate on smaller datasets such as our Test Set. Limitations and directions for future research are also presented. 1 Introduction Artificial Intelligence systems called “chatbots”, concerning human-machine interaction, have been widely used in the last decades. A chatbot must have the ability to comprehend the intent, context and sentiment of the user and interact properly with them. Machine Learning (ML) has played an important role in the development of chatbots. ML can be described as a subset of artificial intelligence in which a mathematical model based on “training data” is built to make decisions or predictions without being programmed [1]. It has been applied for several scientific purposes, such as education applications [19], sign language learning [2], gaming applications [3] and early and objective autism spectrum disorder (ASD) assessment [4,5]. The structure of a chatbot is based on preprocessed data that can easily be understood by its engine. This process consists of the raw text input, which is separated into single words, called tokens. The tokenized strings are subjected to text processing, creating regular expressions, and removing any punctuation marks and stop words. These entities are classified into predefined classes that, either preexist, or are created by the programmer of the system. Then, there is an intent classification during which the chatbot attempts to understand what the user wants in order to respond with an answer or a solution to the problem. There are two main steps someone should take concerning chatbots, i.e., the way in which the tokens are made and their classification [6]. Chatbots have been developed for several domains such as home automation and Internet of Things [7], customer service [8], banking [9], and elderly care [10]. All these chatbots use Natural Language Processing (NLP) which enables the communication between machine-to-user and user-tomachine utilising human natural language [11]. NLP encodes the information the user inserts into vector. In other words, it aims at text processing with computers for its analysis, information extraction and eventually representation of the same information in a different way [12]. In the current study, the creation of the chatbot is based on Convolutional Neural Networks (CNNs). CNNs are applied as a classifier and some specific tools for tokenization are used. A Convolutional Neural Network is a deep learning algorithm keeping a hierarchical structure in which layers learn from each other. Considering that it is difficult to apply any algorithm in text, a technique called “Word Embedding”, which converts text into numbers in order to run text processing, was used. In this paper, “Word2Vec” technique was applied because it offers a great amount of pretrained data and enables users to train their own data set assuming they have enough data for the problem. Additionally, a unique feature of Word2vec is that vectors can be got from other vectors using vector operations [6]. This is the preprocess needed to insert the data into the CNN. AlexNet, LeNet5, ResNet and VGGNet CNN architectures were utilised. There have been several earlier studies applying CNN for the creation of a chatbot. For example, an emotion recognition algorithm was developed by [13]. * Corresponding author: [email protected] © The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0 (http://creativecommons.org/licenses/by/4.0/). SHS Web of Conferences 139, 03009 (2022) ETLTC2022 https://doi.org/10.1051/shsconf/202213903009 Vector” and “Term Frequency-Inverse Document Frequency (TF-IDF)”. In this paper, Word2Vec and more specifically an extension of it, called “Doc2Vec” was utilised [14]. Word2Vec takes single words as input, whereas Doc2Vec takes a whole sentence as input. Word2Vec offers a big amount of pretrained data that can be found online and used on any system. Additionally, it lets users upload their own data to train the network. However, a lot of data is required in order to have a successful training process. Also, a unique feature of Word2vec is that vectors can be got from other vectors using vector operations [6]. These entities are classified into predefined classes that, either preexist or are created by the programmer of the system. Word Embedding is followed by entity classification, a procedure in which there is a recognition of information units derived from unstructured text such as names, including person, organisation names, numeric expressions, and location names. The identification of references to these entities in text was recognised as one of the significant sub-tasks of Information Extraction (IE) [15]. Then, intent classification follows during which the chatbot tries to understand what the user wants before finally responding with an answer or a solution to the problem. This algorithm was related to voice-enabled chatbots, and the emotion recognition consisted of two steps. Initially, the user sound received was transformed into a spectrogram by utilising Mel-frequency cepstral coefficient. This spectrogram was analysed by a CNN which classified it into five emotions. In another study by [11], a chatbot which identified users’ mental states tried to prevent negative emotions and made users feel positive by urging them to have more constructive thoughts. Three deep learning classifiers were used for emotion detection i.e., Convolutional Neural Network (CNN), Recurrent Neural Network (CNN), and Hierarchical Attention Network (HAN). Therefore, CNN can be applied for chatbot creation of different domains bringing significant results. 1.1 Objective The main objective of this paper is to use Convolutional Neural Networks (CNNs) as a classifier in order to create a chatbot and compare how different architectures can result in different training times and accuracy. Moreover, a specific tool for tokenization is used. Therefore, it can be hypothesised that different CNNs architectures can bring promising results in the creation of a chatbot. Raw Text (User’s input) 2 Methodology 2.1 Design There is a certain complexity regarding the choice of the appropriate methodology for making a chatbot understand the user’s input, as it needs training based on the context of the conversation. According to the flowchart in Figure 1, the following steps must be taken. Initially, the data need to follow a preprocess procedure, involving the separation of the raw text input from the user into single words, called tokens. Apart from the removal of any punctuation marks and stop words, the tokenized strings go through text normalization, which can be accomplished by two techniques, stemming and lemmatisation. Stemming slices a string into smaller substrings using a set of rules (or a model). The idea is to remove word affixes (especially suffixes) that change the meaning. Lemmatisation, on the other hand, looks up every token in a dictionary and returns the canonical "head" word in the dictionary, which is known as lemma. It can handle unusual cases as well as tokens with diverse parts of speech because it looks up tokens from a ground truth. Both stemming and lemmatisation have benefits and drawbacks. Stemming is faster because it simply asks users to splice word strings. Lemmatisation, on the other hand, dictates a lookup in a dictionary or database and relies on part-of-speech tags to determine a word's root lemma, making it considerably slower but more effective than stemming. The latest is the one that is used in this research [14]. It is difficult to apply any algorithm in text, so a technique called “Word Embedding”, which converts text into numbers, is applied. Word Embedding can be done by a variety of tools such as, “Word2Vec”, “Count Word Embedding (Word2Vec / Doc2Vec, Count Vector, TF-IDF) Tokenization Entity Classifier Text Normalization (Stemming, Lemmatization) Removing Stop Words Intent Classifier Chatbot’s Response Fig. 1. A flowchart showing how a chatbot engine processes an input string and gives a valid reply. 2 SHS Web of Conferences 139, 03009 (2022) ETLTC2022 https://doi.org/10.1051/shsconf/202213903009 2.2 Proposed Approach 3 Results and Discussion CNNs are a deep learning algorithm keeping a hierarchical structure in which layers learn from each other. They are one of the most popular deep learning classifiers associated with image classification, as well. CNNs have also been used for Relation Classification and Relation Extraction [16,17]. CNNs have multiple layers, such as the convolutional layer and the fully connected layer, which have parameters, as well as the non-linearity layer, and the pooling layer which have no parameters. AlexNet, LeNet5, ResNet and VGGNet CNN architectures were utilised in this study and the differences between them are presented in Table 1. In image classification task, LeNet-5 is the only architecture that gets greyscale images as input. This study utilised Convolutional Neural Networks (CNNs) as a classifier to create a chatbot and compare how different architectures can result in different training times and accuracy. Moreover, a specific tool for tokenization was used. Figure 1 presents an example of our chatbot’s performance in which 75% of the questions asked are answered properly. On Table 2 and Figures 3-5, it is shown that LeNet5 achieved the highest accuracy and the smallest loss requiring the minimum training time. On the other hand, VGGNet achieved the lowest accuracy and the highest loss requiring the maximum training time. Table 2: The results of each architecture after training for 200 Epochs. Fig. 2. An example of our model’s performance. Table 1: The number of the main layers used by the architectures. Convolutional Layers Fully Connected Layers Pooling Layers Output Activation Function AlexNet 5 3 3 Softmax LeNet-5 2 3 2 Softmax ResNet50 49 1 2 Softmax VGGNet 13 3 5 Softmax The experimental procedure was utilised on a computer with Intel Core i7-6700HQ (2.60GHz), 16GB of RAM, Windows 10 Home OS (Version 20H2), GeForce GTX 960M Graphics Card. The necessary code was written in Python 3.8 and the Anaconda Virtual Environment was also used to work with the chatbot’s environment to avoid files running locally on the computer. NLTK library was added, as well. CNN models were trained for the implementation of the chatbot [Table 2]. The training was done for 200 epochs on each model, using an existing dataset, which contained 826 questions along with 352 unique answers [18]. As Loss Function we used “sparse_categorical_crossentropy” whereas “Adam” was employed as an optimiser. A test-dataset was used for model evaluation. The Test Set used 1/3 of the original data of the dataset. Training Time (for 200 epochs) Accuracy F1score Loss AlexNet 58mins:47secs 0.1684 91.25 3.999 LeNet-5 2mins:00secs 0.9819 4.717 0.1548 ResNet50 8h:59mins:56sec 0.8756 2.618 0.4388 VGGNet 16h:39mins:34secs 0.1269 41.49 4.46 Fig. 3. Epoch accuracy. Fig. 4. F1 – Score. 3 SHS Web of Conferences 139, 03009 (2022) ETLTC2022 https://doi.org/10.1051/shsconf/202213903009 3. 4. Fig. 5. Loss. Table 3 presents the evaluation numbers of each architecture. It is obvious, that values differ a lot from the ones depicted on Table 2 on some occasions. LeNet5 still achieved the highest accuracy but not the smallest loss, as it did on the entire dataset. Also, it is noticeable that ResNet-50 attains 0 accuracy with the highest loss, whereas no results were obtained for the VGGNet architecture. 5. 6. Table 3: The evaluation numbers of each architecture. AlexNet Preprocess Time 1192.9145 sec 7. Accuracy F1score Loss 0.14 98.13 6.98 LeNet-5 7.6288 sec 0.21 12.13 7.40 ResNet50 1156.0755 sec 0.0 3.60 19.49 VGGNet -- -- -- -- 8. 4 Conclusion – Future Directions 9. This paper highlighted the construction and training of a general purpose chatbot. The methodology behind the creation of the chatbot was analysed step by step using four different architectures of Convolutional Neural Networks (CNNs) as a classifier. The results obtained differ in terms of the architectures used, concluding that the best architecture is LeNet-5, as it has the best accuracy, as well as the fastest training time and the least losses. It should be pointed out that on smaller datasets, such as our Test Set, this method is not very accurate. In the future, we plan to increase accuracy for each architecture keeping the training time in reasonable limits. Additionally, we would like to apply more architectures in order to compare them and find the most optimal one. Finally, we could utilise more data sets, and thus the chatbot could get more expertise on specific areas of interest. 10. 11. 12. References 1. 2. 13. Zhang, X.-D. A Matrix Algebra Approach to Artificial Intelligence; Springer, 2020; Papatsimouli, M.; Lazaridis, L.; Kollias, K.-F.; Skordas, I.; Fragulis, G.F. Speak with Signs: 4 Active Learning Platform for Greek Sign Language, English Sign Language, and Their Translation. 2020, doi:10.48550/arXiv.2012.11981. Hitboxes: A Survey About Collision Detection in Video Games | SpringerLink Available online: https://link.springer.com/chapter/10.1007/978-3030-77277-2_24 (accessed on 21 March 2022). Kollias, K.-F.; Syriopoulou-Delli, C.K.; Sarigiannidis, P.; Fragulis, G.F. The Contribution of Machine Learning and EyeTracking Technology in Autism Spectrum Disorder Research: A Review Study. In Proceedings of the 2021 10th International Conference on Modern Circuits and Systems Technologies (MOCAST); IEEE, 2021; pp. 1–4. Kollias, K.-F.; Syriopoulou-Delli, C.K.; Sarigiannidis, P.; Fragulis, G.F. The Contribution of Machine Learning and EyeTracking Technology in Autism Spectrum Disorder Research: A Systematic Review. Electronics 2021, 10, 2982. Manaswi, N.K.; Manaswi, N.K.; John, S. Deep Learning with Applications Using Python; Springer, 2018; Baby, C.J.; Khan, F.A.; Swathi, J.N. Home Automation Using IoT and a Chatbot Using Natural Language Processing. In Proceedings of the 2017 Innovations in Power and Advanced Computing Technologies (i-PACT); April 2017; pp. 1–6. D’silva, G.M.; Thakare, S.; More, S.; Kuriakose, J. Real World Smart Chatbot for Customer Care Using a Software as a Service (SaaS) Architecture. In Proceedings of the 2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC); February 2017; pp. 658–664. Kulkarni, C.S.; Bhavsar, A.U.; Pingale, S.R.; Kumbhar, S.S. BANK CHAT BOT – An Intelligent Assistant System Using NLP and Machine Learning. 04, 5. Su, M.-H.; Wu, C.-H.; Huang, K.-Y.; Hong, Q.B.; Wang, H.-M. A Chatbot Using LSTM-Based Multi-Layer Embedding for Elderly Care. In Proceedings of the 2017 International Conference on Orange Technologies (ICOT); December 2017; pp. 70–74. Patel, F.; Thakore, R.; Nandwani, I.; Bharti, S.K. Combating Depression in Students Using an Intelligent ChatBot: A Cognitive Behavioral Therapy. In Proceedings of the 2019 IEEE 16th India Council International Conference (INDICON); IEEE, 2019; pp. 1–4. Conneau, A.; Schwenk, H.; Barrault, L.; Lecun, Y. Very Deep Convolutional Networks for Text Classification. arXiv preprint arXiv:1606.01781 2016. Lee, M.-C.; Chiang, S.-Y.; Yeh, S.-C.; Wen, T.F. Study on Emotion Recognition and Companion Chatbot Using Deep Neural Network. Multimedia Tools and Applications 2020, 79, 19629–19657. SHS Web of Conferences 139, 03009 (2022) ETLTC2022 14. 15. 16. 17. 18. 19. https://doi.org/10.1051/shsconf/202213903009 Bengfort, B.; Bilbro, R.; Ojeda, T. Applied Text Analysis with Python: Enabling LanguageAware Data Products with Machine Learning; O’Reilly Media, Inc., 2018; ISBN 978-1-49196299-2. Nadeau, D.; Sekine, S. A Survey of Named Entity Recognition and Classification. Lingvisticae Investigationes 2007, 30, 3–26. Zeng, D.; Liu, K.; Lai, S.; Zhou, G.; Zhao, J. Relation Classification via Convolutional Deep Neural Network. In Proceedings of the Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers; Dublin City University and Association for Computational Linguistics: Dublin, Ireland, August 2014; pp. 2335–2344. Nguyen, T.H.; Grishman, R. Relation Extraction: Perspective from Convolutional Neural Networks. In Proceedings of the Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing; Association for Computational Linguistics: Denver, Colorado, June 2015; pp. 39–48. Question-Answer Dataset Available online: https://kaggle.com/rtatman/questionanswerdataset (accessed on 16 January 2022). Fragulis, G. F., Papatsimouli, M., Lazaridis, L., & Skordas, I. A. (2021). An Online Dynamic Examination System (ODES) based on open source software tools. Software Impacts, 7, 100046. 5