Customer Reviews Analysis With Deep Neural Network

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2937518, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2019.DOI

Customer Reviews Analysis with Deep


Neural Networks for E-Commerce
Recommender Systems
BABAK MALEKI SHOJA1 , AND NASSEH TABRIZI.2 ,
1
Department of Computer Science, East Carolina University, USA (e-mail: [email protected])
2
Department of Computer Science, East Carolina University, USA (e-mail: [email protected])
Corresponding author: Babak Maleki Shoja (e-mail: [email protected]).
This research is supported in part by grants #1560037 from the National Science Foundation.

ABSTRACT An essential prerequisite of an effective recommender system is providing helpful infor-


mation regarding users and items to generate high-quality recommendations. Written customer review is
a rich source of information that can offer insights into the recommender system. However, dealing with
the customer feedback in text format, as unstructured data, is challenging. In this research, we extract
those features from customer reviews and use them for similarity evaluation of the users and ultimately
in recommendation generation. To do so, we developed a glossary of features for each product category
and evaluated them for removing irrelevant terms using Latent Dirichlet Allocation. Then, we employed a
deep neural network to extract deep features from the reviews-characteristics matrix to deal with sparsity,
ambiguity, and redundancy. We applied matrix factorization as the collaborative filtering method to provide
recommendations. As the experimental results on the Amazon.com dataset demonstrate, our methodology
improves the performance of the recommender system by incorporating information from reviews and
produces recommendations with higher quality in terms of rating prediction accuracy compared to the
baseline methods.

INDEX TERMS recommender system, review, deep neural networks, recommendation, matrix factoriza-
tion, latent dirichlet allocation.

I. INTRODUCTION find products or services they need based on analysis of user


HE exponential growth of data and information on the preferences using client profiles and their similarities or find-
T Internet confronts us with information overload. This
results in a tremendous amount of information that makes
ing products or services that are similar to those clients who
have already expressed interest in [5]. Recently, there is an
it hard for people to make choices between an enormous increasing trend in employing this approach to various areas,
number of movies, books, web pages, and other products including music, book, social tags, and products. Several e-
which poses a challenge to user’s ability to efficiently access commerce companies such as Amazon employ recommender
required data [1], [2]. Evaluating even a small portion of such systems and related tools to enhance the recommendations
data seems to be impractical, increasing the need for auto- to their customers with the primary purpose of increasing
matic recommender systems with the capability of suggest- overall profits [4], [6]–[9].
ing relevant items as well as new items to the customers and Generating proper recommendations to the user requires
clients [3], [4]. Besides, personalization and customization information about the user’s characteristics, preferences, and
for users and providing suggestions in the ever increasing needs [9]. Recommender systems mainly consider the overall
information is a crucial and challenging problem for online rating a customer gives to items and latent factor models
service providers such as e-learning. Recommender systems such as matrix factorization (MF) are widely used to predict
are a branch of information filtering systems that try to ratings. However, there are drawbacks for using MF models
predict users’ preferences for an item and provide person- such as cold-start problem, considering only the customer
alized suggestions based on this analysis for a particular overall satisfaction, and sparsity. As the literature on the MF
user. In other words, recommender systems help users to methods show, numerous researches are devoted to tackling

VOLUME 4, 2016 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2937518, IEEE Access

the weaknesses of MF methods by incorporating side infor- research is predicting ratings by using aspects of information
mation such as tags [10], [11], visual features [12], and social from users’ and items’ information. They constructed user-
relations [13], [14]. Customer reviews are one of the critical review and used it directly in the proposed model to provide
resources in developing recommender systems. A written rating predictions and recommendation lists. Besides, their
part of the review of a rating includes essential information model accomplishes a cross-domain task by transferring
on what the customer thinks about the product. word embedding.
Consequently, researchers suggest many models that ex- In another aspect-based recommender system paper, [25]
ploit reviews with ratings for improving the recommenda- proposed an aspect-aware MF model that effectively com-
tions. Some of these models are discussed in [15]–[18]. Sen- bines reviews and ratings for rating predictions. It learns the
timent analysis is one of the conventional approaches toward latent topics from reviews and ratings without having the
the analysis of customer reviews. It is mainly to predict constraint of a one-to-one mapping between latent factors
whether the attitude of a piece of text is positive or negative, and latent topics. Also, the model estimates aspect ratings and
supported or opposed [19]. Semantic analysis is employed to assign weights to the aspects. They performed experimental
analyze customer reviews [19]–[21] for different objectives results on many real-world datasets and showed the perfor-
such as to measure e-commerce service quality [22]. Some mance of their models in accurately predicting the ratings.
recent studies try to use customer reviews in developing Some aspect-based recommender systems utilize semantic
recommender systems. The approaches they utilized include analysis on reviews. For example, [26] proposed a sentiment
semantic analysis and aspect-based latent factor models [23]– utility logistic model that uses sentiment analysis of user
[26]. In this paper, we perform a customer review mining and reviews where it predicts the sentiment that the user has
extract a set of product characteristics that users mentioned about the item and then identifies the most valuable aspects
in the reviews and will use the Latent Dirichlet Allocation of the user’s possible experience with that item. For example,
(LDA) method to finalize the set of characteristics. We then the system suggests a user going to a specific restaurant (as
use the set of attributes to construct the users-attributes the primary recommendation), and also it recommends an
matrix. This matrix, however, is very sparse as each user aspect of that restaurant like the time to go to a restaurant
mentions only a few attributes in the review. Sparsity is a (breakfast, lunch, or dinner) as a valuable aspect to the user
well-known challenge in developing recommender systems. (the secondary recommendation). The experimental results
Many papers propose various solutions to deal with the demonstrated the better experience of those users who fol-
sparsity problem. To deal with this problem, we use a deep lowed the recommendations.
neural network that plays an autoencoder role which helps In the context of analyzing reviews, [28] analyzed cus-
to learn more abstract and latent attributes. Having users- tomer reviews to find out what makes a review helpful to
attributes and users-items matrix, we use an MF model to other customers. They analyzed 1,587 reviews from Ama-
predict ratings and provide recommendations. zon.com and indicated that extremity, depth of review, prod-
The rest of this paper is organized as follows. Section uct type affect the perceived helpfulness of the review. While
II presents the literature survey and related works. Section this research does not incorporate the reviews in making
III discusses the problem description and the proposed ap- recommendations, it provides information that is potentially
proach. Section IV provides information about the dataset useful in developing recommender systems.
and how we conducted data preprocessing and analyzes the [29] proposed a new recommender system that integrates
outcomes of the experimental analysis. Finally, we conclude opinion mining and recommendations. They proposed a new
current research, its limitations, and future research direc- feature and opinion extraction method based on the character-
tions. istics of online reviews which can address the problem of data
sparseness. They used the part-of-speech tagging approach
II. RELATED WORK based on association rule mining for each review. They per-
Dealing with text as unstructured data is challenging. Natural formed their empirical study on online restaurant customer
Language Processing (NLP) is a branch of computer science reviews written in Chinese and illustrated the performance of
and artificial intelligence (AI) concerned with processing and the proposed methods.
analyzing natural language data. Deep learning for NLP is [15] considered the review texts using topic modeling
one of the approaches that is improving the capability of the techniques and align the topic with rating dimensions to
computer to understand human language [27]. There are a enhance the prediction accuracy. They proposed a unified
few studies that try to incorporate customer written reviews model combining content-based and collaborative filtering,
in generating recommendations. which can deal with the cold-start problem. They applied the
Some researches on the integration of customer reviews in proposed framework to 27 classes of real-case datasets and
recommendation systems are under the category of aspect- showed the significant improvement of the recommendations
based or aspect-aware recommender systems. As an aspect- comparing to the baselines methods.
based recommender system, [24] proposed a model called [16] tried to incorporate the implicit tastes of each user
aspect-based latent factor model which integrates ratings and in order to predict ratings as the text review justifies a user’s
review texts via latent factor model. The purpose of this rating. They used latent review topics extracted from topic
2 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2937518, IEEE Access

models as highly interpretable textual labels for latent rating methods [30]. While latent factor models try to uncover hid-
dimensions. Also, they accurately predicted product ratings den dimensions in review ratings, LDA aims to uncover hid-
using the information extracted from the reviews, which den dimensions in the written part of the review. Introduced
can improve the recommendations for those that have too by Blei et al. [31]. LDA is a generative statistical model
few ratings. Moreover, their discovered topics are useful in for topic modeling in the natural language processing (NLP)
facilitating tasks such as automated genre discovery. In a context. Topic modeling is the task of describing a collection
similar study, [17] exploit textual review information along of documents by identifying a set of topics. In LDA, we
with ratings to model user preferences and item attributes in model each item of the collection as a finite mixture over an
a shared topic space. They used an MF model for generating underlying set of topics as a three-level hierarchical Bayesian
recommendations and used 26 real-case datasets to evaluate model. We also model each topic as an infinite mixture
the performance of their model. over an underlying set of topic probabilities which provides
As presented above, none of the abovementioned studies an explicit representation of a document [23]. In order to
used a deep neural network autoencoder to deal with the spar- describe LDA, a set of documents d ∈ D and LDA associate
sity in the user-attributes matrix extracted from the reviews. each document with a K-dimensional stochastic vector as a
To the best of the authors’ knowledge, this is the first study topic distribution θd . This association encodes the fraction
that extracts deep features from extracted latent topics from of words in a document that discusses the topic k with the
the textual user reviews to develop a recommender system. probability of θd,k . LDA associate a word distribution, φk to
In the next section, we present the proposed approach. each topic to encode the probability of a word used for that
topic. LDA assumes a Dirichlet distribution for the topic (θd ).
III. METHODOLOGY As a result of applying LDA, we have word distribution of
In this section, we provide the proposed methodology for each topic and topic distribution for each document. Having
incorporating customer written reviews in developing rec- the word distribution and topic assignment of the words, we
ommender systems. Figure 1 depicts the general framework can calculate the likelihood of a corpus T as
for transforming customer written reviews into a dense users-
attributes matrix and predicting ratings using this matrix and Nd
YY
users-items matrix. As described before, the idea of how to p(T |θ, φ, z) = θzd,j ,wd,j (1)
use customer written reviews is investigating what attributes d∈T j=1

of the product category are mentioned in the customer’s


where z is topic assignments updated via sampling. This
review. In doing so, we need to match the review with a
likelihood is a product of the probability of the topic being
set of predefined product attributes. As Fig 1 demonstrates,
the document and the word being the topic [16].
we use Latent Dirichlet Allocation (LDA) to analyze the
reviews on a product category and retrieve a dictionary of LDA results in a vast number of words from the reviews.
attributes. Afterward, we can construct the users-attributes Inspired by [16], we filter the extracted words using frequent
matrix, which indicates what attributes the user has pointed itemsets using association rules to prune the set of words
out in his or her reviews in a binary format. The major LDA provides. Association rule mining uses two metrics,
challenge with this matrix is a well-known problem called including support and confidence where support is a measure
sparsity. Besides, there are other problems, including ambi- that shows if the itemset appears in the dataset frequently, and
guity and redundancy, regarding the extracted attributes in confidence shows how often a rule can be found.
the matrix. To deal with this problem, we propose a deep
neural network approach to transform this sparse matrix into B. DEEP NEURAL NETWORKS
a dense matrix presenting a set of deep features extracted Sparsity is a significant problem in the recommender sys-
from the users-attributes matrix and construct the users-deep tems, which significantly reduces the performance of the rat-
features matrix. We use this matrix and users-items matrix ing prediction. The problem of sparsity is sometimes called
to predict ratings and generate recommendations via Matrix gray sheep problem, which is peculiar to similarity-based
Factorization (MF) as a powerful and efficient collaborative collaborative recommendation systems. The problem arises
filtering method. In the following subsections, we present from the fact that users-attributes interaction will occur for
and describe DLA, deep neural network model, and the MF a tiny percentage of all possible interactions because the
method used in this research. user only mentions a tiny portion of all the attributes in the
written review [7] that makes some users not similar enough
A. LATENT DIRICHLET ALLOCATION to others to discover their preferences. Hence, the system
The basic latent factor model predicts ratings for a user and cannot retrieve proper recommendations. In this regard, many
item using user and item biases, K-dimensional user and investigators have focused on dealing with this problem to
item factors including the item’s properties and the user’s provide a solution that mitigates the effect of sparsity [32],
preferences minimizing the Mean Squared Error (MSE). [33]. Here, we propose a deep neural network approach
There are a variety of methods for optimizing MSE for this to deal with the sparsity in the users-attributes matrix and
problem, such as alternating least-squares and gradient-based transform it into a dense matrix. Here, we describe the details
VOLUME 4, 2016 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2937518, IEEE Access

FIGURE 1. General framework of the proposed approach of using customer reviews in recommendation systems

of the proposed deep neural networks to process the attributes is the input layer and output layer, while the decoder is the
extracted using LDA. hidden layer and the output layer. Figure 2 illustrates the
The reason for using sparse coding is to learn more inter- purpose of the autoencoder, which is reconstructing the input
pretable features for machine learning applications [34] and data in the output layer with the same dimensionality. In other
it helps at representing the input matrix as a weighted linear words, this follows an unsupervised learning framework.
combination of a small number of basis vectors. The resulted Letting x1 , x2 , ..., xm be an unlabeled dataset, we can
matrix is capable of capturing high-level patterns that exist obtain the nonlinear representation of the input data using
in the input layer. For instance, [35] developed a sparse activation function [36]. Using a sigmoid activation for an
autoencoder as the result of combined sparse coding with unlabeled dataset xi , the representation is
the autoencoder. They implemented their idea by penalizing h(xi ; W, b) = σ(W xi + b) (2)
the deviation between the expected hidden representation and
present average activation. In more relevant research, [8] de- where W denotes weight matrix, σ is the sigmoid activation
veloped an autoencoder using deep neural networks for tag- function, and b is the bias term. This representation is also
aware recommender systems. Through experimental results, called the hyperbolic tangent function. On the other side,
they demonstrated the usefulness of the sparse autoencoders the decoder reconstructs the input into the output layer by
for the recommendation algorithms. minimizing the error between the input and the output layers.
The minimizing term is defined in Equation (4.3).
Inspired by [8], an autoencoder constitutes an input layer, m
a hidden layer, and an output layer. We can divide the X
min ||σ(W T h(xi ; W, b) + c) − xi ||2 (3)
autoencoder itself into an encoder and decoder. The encoder
i=1
4 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2937518, IEEE Access

network, which serves as the sparse autoencoder, to extract


deep features from the set of retrieved attributes of a product
category.
Another advantage of using this approach is that the num-
ber of features in the final output layer can be less than the
number of attributes from the users-attributes matrix. Having
a lower dimensionality can speed-up the learning process
when the predictive model is dealing with a large dataset.
Besides, the model can potentially reduce the deficiencies
caused by ambiguity and redundancy in the set of attributes.
These characteristics of the deep neural network significantly
enhance the quality of the recommendation list for the users.

C. COLLABORATIVE FILTERING
The last step in generating recommendations is using the MF
model to predict ratings. First, we update the user profile
based on the users-deep features matrix obtained from the
deep neural network. Conventional collaborative filtering
models only use user-item or user-attributes matrix to gen-
erate a recommendation list. In order to use both users-items
and users-deep features information, we employ the approach
developed by Ricci et al. [38]. In this method, we should
FIGURE 2. Demonstration of a simple autoencoder find the target user neighborhood Nu based on the similarity
between the target user and other users using Equation (4.6).
X̂u , X̂v
where m denotes the number of examples. Since the min- simu,v = (6)
imization is a convex function, we can obtain the opti- ||X̂u ||||X̂v ||
mal value. We can calculate the sparsity penalty term by Having the similarity matrix between users, we can predict
Kullback-Leibler divergence between a preferred activation the rating of the target user using a weighted average of
ratio in the hidden layer and the desired hidden representa- ratings from the neighbor users using Equation (4.7).
tions [37] using Equation (4.4). X
Su,i = (πU I Y )v,i (7)
n
X v∈Nu
P = DKL (ρ||ρ̂j ) (4)
j=1 The final and easy step is to sort the predicted ratings for
items and generate the list of the recommendations according
in which ρ is a reset average activation that is set to be close to the size of the list, n. Please note that using this ap-
to zero in practice, and n is the number of hidden units. Also, proach, we are exploiting the ternary relation between users-
1−ρ
DKL (ρ||ρ̂j ) = ρ log ρ̂ρj + (1 − ρ) log 1−ρ̂j . attributes-items [8].
Combining the objective function and penalty term intro-
duced above, we obtain the final objective function for the IV. EXPERIMENTAL RESULTS
autoencoder using Equation (4.5). A. DATASET
m
X In this research, we use the Amazon Review dataset [39].
min ||σ(W T h(xi ; W, b) + c) − xi ||2 + βP (5) This dataset contains 142.8 million reviews on the Amazon
i=1 products between May 1996 and July 2014 along with users
where β is a hyperparameter to change the weight of the profile and item metadata. The dataset includes the ID of
penalty term. the reviewer, the ID of the product (ASIN), name of the
In order to transform this architecture into an autoencoder reviewer, helpfulness rating of the review, text of the review,
using a deep neural network, we need to use more layers as rating of the product, a summary of the review, and time of
hidden layers where the output of each layer is the input for the review. It also has the name of the product, price in US
the next layer. In other words, the procedure and calculations dollars, related products, sales rank information, brand name,
explained above will be followed for more than one time to and the list of the categories of the product. Table 1 presents
the result of each implementation. The input data will train the statistics of the Amazon Reviews dataset separated by
the first hidden layer, and the output layer of the first hidden each product category [15]. Also, Figure 3 demonstrates the
layer will serve as the input of the second hidden layer. We sparsity of the reviews in the dataset where the percentage
iterate these steps based on the number of hidden layers for a product category indicates the percentage of the users
considered for the autoencoder. We use this deep neural with no more than three ratings [17]. This sparsity can reduce
VOLUME 4, 2016 5

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2937518, IEEE Access

the performance of recommender systems drastically. On mance. During the performance evaluation, we performed
average, there is roughly an average of 120 words in each both on a product category to find the best results. Figures
review. 4, 5, and 6 demonstrate the MSE on two product datasets
used to tune the hyperparameters. As you can see, MSE is
not improving after using three hidden layers. MSE stops
improving significantly at 1000 and 800 neurons in the first
and second hidden layers, respectively.

FIGURE 4. Analysis of MSE based on the number of hidden layers


FIGURE 3. Sparsity of the Amazon Reviews Dataset

As the dataset preprocessing, there are many users without


having any written review. We removed these observations
from the dataset. For processing the data and implement-
ing the proposed methodology, we used Python 3. As the
cleaning up step, we removed punctuations and stop-words
using NTLK stopwords. Words that have appeared in the
review corpus of a product category only once are most likely
irrelevant; thus, we eliminated these words as well. Using
the rest of tokens, we construct our preliminary dictionary of
attributes. Note that we create a separate dictionary for each
product category.
For the training part, we selected 80% of the dataset for
training, 10% for validation, and 10% for testing, randomly.
Furthermore, we selected 25 topics and 40 words for each
topic when we applied LDA to each category review corpus.
FIGURE 5. Analysis of MSE based on the number of neurons in the first
Then, we used the association rule mining technique to hidden layers
extract frequent itemsets from unique words obtained after
the LDA step. Finally, we matched the reviews of each user
with the set of extracted words and constructed the users- B. BASELINE METHODS
attributes matrix. For the rest of the parameters required to We compare the performance of our model with three other
apply the deep neural network feature extractor and matrix state-of-the-art models, including MF, the Hidden Topics and
factorization method, the hyperparameters are as follows. Factors (HTF), and the Ratings Meet Review (RMR). The
• The number of hidden layers is 2/3. following is the explanation of these models.
• The number of neurons in the first layer is 1000. • MF is the standard and widely used matrix factorization
• The number of neurons in the second layer is 800. model. We consider the model proposed and described
• Average activation is 0.2. in [40]. This model uses the ratings of the user in
In order to obtain these values, we changed one hyperpa- generating recommendations, and the written part of the
rameter in a reasonable range to find a value that provides the customer’s feedback is not incorporated.
best performance while we fix other hyperparameters only • HTF is a model proposed by [16] that incorporates
on one of the product categories. For the number of hidden the review text with the stochastic topic distribution
layers, both two and three hidden layers show high perfor- modeling which can be applied either on users or items.
6 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2937518, IEEE Access

TABLE 1. General statistics of the Amazon Review dataset

various classes of datasets. They tried to address the


cold-start problem with collapsed Gibbs sampler for
learning the model parameters.
For each product category, we report the performance of each
model against our model. We consider Mean Squared Error
(MSE) for evaluation of these models against the proposed
approach.

C. EVALUATION
We applied the proposed deep feature extractor method to all
the product categories datasets and obtained the best MSE
for our model and compared these results with our baselines.
Figure 7 and Table 2 demonstrate the results.
As you can see in these figures, the proposed method per-
FIGURE 6. Analysis of MSE based on the number of neurons in the second forms better for most of the product categories. Comparing
hidden layers to the MF model, our method is capable of predicting ratings
with an average of 8.71% improvement, in some cases up
to 20.19%. In three cases, MF shows better performance,
It also employs matrix factorization to deal with the including Books, Movies and TV, and Music. For Books
ratings. and Music categories, the model is off only by less than 1
• RMR is a hybrid model constituted of content-based percent, which implies that the performance of the model
filtering and collaborative filtering suggested by [15]. is close to the best performance for these two categories.
This model tries to exploit the information from reviews A similar situation is happening between the HFT model
and improve the recommendation list accuracy across and the proposed approach. Our deep neural network model
VOLUME 4, 2016 7

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2937518, IEEE Access

TABLE 2. MSE of the proposed method vs baselines and the percentage of the improvement

model outperforms the RMR method by 2.06% on average.


Figs 8, 9, and 10 illustrate the improvements made by our
model compared to MF, HFT, and RMR baseline models,
respectively.

FIGURE 7. Comparing the MSE from the proposed method and the baselines

beats the HFT model predictions for most of the cases. On


average, our model improves the predictions by 3.14%. For
the only two cases that our model is not performing better,
the performance is close enough. In the worst case, which is
the Movies and TV product category, MF and HFT models FIGURE 8. Proposed model improvement compared to MF
performed only 1.87% better than our model. Finally, our
8 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2937518, IEEE Access

systems. In our proposed model, we use Latent Dirichlet


Allocation to extract attributes related to each product cat-
egory. Then, we used association rule mining to use frequent
terms in the dataset. Having the set of extracted attributes,
we constructed a users-attributes matrix. This matrix suffers
from a sparsity. To deal with this challenge, we proposed
a deep neural network solution that transforms the sparse
users-attributes matrix into a dense users-deep features ma-
trix, as an unsupervised learning tool. Finally, we used matrix
factorization to predict ratings. We evaluated the performance
of our model using the Amazon Review dataset, which is
the largest dataset for customer reviews categorized for each
product category. We also compared the MSE of our model
with three baseline models from the literature, including MF,
HFT, and RMR models. Our model outperforms these state-
FIGURE 9. Proposed model improvement compared to HFT of-the-art models for most datasets.
For the future research directions, we are going to apply
a deep neural network as the predicting model instead of the
deep neural encoder and the matrix factorization method to
improve the predictive power of our approach. Besides, we
will investigate the application of other natural language pro-
cessing tools for the construction of users-attributes matrix
and compare their performance with current research.

REFERENCES
[1] M. Darabi and N. Tabrizi, “An ontology-based framework to model
user interests,” in Computational Science and Computational Intelligence
(CSCI), 2016 International Conference on. IEEE, 2016, pp. 398–403.
[2] H. Yamaba, M. Tanoue, K. Takatsuka, N. Okazaki, and S. Tomita, “On a
serendipity-oriented recommender system based on folksonomy,” Artifi-
cial Life and Robotics, vol. 18, no. 1-2, pp. 89–94, 2013.
[3] Z.-K. Zhang, T. Zhou, and Y.-C. Zhang, “Tag-aware recommender sys-
tems: a state-of-the-art survey,” Journal of computer science and technol-
ogy, vol. 26, no. 5, p. 767, 2011.
FIGURE 10. Proposed model improvement compared to RMR [4] B. Horsburgh, S. Craw, and S. Massie, “Learning pseudo-tags to augment
sparse tagging in hybrid music recommender systems,” Artificial Intelli-
gence, vol. 219, pp. 25–39, 2015.
[5] E. Baralis, L. Cagliero, T. Cerquitelli, S. Chiusano, P. Garza, D. Antonelli,
We investigated the product categories that our model is G. Bruno, and N. A. Mahoto, “Personalized tag recommendation based on
less accurate comparing the baselines. We suggest that the generalized rules,” 2018.
reason for this minor inaccuracy is the existence of more [6] A. K. Sahu, P. Dwivedi, and V. Kant, “Tags and item features as a bridge
for cross-domain recommender systems,” Procedia Computer Science,
non-technical terms than technical attributes in the users- vol. 125, pp. 624 – 631, 2018, the 6th International Conference
attributes matrix. For example, in the category of Indus- on Smart Computing and Communications. [Online]. Available:
trial Scientific, we have a significant improvement between http://www.sciencedirect.com/science/article/pii/S187705091732848X
[7] H. Kim and H.-J. Kim, “A framework for tag-aware recommender sys-
the MF model and the proposed method equal to 20.19%. tems,” Expert Systems with Applications, vol. 41, no. 8, pp. 4000–4009,
Moreover, the performance improvement is higher for other 2014.
categories that customers talk more about product attributes [8] Y. Zuo, J. Zeng, M. Gong, and L. Jiao, “Tag-aware recommender systems
based on deep neural networks,” Neurocomputing, vol. 204, pp. 51–60,
such as Tools and Clothing. The superiority of the proposed 2016.
model is the fact that the deep neural feature extractor re- [9] H.-N. Kim, A. Alkhaldi, A. El Saddik, and G.-S. Jo, “Collaborative
trieves the deep features and models the extracted words in a user modeling with user-generated tags for social recommender systems,”
Expert Systems with Applications, vol. 38, no. 7, pp. 8488–8496, 2011.
way that makes the users-attributes more informative, hence,
[10] Y. Shi, M. Larson, and A. Hanjalic, “Mining contextual movie similar-
extracting non-trivial relation between users based on the ity with matrix factorization for context-aware recommendation,” ACM
reviews they write. Our model can benefit e-commerce busi- Transactions on Intelligent Systems and Technology (TIST), vol. 4, no. 1,
nesses through increasing revenue and customer satisfaction p. 16, 2013.
[11] H. Zhang, Z.-J. Zha, Y. Yang, S. Yan, Y. Gao, and T.-S. Chua, “Attribute-
as recommendation plays a crucial role in real systems. augmented semantic hierarchy: Towards a unified framework for content-
based image retrieval,” ACM Transactions on Multimedia Computing,
V. CONCLUSIONS Communications, and Applications (TOMM), vol. 11, no. 1s, p. 21, 2014.
[12] R. He and J. McAuley, “Vbpr: visual bayesian personalized ranking from
In this paper, we proposed a deep neural network approach implicit feedback,” in Thirtieth AAAI Conference on Artificial Intelli-
to incorporate customer reviews in developing recommender gence, 2016.

VOLUME 4, 2016 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2937518, IEEE Access

[13] H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King, “Recommender systems [35] Q. V. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, and A. Y.
with social regularization,” in Proceedings of the fourth ACM international Ng, “On optimization methods for deep learning,” in Proceedings of the
conference on Web search and data mining. ACM, 2011, pp. 287–296. 28th International Conference on International Conference on Machine
[14] X. Wang, X. He, L. Nie, and T.-S. Chua, “Item silk road: Recommending Learning. Omnipress, 2011, pp. 265–272.
items from information domains to social users,” in Proceedings of the [36] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A
40th International ACM SIGIR conference on Research and Development review and new perspectives,” IEEE transactions on pattern analysis and
in Information Retrieval. ACM, 2017, pp. 185–194. machine intelligence, vol. 35, no. 8, pp. 1798–1828, 2013.
[15] G. Ling, M. R. Lyu, and I. King, “Ratings meet reviews, a combined [37] G. E. Hinton, “A practical guide to training restricted boltzmann ma-
approach to recommend,” in Proceedings of the 8th ACM Conference on chines,” in Neural networks: Tricks of the trade. Springer, 2012, pp.
Recommender systems. ACM, 2014, pp. 105–112. 599–619.
[16] J. McAuley and J. Leskovec, “Hidden factors and hidden topics: under- [38] F. Ricci, L. Rokach, and B. Shapira, “Introduction to recommender sys-
standing rating dimensions with review text,” in Proceedings of the 7th tems handbook,” in Recommender systems handbook. Springer, 2011,
ACM conference on Recommender systems. ACM, 2013, pp. 165–172. pp. 1–35.
[17] Y. Tan, M. Zhang, Y. Liu, and S. Ma, “Rating-boosted latent topics: [39] R. He and J. McAuley, “Ups and downs: Modeling the visual evolution of
Understanding users and items with ratings and reviews.” in IJCAI, vol. 16, fashion trends with one-class collaborative filtering,” in proceedings of the
2016, pp. 2640–2646. 25th international conference on world wide web. International World
[18] C. Wang and D. M. Blei, “Collaborative topic modeling for recommending Wide Web Conferences Steering Committee, 2016, pp. 507–517.
scientific articles,” in Proceedings of the 17th ACM SIGKDD international [40] A. Mnih and R. R. Salakhutdinov, “Probabilistic matrix factorization,” in
conference on Knowledge discovery and data mining. ACM, 2011, pp. Advances in neural information processing systems, 2008, pp. 1257–1264.
448–456.
[19] L. Zhang, Y. Zhou, X. Duan, and R. Chen, “A hierarchical multi-input and
output bi-gru model for sentiment analysis on customer reviews,” in IOP
Conference Series: Materials Science and Engineering, vol. 322, no. 6.
IOP Publishing, 2018, p. 062007.
[20] A. Bagheri, M. Saraee, and F. De Jong, “Care more about customers:
Unsupervised domain-independent aspect detection for sentiment analysis
of customer reviews,” Knowledge-Based Systems, vol. 52, pp. 201–213,
2013.
[21] Q. Sun, J. Niu, Z. Yao, and H. Yan, “Exploring ewom in online customer
reviews: Sentiment analysis at a fine-grained level,” Engineering Applica-
tions of Artificial Intelligence, vol. 81, pp. 68–78, 2019.
[22] P. K. Sari, A. Alamsyah, and S. Wibowo, “Measuring e-commerce service
quality from online customer review using sentiment analysis,” in Journal
of Physics: Conference Series, vol. 971, no. 1. IOP Publishing, 2018, p.
012053.
[23] P. P. Ładyżyński and P. Grzegorzewski, “A recommender system based
on customer reviews mining,” in International Conference on Artificial
Intelligence and Soft Computing. Springer, 2014, pp. 512–523.
[24] L. Qiu, S. Gao, W. Cheng, and J. Guo, “Aspect-based latent factor model
by integrating ratings and reviews for recommender system,” Knowledge-
Based Systems, vol. 110, pp. 233–243, 2016.
[25] Z. Cheng, Y. Ding, L. Zhu, and M. Kankanhalli, “Aspect-aware latent fac-
tor model: Rating prediction with ratings and reviews,” in Proceedings of
the 2018 World Wide Web Conference on World Wide Web. International
World Wide Web Conferences Steering Committee, 2018, pp. 639–648.
[26] K. Bauman, B. Liu, and A. Tuzhilin, “Aspect based recommendations:
Recommending items with the most valuable aspects based on user re-
views,” in Proceedings of the 23rd ACM SIGKDD International Confer-
ence on Knowledge Discovery and Data Mining. ACM, 2017, pp. 717–
725.
[27] E. Fathi and B. M. Shoja, “Deep neural networks for natural language
processing,” Computational Analysis and Understanding of Natural Lan-
guages: Principles, Methods and Applications, vol. 38, p. 229, 2018.
[28] M. M. Susan and S. David, “What makes a helpful online review? a study
of customer reviews on amazon. com,” MIS Quarterly, vol. 34, no. 1, pp.
185–200, 2010.
[29] H. Liu, J. He, T. Wang, W. Song, and X. Du, “Combining user preferences
and user opinions for accurate recommendation,” Electronic Commerce
Research and Applications, vol. 12, no. 1, pp. 14–23, 2013.
[30] S.-M. Kim and E. Hovy, “Determining the sentiment of opinions,” in
Proceedings of the 20th international conference on Computational Lin-
guistics. Association for Computational Linguistics, 2004, p. 1367.
[31] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,”
Journal of machine Learning research, vol. 3, no. Jan, pp. 993–1022, 2003.
[32] Z. Zhang, D. D. Zeng, A. Abbasi, J. Peng, and X. Zheng, “A random
walk model for item recommendation in social tagging systems,” ACM
Transactions on Management Information Systems (TMIS), vol. 4, no. 2,
p. 8, 2013.
[33] M. A. Ghazanfar and A. Prügel-Bennett, “Leveraging clustering ap-
proaches to solve the gray-sheep users problem in recommender systems,”
Expert Systems with Applications, vol. 41, no. 7, pp. 3261–3275, 2014.
[34] H. Lee, A. Battle, R. Raina, and A. Y. Ng, “Efficient sparse coding
algorithms,” in Advances in neural information processing systems, 2007,
pp. 801–808.

10 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.

You might also like