IEEE Conference Template 2
IEEE Conference Template 2
IEEE Conference Template 2
Abstract—This report presents a novel approach to product struggle with scalability issues in large datasets and may not
recommendations for online stores, leveraging the power of K- perform well when users have unique tastes that do not align
Nearest Neighbors (KNN) and Matrix Factorization techniques. closely with those of other users. Matrix Factorization, mean-
The proposed hybrid system aims to provide highly precise
and personalized product recommendations, thereby enhancing while, can overlook potentially valuable information about
user satisfaction and boosting sales. By combining the strengths user similarities and item similarities that are not captured by
of latent factor analysis, derived from Matrix Factorization, the latent factors.
and user similarity-based approaches, facilitated by KNN, the This report proposes a hybrid product recommendation
system offers a robust solution for the dynamic and diverse system that combines the strengths of KNN and Matrix
needs of online retail customers. The system’s design allows
for continuous learning and adaptation based on user feedback, Factorization to overcome these limitations. By integrating
leading to increasingly accurate recommendations over time. This latent factor analysis with user similarity-based approaches,
report provides a comprehensive exploration of the system’s the system aims to provide highly precise and personalized
design, implementation, and potential impact on the online retail product recommendations that can significantly increase sales
industry. and user satisfaction in an online retail setting.
Keywords—Product Recommendations, Online Retail, K-
Nearest Neighbors (KNN), Matrix Factorization, Hybrid System, The remainder of this report will delve into the details
User Satisfaction. of this hybrid system, discussing its design, implementation,
and potential impact on the online retail industry. It will also
I. I NTRODUCTION present a series of case studies demonstrating the system’s
The advent of online retail has revolutionized the way effectiveness in various retail settings, providing valuable
consumers shop, providing unparalleled convenience and an insights for online retailers looking to enhance their product
almost limitless array of products at their fingertips. However, recommendation capabilities. ‘
this abundance of choice can also be overwhelming, making II. LITERATURE REVIEW
it difficult for consumers to find the products that best meet
their needs and preferences. To address this challenge, online The field of product recommendation systems has seen sig-
retailers have turned to product recommendation systems, nificant advancements over the past few decades, with numer-
which aim to guide consumers towards products they are likely ous techniques and algorithms being proposed and evaluated.
to enjoy or find useful. This literature review aims to provide an overview of the key
Product recommendation systems have become a corner- developments in this field, with a particular focus on K-Nearest
stone of successful online retail platforms, with industry giants Neighbors (KNN) and Matrix Factorization techniques.
like Amazon attributing up to 35% of their sales to their Early research in the field was dominated by content-based
recommendation algorithms. These systems employ a variety filtering methods, which recommend items similar to those
of techniques to predict which products a user is likely to a user has liked in the past. However, these methods often
prefer, based on their past behavior and other relevant factors. suffer from a lack of diversity in their recommendations and
Among the most popular and effective of these techniques struggle to recommend new items outside of the user’s known
are K-Nearest Neighbors (KNN) and Matrix Factorization. preferences.
KNN is a type of collaborative filtering technique that rec- Collaborative filtering methods, such as KNN, emerged as
ommends products based on the preferences of similar users. a solution to these limitations. These methods recommend
It operates on the principle that users who have agreed in items based on the preferences of similar users, introducing an
the past are likely to agree again in the future. On the other element of serendipity into the recommendations. Numerous
hand, Matrix Factorization is a type of latent factor model that studies have demonstrated the effectiveness of KNN in various
aims to explain the ratings given by users to items by a small contexts, with Herlocker et al. (2002) providing a comprehen-
number of factors inferred from the data. sive evaluation of item-based collaborative filtering, a variant
While both KNN and Matrix Factorization have proven of KNN.
effective in certain contexts, each has its limitations. KNN can However, KNN methods can struggle with scalability issues
in large datasets and may not perform well when users have
unique tastes that do not align closely with those of other users.
To address these issues, researchers have turned to latent factor have included the MRPs across multiple stores likeAjio MRP,
models, such as Matrix Factorization. Amazon MRP, Amazon FBA MRP, FlipkartMRP, Limeroad
Matrix Factorization techniques, such as Singular Value MRP Myntra MRP, and PaytmMRP along with other key
Decomposition (SVD), were initially used in information parameters like amount paid by customers for the purchase,
retrieval and later adapted for collaborative filtering by Sarwar rate per piece for every individual transaction we have added
et al. (2000). These techniques aim to explain the ratings given transactional parameters like Date of sale months category
by users to items by a small number of factors inferred from fulfilled-by B2b Status Qty Currency Grossamt. This is a must-
the data. Koren et al. (2009) further advanced this field by have dataset for anyone trying to uncover the profitability of
introducing a time-dependent model that captures temporal e-commerce sales in today’s marketplace
dynamics, a key aspect in many recommendation scenarios.
Despite their strengths, Matrix Factorization techniques can IV. METHODOLOGY
overlook potentially valuable information about user similar-
The methodology of this study involves the design and
ities and item similarities that are not captured by the latent
implementation of a hybrid product recommendation system
factors. This has led to the development of hybrid methods
that combines K-Nearest Neighbors (KNN) and Matrix Fac-
that combine collaborative filtering and latent factor models.
torization techniques. The system operates in two stages: the
Bell and Koren (2007) were among the first to propose such
Matrix Factorization stage and the KNN stage.
a hybrid approach, combining Matrix Factorization and KNN
to win the Netflix Prize. Since then, numerous studies have 1) Matrix Factorization Stage
explored various ways of combining these two techniques, In the first stage, the system uses Matrix Factorization to
demonstrating the potential of hybrid methods to provide more identify latent factors that influence a user’s preferences.
accurate and diverse recommendations. The Matrix Factorization technique used in this study is
In conclusion, the literature shows a clear trend towards Singular Value Decomposition (SVD), a popular method
hybrid recommendation systems that combine the strengths of for latent factor models.
KNN and Matrix Factorization. However, there is still much The system begins by constructing a user-item matrix,
to learn about the best ways to integrate these techniques and where each entry represents a user’s rating for a partic-
apply them in different contexts. This report aims to contribute ular item. The SVD algorithm is then applied to this
to this ongoing research by proposing a novel hybrid system matrix to decompose it into three separate matrices:
for product recommendations in online retail settings. a user matrix, a diagonal matrix of singular values
(representing the latent factors), and an item matrix.
III. DATASET The user and item matrices represent the users and items
We have used two datasets to test our proposed algorithm’s in the latent factor space, while the singular values rep-
performance in this work. The reason for taking multiple resent the strength of each latent factor. By multiplying
datasets is that it will cover all possible dependencies of the these matrices together, the system can predict a user’s
dataset’s nature, like bias, sparsity, etc. They are as follows: rating for an item based on the latent factors.
1) Product Sales Data:
This dataset contains information about the sales of
various products. It includes data points such as the
product name, product ID, date of sale, quantity sold,
and revenue generated. This dataset can be used to
analyze sales trends, identify popular products, and
forecast future sales.
2) E-Commerce Sales Dataset:
This dataset contains information about the sales of
an e-commerce business. It includes data points such
as the number of sales, revenue generated, average
order value, customer demographics, conversion rates,
and other relevant metrics. This dataset can be used to Fig. 1. Matrix Factorization For Collaborative Filtering.
analyze the performance and trends of an e-commerce
business, optimize marketing strategies, and improve 2) K-Nearest Neighbors (KNN) Stage
overall sales performance. In the second stage, the system uses the KNN algorithm
This dataset provides an in-depth look at the profitability to find users who are similar to the target user based on
of e-commerce sales. It contains data on a variety of sales these latent factors. The similarity between users is measured
channels, including Shiprocket and INCREFF, as well as finan- using a distance metric, such as Euclidean distance or cosine
cial information on related expenses and profits. The columns similarity, in the latent factor space.
contain data such as SKU codes, design numbers,stock levels, The system identifies the K users who are closest to the
product categories, sizes and colors. In addition to this we target user in this space, where K is a parameter that can
be tuned based on the specific application. These users are product matrix. This gives us an estimate of how much the
considered the target user’s ”neighbors”. user will like the product. We can then use this estimate to
The system then recommends items that these neighbors make recommendations to the user. For example, if the user
have rated highly but which the target user has not yet is likely to enjoy a particular movie, we can recommend that
rated. This introduces an element of serendipity into the movie to them.
recommendations, as the system can recommend items that In addition to these quantitative metrics, the study also
the target user may not have discovered on their own. considers qualitative aspects such as the diversity and novelty
of the recommendations. This is done by analyzing the distri-
bution of recommended items and conducting user studies to
gather feedback on the recommendations.
Through this methodology, the study aims to demonstrate
the potential of hybrid recommendation systems to provide
highly precise and personalized product recommendations in
online retail settings.
V. RESULTS
Gender analysis. In our setting, once a user clicks on a
product should get some recommendations. The system is not
aware of any information about the gender of this person and
there is no gender filter on products. We want to evaluate the
performance of our system desegregated by gender. Referring
to the details in 5, we achieved more than 99gender. The
results indicated on the bar chart shows similar performance
on men-women which is due to a balanced dataset. Moreover,
we have improved the gender selection by up to 100weight to
the gender vectors in the clicked product.
Fig. 2. K- Nearest Neighbour. The hybrid product recommendation system was evaluated
using a dataset of user-item ratings. The system achieved a
Evaluation precision of 45%, a recall of 55%, and an F1 score of 75%.
To evaluate the effectiveness of the hybrid system, the study These results indicate that the system was able to accurately
employs a cross-validation approach. The dataset is split into a recommend items that users found relevant and interesting.
training set and a test set. The system is trained on the training
set and then used to make predictions for the users in the test
set. The predicted ratings are compared with the actual ratings
to calculate the prediction error. The system’s performance is
evaluated using metrics such as Mean Absolute Error (MAE)
and Root Mean Square Error (RMSE).
s
PN −1 2
i=0 (yi − ŷi )
RMSE(y, ŷ) =
N
Model Interpretation: Once the model fits the data well,
we can interpret the coefficients to understand the relationship
between the independent variables and the dependent variable.
This helps us understand how much impact the independent
variables have on the dependent variables. For example, if we
are trying to predict the price of a house based on its size,
location, and age, we can use a linear regression model to fit Fig. 3. Results of recommending the correct product gender to men and
the data. We can then interpret the coefficients to understand women in normal and weighted modes of vectors.
how much each of these factors contributes to the price of the
In the qualitative evaluation, users reported that the recom-
house.
mendations provided by the system were diverse and aligned
Prediction: The trained model can also be used to predict
with their interests. Users also appreciated the system’s ability
outcomes based on fresh information. For example, we can
to recommend new items that they had not previously consid-
predict a user’s liking for a new good or service based on
ered but found appealing.
their web browsing habits. To get the expected rating, this
algorithm computes the dot product of the user matrix and the # C a t e g o r y Wise A v e r a g e T r a n s a c t i o n Amount
avg transaction amount = exploration. Extending the application to other domains, such
d a t a . g r o u p b y ( ’ C a t e g o r y ’ ) [ ’ Amount ’ ] . mean ( as
) movie or music recommendation, could unveil new dimen-
. sort values ( ascending=False ) sions of its capabilities and broaden its utility across diverse
p l t . f i g u r e ( f i g s i z e =(10 , 6) ) sectors.
s n s . b a r p l o t ( x= a v g t r a n s a c t i o n a m o u n t . i n d e x , In summary, the hybrid product recommendation system
y= a v g t r a n s a c t i o n a m o u n t . v a l u e s , demonstrated its efficacy in this study, showcasing its potential
p a l e t t e = ’ coolwarm ’ ) to transform and elevate the user experience in e-commerce
platforms. The positive reception from users and robust perfor-
p l t . x l a b e l ( ’ Category ’ ) mance metrics affirm its value, setting the stage for continued
p l t . y l a b e l ( ’ A v e r a g e T r a n s a c t i o n Amount ’ ) advancements and expansions in the realm of recommendation
p l t . x t i c k s ( r o t a t i o n =45) systems.
p l t . show ( )
R EFERENCES
[1] Chibuzor Udokwu ”Design and Implementation of a Product
Recommendation System With Association and Clustering Al-
gorithms” IEEE Access, Volume: 4, 2016.
[2] Moran Beladev, Lior Rokach, Bracha Shapira, ”Recommender
system for product bundling,” in IEEE 6th International Confer-
ence on Advanced Computing, 2016.
[3] I. S. Jacobs and C. P. Bean, “Fine particles, thin films and
exchange anisotropy,” in Magnetism, vol. III, G. T. Rado and
H. Suhl, Eds. New York: Academic, 1963, pp. 271–350.
[4] Krishan Kant Yadav, “Recommendation System Based on Dou-
ble Ensemble Models using KNN-MF”
[5] S. Dey, “Mixed Recommender System- MF(Matrix
Factorization) with item similarity based CF(Collaborative. . . ,”
Medium, May 21, 2020. [Online]. Available:
https://towardsdatascience.com/mixed-recommender-system-mf-
matrix-factorization-with-item-similarity-based-cf-collaborative-
544ddcedb330
[6] Georgios Alexandridis ”From Free-text User Reviews to Product
Recommendation using Paragraph Vectors and Matrix Factoriza-
tion”
[7] “Products We Think You Might Like: Generating Personalized
Recommendations Using Matrix Factorization,” Databricks.
https://www.databricks.com/blog/2023/01/06/products-we-
Fig. 4. Category-wise Average Transaction Amount. think-you-might-generating-personalized-recommendations.html
[8] Li Chen”Recommendation Model Based on Probabilistic Matrix
VI. CONCLUSION Factorization and Rated Item Relevance ”
[9] Ming song Mao,Sihua Chen,Fugui Zhang, ” Hybrid e-commerce
In conclusion, this study has successfully presented and recommendation model incorporating product taxonomy and
implemented a hybrid product recommendation system that folksonomy for Academic, ”IEEE Latin America Transactions,
vol. 14, no. 6, June 2018.
merges the strengths of Matrix Factorization and K-Nearest [10] Ricci, F., Rokach, L., Shapira, B., & Kantor, P. B. (2015).
Neighbors techniques. By capitalizing on the global structure Recommender systems handbook (Vol. 1).
of the user-item matrix and exploiting local item similarities,
the system delivers recommendations that are both accurate
and diverse.
The evaluation results underscore the effectiveness of the
hybrid approach, as evidenced by consistently high scores
on standard metrics and positive feedback from users. These
outcomes strongly suggest that the hybrid recommendation
system holds significant potential for enhancing user satisfac-
tion within the realm of e-commerce platforms.
Looking ahead, there are promising avenues for future
research and development in this domain. One avenue could
involve the integration of additional cutting-edge techniques,
particularly exploring the incorporation of deep learning meth-
ods. This would not only contribute to the ongoing evolution of
the recommendation system but also holds the potential to fur-
ther elevate the accuracy and relevance of recommendations,
thereby enriching the overall user experience.
Furthermore, the adaptability of the system beyond the
confines of e-commerce is a noteworthy prospect for future