A Movie Recommender System: MOVREC: Manoj Kumar D.K. Yadav Ankur Singh Vijay Kr. Gupta

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

International Journal of Computer Applications (0975 – 8887)

Volume 124 – No.3, August 2015

A Movie Recommender System: MOVREC

Manoj Kumar D.K. Yadav Ankur Singh Vijay Kr. Gupta


Assistant Professor Associate Professor M.Tech (P) Assistant Professor
Department of Department of Software Department of
Information and Computer Science Engineering Computer Science
Technology MNNIT Allahabad BBDU Lucknow BBDNITM Lucknow
BBDNITM Lucknow Allahabad India

ABSTRACT 2/3rd of the movies watched are


Now a day’s recommendation system has changed the style of Netflix
recommended
searching the things of our interest. This is information
filtering approach that is used to predict the preference of that
user. recommendations generate 38% more
Google News
click-troughs
The most popular areas where recommender system is applied
are books, news, articles, music, videos, movies etc. In this
paper we have proposed a movie recommendation system Amazon 35% sales from recommendations
named MOVREC. It is based on collaborative filtering
approach that makes use of the information provided by users,
analyzes them and then recommends the movies that is best 28% of the people would buy more
Choicestream
suited to the user at that time. The recommended movie list is music if they found what they liked
sorted according to the ratings given to these movies by
previous users and it uses K-means algorithm for this purpose. Table1. Companies benefit through recommendation
MOVREC also help users to find the movies of their choices system
based on the movie experience of other users in efficient and
effective manner without wasting much time in useless Recommender Systems generate recommendations; the user
browsing. This system has been developed in PHP using may accept them according to their choice and may also
Dreamweaver 6.0 and Apache Server 2.0. The presented provide, immediately or at a next stage, an implicit or explicit
recommender system generates recommendations using feedback. The actions of the users and their feedbacks can be
various types of knowledge and data about users, the available stored in the recommender database and may be used for
items, and previous transactions stored in customized generating new recommendations in the next user-system
databases. The user can then browse the recommendations interactions. The economic potential of theses recommender
easily and find a movie of their choice. systems have led some of the biggest e-commerce websites
(like Amazon.com, snapdeal.com) and the online movie rental
Keywords company Netflix to make these systems a salient part of their
K-means, recommendation system, recommender system, data websites. High quality personalized recommendations add
mining, clustering, movies, Collaborative filtering, Content- another dimension to user experience. The web personalized
based filtering recommendation systems are recently applied to provide
different types of customized information to their respective
users. These systems can be applied in various types of
1. INTRODUCTION applications and are very common now a day.
In today’s world where internet has become an important part
of human life, users often face the problem of too much We can classify the recommender systems in two broad
choice. Right from looking for a motel to looking for good categories:
investment options, there is too much information available.
1. Collaborative filtering approach
To help the users cope with this information explosion,
companies have deployed recommendation systems to guide 2. Content-based filtering approach
their users. The research in the area of recommendation
systems has been going on for several decades now, but the 1.1 Collaborative filtering
interest still remains high because of the abundance of Collaborative filtering system recommends items based on
practical applications and the problem rich domain. A number similarity measures between users and/or items. The system
of such online recommendation systems implemented and recommends those items that are preferred by similar kind of
used are the recommendation system for books at users. Collaborative filtering has many advantages
Amazon.com , for movies at MovieLens.org, CDs at 1. It is content-independent i.e. it relies on connections
CDNow.com (from Amazon.com), etc. only
Recommender Systems have added to the economy of the 2. Since in CF people makes explicit ratings so real
some of the e-commerce websites (like Amazon.com) and quality assessment of items are done.
Netflix which have made these systems a salient parts of their
websites. A glimpse of the profit of some websites is shown in 3. It provides serendipitous recommendations because
table below: recommendations are base on user’s similarity
rather than item’s similarity.

7
International Journal of Computer Applications (0975 – 8887)
Volume 124 – No.3, August 2015

1.2 Content-based filtering


Content-based filtering is based on the profile of the user’s
preference and the item’s description. In CBF to describe
items we use keywords apart from user’s profile to indicate (1)
user’s preferred liked or dislikes. In other words CBF Where, k is the number of clusters, n is the number of
algorithms recommend those items or similar to those items
that were liked in the past. It examines previously rated items
cases is a chosen distance measure between a
and recommends best matching item.
There are various approaches proposed in various research data point and the cluster centre , is an indicator of
papers listed below. These approaches are often combined the distance of the n data points from their respective cluster
in Hybrid Recommender Systems. An earlier study by centers.
Eyjolfsdottir et. al for the recommendation of movies through
MOVIEGEN had certain drawbacks such as , it asks a series The algorithm is composed of the following steps:
of questions to users which was time taking . On the other
hand it was not user friendly for the fact that it proved to be
stressful to a certain extent. Keeping in mind these 1. Select K points as initial centroids.
shortcomings, we have developed MovieREC, a movie 2. Repeat
recommendation system that recommends movies to users 3. From k clusters by assigning each point to its
based on the information provided by the users themselves. In closest centroid.
the present study, a user is given the option to select his 4. Re-compute the centroid of each cluster.
choices from a set of attributes which include actor, director, 5. Until Centroid do not change.
genre, year and rating etc. We predict the users choices based
on the choices of the previous visited history of users. The
system has been developed in PHP and currently uses a Figure 1. K-means Algorithm
simple console based interface.

2. RELATED WORK
Many recommendation systems have been developed over the
past decades. These systems use different approaches like
collaborative approach, content based approach, a utility base
approach, hybrid approach etc.
Looking at the purchase behavior and history of the shoppers,
Lawrence et al. 2001 presented a recommender system which
suggests the new product in the market. To refine the
recommendation collaborative and content based filtering
approach were used. To find the potential customers most of
the recommendation systems today use ratings given by
previous users. These ratings are further used to predict and
recommend the item of one’s choice.
In 2007 Weng, Lin and Chen performed an evaluation study Figure 2: The 5 steps of the K-Means algorithm [3]
which says using multidimensional analysis and additional
customer’s profile increases the recommendation quality. 3.2 Data Description
Weng used MD recommendation model (multidimensional
recommendation model) for this purpose. multidimensional In proposed model we use a pre filter before applying K-
recommendation model was proposed by Tuzhilin and means algorithm. The attributes used to calculate distance of
Adomavicius (2001). each point from centroid are

3. RESEARCH METHODOLOGY 1. Genre


2. Actor
3.1 The Basic K-means Algorithm 3. Director
4. Year
The original K-means algorithm was proposed by MacQueen 5. Rating
[20] .The ISODATA algorithm by Ball and Hall[22] was an
early but sophisticated version of k-means. Clustering divides Different attributes have different weights. In our research we
the objects into meaningful groups. Clustering is unsupervised have found that the most appropriate recommendations that
learning. Document clustering is automatic document can be generated should be based on the ratings given to the
organization. movies by previous users, therefore we have given more
In K-means clustering technique we choose K initial importance to the rating attribute than other attributes. These
centroids, where K is the desired number of clusters. Each ratings have been taken from www.imdb.com because
point is then assigned to the cluster with nearest mean i.e. the perhaps it has the largest collection of movies along with the
centroid of the cluster. Then we update the centroid of each rating given to these movies by a large number of different
cluster based on the points that are assigned to the cluster. We users from different parts of the world. Another important
repeat the process until there is no change in the cluster center parameter in our proposed model is total number of votes
(centroid). Finally, this algorithm aims at minimizing received by a particular movie. We have divided number of
an objective function, in this case a squared error function. votes in to three categories that is less than or equal to 1000,
The objective function more than 1000 but less than or equal to 10,000 and greater
than 10,000.

8
International Journal of Computer Applications (0975 – 8887)
Volume 124 – No.3, August 2015

In our research we have found that as the number of vote’s 2. Director (Wd)
increases the weight of rating should also increase
respectively. Therefore we have used ratios of 1:1, 1:2, and Wd= No. of movies of Director(d) in data set
1:3 depending on total number of votes received by a movie. Total no. of movies in data set
we have also found that the movies which have rating less
than 5 are the ones which are least suitable for 3. Rating
recommendation, and are least desirable by users. Users
generally want to see a good movie and higher rating ensures Weight
that our predicted movie set are of those movies which are Rat If number of
ing If number of If number of
liked by a large number of users. Weights assigned to other 1000< votes
votes <=1000 votes >10000
attributes are generally based on the average of total movies <=10000
associated with that particular attribute to the total number of 10 10 20 30
movies in our data set.
9 9 18 27
3.3 Simulation of MOVREC 8 8 16 24
When any user enters our system MOVREC he has a couple
of options. He /she can search a particular movie or see 7 7 14 21
upcoming movies list or can go to our recommendation page.
6 6 12 18
On recommendation page he is given the choice to
select/input values for different attributes. On the basis of 5 5 10 15
these input values, we search our search our database and 1-
prepare an array of suitable movies. Movies included in the 4.9 1 2 3
array are those whose even one attribute value matches with
the input value of the user. We then calculate the number of
movies in our array with the help of a counter. If the counter 4. Genre(Wg)
value is less than or equal to twenty we display the movie list
sorted according to ratings associated with the movies. If Wg= No. of movies of Genre(g) in data set
number of movies is greater than twenty then we apply a pre Total no. of movies in data set
filter and select top twenty movies according to rating. If two
movies have same rating then priority is given to the movie 5. Year(Wy)
having a large number of votes. After filtering the movie list
we match the attributes value to their respective weights and Wy= No. of movies in Year (Y) in data set
compute the total weight of each movie. Once we have Total no. of movies in data set
calculated the total weight of each movie we apply K-means
clustering algorithm on these group of movies. In our research Total weight of a particular movie m is given by
we have also found that generally a user prefer a list with five
movies so we assume K equal to be 4 so that an average every Wm= Wr + Wa +Wd +Wg +Wy
K has five movies, where K is the number of cluster to be
formed.
For each cluster k1, k2 , k3, k4 we assume initial centroid c1, 3.4 Proposed Algorithm
c2, c3, c4 which corresponds to the first, sixth, eleventh, and
sixteenth movie in the movie array. After defining the initial Input: a number of movies: m
centroid we compute the distance of all the other data points
Output: a number of clusters: K
from each centroid and assign the remaining data points
(movies) to closest centroid and form clusters. The distance Step 1 Select n movies from m movies n<m
measure we have used to calculate the distance between data
points and centroid is the Euclidean Distance. Step 2 If n>20 then select top 20 movies from n movies
based on ratings.
After forming initial clusters we take one cluster at a time. We Else display the output movies sorted by rating.
again calculate centroids but this time each centroid
corresponds to mean of the points in that cluster. After Step 3 If rating of movies x, y are equal i.e. If
recalculating centroids we compute the distance of all data Rx= Ry
points with respect to these newly formed centroids and Then select those movies which have greater
reassign them to form clusters. We repeat this process till number of user votes.
there is no change in centroids. This ensures that the clusters Step 4 Assume K=4.
finally formed are optimized and no further grouping is
possible. Once final cluster are formed we compute the Step 5 REPEAT (6, 7)
average rating of all points belonging to that cluster i.e. cluster
rating, then according to the input user query we display the Step 6 Chose initial centroid C1, C2, C3, C4.
cluster having highest cluster rating.
Step 7 Calculate Euclidean distance of all data points
Weightage and matching of attributes w.r.t. C1, C2, C3, C4 and re-compute the centroid of
each cluster.
1. Actor (Wa)
Step 8 UNTILL centroid does not change.
Wa= No. of movies of Actor(a) in data set
Total no. of movies in data set Where,

9
International Journal of Computer Applications (0975 – 8887)
Volume 124 – No.3, August 2015

m: Total number of movies in database Kritika


n: Number of movies after user query 1 3 2 5 4
Baranwal
x, y: Two random movies
Rx, Ry: Rating of movies x, y Ranjana
K: Number of cluster 3 2 1 4 5
Rai
C1, C2, C3, C4: Initial Centroid.
Dr.
Siddharth 2 3 1 4 5
3.5 Challenges Faced Singh
In developing any system the biggest challenge is to satisfy Alok
1 4 2 5 3
the end users for which the system is being developed. We Kumar
also faced certain challenges while developing our system.
Some of them are: Shashank
3 2 1 5 4
Pandey
 To have a system that is user friendly and easy to
Er.
understand and use.
Pankaj 3 2 1 4 5
Agarwal
 To create a data set that has all relevant information
Mr.
about a particular movie.
Girjesh 3 1 2 4 5
Mishra
 The biggest challenge was to have the most
appropriate movie recommended list. Table 2 . Excerpt of the Survey Conducted

 To make our system diversifiable so that it can In our survey we have found that almost every user has given
satisfy users of different geographical locations. maximum priority to the rating concerned with the movie so
we have given the maximum priority to our rating attribute.
 To give weights to different attributes. Also we have deduced from our survey that movie rating has
more weight if that movie has received a large number of
votes. So we have divided rating and votes in to three
3.6 Overcome the problems categories i.e. minimum, medium and maximum votes.
 The proposed system has been tested over a small
group of people, and we have received a positive 4. CONCLUSION
response from them. We have kept our system
In this paper we have introduced MovieREC, a recommender
simple and interactive for this we have choose php
system for movie recommendation. It allows a user to select
and java script.
his choices from a given set of attributes and then recommend
him a movie list based on the cumulative weight of different
 For collecting information we have intensively
attributes and using K-means algorithm. By the nature of our
search free online movie data bases and extract the
system, it is not an easy task to evaluate the performance since
information which was useful for our proposed
there is no right or wrong recommendation; it is just a matter
system.
of opinions. Based on informal evaluations that we carried out
over a small set of users we got a positive response from
 To accurately recommend movie to user we have them. We would like to have a larger data set that will enable
applied K-means clustering algorithm along with a more meaningful results using our system. Additionally we
pre filter. would like to incorporate different machine learning and
clustering algorithms and study the comparative results.
 We have included movies in our database Eventually we would like to implement a web based user
irrespective of their language or location so that interface that has a user database, and has the learning model
users from all across the globe can use our system. tailored to each user.
 For assigning weights to attributes and for giving 5. REFERENCES
priority to them we have conducted a survey on a
group of people and on the basis of the result [1] Han J., Kamber M., “Data Mining: Concepts and
obtained we have prioritize our attributes. Techniques”, Morgan Kaufmann (Elsevier), 2006.
[2] Ricci and F. Del Missier, “Supporting Travel Decision
An excerpt of the survey is given below making Through Personalized Recommendation,”
Design Personalized User Experience for e-commerce,
Actor/ pp. 221-251, 2004.
Name Genre Rating Director Year
Cast [3] Steinbach M., P Tan, Kumar V., “Introduction to Data
Mrs Mining.” Pearson, 2007.
Malti 2 5 1 4 3
[4] Jha N K, Kumar M, Kumar A, Gupta V K “Customer
Singh
classification in retail marketing by data mining”
K.K International Journal of Scientific & Engineering
5 3 1 2 4
Singh Research, Volume 5, Issue 4, April-2014 ISSN 2229-
5518

10
International Journal of Computer Applications (0975 – 8887)
Volume 124 – No.3, August 2015

[5] Giles C.L., Bollacker K.D., and Lawrence S., “CiteSeer: [13] Middleton S. E., De Roure D. C., and Shadbolt N. R.,
An automatic citation indexing system,” in Proceedings “Capturing knowledge of user preferences: ontologies in
of the third ACM conference on Digital libraries, 1998, recommender systems,” in Proceedings of the 1st
pp. 89–98. international conference on Knowledge capture, 2001,
pp. 100–107.
[6] Beel J., Langer S., Genzmehr M., and Nürnberger A.,
“Introducing Docear’s Research Paper Recommender [14] Zarrinkalam F. and Kahani M., “SemCiR - A citation
System,” in Proceedings of the 13th ACM/IEEE-CS recommendation system based on a novel semantic
Joint Conference on Digital Libraries (JCDL’13), 2013, distance measure,” Program: electronic library and
pp. 459–460. information systems, vol. 47, no. 1, pp. 92–112, 2013.
[7] Bethard S and Jurafsky D, “Who should I cite: learning [15] Schafer J. B., Frankowski D., Herlocker J., and Sen S.,
literature search models from citation behavior,” in “Collaborative filtering recommender systems,” Lecture
Proceedings of the 19th ACM international conference Notes In Computer Science, vol. 4321, p. 291, 2007.
on Information and knowledge management, 2010, pp.
609–618. [16] Seroussi Y., “Utilising user texts to improve
recommendations,” User Modeling, Adaptation, and
[8] Bollacker K. D., Lawrence S., and Giles C. L., Personalization, pp. 403–406, 2010.
“CiteSeer: An autonomous web agent for automatic
retrieval and identification of interesting publications,” in [17] Buttler D., “A short survey of document structure
Proceedings of the 2nd international conference on similarity algorithms,” in Proceedings of the 5th
Autonomous agents, 1998, pp. 116–123. International Conference on Internet Computing, 2004.

[9] Erosheva E., Fienberg S., and Lafferty J., “Mixed- [18] Goldberg D., Nichols D., Oki B. M., and Terry D.,
membership models of scientific publications,” in “[Using collaborative filtering to weave an information
Proceedings of the National Academy of Sciences of the Tapestry],” Communications of the ACM, vol. 35, no.
United States of America, 2004, vol. 101, no. Suppl 1, 12, pp. 61–70, 1992.
pp. 5220–5227. [19] Beel J., Langer S., and Genzmehr M., “Mind-Map based
[10] Ferrara F., Pudota N., and Tasso C., “A Keyphrase- User Modelling and Research Paper Recommendations,”
Based Paper Recommender System,” in Proceedings of in work in progress, 2014.
the IRCDL’11, 2011, pp. 14–25. [20] MacQueen J.. Some methods for classification and
[11] Jiang Y., Jia A., Feng Y., and Zhao D., analysis of multivariate observations. In Proc. Of the 5th
“Recommending academic papers via users’ reading Berkeley Symp. On Mathematical Statistics and
purposes,” in Proceedings of the sixth ACM conference Probability, pages 281-297. University of California
on Recommender systems, 2012, pp. 241–244. Press, 1967.

[12] McNee S. M., Kapoor N., and Konstan J. A., “Don’t look [21] Ball G. and Hall D.. A Clustering Technique for
stupid: avoiding pitfalls when recommending research Summarizing Multivariate Data. Behavior Science,
papers,” in Proceedings of the 20th anniversary 12:153-155, March 1967. Bowman, M., Debray, S. K.,
conference on Computer supported cooperative work, and Peterson, L. L. 1993. Reasoning about naming
2006, pp. 171–180. systems.

IJCATM : www.ijcaonline.org 11

You might also like