A Movie Recommender System: MOVREC: Manoj Kumar D.K. Yadav Ankur Singh Vijay Kr. Gupta
A Movie Recommender System: MOVREC: Manoj Kumar D.K. Yadav Ankur Singh Vijay Kr. Gupta
A Movie Recommender System: MOVREC: Manoj Kumar D.K. Yadav Ankur Singh Vijay Kr. Gupta
7
International Journal of Computer Applications (0975 – 8887)
Volume 124 – No.3, August 2015
2. RELATED WORK
Many recommendation systems have been developed over the
past decades. These systems use different approaches like
collaborative approach, content based approach, a utility base
approach, hybrid approach etc.
Looking at the purchase behavior and history of the shoppers,
Lawrence et al. 2001 presented a recommender system which
suggests the new product in the market. To refine the
recommendation collaborative and content based filtering
approach were used. To find the potential customers most of
the recommendation systems today use ratings given by
previous users. These ratings are further used to predict and
recommend the item of one’s choice.
In 2007 Weng, Lin and Chen performed an evaluation study Figure 2: The 5 steps of the K-Means algorithm [3]
which says using multidimensional analysis and additional
customer’s profile increases the recommendation quality. 3.2 Data Description
Weng used MD recommendation model (multidimensional
recommendation model) for this purpose. multidimensional In proposed model we use a pre filter before applying K-
recommendation model was proposed by Tuzhilin and means algorithm. The attributes used to calculate distance of
Adomavicius (2001). each point from centroid are
8
International Journal of Computer Applications (0975 – 8887)
Volume 124 – No.3, August 2015
In our research we have found that as the number of vote’s 2. Director (Wd)
increases the weight of rating should also increase
respectively. Therefore we have used ratios of 1:1, 1:2, and Wd= No. of movies of Director(d) in data set
1:3 depending on total number of votes received by a movie. Total no. of movies in data set
we have also found that the movies which have rating less
than 5 are the ones which are least suitable for 3. Rating
recommendation, and are least desirable by users. Users
generally want to see a good movie and higher rating ensures Weight
that our predicted movie set are of those movies which are Rat If number of
ing If number of If number of
liked by a large number of users. Weights assigned to other 1000< votes
votes <=1000 votes >10000
attributes are generally based on the average of total movies <=10000
associated with that particular attribute to the total number of 10 10 20 30
movies in our data set.
9 9 18 27
3.3 Simulation of MOVREC 8 8 16 24
When any user enters our system MOVREC he has a couple
of options. He /she can search a particular movie or see 7 7 14 21
upcoming movies list or can go to our recommendation page.
6 6 12 18
On recommendation page he is given the choice to
select/input values for different attributes. On the basis of 5 5 10 15
these input values, we search our search our database and 1-
prepare an array of suitable movies. Movies included in the 4.9 1 2 3
array are those whose even one attribute value matches with
the input value of the user. We then calculate the number of
movies in our array with the help of a counter. If the counter 4. Genre(Wg)
value is less than or equal to twenty we display the movie list
sorted according to ratings associated with the movies. If Wg= No. of movies of Genre(g) in data set
number of movies is greater than twenty then we apply a pre Total no. of movies in data set
filter and select top twenty movies according to rating. If two
movies have same rating then priority is given to the movie 5. Year(Wy)
having a large number of votes. After filtering the movie list
we match the attributes value to their respective weights and Wy= No. of movies in Year (Y) in data set
compute the total weight of each movie. Once we have Total no. of movies in data set
calculated the total weight of each movie we apply K-means
clustering algorithm on these group of movies. In our research Total weight of a particular movie m is given by
we have also found that generally a user prefer a list with five
movies so we assume K equal to be 4 so that an average every Wm= Wr + Wa +Wd +Wg +Wy
K has five movies, where K is the number of cluster to be
formed.
For each cluster k1, k2 , k3, k4 we assume initial centroid c1, 3.4 Proposed Algorithm
c2, c3, c4 which corresponds to the first, sixth, eleventh, and
sixteenth movie in the movie array. After defining the initial Input: a number of movies: m
centroid we compute the distance of all the other data points
Output: a number of clusters: K
from each centroid and assign the remaining data points
(movies) to closest centroid and form clusters. The distance Step 1 Select n movies from m movies n<m
measure we have used to calculate the distance between data
points and centroid is the Euclidean Distance. Step 2 If n>20 then select top 20 movies from n movies
based on ratings.
After forming initial clusters we take one cluster at a time. We Else display the output movies sorted by rating.
again calculate centroids but this time each centroid
corresponds to mean of the points in that cluster. After Step 3 If rating of movies x, y are equal i.e. If
recalculating centroids we compute the distance of all data Rx= Ry
points with respect to these newly formed centroids and Then select those movies which have greater
reassign them to form clusters. We repeat this process till number of user votes.
there is no change in centroids. This ensures that the clusters Step 4 Assume K=4.
finally formed are optimized and no further grouping is
possible. Once final cluster are formed we compute the Step 5 REPEAT (6, 7)
average rating of all points belonging to that cluster i.e. cluster
rating, then according to the input user query we display the Step 6 Chose initial centroid C1, C2, C3, C4.
cluster having highest cluster rating.
Step 7 Calculate Euclidean distance of all data points
Weightage and matching of attributes w.r.t. C1, C2, C3, C4 and re-compute the centroid of
each cluster.
1. Actor (Wa)
Step 8 UNTILL centroid does not change.
Wa= No. of movies of Actor(a) in data set
Total no. of movies in data set Where,
9
International Journal of Computer Applications (0975 – 8887)
Volume 124 – No.3, August 2015
To make our system diversifiable so that it can In our survey we have found that almost every user has given
satisfy users of different geographical locations. maximum priority to the rating concerned with the movie so
we have given the maximum priority to our rating attribute.
To give weights to different attributes. Also we have deduced from our survey that movie rating has
more weight if that movie has received a large number of
votes. So we have divided rating and votes in to three
3.6 Overcome the problems categories i.e. minimum, medium and maximum votes.
The proposed system has been tested over a small
group of people, and we have received a positive 4. CONCLUSION
response from them. We have kept our system
In this paper we have introduced MovieREC, a recommender
simple and interactive for this we have choose php
system for movie recommendation. It allows a user to select
and java script.
his choices from a given set of attributes and then recommend
him a movie list based on the cumulative weight of different
For collecting information we have intensively
attributes and using K-means algorithm. By the nature of our
search free online movie data bases and extract the
system, it is not an easy task to evaluate the performance since
information which was useful for our proposed
there is no right or wrong recommendation; it is just a matter
system.
of opinions. Based on informal evaluations that we carried out
over a small set of users we got a positive response from
To accurately recommend movie to user we have them. We would like to have a larger data set that will enable
applied K-means clustering algorithm along with a more meaningful results using our system. Additionally we
pre filter. would like to incorporate different machine learning and
clustering algorithms and study the comparative results.
We have included movies in our database Eventually we would like to implement a web based user
irrespective of their language or location so that interface that has a user database, and has the learning model
users from all across the globe can use our system. tailored to each user.
For assigning weights to attributes and for giving 5. REFERENCES
priority to them we have conducted a survey on a
group of people and on the basis of the result [1] Han J., Kamber M., “Data Mining: Concepts and
obtained we have prioritize our attributes. Techniques”, Morgan Kaufmann (Elsevier), 2006.
[2] Ricci and F. Del Missier, “Supporting Travel Decision
An excerpt of the survey is given below making Through Personalized Recommendation,”
Design Personalized User Experience for e-commerce,
Actor/ pp. 221-251, 2004.
Name Genre Rating Director Year
Cast [3] Steinbach M., P Tan, Kumar V., “Introduction to Data
Mrs Mining.” Pearson, 2007.
Malti 2 5 1 4 3
[4] Jha N K, Kumar M, Kumar A, Gupta V K “Customer
Singh
classification in retail marketing by data mining”
K.K International Journal of Scientific & Engineering
5 3 1 2 4
Singh Research, Volume 5, Issue 4, April-2014 ISSN 2229-
5518
10
International Journal of Computer Applications (0975 – 8887)
Volume 124 – No.3, August 2015
[5] Giles C.L., Bollacker K.D., and Lawrence S., “CiteSeer: [13] Middleton S. E., De Roure D. C., and Shadbolt N. R.,
An automatic citation indexing system,” in Proceedings “Capturing knowledge of user preferences: ontologies in
of the third ACM conference on Digital libraries, 1998, recommender systems,” in Proceedings of the 1st
pp. 89–98. international conference on Knowledge capture, 2001,
pp. 100–107.
[6] Beel J., Langer S., Genzmehr M., and Nürnberger A.,
“Introducing Docear’s Research Paper Recommender [14] Zarrinkalam F. and Kahani M., “SemCiR - A citation
System,” in Proceedings of the 13th ACM/IEEE-CS recommendation system based on a novel semantic
Joint Conference on Digital Libraries (JCDL’13), 2013, distance measure,” Program: electronic library and
pp. 459–460. information systems, vol. 47, no. 1, pp. 92–112, 2013.
[7] Bethard S and Jurafsky D, “Who should I cite: learning [15] Schafer J. B., Frankowski D., Herlocker J., and Sen S.,
literature search models from citation behavior,” in “Collaborative filtering recommender systems,” Lecture
Proceedings of the 19th ACM international conference Notes In Computer Science, vol. 4321, p. 291, 2007.
on Information and knowledge management, 2010, pp.
609–618. [16] Seroussi Y., “Utilising user texts to improve
recommendations,” User Modeling, Adaptation, and
[8] Bollacker K. D., Lawrence S., and Giles C. L., Personalization, pp. 403–406, 2010.
“CiteSeer: An autonomous web agent for automatic
retrieval and identification of interesting publications,” in [17] Buttler D., “A short survey of document structure
Proceedings of the 2nd international conference on similarity algorithms,” in Proceedings of the 5th
Autonomous agents, 1998, pp. 116–123. International Conference on Internet Computing, 2004.
[9] Erosheva E., Fienberg S., and Lafferty J., “Mixed- [18] Goldberg D., Nichols D., Oki B. M., and Terry D.,
membership models of scientific publications,” in “[Using collaborative filtering to weave an information
Proceedings of the National Academy of Sciences of the Tapestry],” Communications of the ACM, vol. 35, no.
United States of America, 2004, vol. 101, no. Suppl 1, 12, pp. 61–70, 1992.
pp. 5220–5227. [19] Beel J., Langer S., and Genzmehr M., “Mind-Map based
[10] Ferrara F., Pudota N., and Tasso C., “A Keyphrase- User Modelling and Research Paper Recommendations,”
Based Paper Recommender System,” in Proceedings of in work in progress, 2014.
the IRCDL’11, 2011, pp. 14–25. [20] MacQueen J.. Some methods for classification and
[11] Jiang Y., Jia A., Feng Y., and Zhao D., analysis of multivariate observations. In Proc. Of the 5th
“Recommending academic papers via users’ reading Berkeley Symp. On Mathematical Statistics and
purposes,” in Proceedings of the sixth ACM conference Probability, pages 281-297. University of California
on Recommender systems, 2012, pp. 241–244. Press, 1967.
[12] McNee S. M., Kapoor N., and Konstan J. A., “Don’t look [21] Ball G. and Hall D.. A Clustering Technique for
stupid: avoiding pitfalls when recommending research Summarizing Multivariate Data. Behavior Science,
papers,” in Proceedings of the 20th anniversary 12:153-155, March 1967. Bowman, M., Debray, S. K.,
conference on Computer supported cooperative work, and Peterson, L. L. 1993. Reasoning about naming
2006, pp. 171–180. systems.
IJCATM : www.ijcaonline.org 11