WORD PBL

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

1.

INTRODUCTION

1. What is Movie Recommendation System?


A movie recommendation system, or a movie recommender system, is an ML-
based approach to filtering or predicting the users’ film preferences based on their
past choices and behavior. It’s an advanced filtration mechanism that predicts the possible
movie choices of the concerned user and their preferences towards a domain-specific item,
aka movie.

The basic concept behind a movie recommendation system is quite simple. In particular,
there are two main elements in every recommender system: users and items. The system
generates movie predictions for its users, while items are the movies themselves.

The primary goal of movie recommendation systems is to filter and predict only those
movies that a corresponding user is most likely to want to watch. The ML algorithms for
these recommendation systems use the data about this user from the system’s database.
This data is used to predict the future behavior of the user concerned based on the
information from the past.

1. Filtration strategies for Recommendation system


The most popular categories of the ML algorithms used for movie
recommendations system include content-based filtering and collaborative
filtering:-

1. Content based filtering


A filtration strategy for movie recommendation systems, which uses the data provided
about the items (movies). This data plays a crucial role here and is extracted from only
one user. An ML algorithm used for this strategy recommends motion pictures that are
similar to the user’s preferences in the past. Therefore, the similarity in content-based
filtering is generated by the data about the past film selections and likes by only one
user.
2. Collaborative filtering
collaborative filtering is based on the interaction of all users in the system with the items
(movies). Thus, every user impacts the final outcome of this ML-based recommendation
system, while content-based filtering depends strictly on the data from one user for its
modeling.
Collaborative filtering algorithms are divided into two categories:

User-based collaborative filtering. The idea is to look for similar patterns in movie
preferences in the target user and other users in the database.
Item-based collaborative filtering. The basic concept here is to look for similar items
(movies) that target users rate or interact with.
Reason for selection of this project

A movie recommender system is basically a tool that helps streaming media platforms
recommend users’ favorite movies on the basis of their interests and behavior. It creates a
list of favorite movies according to the user profile. Using an AI-based algorithm that
analyzes the data, it goes through various possible options, and creates a customized list of
items that are interesting and relevant to an individual.

The results provided by a recommender engine are completely based on the user’s profile,
search or browsing history, what other people are watching with similar traits/locations,
and how likely is the user to watch those movies.

There are thousands of movies available on every streaming media platform. A


Recommender system helps to personalize a platform and help users find what they are
looking for.

From a business perspective, the more relevant content or movies a user finds on any
particular platform, the higher their engagement and as a result increased revenue. Various
platforms have also revealed that 35 to 40% of revenue comes from recommendations only.

The main aim of any recommendation engine is to stimulate demand and actively engage
users. Primarily a component of an eCommerce personalization strategy, recommendation
engines dynamically populate various products onto websites, apps, or emails, thus
enhancing the customer experience. These kinds of varied and omnichannel
recommendations are made based on multiple data points such as customer preferences,
past transaction history, attributes, or situational context.
2.Objective
The primary goal of movie recommendation systems is to filter and predict only those movies
that a corresponding user is most likely to want to watch. The ML algorithms for these
recommendation systems use the data about this user from the system’s database. This data is
used to predict the future behavior of the user concerned based on the information from the
past.

The movie recommendation system can assist the user in finding the movies as per his or her
interest. This can seem quite helpful to the user.

From a user’s perspective, they are catered to fulfil the user’s needs in the shortest time
possible. For example, the type of content you watch on Netflix or Hulu. A person who likes
to watch only Korean drama will see titles related to that only but a person who likes to watch
Action-based titles will see that on their home screen.

From an organization’s perspective, they want to keep the user as long as possible on the
platform so that it will generate the most possible profit for them. With better
recommendations, it creates positive feedback from the user as well. What good it will be to
the organization to have a library of 500K+ titles when they cannot provide proper
recommendations?

Recommendations are a great way to keep you watching but for Raghu the recommendations
he gets wrong. But how? Well, as you know that recommendation systems are catered for a
user but not for multiple users. Raghu lives in a joint family and everyone uses a single system
to watch what they want. While OTT platforms give you a choice of adding multiple profiles
but everyone else has already taken those and he is left with a single profile to share with his
grandparents. So, Raghu decides to create his movie recommendation system. Before getting
started he should understand the different types of recommendation systems.
3.Problem Statement

Perform analysis and Basic Recommendations based on Similar Genres and Movies which Users
prefer.

Some of the Key Points on which we will be focusing include:

● Profitability of Movies

● Language-based Gross Analysis

● Comparison of Gross and Profit for Different Genres,

● Recommendation systems based on Actors, Movies, Genres, title, overview, crew (Director’s
name).

This Project will help us to understand the Correlation between these factors.
i) Block Diagram with explanation

This block diagram shows how the movie recommendation system works- right from
analyzing the past user preferences and then it uses this information to find similar kind
of movies. It uses techniques of content and collaborative filtering to filter out the data
following a set pattern.
ii) Benefits to the surrounding society

There are thousands of movies available on every streaming media platform. A Recommender
system helps to personalize a platform and help users find what they are looking for.

From a business perspective, the more relevant content or movies a user finds on any particular
platform, the higher their engagement and as a result increased revenue. Various platforms
have also revealed that 35 to 40% of revenue comes from recommendations only.

Movie recommender system delivers a smart and personalized experience to users and in turn,
helps streaming media service providers to enhance user engagement. It also provides perfect
movie recommendations and helps users endlessly scroll through the movies and watch one
after the other.
iii) Methodologies
Proposed techniques or methods to be implemented
Python:
Python is a widely used general-purpose, high level programming language. It was created by
Guido van Rossum in 1991 and further developed by the Python Software Foundation. It was
designed with an emphasis on code readability, and its syntax allows programmers to express
their concepts in fewer lines of code.

Python is a programming language that lets you work quickly and integrate systems more
efficiently. Before we start Python programming, we need to have an interpreter to interpret
and run our programs. Windows: There are many interpreters available freely to run Python
scripts like IDLE (Integrated Development Environment).

Linux: Python comes preinstalled with popular Linux distros such as Ubuntu and Fedora. To
check which version of Python you’re running, type “python” in the terminal emulator. The
interpreter should start and print the version number.

macOS: Generally, Python 2.7 comes bundled with macOS.

Kaggle: Kaggle is one of to largest data science communities for open-sourcedata and
collaboration. It attracts a huge number of data scientists with its competitions who want to
earn the recognition of the data science community and be part of a good cause to use data
wisely. Although Kaggle has many datascience-related courses and beginners notebooks it can
be quite challenging to navigate Kaggle as a beginner. This article familiarizes you with the
basic thingsone needs to know to use Kaggle well.

GitHub: GitHub, Inc. is an Internet hosting service for software developmentand version
control using Git. It provides the distributed version control of Git plus access control, bug
tracking, software feature requests, task management, continuous integration, and wikis for
every project. Headquartered in California,it has been a subsidiary of Microsoft since 2018.

It is commonly used to host open source software development projects. As of January 2023,
GitHub reported having over 100 million developers and more than 372 million repositories,
including at least 28 million public repositories. It is the largest source code host as of
November 2021.

BOOTSTRAP: Bootstrap is an HTML, CSS and JS library that focuses on simplifying the
development of informative web pages (as opposed to web applications). The primary purpose
of adding it to a web project is to apply Bootstrap's choices of color, size, font and layout to
that project. As such, the primary factor is whether the developers in charge find those choices
to their liking. Once added to a project, Bootstrap provides basic style definitions for all HTML
elements. The result is a uniform appearance for prose, tables and form elements across web
browsers. In addition, developers can take advantage of CSS classes defined in Bootstrap to
further customize the appearance of their contents. For example, Bootstrap has provisioned .
Project flowchart:

A general project management flow chart is a diagram that depicts a series of actions that begin
with the commencement and progress of a project leading to a conclusion. However, if the
project statement is not accepted, you’ll have to amend or terminate the project. As soon as
your scope statement is accepted, you can move forward with the strategic planning
.
i. E.R diagram:

An Entity Relationship (ER) Diagram is a type of flowchart that illustrates how


“entities” such as people, objects or concepts relate to each other within a
system.
ii. Data flow diagram:

DFD is the abbreviation for Data Flow Diagram. The flow of data of a system or a process
is represented by DFD. It also gives insight into the inputs and outputs of each entity and
the process itself. DFD does not have control flow and no loops or decision rules are
present. Specific operations depending on the type of data can be explained bya flowchart.

It is a graphical tool, useful for communicating with users, managers and otherpersonnel.

Advantages:

Although several definitions about recommender systems have been given, we could define
them as information filtering systems that tackle the problem of information overload. They
filter information according to the user’s interest and preferences or observed behavior about
item.

So, based on the user’s profile, these systems can predict whether a product will be preferable
by a user or not. More broadly, recommender systems represent user preferences for the
purpose of suggesting items to purchase or examine and are now an integral part of a lot of e-
commerce sites.

Limitations
There are various challenges faced by Recommendation System. These challenges are Cold
Start problem,
Data Sparsity, Scalability.
Cold Start Problem: It needs enough users in the system to find a match. For instance, if
we want to find similar user or similar item, we match them with the set of available users
or items. At initial stage for a new user, his profile is empty as he has not rated any item
and the system do not know about his taste, so it becomes difficult for a system to provide
him recommendation about any item. Same case can be with new item, as it is not rated by
any user because it’s new for the user. Both these problem can be resolved by implement
4.Applications:

1. Amazon.com:
Amazon.com uses item-to-item collaborative filtering recommendations on most pages
of their website and e-mail campaigns. According to McKinsey, 35% of Amazon
purchases are thanks to recommendation systems.
2. Netflix: Netflix is another data-driven company that leverages recommendation
systems to boost customer satisfaction. The same Mckinsey study we mentioned above
highlights that 75% of Netflix viewing is driven by recommendations. In fact, Netflix
is so obsessed with providing the best results for users that they held data science
competitions called Netflix Prize where one with the most accurate movie
recommendation algorithm wins a prize worth $1,000,000.
3. Spotify:
Every week, Spotify generates a new customized playlist for each subscriber called
“Discover Weekly” which is a personalized list of 30 songs based on users’ unique
music tastes. Their acquisition of Echo Nest, a music intelligence and data-analytics
startup, enable them to create a music recommendation engine that uses three different
types of recommendation models:
Collaborative filtering: Filtering songs by comparing users’ historical listening data
with other users’ listening history.
Natural language processing: Scraping the internet for information about specific artists
and songs. Each artist or song is then assigned a dynamic list of top terms that changes
daily and is weighted by relevance. The engine then determines whether two pieces of
music or artists are similar.
Audio file analysis: The algorithm each individual audio file’s characteristics, including
tempo, loudness, key, and time signature, and makes recommendations accordingly.
5.Tools/platform used

• Jupyter notebook:
The Jupyter Notebook App is a server-client application that allows editing and running
notebook documents via a web browser. The Jupyter Notebook App can be executed on a local
desktop requiring no internet access (as described in this document) or can be installed on a
remote server and accessed through the internet.

In addition to displaying/editing/running notebook documents, the Jupyter Notebook App has


a “Dashboard” (Notebook Dashboard), a “control panel” showing local files and allowing to
open notebook documents or shutting down their kernels.
• Kernel:
A notebook kernel is a “computational engine” that executes the code contained in a Notebook
document. The python kernel, referenced in this guide, executes python code. Kernels for many
other languages exist (official kernels).

When you open a Notebook document, the associated kernel is automatically launched. When
the notebook is executed (either cell-by-cell or with menu Cell -> Run All), the kernel performs
the computation and produces the results. Depending on the type of computations, the kernel
may consume significant CPU and RAM. Note that the RAM is not released until the kernel is
shut-down.
6.Project Evaluation and Planning:
Evaluation planning comes down to two questions:
• What are the desired outcomes of your project?
• How will you measure them?
It is about building benchmarks and accountability into your plan, and using them to evaluate
the plan as you go and after the project is finished. It gives your project a more strategic
structure, provides evidence for your results and, importantly, contributes to the knowledge
base about effective crime prevention.
In this project as well we began with laying out a plan for the development of the project.
We took the following steps before undertaking the project, the following steps are:
• Firstly, we researched about the movie recommendation systems available
in themarket.
• We analyzed and studied about the different technologies needed to build the
project.
• We calculated the maximum time required to complete the project.
• We divided the project into different modules and thereby applying the
strategy pfdivide and conquer to rule out the big problem by dividing it into
chunks of small tasks.
• We also estimated the total expenditure incurred during building the project.
• We also conducted risk analysis and tried to remove any posing threats to the
project.
• We also studied about the advantages of this project and how it an be improved
by usingcertain other technology.
• We have also thoroughly analyzed all the limitations of this project or any
of thetechnology used in this project.
7. Which model is used to build the movie recommendation
system and why?
We have used spiral model for building the movie recommendation system as it is the best
model and proper risk analysis is done again and again until majority of risk is eliminated.

The spiral model enables gradual releases and refinement of a product through each phase of
the spiral as well as the ability to build prototypes at each phase. The most important feature
of the model is its ability to manage unknown risks after the project has commenced; creating
a prototype makes this feasible.

The spiral model is a systems development lifecycle (SDLC) method used for risk management
that combines the iterative development process model with elements of the Waterfall model.
The spiral model is used by software engineers and is favored for large, expensive and
complicated projects.
This movie recommendation system will be built by following these steps and these steps can
be considered as different modules:
i. Install libraries
ii. Download and prepare dataset
iii. Preprocess dataset
iv. Encode data
v. Perform vector search

Advantages of Spiral model:


• High amount of risk analysis hence, avoidance of Risk is enhanced.
• Good for large and mission-critical projects.
• Strong approval and documentation control.
• Additional Functionality can be added at a later date.
• Software is produced early in the software life cycle.

Disadvantages of Spiral model:


• Can be a costly model to use.
• Risk analysis requires highly specific expertise.
• Project’s success is highly dependent on the risk analysis phase.
• Doesn’t work well for smaller projects.
8. Time estimation of the project (using Critical Path
Method)

Activity Id Activity

A Data Collection

B Data Preprocessing

C Feature Engineering

D Model Training

E Model Deployment

F Monitoring and Maintenance

Activity Activity Duration Early Last Total Free


Id start finish Float Float
A 1-2 3 0 3 0 0

B 2-3 4 3 12 5 0

C 2-4 5 3 8 0 0

D 3-5 4 7 16 5 5

E 4-5 8 8 16 0 0

F 5-6 6 16 22 0 0
Network diagram of the project
9. Cost estimation of the project
Summarizing our thoughts, the recommendation system development is the process that is not
easy to estimate at first sight. It requires the personalized approach and the deep understanding
and analysis of the clients business processes, goals and data.
Finalizing the development costs:
1) Analysis and raw estimation – free
2) Prototype development – $5000
3) MVP development (prototype included) – $10000
4) Deployment and Release – $5000
This way we would say that the usual development costs for the recommendation engine
powered by machine learning are about $15.000.
For the recommendation engine powered by traditional algorithms, it is about 30% lower. But
the system makes less accurate predictions in comparison to the smart recommendation engine.
Often the price varies according to several factors: the amount of data and its complexity,
business goals and expectations, existing code base and technologies-in-use. So it can both
increase or decrease accordingly to the specific project.
10. SOURCE CODE:
"cells": [
{
"cell_type": "code",
"execution_count": 15,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import nltk\n",
"from nltk.corpus import stopwords\n",
"from sklearn.feature_extraction.text import TfidfVectorizer\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn import naive_bayes\n",
"from sklearn.metrics import roc_auc_score,accuracy_score\n",
"import pickle"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"[nltk_data] Downloading package stopwords to\n",
"[nltk_data] C:\\Users\\kishan\\AppData\\Roaming\\nltk_data...\n",
"[nltk_data] Unzipping corpora\\stopwords.zip.\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nltk.download(\"stopwords\")"
]
},
{

19
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"dataset = pd.read_csv('reviews.txt',sep = '\\t', names =['Reviews','Comments'])"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Reviews</th>\n",
" <th>Comments</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>The Da Vinci Code book is just awesome.</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>this was the first clive cussler i've ever rea...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
20
" <td>i liked the Da Vinci Code a lot.</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>i liked the Da Vinci Code a lot.</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1</td>\n",
" <td>I liked the Da Vinci Code but it ultimatly did...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6913</th>\n",
" <td>0</td>\n",
" <td>Brokeback Mountain was boring.</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6914</th>\n",
" <td>0</td>\n",
" <td>So Brokeback Mountain was really depressing.</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6915</th>\n",
" <td>0</td>\n",
" <td>As I sit here, watching the MTV Movie Awards, ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6916</th>\n",
" <td>0</td>\n",
" <td>Ok brokeback mountain is such a horrible movie.</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6917</th>\n",
" <td>0</td>\n",
" <td>Oh, and Brokeback Mountain was a terrible movie.</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>6918 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" Reviews Comments\n",
"0 1 The Da Vinci Code book is just awesome.\n",
"1 1 this was the first clive cussler i've ever rea...\n",
"2 1 i liked the Da Vinci Code a lot.\n",
21
"3 1 i liked the Da Vinci Code a lot.\n",
"4 1 I liked the Da Vinci Code but it ultimatly did...\n",
"... ... ...\n",
"6913 0 Brokeback Mountain was boring.\n",
"6914 0 So Brokeback Mountain was really depressing.\n",
"6915 0 As I sit here, watching the MTV Movie Awards, ...\n",
"6916 0 Ok brokeback mountain is such a horrible movie.\n",
"6917 0 Oh, and Brokeback Mountain was a terrible movie.\n",
"\n",
"[6918 rows x 2 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataset"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"stopset = set(stopwords.words('english'))"
"stopset = stopwords.words('english')"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"vectorizer = TfidfVectorizer(use_idf = True,lowercase = True, strip_accents='ascii',stop_words=stopset)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"X = vectorizer.fit_transform(dataset.Comments)\n",
"y = dataset.Reviews\n",
"pickle.dump(vectorizer, open('tranform.pkl', 'wb'))"
]
},
{
22
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"MultinomialNB()"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"clf = naive_bayes.MultinomialNB()\n",
"clf.fit(X_train,y_train)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"97.47109826589595"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"accuracy_score(y_test,clf.predict(X_test))*100"
]
},
{
"cell_type": "code",
"execution_count": 20,
23
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"MultinomialNB()"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"clf = naive_bayes.MultinomialNB()\n",
"clf.fit(X,y)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"98.77167630057804"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"accuracy_score(y_test,clf.predict(X_test))*100"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"filename = 'nlp_model.pkl'\n",
"pickle.dump(clf, open(filename, 'wb'))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
24
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

25
11. References
• https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.htm
• https://www.muvi.com/blogs/movie-recommender-
system.html#:~:text=Movie%20recommender%20system%20delivers%20a,watch%20one%2
0after%20the%20other.
• https://research.aimultiple.com/recommendation-system/
• https://towardsdatascience.com/5-advantages-recommendation-engines-can-offer-to-
businesses-10b663977673

26

You might also like