Submission I - Case Study For PGDDS (Semester II)

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

TITLE - SUBMISSION I - CASE STUDY FOR PGDDS (SEMESTER II)

NAME – NIRAJ KUMAR SRIVASTAVA


REGISTRATION NUMBER – 202105136

NAME OF INSTITUTE – SCDL


ACADEMIC YEAR - 2021
Introduction

We have a data set ‘movies_2022’ in which we have to analyse the data on the India net
collection and India gross collection based on several factors.

Methods/Tool

Python_for_Data_Analysis.

Activities

#%%
# Step1 - Import important Libraries

import numpy as np # Linear Algebra


import pandas as pd # Data Processing
import datetime as dt # To create Date
import os #provides functions for interacting with the operating system
pd.set_option('display.max_columns',80) #display-related options being those the user is
most likely to adjust
import warnings #Ignore warning
warnings.filterwarnings('ignore')
#%%
# step2 - First read data

movies_data = pd.read_csv(r"D:\Learning\1). Sy_mb_ios_is\2_nd_semester\Pro_jects\


movies_2022.csv")
print(movies_data.head())

#%%
# step3 - collect data information

print(movies_data.info())

#%%
# step4 - Know the number of unique elements in the object

print(movies_data.nunique())

#%%
# step5 - know the number of blank elements and remove the blank elemets in the
object

print(movies_data.isna().sum())

#%%
# step6 - Observations :

# We have float64(4), int64(1) & object(4) in our data.


# Only 'Budget' column is of the Integer type.
# There are 3 missing records in 'Movie' column & 6 missing data in 'Movie type
column'.
# There are 10 Dupliates rows on the basis of movie column.
# There is need of month-Year column on the basis of 'Released Date' column.
# The business value in data are in Crore.
# "Movies Industries" categorised in below short name in "Industries categories"
# 1- Bollywood - BLD
# 2- Tollywood - TLD
# 3- Gollywood - GLD
# 4- Kollywood - KLD
# 5- Hollywood - HLD
# 6- Sandalwood - SNOD
# 7- Marathi Film - MTF
# 8- Bengali Tollywood - BGT
# 9- Pollywood - PLD
# 10-Mollywood - MLD

#%%
# step7 - Handling missing data

#movies_data = movies_data.dropna(axis=0,subset='Movie',inplace =True)


movies_data['Movie'].fillna('unknown',inplace = True)
movies_data = movies_data[~movies_data['Movie'].isin(['unknown'])]
movies_data['Movie Industries'].fillna('unknown',inplace = True)
movies_data['Industries categories'].fillna('unknown',inplace = True)
print(movies_data.isna().sum())

#%%
# Data Cleaning is done
#%%
# step8 - handling duplicated data.

movies_data.drop_duplicates(subset ='Movie', inplace =True)


print(movies_data.nunique())

#%%
# step9 - Adding Year-Month column on the basis of 'Released Date' column.

movies_data['Released Date']=pd.to_datetime(movies_data['Released Date'],


infer_datetime_format=True)
movies_data['year'] = pd.DatetimeIndex(movies_data['Released Date']).year.astype(str)
movies_data['month'] = movies_data['Released Date'].dt.month_name().str.slice(stop=3)
movies_data['Month-Year'] = movies_data['month'] +"-"+ movies_data['year']
#%%
# step10 - View the Description of data

print(movies_data.describe())

#%%
# step8-step9 - Data Transformation Done

#%%
# step10 - Some Questions in data set :(Exploratory data analysis)

# Q1. Which type of Movie Industries released the movies most?(Univariate)


# Q2. In which month Movies released Most? (Univariate)
# Q3. In which Verdict Movies released Most? (Univariate)
# Q4. How much film industries done business in india gross level? (Univariate)
# Q5. How much film industries done business in india net level? (Univariate)
# Q6. Please do the analysis on the basid of Movies Industries and Verdict?
(mutivariate)
#%%
# Step11 - Solution Question wise:

#%%
# importing Libraries for Graph analysis
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(rc={'figure.figsize':(11.7,8.27)})

sns.set()

#%%
# Step12 – Q1. Which type of Movie Industries released the movies most?

#solution

sns.countplot(x=movies_data['Industries categories'],order = movies_data['Industries


categories'].value_counts().index).set(title='Film Industries wise Movies released count')
#%%
# Observatons:

# Bollywood(BLD) Industries released More Movies than any other film industries.

#%%
# step13 - Q2. In which month Movies released Most?

#solution

sns.countplot(x=movies_data['Month-Year'],order = movies_data['Month-
Year'].value_counts().index).set(title = "Month wise Movies released count")

# Observations :

# In Feb 2022, Maximum number of Movies released.


#%%
# step14- Q3. In which Verdict Movies released Most?

#solution

sns.countplot(x=movies_data['Verdict'],order =
movies_data['Verdict'].value_counts().index).set(title = "Verdict wise Movies released
count")

# Observations :

# In "None" Categories of Verdict column, Most the movies released.


# Remaining are in order Diaster, Average, Blockbuster,flop Verdict,Superhit all
Time,Blockbuster,Hit and Plus
#%%
# step15 - Q4. How much film industries done business in India gross level?

#solution

sns.boxplot(movies_data['India Gross'])

# Observations:
# we can see that film industries done the business at India Gross level approx 1000 cr .
#%%
# step16 - Q5. How much film industries done business in India net level?

#solution

sns.displot(movies_data['India Net'])

# Observations :
# we can see that film industries done the business at India Net level approx 850 cr .

#%%
# step17 - Q6. Please do the analysis on the basid of Movies Industries and Verdict?

#solution

sns.displot(data=movies_data, x="Verdict", hue="Movie Industries", multiple="stack",


hue_order=movies_data['Movie Industries'].value_counts().index)
plt.xticks(rotation=90)
# Observation :

# Bollywood Industries have maximum Movies "None","Disaster"& "Flop" Verdict .


# Tollywood Industries have maximum Movies "Average" Verdict.
# Marathi Film ,Mollywood & Tollywood have equal and maximum count of Movies in
"Blockbuster" Verdict.
# only Kollywood Industries is in "Plus" Verdict.
# only Bollywood, Kollywood & Sandalwood film industries have movies in "All time
Blockbuster" Verdict.
# only Bollywood, Kollywood, Marathi Film & Tollywood have movies in "Hit"
Verdict.
# only Bollywood, Kollywood, Sandalwood & Tollywood have movies in "SuperHit"
Verdict.
Observation Summary

1). Bollywood(BLD) Industries released More Movies than any other film industries.
2). In Feb 2022, Maximum number of Movies released.
3). In "None" Categories of Verdict column, Most the movies released.
4). Remaining are in order Disaster, Average, Blockbuster,flop Verdict,Superhit all
Time, Blockbuster, Hit and Plus.
5). we can see that film industries done the business at India Gross level approx 1000
cr .
6). we can see that film industries done the business at India Net level approx 850 cr .
7).Bollywood Industries have maximum Movies "None","Disaster"& "Flop" Verdict.
8). Tollywood Industries have maximum Movies "Average" Verdict.
9). Marathi Film ,Mollywood & Tollywood have equal and maximum count of
Movies in "Blockbuster" Verdict.
10). only Kollywood Industries is in "Plus" Verdict.
11). only Bollywood, Kollywood & Sandalwood film industries have movies in "All
time Blockbuster" Verdict.
12). only Bollywood, Kollywood, Marathi Film & Tollywood have movies in "Hit"
Verdict.
13). only Bollywood, Kollywood, Sandalwood & Tollywood have movies in
"SuperHit" Verdict.

You might also like