Tugas-Bank-Campaign (1) .Ipynb - Colaboratory

10/19/21, 12:31 AM tugas-bank-campaign (1).
ipynb - Colaboratory
IMPORT LIBRARY
def configure_plotly_browser_state():
import IPython
display(IPython.core.display.HTML('''
<script src="/static/components/requirejs/require.js"></script>
<script>
requirejs.config({
paths: {
base: '/static/base',
plotly: 'https://cdn.plot.ly/plotly-latest.min.js?noext',
},
});
</script>
'''))
!pip install chart_studio
!pip install openpyxl
import warnings
warnings.filterwarnings('ignore')
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
%matplotlib inline
import matplotlib.pyplot as plt # Matlab-style plotting
import seaborn as sns
color = sns.color_palette()
sns.set_style('darkgrid')
from plotly.tools import make_subplots
from plotly import tools
import chart_studio.plotly as py
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.figure_factory as ff
from IPython.display import HTML, Image
from scipy import stats
from scipy.stats import norm, skew #for some statistics
Requirement already satisfied: chart_studio in /usr/local/lib/python3.7/dist-packages (1.1.0)
Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from chart_studio) (2.23.0)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from chart_studio) (1.15.0)
Requirement already satisfied: retrying>=1.3.3 in /usr/local/lib/python3.7/dist-packages (from chart_studio) (1.3.3)
Requirement already satisfied: plotly in /usr/local/lib/python3.7/dist-packages (from chart_studio) (4.4.1)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->chart_studio) (3.0.4

Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->chart_studio) (2021
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->chart_studio) (2.10)
Requirement already satisfied: openpyxl in /usr/local/lib/python3.7/dist-packages (2.5.9)
Requirement already satisfied: jdcal in /usr/local/lib/python3.7/dist-packages (from openpyxl) (1.4.1)
Requirement already satisfied: et-xmlfile in /usr/local/lib/python3.7/dist-packages (from openpyxl) (1.1.0)
https://colab.research.google.com/drive/1mu7DZCLMXk6k8zsDdbtCwM_AfyFlDvV4#scrollTo=e5NgJM_5RGLk&printMode=true 1/29
10/19/21, 12:31 AM tugas-bank-campaign (1).ipynb - Colaboratory
Melihat Keseluruhan Data
data = pd.read_csv("bank-full.csv",sep = ";")
data.head()
age job marital education default balance housing loan contact day month duration campaign pdays previo
0 58 management married tertiary no 2143 yes no unknown 5 may 261 1 -1
1 44 technician single secondary no 29 yes no unknown 5 may 151 1 -1
2 33 entrepreneur married secondary no 2 yes yes unknown 5 may 76 1 -1
3 47 blue-collar married unknown no 1506 yes no unknown 5 may 92 1 -1
4 33 unknown single unknown no 1 no no unknown 5 may 198 1 -1
data.describe()
age balance day duration campaign pdays previous
count 45211.000000 45211.000000 45211.000000 45211.000000 45211.000000 45211.000000 45211.000000
mean 40.936210 1362.272058 15.806419 258.163080 2.763841 40.197828 0.580323
std 10.618762 3044.765829 8.322476 257.527812 3.098021 100.128746 2.303441
min 18.000000 -8019.000000 1.000000 0.000000 1.000000 -1.000000 0.000000
25% 33.000000 72.000000 8.000000 103.000000 1.000000 -1.000000 0.000000
50% 39.000000 448.000000 16.000000 180.000000 2.000000 -1.000000 0.000000
75% 48.000000 1428.000000 21.000000 319.000000 3.000000 -1.000000 0.000000
max 95.000000 102127.000000 31.000000 4918.000000 63.000000 871.000000 275.000000
data.describe().transpose()
count mean std min 25% 50% 75% max
age 45211.0 40.936210 10.618762 18.0 33.0 39.0 48.0 95.0
balance 45211.0 1362.272058 3044.765829 -8019.0 72.0 448.0 1428.0 102127.0
day 45211.0 15.806419 8.322476 1.0 8.0 16.0 21.0 31.0
duration 45211.0 258.163080 257.527812 0.0 103.0 180.0 319.0 4918.0
campaign 45211.0 2.763841 3.098021 1.0 1.0 2.0 3.0 63.0
pdays 45211.0 40.197828 100.128746 -1.0 -1.0 -1.0 -1.0 871.0
previous 45211.0 0.580323 2.303441 0.0 0.0 0.0 0.0 275.0
Melihat Data Continous dan Categorical
cont_features=[i for i in data.columns if data[i].nunique()>12]
cat_features=[i for i in data.columns if data[i].nunique()<=12]
cont_features
['age', 'balance', 'day', 'duration', 'campaign', 'pdays', 'previous']
cat_features
['job',
'marital',
'education',
'default',
'housing',
'loan',
'contact',
'month',
'poutcome',
'y']
data.info() # cek info
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45211 entries, 0 to 45210
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 45211 non-null int64
1 job 45211 non-null object
2 marital 45211 non-null object
3 education 45211 non-null object
4 default 45211 non-null object
5 balance 45211 non-null int64
6 housing 45211 non-null object
7 loan 45211 non-null object
8 contact 45211 non-null object
9 day 45211 non-null int64
10 month 45211 non-null object
11 duration 45211 non-null int64
12 campaign 45211 non-null int64
13 pdays 45211 non-null int64
14 previous 45211 non-null int64
15 poutcome 45211 non-null object
16 y 45211 non-null object
dtypes: int64(7), object(10)
memory usage: 5.9+ MB
data = data.dropna() #Menghapus missing data jika ada
Normality Check - Fitur2 Continous

1. Age
configure_plotly_browser_state()
fig = go.Figure()
fig.add_trace(go.Box(y=data['age'], name='Age',
marker_color = 'rgb(0, 0, 100)'))
fig.show()
90
80
70
60
50
40
30
20
Age
fig = ff.create_distplot([data['age']],['Age'],bin_size=5,colors=['rgb(0, 0, 100)'])
iplot(fig, filename='Basic Distplot')
Age
0.04
0.03
0.02
0.01
fig = plt.figure()
res = stats.probplot(data['age'], plot=plt)
plt.show()
20 40 60 80 100
Shapiro Wilk Test (Menghitung p-value)
from scipy.stats import shapiro
import scipy.stats as stats
shapiro([data['age']])
(0.9605178833007812, 0.0)
Karena p value < 0.05 (significance level), hipotesis null ditolak dan dapat disimpulkan fitur tersebut memiliki distribusi tidak normal
2. Balance
fig = go.Figure()
fig.add_trace(go.Box(y=data['balance'], name='Balance',
fig.show()
Q1 = data['balance'].quantile(0.25)
Q3 = data['balance'].quantile(0.75)
IQR = Q3 - Q1 #IQR is interquartile range.
filter = (data['balance'] >= Q1 - 1.5 * IQR) & (data['balance'] <= Q3 + 1.5 *IQR)
df1=data.loc[filter]
100k
80k
60k
40k
20k
Balance
fig = ff.create_distplot([df1['balance']],['balance'],bin_size=5,colors=['rgb(0, 0, 100)'])
balance
0.015
0.01
0.005
fig = plt.figure()
−2000 −1000 0 1000

res = stats.probplot(df1['balance'], plot=plt)
2000 3000
plt.show()
ntA = shapiro(df1['balance'])
ntA
(0.866013765335083, 0.0)
from scipy.stats import anderson
result = anderson(df1['balance'])
print('Statistic: %.3f' % result.statistic)
p = 0
for i in range(len(result.critical_values)):
sl, cv = result.significance_level[i], result.critical_values[i]
if result.statistic < result.critical_values[i]:
print('%.3f: %.3f, data normal (tidak menolak H0)' % (sl, cv))
else:
print('%.3f: %.3f, data tidak normal (menolak H0)' % (sl, cv))
Statistic: 2010.508
15.000: 0.576, data tidak normal (menolak H0)
3. Day
fig = go.Figure()
fig.add_trace(go.Box(y=data['day'], name='Day',
fig.show()
fig = ff.create_distplot([data['day']],['day'],bin_size=5,colors=['rgb(0, 0, 100)'])
30
25
0.05
day
20
0.04
15
0.03
0.02
10
0.01
5
0
0
Day
0 10 20 30
fig = plt.figure()
res = stats.probplot(data['day'], plot=plt)
plt.show()
ntA = shapiro(data['day'])
ntA
(0.9595543146133423, 0.0)
4. Duration
fig = go.Figure()
fig.add_trace(go.Box(y=data['duration'], name='Duration',
fig.show()
5000
4000
3000
2000
1000
Duration
fig = ff.create_distplot([data['duration']],['duration'],bin_size=5,colors=['rgb(0, 0, 100)'])
duration
0.003
0.002
0.001
0 1000 2000 3000 4000 5000
fig = plt.figure()
res = stats.probplot(data['duration'], plot=plt)
plt.show()
ntA = shapiro(data['duration'])
ntA
(0.7269970774650574, 0.0)
5. Campaign
fig = go.Figure()
fig.add_trace(go.Box(y=data['campaign'], name='Campaign',
fig.show()
60
50
40
30
20
10
0
Campaign
fig = ff.create_distplot([data['campaign']],['campaign'],bin_size=5,colors=['rgb(0, 0, 100)'])
campaign
0.4
0.3
0.2
0.1
fig = plt.figure()
res = stats.probplot(data['campaign'], plot=plt)
plt.show()
0 20 40 60
ntA = shapiro(data['campaign'])
ntA
(0.5507382750511169, 0.0)
6. Pdays
fig = go.Figure()
fig.add_trace(go.Box(y=data['pdays'], name='Pdays',
fig.show()
800
600
400
200
Pdays
fig = ff.create_distplot([data['pdays']],['pdays'],bin_size=5,colors=['rgb(0, 0, 100)'])
pdays
0.15
0.1
0.05
fig = plt.figure()
res = stats.probplot(data['pdays'], plot=plt)
0
plt.show()
0 200 400 600 800
ntA = shapiro(data['pdays'])
ntA
(0.47478705644607544, 0.0)
7. Previous
fig = go.Figure()
fig.add_trace(go.Box(y=data['previous'], name='Previous',
fig.show()
250
200
150
100
50
Previous
fig = ff.create_distplot([data['previous']],['previous'],bin_size=5,colors=['rgb(0, 0, 100)'])
previous
1
fig = plt.figure()
res = stats.probplot(data['previous'], plot=plt)
plt.show()
0.5
0 50 100 150 200 250
ntA = shapiro(data['previous'])
ntA
(0.23559075593948364, 0.0)
def calculateCrosstabulation(catVariable, targetCatVariable=data.y):
# Menghitung cross tabulation dalam absolut dan nilai relatif
absCount = pd.crosstab(index = catVariable, columns = targetCatVariable)\
.rename(columns={0:"no",1:"yes"})
relCount = pd.crosstab(index = catVariable, columns = targetCatVariable, normalize="index")\
.rename(columns={0:"no",1:"yes"})*100
relCount = relCount.round(1)

# Gambar 2 subplot bar chart
fig=make_subplots(
rows=2,
cols=1,
vertical_spacing=0.3,
subplot_titles=(f"Jumlah Absolut dari kategori fitur Y: yes dan no berdasarkan {catVariable.name}",
f"Jumlah Persentase dari kategori fitur Y: yes dan no berdasarkan {catVariable.name}"),
print_grid=False)
# Menambahkan trace pada frekuensi absolut
for col in absCount.columns:
fig.add_trace(go.Bar(x=absCount.index,
y=absCount[col],
text=absCount[col],
hoverinfo="x+y",
textposition="auto",
name=f"{col}",
textfont=dict(family="sans serif",size=14),
),
row=1,
col=1
)
# Menambah trace pada frekuensi relatif
for col in relCount.columns:
fig.add_trace(go.Bar(x=relCount.index,
y=relCount[col],
text=relCount[col],
hoverinfo="x+y",
textposition="auto",
name=f"{col}",
textfont=dict(family="sans serif",size=14),
),
row=2,
col=1
)

# Update layout. Menambahkan judul, dimensi, dan warna background
fig.layout.update(
height=600,
width=1000,
hovermode="closest",
barmode = "group",
paper_bgcolor="rgb(243, 243, 243)",
plot_bgcolor="rgb(243, 243, 243)"
)
# set judul axis y menjadi bold
fig.layout.yaxis1.update(title="<b>Abs Frequency</b>")
fig.layout.yaxis2.update(title="<b>Rel Frequency(%)</b>")

# set judul axis x menjadi bold
fig.layout.xaxis2.update(title=f"<b>{catVariable.name}</b>")
return fig.show()

def calculateChiSquare(catVariable, targetCatVariable=data.y):
catGroupedByCatTarget = pd.crosstab(index = catVariable, columns = targetCatVariable)
testResult = stats.chi2_contingency(catGroupedByCatTarget)
print(f"Chi Square Test Result between {targetCatVariable.name} & {catVariable.name}:")
return print(testResult)
Chi-square Test: Uji independensi Chi-square menguji apakah ada hubungan yang signifikan antara dua variabel
kategori. Data biasanya ditampilkan dalam format tabulasi silang dengan setiap baris mewakili kategori untuk satu
variabel dan setiap kolom mewakili kategori untuk variabel lain. Uji independensi chi-kuadrat merupakan uji
omnibus, yaitu menguji data secara keseluruhan. Ini berarti bahwa seseorang tidak akan dapat membedakan level
(kategori) variabel mana yang bertanggung jawab atas hubungan tersebut jika tabel Chi-kuadrat lebih besar dari
2×2. Jika pengujian lebih besar dari 2x2, pengujian tersebut memerlukan pengujian post hoc.
The H0 (Null Hypothesis): Tidak ada hubungan antara variabel 1 dan 2
The H1 (Alternative Hypothesis): Terdapat hubungan antara variabel 1 dan 2
Jika nilai p signifikan (kurang dari 0,05), Kita dapat menolak hipotesis nol dan mengklaim bahwa temuan
mendukung hipotesis alternatif. Sementara kita memeriksa hasil uji chi2, kita juga perlu memeriksa apakah
frekuensi sel yang diharapkan lebih besar dari atau sama dengan 5. Jika sebuah sel memiliki frekuensi yang
diharapkan kurang dari 5, maka uji Fisher's Exact harus digunakan untuk mengatasi hal ini.
1. Job Vs Y
calculateCrosstabulation(data.job)
/usr/local/lib/python3.7/dist-packages/plotly/tools.py:465: DeprecationWarning:
plotly.tools.make_subplots is deprecated, please use plotly.subplots.make_subplots instead
Jumlah Absolut dari kategori fitur Y: yes dan no berdasarkan job

no
9024
Abs Frequency
8000
8157
yes
6757
6000 no
yes
4000
calculateChiSquare(data.job)
4540
3785
2000 1131
Chi Square Test 631
Result between
708 y & job:
840 1101
1364 123 109 1301
1748 516 1392 187 369 669 269 202 254 34
(836.1054877471965,
0 3.337121944935502e-172, 11, array([[4566.0715755 , 604.9284245 ],
ad
[8593.5038818 blu
, 1138.4961182 en ],
ho ma re se se stu tec un un
mi e- tre us na tir lf- rv de hn em kn
n. co pr em g ed em ice n i c p ow
[1313.04359559, 173.95640441],
lla en a e m p l s t i a l oy n
r eu id en oy n e
ed d
[1094.93884232, 145.06115768],
r t
[8351.55771825, 1106.44228175],
[1999.14640242, 264.85359758],
[1394.28099356, 184.71900644],
Jumlah Persentase dari kategori fitur Y: yes dan no berdasarkan job

[3668.04512176, 485.95487824],
Rel Frequency(%)
[ 828.2682533 , 109.7317467 ],
92.7
91.7
91.2
91.1
88.9
88.2
88.2
87.8
80
[6708.26643958, 888.73356042],
86.2
84.5
77.2
[1150.56879963, 152.43120037],
71.3
[ 60
254.30837628, 33.69162372]]))
40
28.7
20
Nilai pertama (836.105)
12.2 adalah nilai
8.3 Chi-kuadrat,
8.8 diikuti oleh nilai-p
11.8 (3.337e-172),
8.9 kemudian
11.1 muncul derajat
11.8
22.8
7.3
15.5 13.8
0
kebebasan (11), dan
ad
m terakhir
blu
e
mengeluarkan
en
tr
ho
us frekuensi
ma
n
yang
re
tir diharapkan
se
lf- sebagai
se
rv
sarray.
tu
d
Karena
tec
h
semua
un
em frekuensi
un
kn
in. -co ep em ag ed em ice en nic plo ow
me lla re a e p s t ia n
ne ye
yang diharapkan lebih besar dari
r 5, hasilur uji chi2
id
dapat ndipercaya.
t Kitaloydapat
ed
n
menolak hipotesis nol karena
d nilai p
kurang dari 0,05 (sebenarnya nilai p hampir 0). Dengan demikian,
job hasilnya menunjukkan bahwa ada hubungan
yang signifikan secara statistik
2. Marital Vs Y
calculateCrosstabulation(data.marital)
Jumlah Absolut dari kategori fitur Y: yes dan no berdasarkan marital

24459 no
Abs Frequency
20k yes
no
yes
10k
10878
2755 1912
4585 622
0
divorced married single
Jumlah Persentase dari kategori fitur Y: yes dan no berdasarkan marital

Rel Frequency(%)
80 88.1 89.9
85.1
60
calculateChiSquare(data.marital)
40
20 Test Result between y &11.9

Chi Square marital:
10.1
(196.49594565603957, 2.1450999986791792e-43, 2, array([[ 4597.86012254, 609.13987746],
14.9
0
[24030.37552808, 3183.62447192],
divorced married single

[11293.76434938, 1496.23565062]]))
marital
Nilai pertama (196.49) adalah nilai Chi-kuadrat, diikuti oleh nilai-p (2.1450e-43), kemudian muncul derajat
kebebasan (2), dan terakhir mengeluarkan frekuensi yang diharapkan sebagai array. Karena semua frekuensi yang
diharapkan lebih besar dari 5, hasil uji chi2 dapat dipercaya. Kita dapat menolak hipotesis nol karena nilai p kurang
dari 0,05 (sebenarnya nilai p hampir 0). Dengan demikian, hasilnya menunjukkan bahwa ada hubungan yang
signifikan secara statistik
3. Education Vs Y
calculateCrosstabulation(data.education)
Jumlah Absolut dari kategori fitur Y: yes dan no berdasarkan education

20k no
20752
Abs Frequency
yes
15k
no
yes
10k 11305
5k 6260 2450 1996 1605

591 252
0
primary secondary tertiary unknown
Jumlah Persentase dari kategori fitur Y: yes dan no berdasarkan education

Rel Frequency(%)
80
91.4 89.4 85 86.4
60
40
20 10.6 13.6
8.6
15
0
primary secondary tertiary unknown
education
calculateChiSquare(data.education)
Chi Square Test Result between y & education:
(238.92350616407606, 1.6266562124072994e-51, 3, array([[ 6049.5371038 , 801.4628962 ],
[20487.71856407, 2714.28143593],
[11744.98511424, 1556.01488576],
[ 1639.75921789, 217.24078211]]))
Karena semua frekuensi yang diharapkan lebih besar dari 5, hasil uji chi2 dapat dipercaya. Kita dapat menolak
hipotesis nol karena nilai p kurang dari 0,05 (sebenarnya nilai p hampir 0). Dengan demikian, hasilnya menunjukkan
bahwa ada hubungan yang signifikan secara statistik
4. Default Vs Y
calculateCrosstabulation(data.default)
Jumlah Absolut dari kategori fitur Y: yes dan no berdasarkan default

40k
39159 no
Abs Frequency
yes
30k
no
20k yes
10k
5237
763 52
0
no yes
Jumlah Persentase dari kategori fitur Y: yes dan no berdasarkan default

Rel Frequency(%)
93.6
80 88.2
60
40
20 11.8
6.4
0
no yes
default
calculateChiSquare(data.default)
Chi Square Test Result between y & default:
(22.20224995571685, 2.4538606753508344e-06, 1, array([[39202.34261574, 5193.65738426],
[ 719.65738426, 95.34261574]]))
5. Housing Vs Y
calculateCrosstabulation(data.housing)
Jumlah Absolut dari kategori fitur Y: yes dan no berdasarkan housing

23195 no
Abs Frequency
20k
yes
15k 16727 no
yes
10k
5k 3354
1935
0
no yes
Jumlah Persentase dari kategori fitur Y: yes dan no berdasarkan housing

Rel Frequency(%)
92.3
80
83.3
60
40
20
7.7
16.7
0
no yes
housing
calculateChiSquare(data.housing)
Chi Square Test Result between y & housing:
(874.822448867983, 2.918797605076633e-192, 1, array([[17731.82813917, 2349.17186083],
[22190.17186083, 2939.82813917]]))
6. Loan Vs Y
calculateCrosstabulation(data.loan)
Jumlah Absolut dari kategori fitur Y: yes dan no berdasarkan loan

30k 33162 no
Abs Frequency
yes
no
20k
yes
10k
4805
6760 484
0
no yes
Jumlah Persentase dari kategori fitur Y: yes dan no berdasarkan loan

Rel Frequency(%)
93.3
80 87.3
60
40
20 12.7
6.7
0
no yes
loan
calculateChiSquare(data.loan)
Chi Square Test Result between y & loan:
(209.61698034978633, 1.665061163492756e-47, 1, array([[33525.4379244, 4441.5620756],
[ 6396.5620756, 847.4379244]]))
7. Contact Vs Y
calculateCrosstabulation(data.contact)
Jumlah Absolut dari kategori fitur Y: yes dan no berdasarkan contact

24916 no
Abs Frequency
20k yes
no
yes
10k 12490
2516
4369 390 530
0
cellular telephone unknown
Jumlah Persentase dari kategori fitur Y: yes dan no berdasarkan contact

100
Rel Frequency(%)
95.9
85.1 86.6
50
13.4
14.9 4.1
0
cellular telephone unknown
contact
calculateChiSquare(data.contact)
Chi Square Test Result between y & contact:
(1035.714225356292, 1.251738325340638e-225, 2, array([[25859.09999779, 3425.90000221],
[ 2566.04215788, 339.95784212],
[11496.85784433, 1523.14215567]]))
8. Month Vs Y
calculateCrosstabulation(data.month)
Jumlah Absolut dari kategori fitur Y: yes dan no berdasarkan month

no
12841
Abs Frequency
10k yes
no
yes
6268
5k
5559
4795
3567
688 1261 627 925
2355 577 114 100 2208 441 142 546 229 248 403 415 323 310 269
0
apr aug dec feb jan jul jun mar may nov oct sep
Jumlah Persentase dari kategori fitur Y: yes dan no berdasarkan month

Rel Frequency(%)
93.3
90.9
89.9
89.8
89.8
80 89
83.4
80.3
60
56.2
53.5
53.3
40 48 52
46.7
46.5
43.8
20 11 10.1 10.2 10.2
9.1 6.7
19.7
16.6
0
apr aug dec feb jan jul jun mar may nov oct sep
month
calculateChiSquare(data.month)
Chi Square Test Result between y & month:
(3061.838938445269, 0.0, 11, array([[ 2589.00055296, 342.99944704],
[ 5516.19592577, 730.80407423],
[ 188.96525182, 25.03474818],
[ 2339.10725266, 309.89274734],
[ 1238.87031917, 164.12968083],
[ 6088.3897724 , 806.6102276 ],
[ 4716.18415872, 624.81584128],
[ 421.19824821, 55.80175179],
[12155.58718011, 1610.41281989],
[ 3505.57032581, 464.42967419],
[ 651.66521422, 86.33478578],
[ 511.26579815, 67.73420185]]))
hipotesis nol karena nilai p kurang dari 0,05 (sebenarnya nilai p sama dengan 0). Dengan demikian, hasilnya
menunjukkan bahwa ada hubungan yang signifikan secara statistik
9. Poutcome Vs Y
calculateCrosstabulation(data.poutcome)
calculateChiSquare(data.poutcome)
Chi Square Test Result between y & poutcome:
(4391.5065887686615, 0.0, 3, array([[ 4327.65747274, 573.34252726],
[ 1624.74795957, Jumlah Absolut dari kategori fitur Y: yes dan no berdasarkan poutcome
215.25204043],
[ 1334.23596028, 176.76403972],
33573 no
[32635.35860742,
30k 4323.64139258]]))
Abs Frequency
yes
no
20k
yes
10k
hipotesis nol karena nilai p kurang dari 0,05 (sebenarnya nilai p sama dengan 0). Dengan demikian,3386
4283 hasilnya
618 1533 307 533 978
menunjukkan
0 bahwa ada hubungan yang signifikan secara statistik
failure other success unknown
KESIMPULAN: Berdasarkan analisa di atas, maka user bisa atau tidak nya membuka
akun berelasi kuat dengan fitur fitur
Jumlah Persentase daricategorical
kategori fitur Y: (melalui uji Chi-Square
yes dan no berdasarkan Test) dan tidak
poutcome
Rel Frequency(%)
90.8
memiliki 80keterkaitan
87.4
kuat dengan83.3variabel kontinu (Saphiro Wilk Test)
60 64.7
40
35.3
20 12.6 9.2
16.7
0
failure other success unknown
poutcome
check 0 d selesai pada 00.29

Tugas-Bank-Campaign (1) .Ipynb - Colaboratory

Uploaded by

Copyright:

Available Formats

Tugas-Bank-Campaign (1) .Ipynb - Colaboratory

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tugas-Bank-Campaign (1) .Ipynb - Colaboratory

Uploaded by

Copyright:

Available Formats

10/19/21, 12:31 AM tugas-bank-campaign (1).

Requirement already satisfied: chart_studio in /usr/local/lib/python3.7/dist-packages (1.1.0)

Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from chart_studio) (2.23.0)

Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from chart_studio) (1.15.0)

Requirement already satisfied: retrying>=1.3.3 in /usr/local/lib/python3.7/dist-packages (from chart_studio) (1.3.3)

Requirement already satisfied: plotly in /usr/local/lib/python3.7/dist-packages (from chart_studio) (4.4.1)

Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->chart_studio) (3.0.4

Requirement already satisfied: openpyxl in /usr/local/lib/python3.7/dist-packages (2.5.9)

Requirement already satisfied: jdcal in /usr/local/lib/python3.7/dist-packages (from openpyxl) (1.4.1)

Requirement already satisfied: et-xmlfile in /usr/local/lib/python3.7/dist-packages (from openpyxl) (1.1.0)

Melihat Keseluruhan Data

0 58 management married tertiary no 2143 yes no unknown 5 may 261 1 -1

1 44 technician single secondary no 29 yes no unknown 5 may 151 1 -1

2 33 entrepreneur married secondary no 2 yes yes unknown 5 may 76 1 -1

3 47 blue-collar married unknown no 1506 yes no unknown 5 may 92 1 -1

4 33 unknown single unknown no 1 no no unknown 5 may 198 1 -1

age balance day duration campaign pdays previous

count 45211.000000 45211.000000 45211.000000 45211.000000 45211.000000 45211.000000 45211.000000

mean 40.936210 1362.272058 15.806419 258.163080 2.763841 40.197828 0.580323

std 10.618762 3044.765829 8.322476 257.527812 3.098021 100.128746 2.303441

min 18.000000 -8019.000000 1.000000 0.000000 1.000000 -1.000000 0.000000

25% 33.000000 72.000000 8.000000 103.000000 1.000000 -1.000000 0.000000

50% 39.000000 448.000000 16.000000 180.000000 2.000000 -1.000000 0.000000

75% 48.000000 1428.000000 21.000000 319.000000 3.000000 -1.000000 0.000000

max 95.000000 102127.000000 31.000000 4918.000000 63.000000 871.000000 275.000000

count mean std min 25% 50% 75% max

age 45211.0 40.936210 10.618762 18.0 33.0 39.0 48.0 95.0

balance 45211.0 1362.272058 3044.765829 -8019.0 72.0 448.0 1428.0 102127.0

day 45211.0 15.806419 8.322476 1.0 8.0 16.0 21.0 31.0

duration 45211.0 258.163080 257.527812 0.0 103.0 180.0 319.0 4918.0

campaign 45211.0 2.763841 3.098021 1.0 1.0 2.0 3.0 63.0

pdays 45211.0 40.197828 100.128746 -1.0 -1.0 -1.0 -1.0 871.0

previous 45211.0 0.580323 2.303441 0.0 0.0 0.0 0.0 275.0

Melihat Data Continous dan Categorical

['age', 'balance', 'day', 'duration', 'campaign', 'pdays', 'previous']

RangeIndex: 45211 entries, 0 to 45210

Data columns (total 17 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 age 45211 non-null int64

1 job 45211 non-null object

2 marital 45211 non-null object

3 education 45211 non-null object

4 default 45211 non-null object

5 balance 45211 non-null int64

6 housing 45211 non-null object

7 loan 45211 non-null object

8 contact 45211 non-null object

9 day 45211 non-null int64

10 month 45211 non-null object

11 duration 45211 non-null int64

12 campaign 45211 non-null int64

13 pdays 45211 non-null int64

14 previous 45211 non-null int64

15 poutcome 45211 non-null object

16 y 45211 non-null object

dtypes: int64(7), object(10)

memory usage: 5.9+ MB

Normality Check - Fitur2 Continous