Bagian 2: Transformasi Data Dengan Tipe Kategori : 'Install' 'Seaborn'
Bagian 2: Transformasi Data Dengan Tipe Kategori : 'Install' 'Seaborn'
Bagian 2: Transformasi Data Dengan Tipe Kategori : 'Install' 'Seaborn'
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 (https://github.com/pypa/pip/issues/5599) for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Out[2]: 0
Dataset 2
Dataset yang akan Anda gunakan pada bagian ini adalah data sensus penduduk. Dataset ini memiliki jumlah sebanyak 48842 data dengan 15 fitur.
https://hub.gke2.mybinder.org/user/ipython-ipython-in-depth-hc0ua6ma/notebooks/binder/z.ipynb# 1/10
11/3/21, 1:14 PM z - Jupyter Notebook
In [5]: data.head(10)
Out[5]:
age workclass fnlwgt education educational-num marital-status occupation relationship race gender capital-gain capital-loss hours-per-week native-country income
0 25 Private 226802 11th 7 Never-married Machine-op-inspct Own-child Black Male 0 0 40 United-States <=50K
1 38 Private 89814 HS-grad 9 Married-civ-spouse Farming-fishing Husband White Male 0 0 50 United-States <=50K
2 28 Local-gov 336951 Assoc-acdm 12 Married-civ-spouse Protective-serv Husband White Male 0 0 40 United-States >50K
3 44 Private 160323 Some-college 10 Married-civ-spouse Machine-op-inspct Husband Black Male 7688 0 40 United-States >50K
4 18 NaN 103497 Some-college 10 Never-married NaN Own-child White Female 0 0 30 United-States <=50K
5 34 Private 198693 10th 6 Never-married Other-service Not-in-family White Male 0 0 30 United-States <=50K
6 29 NaN 227026 HS-grad 9 Never-married NaN Unmarried Black Male 0 0 40 United-States <=50K
7 63 Self-emp-not-inc 104626 Prof-school 15 Married-civ-spouse Prof-specialty Husband White Male 3103 0 32 United-States >50K
8 24 Private 369667 Some-college 10 Never-married Other-service Unmarried White Female 0 0 40 United-States <=50K
9 55 Private 104996 7th-8th 4 Married-civ-spouse Craft-repair Husband White Male 0 0 10 United-States <=50K
Out[7]: ['age',
'workclass',
'fnlwgt',
'education',
'educational-num',
'marital-status',
'occupation',
'relationship',
'race',
'gender',
'capital-gain',
'capital-loss',
'hours-per-week',
'native-country',
'income']
https://hub.gke2.mybinder.org/user/ipython-ipython-in-depth-hc0ua6ma/notebooks/binder/z.ipynb# 2/10
11/3/21, 1:14 PM z - Jupyter Notebook
In [8]: data.dtypes
workclass object
fnlwgt int64
education object
educational-num int64
marital-status object
occupation object
relationship object
race object
gender object
capital-gain int64
capital-loss int64
hours-per-week int64
native-country object
income object
dtype: object
In [9]: data.education.unique()
dtype=object)
In [11]: dataRename.head(5)
Out[11]:
age workclass fnlwgt education educational-num marital-status occupation relationship race gender capital-gain capital-loss hoursPerWeek native-country income
0 25 Private 226802 11th 7 Never-married Machine-op-inspct Own-child Black Male 0 0 40 United-States <=50K
1 38 Private 89814 HS-grad 9 Married-civ-spouse Farming-fishing Husband White Male 0 0 50 United-States <=50K
2 28 Local-gov 336951 Assoc-acdm 12 Married-civ-spouse Protective-serv Husband White Male 0 0 40 United-States >50K
3 44 Private 160323 Some-college 10 Married-civ-spouse Machine-op-inspct Husband Black Male 7688 0 40 United-States >50K
4 18 NaN 103497 Some-college 10 Never-married NaN Own-child White Female 0 0 30 United-States <=50K
In [12]: ## Kode untuk melakukan transformasi untuk kolom marital_status dengan fungsi cat.codes
dataRename["race"] = dataRename["race"].astype('category')
dataRename["race_encoded"] = dataRename["race"].cat.codes
dataRename.head()
Out[12]:
age workclass fnlwgt education educational-num marital-status occupation relationship race gender capital-gain capital-loss hoursPerWeek native-country income race_encoded
0 25 Private 226802 11th 7 Never-married Machine-op-inspct Own-child Black Male 0 0 40 United-States <=50K 2
1 38 Private 89814 HS-grad 9 Married-civ-spouse Farming-fishing Husband White Male 0 0 50 United-States <=50K 4
2 28 Local-gov 336951 Assoc-acdm 12 Married-civ-spouse Protective-serv Husband White Male 0 0 40 United-States >50K 4
3 44 Private 160323 Some-college 10 Married-civ-spouse Machine-op-inspct Husband Black Male 7688 0 40 United-States >50K 2
4 18 NaN 103497 Some-college 10 Never-married NaN Own-child White Female 0 0 30 United-States <=50K 4
https://hub.gke2.mybinder.org/user/ipython-ipython-in-depth-hc0ua6ma/notebooks/binder/z.ipynb# 3/10
11/3/21, 1:14 PM z - Jupyter Notebook
In [13]: dataRename.race.unique()
In [15]: race1.dtypes
workclass object
fnlwgt int64
education object
educational-num int64
marital-status object
occupation object
relationship object
race category
gender object
capital-gain int64
capital-loss int64
hoursPerWeek int64
native-country object
income object
race_encoded int8
dtype: object
In [16]: race1.head(10)
Out[16]:
age workclass fnlwgt education educational-num marital-status occupation relationship race gender capital-gain capital-loss hoursPerWeek native-country income race_encoded
19 40 Private 85019 Doctorate 16 Married-civ-spouse Prof-specialty Husband Asian-Pac-Islander Male 0 0 45 NaN >50K 1
141 18 Private 262118 Some-college 10 Never-married Adm-clerical Own-child Asian-Pac-Islander Female 0 0 22 Germany <=50K 1
220 34 Private 162312 Bachelors 13 Married-civ-spouse Adm-clerical Husband Asian-Pac-Islander Male 0 0 40 Philippines <=50K 1
221 25 Private 77698 HS-grad 9 Never-married Machine-op-inspct Not-in-family Asian-Pac-Islander Female 0 0 40 Philippines <=50K 1
232 55 Private 119751 Masters 14 Never-married Exec-managerial Unmarried Asian-Pac-Islander Female 0 0 50 Thailand <=50K 1
Self-emp-not-
309 51 136708 HS-grad 9 Married-civ-spouse Sales Husband Asian-Pac-Islander Male 3103 0 84 Vietnam <=50K 1
inc
376 28 Private 302903 Bachelors 13 Married-civ-spouse Prof-specialty Wife Asian-Pac-Islander Female 0 1485 40 United-States <=50K 1
377 24 Private 154835 HS-grad 9 Never-married Exec-managerial Own-child Asian-Pac-Islander Female 0 0 40 South <=50K 1
395 37 Private 79586 HS-grad 9 Separated Machine-op-inspct Own-child Asian-Pac-Islander Male 0 0 60 United-States <=50K 1
396 45 Private 355781 Bachelors 13 Married-civ-spouse Exec-managerial Husband Asian-Pac-Islander Male 0 0 45 Japan >50K 1
https://hub.gke2.mybinder.org/user/ipython-ipython-in-depth-hc0ua6ma/notebooks/binder/z.ipynb# 4/10
11/3/21, 1:14 PM z - Jupyter Notebook
https://hub.gke2.mybinder.org/user/ipython-ipython-in-depth-hc0ua6ma/notebooks/binder/z.ipynb# 5/10
11/3/21, 1:14 PM z - Jupyter Notebook
https://hub.gke2.mybinder.org/user/ipython-ipython-in-depth-hc0ua6ma/notebooks/binder/z.ipynb# 6/10
11/3/21, 1:14 PM z - Jupyter Notebook
KESIMPULAN
hoursperweek kebanyakan orang berdasarkan race1(asian pac-islander) adalah 40 jam per minggu dengan jumlah >800 orang.
hoursperweek kebanyakan orang berdasarkan race2(black) adalah 40 jam per minggu dengan jumlah >3000 orang.
hoursperweek kebanyakan orang berdasarkan race3(Other) adalah 40 jam per minggu dengan jumlah 250 orang.
hoursperweek kebanyakan orang berdasarkan race4(white) adalah 40 jam per minggu dengan jumlah >20000 orang.
hoursperweek untuk race5(Eskimo) tidak ada/tidak ada responden dengan ras eskimo
dari ke 4 ras yang ada diketahui bahwa kebanyakan jam kerja dalam seminggu adalah 40 jam/bisa dibilang rata-rata orang bekerja adalah sekitar 40 jam seminggu
dtype=object)
https://hub.gke2.mybinder.org/user/ipython-ipython-in-depth-hc0ua6ma/notebooks/binder/z.ipynb# 7/10
11/3/21, 1:14 PM z - Jupyter Notebook
In [20]: plt.figure(figsize=(7,7))
total = float(len(race4) )
ax = sns.countplot(x="income", data=race4[race4["age"]>70])
for p in ax.patches:
height = p.get_height()
ax.text(p.get_x()+p.get_width()/2.,
height + 3,
'{:1.2f}'.format((height/total)*100),
ha="center")
plt.show()
https://hub.gke2.mybinder.org/user/ipython-ipython-in-depth-hc0ua6ma/notebooks/binder/z.ipynb# 8/10
11/3/21, 1:14 PM z - Jupyter Notebook
mean :
income age
1 >50K 44.275178
0 <=50K 36.872184
median :
income age
1 >50K 43
0 <=50K 34
KESIMPULAN
rata-rata untuk umur yang memiliki income kelompok 1(>50K) adalah 44,3 tahun
rata-rata untuk umur yang memiliki income kelompok 0(<=50K) adalah 36,9 tahun
sedangkan median untuk umur yang memiliki income kelompok 1(>50K) adalah 43 tahun
dan median untuk umur yang memiliki income kelompok 0(<=50k) adalah 34 tahun
Instruksi Praktikum mahasiswa Teknik Industri, Teknik Mesin, Agroteknologi, FTSP dan jurusan Soshum
Ganti kolom hours-per-week dengan nama hoursPerWeek
Lakukan analisis histogram pada kolom hoursPerWeek pada setiap data race1, race2, race3, race4 dan race5. Informasi apa yang dapat Anda simpulkan ?
Terdapat berapa kategori data yang mengisi kolom workclass? Apa saja kategori yang ada?
Jelaskan hasil boxplot yang diperoleh untuk data income dan umur untuk data race1 !
https://hub.gke2.mybinder.org/user/ipython-ipython-in-depth-hc0ua6ma/notebooks/binder/z.ipynb# 9/10
11/3/21, 1:14 PM z - Jupyter Notebook
https://hub.gke2.mybinder.org/user/ipython-ipython-in-depth-hc0ua6ma/notebooks/binder/z.ipynb# 10/10