Numpy Merged
Numpy Merged
Numpy Merged
[1]:
import numpy as np
=
[2] :
[2] : numpy.ndarray
[3] :
1
[3] : dtype('int32')
[4] :
a.ndim
[4]: 1
[5] :
[5]: 3
[6] :
a.shape
[6]: (3,)
[7] :
[1, 2, 3, 4, 5]
[[1 2 3]
[3 4 5]]
[9] :
c=np.array(m)
[1 2 3]
[10] :
2
[[1 2 3 4 5]
[6 7 8 9 1]]
[11] :
[12] :
{1, 2, 3, 4}
[13] : #dictionary
import numpy as np
dict={'a':1,'b':2,'c':3}
z=np.array(list(dict.items()))
print(z)
a=np.array(list(dict.keys()))
print(a)
[['a' '1']
['b' '2']
['c' '3']]
['a' 'b' 'c']
[14] :
a=np.array(np.arange(9))
[0 1 2 3 4 5 6 7 8]
[15] :
[0. 0. 0.]
3
[16] : b=np.zeros([3,3])
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
[17] :
[18] : d=np.zeros_like(x)
d
[19] :
[1. 1. 1. 1.]
[20] : b=np.ones([3,3])
[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]
[21] :
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
[22] : c=np.eye(3,k=1)
[[0. 1. 0.]
[0. 0. 1.]
[0. 0. 0.]]
4
[23] :
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
[24] :
[[7 7]
[7 7]]
[25] :
[26] : x=np.arange(6,dtype=int)
[27] :
[28] : d=np.full_like(x,0.1,dtype=np.double)
d
[29] :
[[5.20093491e-090, 5.69847262e-066],
5
[5.51292779e+169, 4.85649086e-033],
[6.48224659e+170, 5.82471487e+257]]])
[31]: #empty_like()
a=([1,2,3],[4,5,6])
np.empty_like(a)
[33]:
x,y=np.meshgrid(x,y)
[[1 2 3]
[1 2 3]
[1 2 3]]
[[4 4 4]
[5 5 5]
[6 6 6]]
[34]:
from numpy import random
x = random.randint(100)
15
6
[35]:
y=np.random.bytes(7)
b"t'\n\x16\x14QB"
[['true' 'false' 'false']
['true' 'false' 'true']]
[36]:
x = random.rand(1) + random.rand(1)*1j
print (x)
print(x.imag)
[0.08421058+0.69654499j]
[0.08421058]
[0.69654499]
[37]:
x = random.rand(1,5) + random.rand(1,5)*1j
print (x)
[38]:
np.random.random(size=(2,2))+1j*np.random.random(size=(2,2))
[39] :
np.random.permutation(5)
[40] :
b=np.random.choice(a,size=5,p=[0.1,0.2,0.3,0.2,0.2])
[4 3 0 3 4]
[41] :
[41]: 3
7
[42] :
[43] :
b=np.random.choice(a)
bananaa
[44] :
np.random.shuffle(a)
5 2.1 Indexing
Indexing in NumPy refers to accessing individual elements or groups of elements within an array
[45] :
import numpy as np
x =
y =
[46] :
[47] : =
[48] :
old_values = arr3d[0].copy()
arr3d[0] = 42
8
[[[42 42 42]
[42 42 42]]
[[ 7 8 9]
[10 11 12]]]
10
10
9
6 2.2 Slicing
Slicing in NumPy refers to the process of selecting a specific subset of elements from an array. It
allows you to create a new view of the original data without copying it, which can be very efficient
in terms of memory usage.
[54] :
arr=np.array([5,6,7,8,9])
[6 7]
[55] :
arr=np.array([5,6,7,3,6,8,9])
[6 7 3 6 8 9]
[56] :
arr=np.array([1,2,3,4,5,8,9])
[2 3 4 5 8 9]
[57] : arr=np.array([5,6,7,8,9])
[5 6 7]
[58] :
[7 8]
[59] : arr=np.array([5,6,7,8,9])
[5 6 7]
[60] : arr=np.array([5,6,7,8,9])
[5 6 7]
[61] :
[7 8]
10
[62] :
arr=np.array([5,6,7,8,4,5,6,7,9])
[6 8]
[63] : arr=np.array([5,6,7,8,4,5,6,7,9])
[9 7 6 5]
print(arr[1, 1:4])
[7 8 9]
[65]: import numpy as np
print(arr[0:2, 2])
[3 8]
[66]: import numpy as np
print(arr[0:2, 1:4])
[[2 3 4]
[7 8 9]]
[67]:
b=
llo
[68]: b =
Hello
11
[69]: b =
llo, World!
7 2.3 Re-Shaping
Reshaping in NumPy is the process of changing the shape (i.e., dimensions) of an existing array
without altering the data. This is particularly useful when you need to transform an array to fit a
certain shape for further operations, such as machine learning or data processing task
(2, 4)
[[[[[1 2 3 4]]]]]
shape of array : (1, 1, 1, 1, 4)
arr1= arr.reshape(4, 3)
print(arr1)
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
12
[[[ 1 2 3]
[ 4 5 6]]
[[ 7 8 9]
[10 11 12]]]
[74]:
[[0 1]
[2 3]
[4 5]
[6 7]]
[75]: a=np.arange(12).reshape(4,3)
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
[76]:
a1=np.arange(6).reshape(3,2)
a2=np.arange(6).reshape(3,2)
print(np.concatenate((a1,a2),axis=1))
[[0 1 0 1]
[2 3 2 3]
[4 5 4 5]]
[77]:
a =
b =
[[[1 2]
[3 4]]
13
[[5 6]
[7 8]]]
[78]:
[[[1 2]
[3 4]]
[[5 6]
[7 8]]]
[79]:
[[[1 2]
[5 6]]
[[3 4]
[7 8]]]
[80]:
ch = np.hstack((a,b))
print(ch)
[[1 2 5 6]
[3 4 7 8]]
[81]:
ch = np.vstack((a,b))
print(ch)
[[1 2]
[3 4]
[5 6]
[7 8]]
9 2.5 Splitting
Splitting in NumPy involves dividing an array into multiple sub-arrays. This can be useful when
you need to partition data for different processing purposes or when dealing with chunks of data
in a structured way.
[0 1 2 3 4 5 6 7 8]
14
[83]:
b =
[84]:
a = np.arange(12).reshape(4,3)
b=np.hsplit(a,3)
[array([[0],
[3],
[6],
[9] ]), array([[ 1],
[ 4],
[ 7],
[10] ]), array([[ 2],
[ 5],
[ 8],
[11] ])]
[85]:
[array([[0, 1, 2],
[3, 4, 5]]), array([[ 6, 7, 8],
[ 9, 10, 11]])]
[0 1 2 3 4 5 6 7 8 9]
[87]:
15
[87] : array([0. , 1. , 1.41421356, 1.73205081, 2. ,
2.23606798, 2.44948974, 2.64575131, 2.82842712, 3. ])
[88] :
[89] :
[89]: 0
[90] :
[90]: 9
[91] :
[91]: 4.5
[92] :
[0 1 2 3 4 5 6 7 8 9]
[93] :
arr=np.arange(0,-5,-0.5)
[94] : x = np.random.randn(8)
y = np.random.randn(8)
print(x)
16
[95] :
[96] :
[98] :
[[10 11 12]
[13 14 15]
[16 17 18]]
[100] :
[101] :
[102] :
17
[103] : import numpy as np
a = np.array([10,100,1000])
np.power(a,2)
[104] : a =
a
[105] :
[105]: 45
[106] :
import numpy as np
a = np.array([[30,40,70],[80,20,10],[50,90,60]])
[106]: 82.0
[108] :
[108]: -0.14756616582071838
[109] :
[110] :
[110]: -0.28413298907449897
18
[111] :
[111]: 0.9329450218698545
[112] :
[112]: 0.8703864138317433
[113] :
[114] : =
[ 0 1 3 6 10 15 21 28]
[115] : =
[[ 0 1 2]
[ 3 5 7]
[ 9 12 15]]
[116] :
[[ 0 0 0]
[ 3 12 60]
[ 6 42 336]]
[117] :
True
[118] :
19
[119] :
[120] :
False
[121] :
[122] :
True
[123] :
[124] :
[[3 7]
[9 1]]
[126] :
20
[127] :
[128] :
[130] :
[131] :
names =
print(np.unique(names))
[132] :
[133]: =
[1 2 3 4]
21
14.3 5.3 Set Operations
[134]:
import numpy as np
=
[135]:
[136]:
[1 2 3 4 5 6]
[137]:
[3 4]
[138]:
[1 2]
[139]:
[1 2 5 6]
[8] : 'JPEG'
[9] :
22
[[[242 242 242]
[242 242 242]
[242 242 242]
…
[195 195 195]
[195 195 195]
[195 195 195]]
23
[165 165 165]]]
[10] :
24
[11] : crop_img=a[100:900,100:900,:]
img_out=Image.fromarray(crop_img)
img_out
[11]:
[12] :
display(Image.fromarray(flipped_img))
25
[ ]:
26
Data Manipulation with Pandas
0 4
1 7
2 -5
3 3
dtype: int64
a 10
b 20
c 30
d 40
e 50
dtype: int64
27
lst = ['G', 'h', 'i', 'j',
'k', 'l', 'm']
10 G
20 h
30 i
40 j
50 k
60 l
70 m
dtype: object
# numpy array
data = np.array(['a', 'b', 'c', 'd', 'e'])
# creating series
s = pd.Series(data)
print(s)
0 a
1 b
2 c
3 d
4 e
dtype: object
# numpy array
data = np.array(['a', 'b', 'c', 'd', 'e'])
# creating series
s = pd.Series(data, index =[1000, 1001, 1002, 1003, 1004])
print(s)
28
1000 a
1001 b
1002 c
1003 d
1004 e
dtype: object
[6] : =
s = pd.Series(numpy_array, index=list('abcdef'))
Output Series:
a 1.0
b 2.8
c 3.0
d 2.0
e 9.0
f 4.2
dtype: float64
# create a dictionary
dictionary = {'D': 10, 'B': 20, 'C': 30}
# create a series
series = pd.Series(dictionary)
print(series)
D 10
B 20
C 30
dtype: int64
# create a dictionary
dictionary = {'A': 50, 'B': 10, 'C': 80}
# create a series
series = pd.Series(dictionary, index=['B','C','A'])
29
B 10
C 80
A 50
dtype: int64
# create a dictionary
dictionary = {'A': 50, 'B': 10, 'C': 80}
# create a series
series = pd.Series(dictionary, index=['B', 'C', 'D', 'A'])
print(series)
B 10.0
C 80.0
D NaN
A 50.0
dtype: float64
Day 1 1/1/2018
Day 2 2/1/2018
30
Day 3 3/1/2018
Day 4 4/1/2018
dtype: object
[12] :
1/1/2018
a 0.0
b 1.0
c 2.0
d 3.0
e 4.0
dtype: float64
a 0.0
b 1.0
c 2.0
d 3.0
e 4.0
dtype: float64
[19] :
[19]: 1.0
[26]:
[26] : b 1.0
a 0.0
d 3.0
dtype: float64
[27] :
31
[27]: b 1.0
c 2.0
d 3.0
e 4.0
dtype: float64
[20] :
[20]: 1.0
[21] :
[21]: c 2.0
d 3.0
dtype: float64
[23]:
[23]: b 1.0
d 3.0
dtype: float64
[28] :
a 0.0
c 2.0
e 4.0
dtype: float64
a 0.0
b 1.0
c 2.0
d 3.0
e 4.0
dtype: float64
[32]:
[32]: a 0.0
dtype: float64
32
[36]:
[36]: b 5.0
d 3.0
e 4.0
dtype: float64
[35]:
[35]: a 0.0
b 5.0
d 3.0
e 4.0
dtype: float64
[38]: s[(s>2)&(s<5)
[38]: d 3.0
e 4.0
dtype: float64
[33]:
[33]: b 5.0
c 2.0
dtype: float64
[7]:
b True
dtype: bool
[42]:
[42]: c 2.0
e 4.0
dtype: float64
33
0 7
1 9
2 11
3 13
4 15
dtype: int64
0 -5
1 -5
2 -5
3 -5
4 -5
dtype: int64
0 6
1 14
2 24
3 36
4 50
dtype: int64
0 0.166667
1 0.285714
2 0.375000
3 0.444444
4 0.500000
dtype: float64
0 1
1 2
2 3
3 4
4 5
dtype: int64
34
4.6 2.5 Ranking
[10]: 0 5.0
1 7.0
2 6.0
3 8.0
4 1.0
5 2.0
6 10.0
7 3.0
8 9.0
9 4.0
dtype: float64
[49]:
[49]: 0 6.0
1 4.0
2 5.0
3 3.0
4 10.0
5 9.0
6 1.0
7 8.0
8 2.0
9 7.0
dtype: float64
[11]:
[11]: 0 5.0
1 7.0
2 6.0
3 8.0
4 1.0
5 2.0
6 10.0
7 3.0
8 9.0
9 4.0
dtype: float64
[12]:
35
[12]: 0 5.0
1 7.0
2 6.0
3 8.0
4 1.0
5 2.0
6 10.0
7 3.0
8 9.0
9 4.0
dtype: float64
[50]:
[50]: 0 5.0
1 7.0
2 6.0
3 8.0
4 1.0
5 2.0
6 10.0
7 3.0
8 9.0
9 4.0
dtype: float64
0 19.5000
1 16.8000
2 22.7800
3 20.1240
4 18.1002
dtype: float64
[8]: 2 22.7800
3 20.1240
0 19.5000
4 18.1002
1 16.8000
dtype: float64
36
[53]: sr.sort_values(ascending = True)
[53]: 1 16.8000
4 18.1002
0 19.5000
3 20.1240
2 22.7800
dtype: float64
[55]:
[55]: 0 19.5000
1 16.8000
2 22.7800
3 20.1240
4 18.1002
dtype: float64
[58]:
1 16.8000
4 18.1002
0 19.5000
3 20.1240
2 22.7800
dtype: float64
[40]: s=pd.Series({'ohio':35000,'teyas':71000,'oregon':16000,'utah':5000})
x=pd.Series(s,index=states)
ohio 35000
teyas 71000
oregon 16000
utah 5000
dtype: int64
california NaN
ohio 35000.0
Texas NaN
oregon 16000.0
dtype: float64
[42]:
37
[42]: california True
ohio False
Texas True
oregon False
dtype: bool
[44]:
[65]:
0 1
1 2
2 3
0 A
1 B
2 C
dtype: object
[66]:
=
0 1
0 1 A
1 2 B
2 3 C
[67]:
=
0 1
1 2
2 3
0 A
1 B
38
2 C
dtype: object
[21]:
0 1
1 2
2 3
3 A
4 B
5 C
dtype: object
[22]:
0 1
1 2
2 3
0 A
1 B
2 C
dtype: object
[69]:
series1 0 1
1 2
2 3
series2 0 A
1 B
2 C
dtype: object
df = pd.DataFrame(data, columns=['Numbers'])
print(df)
Numbers
0 1
1 2
2 3
3 4
4 5
39
[70]: import pandas as pd
nme = ["aparna", "pankaj", "sudhir", "Geeku"]
deg = ["MBA", "BCA", "M.Tech", "MBA"]
scr = [90, 40, 80, 98]
dict = {'name': nme, 'degree': deg, 'score': scr}
df = pd.DataFrame(dict)
print(df)
Name Age
0 G 10
1 h 15
2 i 20
[39]:
a b c
1 4 7 10
2 5 8 11
3 6 9 12
[13]:
[2000,2001,2002,2000,2001,2002],'pop':[1.5,1.7,3.6,2.4,2.9,3.2]})
𝗌
40
[14] :
a b
n v
d 1 4 7
2 5 8
e 2 6 9
[71]:
[71]: ap ts tn
a 0.0 1.0 2.0
b NaN NaN NaN
c 3.0 4.0 5.0
d 6.0 7.0 8.0
4.13 4.Import various file formats to pandas DataFrames and preform the fol-
lowing
4.14 4.1 Importing file
[10]: id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw \
0 0 80.78 6.68 72.01 4.88 41.81 3.70 5.50 4.03 38.70 3.84
1 1 88.91 6.63 80.53 5.59 47.04 4.30 80.22 4.51 41.50 4.01
2 2 79.97 6.37 69.26 5.28 43.07 3.90 75.35 4.04 38.31 3.34
3 3 77.65 5.70 65.76 4.77 40.04 3.52 69.17 3.40 35.78 3.41
4 4 62.80 4.84 52.09 3.73 33.95 2.72 56.27 2.96 31.88 3.13
.. … … … … … … … … … … …
415 415 17.96 1.63 19.25 1.33 18.36 1.54 31.25 1.33 21.99 1.15
416 416 19.21 1.64 20.76 1.49 19.24 1.45 33.21 1.28 23.60 1.15
417 417 18.79 1.63 19.83 1.53 20.96 1.43 34.45 1.41 22.86 1.21
418 418 20.38 1.78 22.53 1.50 21.35 1.48 36.09 1.53 25.98 1.24
419 419 17.89 1.44 19.26 1.10 17.62 1.34 29.81 1.24 21.69 1.05
type
0 SW
1 SW
2 SW
3 SW
41
4 SW
.. …
415 SO
416 SO
417 SO
418 SO
419 SO
[15] :
[15] : id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw type
0 0 80.78 6.68 72.01 4.88 41.81 3.70 5.50 4.03 38.70 3.84 SW
1 1 88.91 6.63 80.53 5.59 47.04 4.30 80.22 4.51 41.50 4.01 SW
2 2 79.97 6.37 69.26 5.28 43.07 3.90 75.35 4.04 38.31 3.34 SW
3 3 77.65 5.70 65.76 4.77 40.04 3.52 69.17 3.40 35.78 3.41 SW
4 4 62.80 4.84 52.09 3.73 33.95 2.72 56.27 2.96 31.88 3.13 SW
[16] :
[16]: id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw \
415 415 17.96 1.63 19.25 1.33 18.36 1.54 31.25 1.33 21.99 1.15
416 416 19.21 1.64 20.76 1.49 19.24 1.45 33.21 1.28 23.60 1.15
417 417 18.79 1.63 19.83 1.53 20.96 1.43 34.45 1.41 22.86 1.21
418 418 20.38 1.78 22.53 1.50 21.35 1.48 36.09 1.53 25.98 1.24
419 419 17.89 1.44 19.26 1.10 17.62 1.34 29.81 1.24 21.69 1.05
type
415 SO
416 SO
417 SO
418 SO
419 SO
[17] : data.shape
[18] :
42
[18] : id int64
huml float64
humw float64
ulnal float64
ulnaw float64
feml float64
femw float64
tibl float64
tibw float64
tarl float64
tarw float64
type object
dtype: object
[19] :
[19]: id 0
huml 1
humw 1
ulnal 3
ulnaw 2
feml 2
femw 1
tibl 2
tibw 1
tarl 1
tarw 1
type 0
dtype: int64
[20] : data.columns
[20]: Index(['id', 'huml', 'humw', 'ulnal', 'ulnaw', 'feml', 'femw', 'tibl', 'tibw',
'tarl', 'tarw', 'type'],
dtype='object')
[21]:
[24]:
[24]: id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw \
0 0 80.78 6.68 72.01 4.88 41.81 3.70 5.50 4.03 38.70 3.84
1 1 88.91 6.63 80.53 5.59 47.04 4.30 80.22 4.51 41.50 4.01
43
2 2 79.97 6.37 69.26 5.28 43.07 3.90 75.35 4.04 38.31 3.34
3 3 77.65 5.70 65.76 4.77 40.04 3.52 69.17 3.40 35.78 3.41
4 4 62.80 4.84 52.09 3.73 33.95 2.72 56.27 2.96 31.88 3.13
.. … … … … … … … … … … …
415 415 17.96 1.63 19.25 1.33 18.36 1.54 31.25 1.33 21.99 1.15
416 416 19.21 1.64 20.76 1.49 19.24 1.45 33.21 1.28 23.60 1.15
417 417 18.79 1.63 19.83 1.53 20.96 1.43 34.45 1.41 22.86 1.21
418 418 20.38 1.78 22.53 1.50 21.35 1.48 36.09 1.53 25.98 1.24
419 419 17.89 1.44 19.26 1.10 17.62 1.34 29.81 1.24 21.69 1.05
type
0 SW
1 SW
2 SW
3 SW
4 SW
.. …
415 SO
416 SO
417 SO
418 SO
419 SO
[25]:
[25]: id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw \
1 1 88.91 6.63 80.53 5.59 47.04 4.30 80.22 4.51 41.50 4.01
2 2 79.97 6.37 69.26 5.28 43.07 3.90 75.35 4.04 38.31 3.34
4 4 62.80 4.84 52.09 3.73 33.95 2.72 56.27 2.96 31.88 3.13
5 5 61.92 4.78 50.46 3.47 49.52 4.41 56.95 2.73 29.07 2.83
6 6 79.73 5.94 67.39 4.50 42.07 3.41 71.26 3.56 37.22 3.64
.. … … … … … … … … … … …
415 415 17.96 1.63 19.25 1.33 18.36 1.54 31.25 1.33 21.99 1.15
416 416 19.21 1.64 20.76 1.49 19.24 1.45 33.21 1.28 23.60 1.15
417 417 18.79 1.63 19.83 1.53 20.96 1.43 34.45 1.41 22.86 1.21
418 418 20.38 1.78 22.53 1.50 21.35 1.48 36.09 1.53 25.98 1.24
419 419 17.89 1.44 19.26 1.10 17.62 1.34 29.81 1.24 21.69 1.05
type
1 SW
2 SW
4 SW
5 SW
6 SW
.. …
44
415 SO
416 SO
417 SO
418 SO
419 SO
[27]:
[27] : id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw type
342 342 NaN NaN NaN NaN 32.54 2.65 55.06 2.81 38.94 2.25 SO
[28] :
[28]: 67.39
[29] :
[30] : data
[30]: id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw \
0 0 80.78 6.68 72.01 4.88 41.81 3.70 5.50 4.03 38.70 3.84
1 1 88.91 6.63 80.53 5.59 47.04 4.30 80.22 4.51 41.50 4.01
2 2 79.97 6.37 69.26 5.28 43.07 3.90 75.35 4.04 38.31 3.34
3 3 77.65 5.70 65.76 4.77 40.04 3.52 69.17 3.40 35.78 3.41
4 4 62.80 4.84 52.09 3.73 33.95 2.72 56.27 2.96 31.88 3.13
.. … … … … … … … … … … …
415 415 17.96 1.63 19.25 1.33 18.36 1.54 31.25 1.33 21.99 1.15
416 416 19.21 1.64 20.76 1.49 19.24 1.45 33.21 1.28 23.60 1.15
417 417 18.79 1.63 19.83 1.53 20.96 1.43 34.45 1.41 22.86 1.21
418 418 20.38 1.78 22.53 1.50 21.35 1.48 36.09 1.53 25.98 1.24
419 419 17.89 1.44 19.26 1.10 17.62 1.34 29.81 1.24 21.69 1.05
type
0 SW
1 SW
2 SW
45
3 SW
4 SW
.. …
415 SO
416 SO
417 SO
418 SO
419 SO
[31] :
[31]: id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw \
419 419 17.89 1.44 19.26 1.10 17.62 1.34 29.81 1.24 21.69 1.05
418 418 20.38 1.78 22.53 1.50 21.35 1.48 36.09 1.53 25.98 1.24
417 417 18.79 1.63 19.83 1.53 20.96 1.43 34.45 1.41 22.86 1.21
416 416 19.21 1.64 20.76 1.49 19.24 1.45 33.21 1.28 23.60 1.15
415 415 17.96 1.63 19.25 1.33 18.36 1.54 31.25 1.33 21.99 1.15
.. … … … … … … … … … … …
4 4 62.80 4.84 52.09 3.73 33.95 2.72 56.27 2.96 31.88 3.13
3 3 77.65 5.70 65.76 4.77 40.04 3.52 69.17 3.40 35.78 3.41
2 2 79.97 6.37 69.26 5.28 43.07 3.90 75.35 4.04 38.31 3.34
1 1 88.91 6.63 80.53 5.59 47.04 4.30 80.22 4.51 41.50 4.01
0 0 80.78 6.68 72.01 4.88 41.81 3.70 5.50 4.03 38.70 3.84
type
419 SO
418 SO
417 SO
416 SO
415 SO
.. …
4 SW
3 SW
2 SW
1 SW
0 SW
[32] :
[32] : id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw \
369 369 13.48 1.27 16.00 1.00 12.67 1.10 23.12 0.88 16.34 0.89
413 413 12.95 1.16 14.09 1.03 13.03 1.03 22.13 0.96 15.19 1.02
395 395 15.62 1.28 18.52 1.06 15.75 1.17 28.63 1.03 21.39 0.88
46
367 367 13.31 1.17 16.47 1.06 12.32 0.93 22.47 0.95 15.97 0.75
414 414 13.63 1.16 15.22 1.06 13.75 0.99 23.13 0.96 15.62 1.01
376 376 13.52 1.28 17.88 1.07 15.10 1.05 25.14 1.23 17.81 0.69
type
369 SO
413 SO
395 SO
367 SO
414 SO
376 SO
[33] :
[33]: id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw \
369 369 13.48 1.27 16.00 1.00 12.67 1.10 23.12 0.88 16.34 0.89
413 413 12.95 1.16 14.09 1.03 13.03 1.03 22.13 0.96 15.19 1.02
414 414 13.63 1.16 15.22 1.06 13.75 0.99 23.13 0.96 15.62 1.01
367 367 13.31 1.17 16.47 1.06 12.32 0.93 22.47 0.95 15.97 0.75
395 395 15.62 1.28 18.52 1.06 15.75 1.17 28.63 1.03 21.39 0.88
376 376 13.52 1.28 17.88 1.07 15.10 1.05 25.14 1.23 17.81 0.69
type
369 SO
413 SO
414 SO
367 SO
395 SO
376 SO
[34] :
[34]: id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw \
0 1.0 289.0 344.0 275.0 325.5 289.0 295.0 1.0 302.0 272.0 328.0
1 2.0 308.0 343.0 284.0 343.0 312.0 320.0 308.0 327.5 285.0 333.0
2 3.0 286.0 336.0 268.0 334.0 295.0 303.5 292.0 303.5 271.0 305.5
3 4.0 284.0 308.0 255.0 313.5 279.0 288.0 270.0 272.5 247.0 310.5
4 5.0 248.0 281.0 227.5 258.0 224.0 225.5 231.0 250.0 211.0 294.0
5 6.0 246.0 275.0 223.0 242.0 326.0 322.0 234.0 234.0 181.0 268.5
6 7.0 285.0 321.0 262.0 304.0 292.0 282.5 279.0 280.5 259.0 320.0
7 8.0 304.0 306.0 278.0 306.0 300.0 299.0 296.0 295.5 266.0 324.0
8 9.0 362.0 370.0 354.0 362.0 365.0 356.5 363.5 359.0 352.0 346.0
9 10.0 387.0 399.0 381.5 383.0 382.0 398.0 382.0 397.0 392.0 377.0
type
0 274.5
1 274.5
47
2 274.5
3 274.5
4 274.5
5 274.5
6 274.5
7 274.5
8 274.5
9 274.5
[35] :
[35]: id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw \
0 1.0 289.0 344.0 275.0 325.5 289.0 295.0 1.0 302.0 272.0 328.0
1 2.0 308.0 343.0 284.0 343.0 312.0 320.0 308.0 327.5 285.0 333.0
type
0 274.5
1 274.5
[15]:
[15]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket \
0 891.0 617.0 246.0 783.0 289.0 497.0 179.0 552.5 220.0
1 890.0 171.5 783.5 701.0 734.5 183.0 179.0 552.5 112.0
2 889.0 171.5 246.0 538.0 734.5 404.5 587.5 552.5 17.0
3 888.0 171.5 783.5 619.0 734.5 226.5 179.0 552.5 824.5
4 887.0 617.0 246.0 876.0 289.0 226.5 587.5 552.5 283.0
[23]: Age Gender Weight (kg) Height (m) Max_BPM Avg_BPM Resting_BPM \
0 56 Male 88.3 1.71 180 157 60
1 46 Female 74.9 1.53 179 151 66
2 32 Female 68.1 1.66 167 122 54
3 25 Male 53.2 1.70 190 164 56
4 38 Male 46.1 1.79 188 158 68
48
.. … … … … … … …
968 24 Male 87.1 1.74 187 158 67
969 25 Male 66.6 1.61 184 166 56
970 59 Female 60.4 1.76 194 120 53
971 32 Male 126.4 1.83 198 146 62
972 46 Male 88.7 1.63 166 146 66
BMI
0 30.20
1 32.00
2 24.71
3 18.41
4 14.39
.. …
968 28.77
969 25.69
970 19.50
971 37.74
972 33.38
49
[25]:
[25]: 38.68345323741007
[28]:
[28]: 40.0
[29]:
[29]: 12.180927866987108
[30] :
[30]: 37639
[31] :
[31]: 148.37500370074312
[35]:
50
Data cleaning and preparation
Student performance
Import any csv file to pandas data frame and perform the following
# Load the CSV file into a Pandas DataFrame
df = pd.read_csv('student.csv')
df
51
6603 No 8 81
6604 Yes 6 65
6605 Yes 6 91
6606 Yes 9 94
1 No College Moderate
2 No Postgraduate Near
52
3 No High School Moderate
4 No College Near
Gender Exam_Score
0 Male 67
1 Female 61
2 Male 74
3 Male 71
4 Female 70
... ... ...
6602 Female 68
6603 Female 69
6604 Female 68
6605 Female 68
6606 Male 64
# Display the first few rows of the DataFrame to understand the data
print("Original DataFrame:")
print(df.head())
Original DataFrame:
Hours_Studied Attendance Parental_Involvement Access_to_Resources
\
0 23 84 Low High
1 19 64 Low Medium
2 24 98 Medium Medium
3 29 89 Low Medium
4 19 92 Medium Medium
53
Motivation_Level \
0 No 7 73
Low
1 No 8 59
Low
2 Yes 7 91
Medium
3 Yes 8 98
Medium
4 Yes 6 65
Medium
1 Public Negative 4 No
2 Public Neutral 4 No
3 Public Negative 4 No
4 Public Neutral 4 No
Missing Data:
Hours_Studied Attendance Parental_Involvement
Access_to_Resources \
0 False False False
False
1 False False False
54
False
2 False False False
False
3 False False False
False
4 False False False
False
5 False False False
False
6 False False False
False
7 False False False
False
8 False False False
False
9 False False False
False
55
4 False False False False
56
# No of null values
n=df.isnull().sum()
n
Hours_Studied 0
Attendance 0
Parental_Involvement 0
Access_to_Resources 0
Extracurricular_Activities 0
Sleep_Hours 0
Previous_Scores 0
Motivation_Level 0
Internet_Access 0
Tutoring_Sessions 0
Family_Income 0
Teacher_Quality 78
School_Type 0
Peer_Influence 0
Physical_Activity 0
Learning_Disabilities 0
Parental_Education_Level 90
Distance_from_Home 67
Gender 0
Exam_Score 0
dtype: int64
1 19 64 Low Medium
2 24 98 Medium Medium
3 29 89 Low Medium
4 19 92 Medium Medium
5 19 88 Medium Medium
6 29 84 Medium Low
7 25 78 Low High
57
8 17 94 Medium High
9 23 98 Medium Medium
1 Public Negative 4 No
2 Public Neutral 4 No
3 Public Negative 4 No
58
4 Public Neutral 4 No
5 Public Positive 3 No
6 Private Neutral 2 No
7 Public Negative 2 No
8 Private Neutral 1 No
9 Public Positive 5 No
1 19 64 Low Medium
2 24 98 Medium Medium
3 29 89 Low Medium
4 19 92 Medium Medium
5 19 88 Medium Medium
6 29 84 Medium Low
59
7 25 78 Low High
8 17 94 Medium High
9 23 98 Medium Medium
1 Public Negative 4 No
2 Public Neutral 4 No
60
3 Public Negative 4 No
4 Public Neutral 4 No
5 Public Positive 3 No
6 Private Neutral 2 No
7 Public Negative 2 No
8 Private Neutral 1 No
9 Public Positive 5 No
1 19 64 Low Medium
2 24 98 Medium Medium
3 29 89 Low Medium
4 19 92 Medium Medium
61
0 No 7 73
Low
1 No 8 59
Low
2 Yes 7 91
Medium
3 Yes 8 98
Medium
4 Yes 6 65
Medium
1 Public Negative 4 No
2 Public Neutral 4 No
3 Public Negative 4 No
4 Public Neutral 4 No
# Display the first few rows of the DataFrame to understand the data
print("Original DataFrame:")
print(df.head())
Original DataFrame:
Hours_Studied Attendance Parental_Involvement Access_to_Resources
62
\
0 23 84 Low High
1 19 64 Low Medium
2 24 98 Medium Medium
3 29 89 Low Medium
4 19 92 Medium Medium
1 Public Negative 4 No
2 Public Neutral 4 No
3 Public Negative 4 No
4 Public Neutral 4 No
63
# Assume 'Price' is a column that we want to transform
64
2 Medium Yes 2 Medium
1 No College Moderate
2 No Postgraduate Near
4 No College Near
65
6604 No Postgraduate Near
Gender Exam_Score
0 Male 67
1 Female 61
2 Male 74
3 Male 71
4 Female 70
... ... ...
6602 Female 68
6603 Female 69
6604 Female 68
6605 Female 68
6606 Male 64
1 19 64 Low Medium
2 24 98 Medium Medium
3 29 89 Low Medium
4 19 92 Medium Medium
66
Low
2 Yes NaN 91
Medium
3 Yes NaN 98
Medium
4 Yes NaN 65
Medium
1 Public Negative 4 No
2 Public Neutral 4 No
3 Public Negative 4 No
4 Public Neutral 4 No
67
886 887 0 2
887 888 1 1
888 889 0 3
889 890 1 1
890 891 0 3
68
Titanic
# Display the first few rows of the DataFrame to understand the data
print("Original DataFrame:")
print(df.head())
Original DataFrame:
PassengerId Survived Pclass \
0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3
# Select the column to analyze for outliers (replace 'Value' with the
actual column name)
column_name = 'Fare'
69
8 0.424018
9 0.042931
Name: Fare, dtype: float64
70
d) perform vectorized string operations on pandas series
# Load the CSV file into a Pandas DataFrame
df = pd.read_csv('titanic.csv')
df
71
.. ... ... ... ... ...
886 0 211536 13.0000 NaN S
887 0 112053 30.0000 B42 S
888 2 W./C. 6607 23.4500 NaN S
889 0 111369 30.0000 C148 C
890 0 370376 7.7500 NaN Q
72
890 370376 7.7500 NaN Q
1 2 1 1 6 female 38.0 1 0
2 3 1 3 6 female 26.0 0 0
3 4 1 1 6 female 35.0 1 0
4 5 0 3 4 male 35.0 0 0
# Split the names based on a delimiter (e.g., space) and create a new
column for the first part of the name
73
df['Name'] = df['Sex'].str.split(' ').str[0]
df
PassengerId Survived Pclass Name Sex Age SibSp Parch
\
0 1 0 3 male male 22.0 1 0
74
1 2 1 1 female female 38.0 1 0
75
Data Wrangling
[3] : X A1Y
0 X0 Y0
1 X1 Y1
[4] : df2
[4] : X Y
0 X2 Y2
1 X3 Y3
[5] :
[5]: X A1Y Y
0 X0 Y0 NaN
1 X1 Y1 NaN
0 X2 NaN Y2
1 X3 NaN Y3
76
0.0.2 MERGE
Used to merge two data frames based on a key column, similar to SQL joins. Options include
how=’inner’, how=’outer’, how=’left’, and how=’right’ for different types of joins.
[9] : df2
[10] :
[12] : df2
77
[12] : key value2
0 y 4
1 z 5
2 a 6
[13] :
0.0.3 JOIN
A join is a way to combine data from two or more tables (or DataFrames) based on a common
column, known as the join key.
[18]: df1 = pd.DataFrame({"x": ["x0", "x1", "x2"], "y": ["y0", "y1", "y2"]},
index=["j0", "j1", "j2"]) # Create DataFrame 2
df2 = pd.DataFrame({"z": ["z0", "z2", "z3"], "a": ["a0", "a2", "a3"]},
index=["K0", "K2", "K3"])
# Print DataFrame 1
print(df1)
# Print DataFrame 2
print(df2)
# Join DataFrames 1 and 2 on index (default)
df3 = df1.join(df2)
print(df3)
x y
j0 x0 y0
j1 x1 y1
j2 x2 y2
z a
K0 z0 a0
K2 z2 a2
K3 z3 a3
x y z a
j0 x0 y0 NaN NaN
j1 x1 y1 NaN NaN
j2 x2 y2 NaN NaN
78
[21]: #inner join
# Create DataFrame 1
df1 = pd.DataFrame({"x": ["x0", "x1", "x2"], "y": ["y0", "y1", "y2"]},
index=["j0", "j1", "j2"]) # Create DataFrame 2
df2 = pd.DataFrame({"x": ["x0", "x1", "x3"],"z": ["z0", "z2", "z3"],
"a": ["a0", "a2", "a3"]},
index=["K0", "K2", "K3"])
df4 = df1.merge(df2,on="x", how='inner')
print(df4)
x y z a
0 x0 y0 z0 a0
1 x1 y1 z2 a2
x y z a
0 x0 y0 z0 a0
1 x1 y1 z2 a2
2 x2 y2 NaN NaN
3 x3 NaN z3 a3
[25]:
df7 = df1.merge(df2,on="x",how='right')
print(df7)
x y z a
0 x0 y0 z0 a0
1 x1 y1 z2 a2
2 x3 NaN z3 a3
0.0.8 RESHAPE
Reshaping functions like pivot and melt are used to transform the layout of data frames.
79
[30]: import pandas as pd
# Create Series 1
s1 = pd.Series([0, 1, 2, 3], index=['a', 'b', 'c', 'd'])
# Create Series 2
s2 = pd.Series([4, 5, 6], index=['c', 'd', 'e'])
# Concatenate Series into DataFrame
df = pd.concat([s1, s2], keys=['one', 'two'])
print(df)
one a 0
b 1
c 2
d 3
two c 4
d 5
e 6
dtype: int64
[31]:
a b c d e
one 0.0 1.0 2.0 3.0 NaN
two NaN NaN 4.0 5.0 6.0
[ ]:
80
81
84
85
86
87
88
89
90
91
92
93