Numpy Merged

NumPy Arrays and Vectorized Computation
0.0.1 NUMPY MODULE:

NumPy, short for Numerical Python, is a fundamental library for numerical computing in Python.
It provides powerful data structures, primarily the ndarray (n-dimensional array), which enables
efficient storage and manipulation of large datasets. With its support for multi-dimensional arrays,
NumPy allows users to perform complex mathematical operations with ease. One of the key features
of NumPy is its ability to perform element-wise operations on arrays, which is significantly faster
than using traditional Python lists. This efficiency stems from its implementation in C, allowing for
lower-level optimizations. NumPy also includes a comprehensive set of mathematical functions that
can operate on arrays, including linear algebra, Fourier transforms, and random number generation.
In addition to its array capabilities, NumPy provides tools for integrating with other languages, such
as C and Fortran, making it a versatile choice for performance-critical applications. It serves as the
backbone for many other scientific computing libraries, including SciPy, pandas, and Matplotlib,
establishing itself as an essential component of the scientific Python ecosystem. NumPy’s array
operations are broadcastable, meaning that arrays of different shapes can still be used together in
calculations, making it easier to handle data of varying dimensions. This flexibility is particularly
useful in data analysis and machine learning tasks.
1 1. Numpy Arrays from Python DataStructures,Intrinsic Numpy

Objects and Random Functions
2 1. 1 Arrays from python data structures
[1]:
import numpy as np
=
[1] : array([1, 2, 3])
[2] :
[2] : numpy.ndarray
[3] :
1
[3] : dtype('int32')
[4] :
a.ndim
[4]: 1
[5] :
[5]: 3
[6] :
a.shape
[6]: (3,)
[7] :
[1, 2, 3, 4, 5]
[8] : #Creation of ndarrays using array() method

#2-D array
import numpy as np
x=[1,2,3]
y=[3,4,5]
z=np.array((x,y))
print(z)
[[1 2 3]
[3 4 5]]
[9] :
c=np.array(m)
[1 2 3]
[10] :
2
[[1 2 3 4 5]
[6 7 8 9 1]]
[11] :
[11] : array({1, 2, 3, 4, 5}, dtype=object)
[12] :
{1, 2, 3, 4}
[13] : #dictionary
import numpy as np
dict={'a':1,'b':2,'c':3}
z=np.array(list(dict.items()))
print(z)
a=np.array(list(dict.keys()))
print(a)
[['a' '1']
['b' '2']
['c' '3']]
['a' 'b' 'c']
3 1.2 Intirinsic Numpy Objects

Intrinsic NumPy objects are fundamental data structures provided by the NumPy library, which
are optimized for numerical computations and provide efficient operations on large datase ts.
[14] :
a=np.array(np.arange(9))
[0 1 2 3 4 5 6 7 8]
[15] :
[0. 0. 0.]
3
[16] : b=np.zeros([3,3])
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
[17] :
[17] : array([[1, 3, 7],

[2, 5, 9]])
[18] : d=np.zeros_like(x)
d
[18] : array([[0, 0, 0],

[0, 0, 0]])
[19] :
[1. 1. 1. 1.]
[20] : b=np.ones([3,3])
[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]
[21] :
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
[22] : c=np.eye(3,k=1)
[[0. 1. 0.]
[0. 0. 1.]
[0. 0. 0.]]
4
[23] :
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
[24] :
[[7 7]
[7 7]]
[25] :
[25] : array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
[26] : x=np.arange(6,dtype=int)
[26] : array([1, 1, 1, 1, 1, 1])
[27] :
[27] : array([0, 0, 0, 0, 0, 0])
[28] : d=np.full_like(x,0.1,dtype=np.double)
d
[28] : array([0.1, 0.1, 0.1, 0.1, 0.1, 0.1])
[29] :
[[0.1 0.1 0.1]

[0.1 0.1 0.1]]
[30] : np.empty((2, 3, 2))
[30]: array([[[1.05337787e-311, 2.86558075e-322],

[0.00000000e+000, 0.00000000e+000],
[1.10343781e-312, 1.31370903e-076]],
[[5.20093491e-090, 5.69847262e-066],
5
[5.51292779e+169, 4.85649086e-033],
[6.48224659e+170, 5.82471487e+257]]])
[31]: #empty_like()
a=([1,2,3],[4,5,6])
np.empty_like(a)
[31]: array([[1730487296, 496, 0],

[ 0, 131074, 168442489]])
[32]: #using diag() mrthod

np.diag([1,2,3,4])
[32]: array([[1, 0, 0, 0],

[0, 2, 0, 0],
[0, 0, 3, 0],
[0, 0, 0, 4]])
[33]:
x,y=np.meshgrid(x,y)
[[1 2 3]
[1 2 3]
[1 2 3]]
[[4 4 4]
[5 5 5]
[6 6 6]]
3.1 1.3 Random Functions

The random functions in NumPy are essential for simulations, statistical sampling, and generating
synthetic data. They help facilitate various operations in scientific computing, machine learning,
and data analysis.
[34]:
from numpy import random
x = random.randint(100)
15
6
[35]:
y=np.random.bytes(7)
b"t'\n\x16\x14QB"
[['true' 'false' 'false']
['true' 'false' 'true']]
[36]:
x = random.rand(1) + random.rand(1)*1j
print (x)
print(x.imag)
[0.08421058+0.69654499j]
[0.08421058]
[0.69654499]
[37]:
x = random.rand(1,5) + random.rand(1,5)*1j
print (x)
[[0.29653563+0.94629414j 0.56539718+0.58965768j 0.83340819+0.82456817j

0.16209606+0.15309722j 0.92519953+0.01018444j]]
[38]:
np.random.random(size=(2,2))+1j*np.random.random(size=(2,2))
[38] : array([[0.90898124+0.87349692j, 0.64895681+0.87327894j],

[0.7544518 +0.122983j , 0.4716534 +0.77610277j]])
[39] :
np.random.permutation(5)
[39] : array([0, 3, 4, 1, 2])
[40] :
b=np.random.choice(a,size=5,p=[0.1,0.2,0.3,0.2,0.2])
[4 3 0 3 4]
[41] :
[41]: 3
7
[42] :
[[ 0.08009351 1.04758386 -0.15977457 0.60779634 0.12686552 -2.29032851

-0.53667358 -0.69266066 1.42867051 -0.34056088]]
[43] :
b=np.random.choice(a)
bananaa
[44] :
np.random.shuffle(a)
['cherry' 'apple' 'bananaa']
4 2.Manipulation Of Numpy Arrays
5 2.1 Indexing
Indexing in NumPy refers to accessing individual elements or groups of elements within an array
[45] :
import numpy as np
x =
y =
[46] :
[47] : =
[47] : array([[1, 2, 3],

[4, 5, 6]])
[48] :
old_values = arr3d[0].copy()
arr3d[0] = 42
8
[[[42 42 42]
[42 42 42]]
[[ 7 8 9]
[10 11 12]]]
[49] : import numpy as np
10
10
9
6 2.2 Slicing
Slicing in NumPy refers to the process of selecting a specific subset of elements from an array. It
allows you to create a new view of the original data without copying it, which can be very efficient
in terms of memory usage.
[54] :
arr=np.array([5,6,7,8,9])
[6 7]
[55] :
arr=np.array([5,6,7,3,6,8,9])
[6 7 3 6 8 9]
[56] :
arr=np.array([1,2,3,4,5,8,9])
[2 3 4 5 8 9]
[57] : arr=np.array([5,6,7,8,9])
[5 6 7]
[58] :
[7 8]
[59] : arr=np.array([5,6,7,8,9])
[5 6 7]
[60] : arr=np.array([5,6,7,8,9])
[5 6 7]
[61] :
[7 8]
10
[62] :
arr=np.array([5,6,7,8,4,5,6,7,9])
[6 8]
[63] : arr=np.array([5,6,7,8,4,5,6,7,9])
[9 7 6 5]
[64]: import numpy as np
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[1, 1:4])
[7 8 9]
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[0:2, 2])
[3 8]
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[0:2, 1:4])
[[2 3 4]
[7 8 9]]
[67]:
b=
llo
[68]: b =
Hello
11
[69]: b =
llo, World!
7 2.3 Re-Shaping
Reshaping in NumPy is the process of changing the shape (i.e., dimensions) of an existing array
without altering the data. This is particularly useful when you need to transform an array to fit a
certain shape for further operations, such as machine learning or data processing task
(2, 4)
[[[[[1 2 3 4]]]]]
shape of array : (1, 1, 1, 1, 4)
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
arr1= arr.reshape(4, 3)
print(arr1)
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
12
[[[ 1 2 3]
[ 4 5 6]]
[[ 7 8 9]
[10 11 12]]]
[74]:
[[0 1]
[2 3]
[4 5]
[6 7]]
[75]: a=np.arange(12).reshape(4,3)
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
8 2.4 Joining Arrays

Joining arrays in NumPy is a way of combining two or more arrays into a single array. There are
several ways to join arrays, depending on the desired result and the shape of the input arrays.
[76]:
a1=np.arange(6).reshape(3,2)
a2=np.arange(6).reshape(3,2)
print(np.concatenate((a1,a2),axis=1))
[[0 1 0 1]
[2 3 2 3]
[4 5 4 5]]
[77]:
a =
b =
[[[1 2]
[3 4]]
13
[[5 6]
[7 8]]]
[78]:
[[[1 2]
[3 4]]
[[5 6]
[7 8]]]
[79]:
[[[1 2]
[5 6]]
[[3 4]
[7 8]]]
[80]:
ch = np.hstack((a,b))
print(ch)
[[1 2 5 6]
[3 4 7 8]]
[81]:
ch = np.vstack((a,b))
print(ch)
[[1 2]
[3 4]
[5 6]
[7 8]]
9 2.5 Splitting
Splitting in NumPy involves dividing an array into multiple sub-arrays. This can be useful when
you need to partition data for different processing purposes or when dealing with chunks of data
in a structured way.

a = np.arange(9)
print(a)
[0 1 2 3 4 5 6 7 8]
14
[83]:
b =
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]
[84]:
a = np.arange(12).reshape(4,3)
b=np.hsplit(a,3)
[array([[0],
[3],
[6],
[9] ]), array([[ 1],
[ 4],
[ 7],
[10] ]), array([[ 2],
[ 5],
[ 8],
[11] ])]
[85]:
[array([[0, 1, 2],
[3, 4, 5]]), array([[ 6, 7, 8],
[ 9, 10, 11]])]
10 3.Computation On Numpy Arrays Using Universal Functions
11 3.1 Unary Universal Functions

Unary Universal Functions (also known as unary ufuncs) in NumPy are mathematical functions
that operate on a single input array element-wise. These functions apply a specific mathematical
operation to each element of an array independently, resulting in an output array of the same shape.
[86]: arr = np.arange(10)
[0 1 2 3 4 5 6 7 8 9]
[87]:
15
[87] : array([0. , 1. , 1.41421356, 1.73205081, 2. ,
2.23606798, 2.44948974, 2.64575131, 2.82842712, 3. ])
[88] :
[88] : array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,

5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
2.98095799e+03, 8.10308393e+03])
[89] :
[89]: 0
[90] :
[90]: 9
[91] :
[91]: 4.5
[92] :
[0 1 2 3 4 5 6 7 8 9]
[93] :
arr=np.arange(0,-5,-0.5)
[0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]
12 3.2 Binary Universal Functions

Binary Universal Functions (also known as binary ufuncs) operate on two input arrays elementwise.
These functions require two arrays (or one array and one scalar) and perform a mathematical
operation between corresponding elements
[94] : x = np.random.randn(8)
y = np.random.randn(8)
print(x)
[ 1.11097262 -0.26995231 0.0060993 1.04398907 -1.82141342 0.00998652

0.08274781 0.82046885]
16
[95] :
[-0.05342373 0.10817525 -0.4610533 0.5755554 -0.66695438 0.25344274

1.40395846 -0.87447163]
[96] :
[96] : array([ 1.11097262, 0.10817525, 0.0060993 , 1.04398907, -0.66695438,

0.25344274, 1.40395846, 0.82046885])
[97] : arr = np.random.randn(7) * 5

remainder, whole_part = np.modf(arr)
print(remainder)
[-0.98958028 0.75318997 0.47148313 0.96309562 -0.84443205 0.60019609

0.41412946]
[98] :
[-6. 7. 2. 1. -2. 5. 3.]

a = np.arange(9).reshape(3,3)
b = np.array([[10,10,10],[10,10,10],[10,10,10]])
[[10 11 12]
[13 14 15]
[16 17 18]]
[100] :
[100] : array([[-10, -9, -8],

[ -7, -6, -5],
[ -4, -3, -2]])
[101] :
[101] : array([[ 0, 10, 20],

[30, 40, 50],
[60, 70, 80]])
[102] :
[102] : array([[0. , 0.1, 0.2],

[0.3, 0.4, 0.5],
[0.6, 0.7, 0.8]])
17
a = np.array([10,100,1000])
np.power(a,2)
[103] : array([ 100, 10000, 1000000], dtype=int32)
13 4.Compute Statistical and Mathematical Methods and Com-

parison Operations on rows/columns
13.1 4.1 Mathematical and Statistical methods on Numpy Arrays
NumPy provides a variety of mathematical and statistical methods to perform operations on arrays.
[104] : a =
a
[104] : array([[3, 7, 5],

[8, 4, 3],
[2, 4, 9]])
[105] :
[105]: 45
[106] :
import numpy as np
a = np.array([[30,40,70],[80,20,10],[50,90,60]])
[106]: 82.0
[107] : arr = np.random.randn(5, 4)
[108] :
[108]: -0.14756616582071838
[109] :
[109] : array([-0.93641711, 0.12758996, -0.44993246, 0.13099294, 0.38993583])
[110] :
[110]: -0.28413298907449897
18
[111] :
[111]: 0.9329450218698545
[112] :
[112]: 0.8703864138317433
[113] :
[113] : array([ 0.68253865, -2.88096912, -2.108008 , 1.35511515])
[114] : =
[ 0 1 3 6 10 15 21 28]
[115] : =
[[ 0 1 2]
[ 3 5 7]
[ 9 12 15]]
[116] :
[[ 0 0 0]
[ 3 12 60]
[ 6 42 336]]
13.2 4.2 Comparison Operations

Comparison operations in NumPy allow element-wise comparison between arrays or with scalars.
[117] :
True
[118] :
19
[119] :
[False True False True]
[120] :
False
[121] :
[False True True True]
[122] :
True
[123] :
[ True False False False]
[124] :
[ True False True False]
14 5.Computation on Numpy Arrays using Sorting,unique and Set

Operations
14.1 5.1 Sorting
Sorting helps to arrange elements of an array in a particular order.

=
[[3 7]
[9 1]]
[126] :
[126] : array([[3, 7],

[1, 9]])
20
[127] :
[127] : array([[3, 1],

[9, 7]])
[128] :
[128] : array([[3, 7],

[1, 9]])
[129] : arr = np.random.randn(5, 3)
[[-0.92147727 -0.67857177 -0.04478315]

[-0.30378745 -0.95433394 -1.83418572]
[-0.48103436 -0.55413111 1.28233061]
[ 0.76260305 1.30994277 0.32818117]
[ 1.87598839 -0.35057108 0.47603584]]
[130] :
[[-0.92147727 -0.67857177 -0.04478315]

[-1.83418572 -0.95433394 -0.30378745]
[-0.55413111 -0.48103436 1.28233061]
[ 0.32818117 0.76260305 1.30994277]
[-0.35057108 0.47603584 1.87598839]]
14.2 5.2 Unique Operation

NumPy provides functions that perform set operations on arrays
[131] :
names =
print(np.unique(names))
['Bob' 'Joe' 'Will']
[132] :
[132]: ['Bob', 'Joe', 'Will']
[133]: =
[1 2 3 4]
21
14.3 5.3 Set Operations
[134]:
import numpy as np
=
[ True False False True True False True]
[135]:
[136]:
[1 2 3 4 5 6]
[137]:
[3 4]
[138]:
[1 2]
[139]:
[1 2 5 6]
15 6.Load an image file and do crop and flip operation using

Numpy indexing
To load and manipulate images with NumPy, you can use the Pillow (PIL) library to load an image
and convert it into a NumPy array.
[8]: from PIL import Image

img=Image.open("img.jpg")
img.format
[8] : 'JPEG'
[9] :
22
[[[242 242 242]
[242 242 242]
[242 242 242]
…
[195 195 195]
[195 195 195]
[195 195 195]]
[[242 242 242]

[242 242 242]
[242 242 242]
…
[195 195 195]
[195 195 195]
[195 195 195]]
[[242 242 242]

[242 242 242]
[242 242 242]
…
[195 195 195]
[195 195 195]
[195 195 195]]
[[208 208 208]

[208 208 208]
[206 206 206]
…
[163 163 163]
[163 163 163]
[163 163 163]]
[[208 208 208]

[207 207 207]
[206 206 206]
…
[164 164 164]
[164 164 164]
[164 164 164]]
[[207 207 207]

[207 207 207]
[205 205 205]
…
[165 165 165]
[165 165 165]
23
[165 165 165]]]
[10] :
24
[11] : crop_img=a[100:900,100:900,:]
img_out=Image.fromarray(crop_img)
img_out
[11]:
[12] :
display(Image.fromarray(flipped_img))
25
[ ]:
26
Data Manipulation with Pandas
1 1.create pandas series from python List ,Numpy Arrays and

Dictionary
2 1.1Pandas Series From Python List
[1]: import pandas as pd

import numpy as np
data=[4,7,-5,3]
0 4
1 7
2 -5
3 3
dtype: int64
[2]: # import pandas lib. as pd

import pandas as pd
# create Pandas Series with define indexes

x = pd.Series([10, 20, 30, 40, 50], index =['a', 'b', 'c', 'd', 'e'])
# print the Series

print(x)
a 10
b 20
c 30
d 40
e 50
dtype: int64
27
lst = ['G', 'h', 'i', 'j',
'k', 'l', 'm']
# create Pandas Series with define indexes

x = pd.Series(lst, index = ind)
# print the Series

print(x)
10 G
20 h
30 i
40 j
50 k
60 l
70 m
dtype: object
3 1.2 Pandas Series From Numpy arrays
[4] : import pandas as pd

import numpy as np
# numpy array
data = np.array(['a', 'b', 'c', 'd', 'e'])
# creating series
s = pd.Series(data)
print(s)
0 a
1 b
2 c
3 d
4 e
dtype: object
[5] : # importing Pandas & numpy

import pandas as pd
import numpy as np
# numpy array
data = np.array(['a', 'b', 'c', 'd', 'e'])
# creating series
s = pd.Series(data, index =[1000, 1001, 1002, 1003, 1004])
print(s)
28
1000 a
1001 b
1002 c
1003 d
1004 e
dtype: object
[6] : =
s = pd.Series(numpy_array, index=list('abcdef'))
Output Series:
a 1.0
b 2.8
c 3.0
d 2.0
e 9.0
f 4.2
dtype: float64
4 1.3 Pandas Series From Dictionary
# create a dictionary
dictionary = {'D': 10, 'B': 20, 'C': 30}
# create a series
series = pd.Series(dictionary)
print(series)
D 10
B 20
C 30
dtype: int64
[8] : # import the pandas lib as pd

import pandas as pd
dictionary = {'A': 50, 'B': 10, 'C': 80}
# create a series
series = pd.Series(dictionary, index=['B','C','A'])
29
B 10
C 80
A 50
dtype: int64
dictionary = {'A': 50, 'B': 10, 'C': 80}
# create a series
series = pd.Series(dictionary, index=['B', 'C', 'D', 'A'])
print(series)
B 10.0
C 80.0
D NaN
A 50.0
dtype: float64
4.1 2. Data Manipulation with Pandas Series

4.2 2.1 Indexing

import numpy as np
# creating simple array

data = np.array(['s','p','a','n','d','a','n','a'])
ser = pd.Series(data,index=[10,11,12,13,14,15,16,17])
print(ser[16])
Date = ['1/1/2018', '2/1/2018', '3/1/2018', '4/1/2018']

Index_name = ['Day 1', 'Day 2', 'Day 3', 'Day 4']
sr = pd.Series(data = Date,
index = Index_name )
print(sr)
Day 1 1/1/2018
Day 2 2/1/2018
30
Day 3 3/1/2018
Day 4 4/1/2018
dtype: object
[12] :
1/1/2018

import pandas as pd
a 0.0
b 1.0
c 2.0
d 3.0
e 4.0
dtype: float64
4.3 2.2 Selecting

import pandas as pd
a 0.0
b 1.0
c 2.0
d 3.0
e 4.0
dtype: float64
[19] :
[19]: 1.0
[26]:
[26] : b 1.0
a 0.0
d 3.0
dtype: float64
[27] :
31
[27]: b 1.0
c 2.0
d 3.0
e 4.0
dtype: float64
[20] :
[20]: 1.0
[21] :
[21]: c 2.0
d 3.0
dtype: float64
[23]:
[23]: b 1.0
d 3.0
dtype: float64
[28] :
a 0.0
c 2.0
e 4.0
dtype: float64
4.4 2.3 Filtering

import pandas as pd
a 0.0
b 1.0
c 2.0
d 3.0
e 4.0
dtype: float64
[32]:
[32]: a 0.0
dtype: float64
32
[36]:
[36]: b 5.0
d 3.0
e 4.0
dtype: float64
[35]:
[35]: a 0.0
b 5.0
d 3.0
e 4.0
dtype: float64
[38]: s[(s>2)&(s<5)
[38]: d 3.0
e 4.0
dtype: float64
[33]:
[33]: b 5.0
c 2.0
dtype: float64
[7]:
b True
dtype: bool
[42]:
[42]: c 2.0
e 4.0
dtype: float64
4.5 2.4 Arithmetic Operations

=
=
[3]: series3 = series1 + series2

print(series3)
33
0 7
1 9
2 11
3 13
4 15
dtype: int64
[4]: series3 = series1 - series2

print(series3)
0 -5
1 -5
2 -5
3 -5
4 -5
dtype: int64
[5]: series3 = series1 *series2

print(series3)
0 6
1 14
2 24
3 36
4 50
dtype: int64
[6]: series3 = series1 /series2

print(series3)
0 0.166667
1 0.285714
2 0.375000
3 0.444444
4 0.500000
dtype: float64
[9]: series3 = series1 %series2

print(series3)
0 1
1 2
2 3
3 4
4 5
dtype: int64
34
4.6 2.5 Ranking

s=pd.Series([121,211,153,214,115,116,237,118,219,120])
s.rank(ascending=True)
[10]: 0 5.0
1 7.0
2 6.0
3 8.0
4 1.0
5 2.0
6 10.0
7 3.0
8 9.0
9 4.0
dtype: float64
[49]:
[49]: 0 6.0
1 4.0
2 5.0
3 3.0
4 10.0
5 9.0
6 1.0
7 8.0
8 2.0
9 7.0
dtype: float64
[11]:
[11]: 0 5.0
1 7.0
2 6.0
3 8.0
4 1.0
5 2.0
6 10.0
7 3.0
8 9.0
9 4.0
dtype: float64
[12]:
35
[12]: 0 5.0
1 7.0
2 6.0
3 8.0
4 1.0
5 2.0
6 10.0
7 3.0
8 9.0
9 4.0
dtype: float64
[50]:
[50]: 0 5.0
1 7.0
2 6.0
3 8.0
4 1.0
5 2.0
6 10.0
7 3.0
8 9.0
9 4.0
dtype: float64
4.7 2.6 Sorting

sr = pd.Series([19.5, 16.8, 22.78, 20.124, 18.1002])
0 19.5000
1 16.8000
2 22.7800
3 20.1240
4 18.1002
dtype: float64
[8]: sr.sort_values(ascending = False)
[8]: 2 22.7800
3 20.1240
0 19.5000
4 18.1002
1 16.8000
dtype: float64
36
[53]: sr.sort_values(ascending = True)
[53]: 1 16.8000
4 18.1002
0 19.5000
3 20.1240
2 22.7800
dtype: float64
[55]:
[55]: 0 19.5000
1 16.8000
2 22.7800
3 20.1240
4 18.1002
dtype: float64
[58]:
1 16.8000
4 18.1002
0 19.5000
3 20.1240
2 22.7800
dtype: float64
4.8 2.7 checking null values
[40]: s=pd.Series({'ohio':35000,'teyas':71000,'oregon':16000,'utah':5000})
x=pd.Series(s,index=states)
ohio 35000
teyas 71000
oregon 16000
utah 5000
dtype: int64
california NaN
ohio 35000.0
Texas NaN
oregon 16000.0
dtype: float64
[42]:
37
[42]: california True
ohio False
Texas True
oregon False
dtype: bool
[44]:
[44]: california False

ohio True
Texas False
oregon True
dtype: bool
4.9 2.8 Concatenation

[19]:
=
=
[65]:
0 1
1 2
2 3
0 A
1 B
2 C
dtype: object
[66]:
=
0 1
0 1 A
1 2 B
2 3 C
[67]:
=
0 1
1 2
2 3
0 A
1 B
38
2 C
dtype: object
[21]:
0 1
1 2
2 3
3 A
4 B
5 C
dtype: object
[22]:
0 1
1 2
2 3
0 A
1 B
2 C
dtype: object
[69]:
series1 0 1
1 2
2 3
series2 0 A
1 B
2 C
dtype: object
4.10 3 .Creating DataFrames from List and Dictionary

4.11 3.1 From List
[16]: =
df = pd.DataFrame(data, columns=['Numbers'])
print(df)
Numbers
0 1
1 2
2 3
3 4
4 5
39
nme = ["aparna", "pankaj", "sudhir", "Geeku"]
deg = ["MBA", "BCA", "M.Tech", "MBA"]
scr = [90, 40, 80, 98]
dict = {'name': nme, 'degree': deg, 'score': scr}
df = pd.DataFrame(dict)
print(df)
name degree score

0 aparna MBA 90
1 pankaj BCA 40
2 sudhir M.Tech 80
3 Geeku MBA 98

=
df = pd.DataFrame(data, columns = ['Name', 'Age'])
Name Age
0 G 10
1 h 15
2 i 20
4.12 3.2 From Dictionary
[39]:
a b c
1 4 7 10
2 5 8 11
3 6 9 12
[13]:
[2000,2001,2002,2000,2001,2002],'pop':[1.5,1.7,3.6,2.4,2.9,3.2]})
𝗌
state year pop

0 AP 2000 1.5
1 AP 2001 1.7
2 AP 2002 3.6
3 TS 2000 2.4
4 TS 2001 2.9
5 TS 2002 3.2
40
[14] :
a b
n v
d 1 4 7
2 5 8
e 2 6 9
[71]:
[71]: ap ts tn
a 0.0 1.0 2.0
b NaN NaN NaN
c 3.0 4.0 5.0
d 6.0 7.0 8.0
4.13 4.Import various file formats to pandas DataFrames and preform the fol-
lowing
4.14 4.1 Importing file

data=pd.read_csv('bird.csv')
data
[10]: id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw \
0 0 80.78 6.68 72.01 4.88 41.81 3.70 5.50 4.03 38.70 3.84
1 1 88.91 6.63 80.53 5.59 47.04 4.30 80.22 4.51 41.50 4.01
2 2 79.97 6.37 69.26 5.28 43.07 3.90 75.35 4.04 38.31 3.34
3 3 77.65 5.70 65.76 4.77 40.04 3.52 69.17 3.40 35.78 3.41
4 4 62.80 4.84 52.09 3.73 33.95 2.72 56.27 2.96 31.88 3.13
.. … … … … … … … … … … …
415 415 17.96 1.63 19.25 1.33 18.36 1.54 31.25 1.33 21.99 1.15
416 416 19.21 1.64 20.76 1.49 19.24 1.45 33.21 1.28 23.60 1.15
417 417 18.79 1.63 19.83 1.53 20.96 1.43 34.45 1.41 22.86 1.21
418 418 20.38 1.78 22.53 1.50 21.35 1.48 36.09 1.53 25.98 1.24
419 419 17.89 1.44 19.26 1.10 17.62 1.34 29.81 1.24 21.69 1.05
type
0 SW
1 SW
2 SW
3 SW
41
4 SW
.. …
415 SO
416 SO
417 SO
418 SO
419 SO
[420 rows x 12 columns]
4.15 4.2 display top and bottom five rows
[15] :
[15] : id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw type
0 0 80.78 6.68 72.01 4.88 41.81 3.70 5.50 4.03 38.70 3.84 SW
1 1 88.91 6.63 80.53 5.59 47.04 4.30 80.22 4.51 41.50 4.01 SW
2 2 79.97 6.37 69.26 5.28 43.07 3.90 75.35 4.04 38.31 3.34 SW
3 3 77.65 5.70 65.76 4.77 40.04 3.52 69.17 3.40 35.78 3.41 SW
4 4 62.80 4.84 52.09 3.73 33.95 2.72 56.27 2.96 31.88 3.13 SW
[16] :
415 415 17.96 1.63 19.25 1.33 18.36 1.54 31.25 1.33 21.99 1.15
416 416 19.21 1.64 20.76 1.49 19.24 1.45 33.21 1.28 23.60 1.15
417 417 18.79 1.63 19.83 1.53 20.96 1.43 34.45 1.41 22.86 1.21
418 418 20.38 1.78 22.53 1.50 21.35 1.48 36.09 1.53 25.98 1.24
419 419 17.89 1.44 19.26 1.10 17.62 1.34 29.81 1.24 21.69 1.05
type
415 SO
416 SO
417 SO
418 SO
419 SO
4.16 4.3 Get shape,data type,null values,index and column details
[17] : data.shape
[17]: (420, 12)
[18] :
42
[18] : id int64
huml float64
humw float64
ulnal float64
ulnaw float64
feml float64
femw float64
tibl float64
tibw float64
tarl float64
tarw float64
type object
dtype: object
[19] :
[19]: id 0
huml 1
humw 1
ulnal 3
ulnaw 2
feml 2
femw 1
tibl 2
tibw 1
tarl 1
tarw 1
type 0
dtype: int64
[20] : data.columns
[20]: Index(['id', 'huml', 'humw', 'ulnal', 'ulnaw', 'feml', 'femw', 'tibl', 'tibw',
'tarl', 'tarw', 'type'],
dtype='object')
[21]:
[21]: RangeIndex(start=0, stop=420, step=1)
4.17 4.4 Select/Delete the records rows/columns based on conditions
[24]:
0 0 80.78 6.68 72.01 4.88 41.81 3.70 5.50 4.03 38.70 3.84
1 1 88.91 6.63 80.53 5.59 47.04 4.30 80.22 4.51 41.50 4.01
43
2 2 79.97 6.37 69.26 5.28 43.07 3.90 75.35 4.04 38.31 3.34
3 3 77.65 5.70 65.76 4.77 40.04 3.52 69.17 3.40 35.78 3.41
4 4 62.80 4.84 52.09 3.73 33.95 2.72 56.27 2.96 31.88 3.13
.. … … … … … … … … … … …
415 415 17.96 1.63 19.25 1.33 18.36 1.54 31.25 1.33 21.99 1.15
416 416 19.21 1.64 20.76 1.49 19.24 1.45 33.21 1.28 23.60 1.15
417 417 18.79 1.63 19.83 1.53 20.96 1.43 34.45 1.41 22.86 1.21
418 418 20.38 1.78 22.53 1.50 21.35 1.48 36.09 1.53 25.98 1.24
419 419 17.89 1.44 19.26 1.10 17.62 1.34 29.81 1.24 21.69 1.05
type
0 SW
1 SW
2 SW
3 SW
4 SW
.. …
415 SO
416 SO
417 SO
418 SO
419 SO
[25]:
1 1 88.91 6.63 80.53 5.59 47.04 4.30 80.22 4.51 41.50 4.01
2 2 79.97 6.37 69.26 5.28 43.07 3.90 75.35 4.04 38.31 3.34
4 4 62.80 4.84 52.09 3.73 33.95 2.72 56.27 2.96 31.88 3.13
5 5 61.92 4.78 50.46 3.47 49.52 4.41 56.95 2.73 29.07 2.83
6 6 79.73 5.94 67.39 4.50 42.07 3.41 71.26 3.56 37.22 3.64
.. … … … … … … … … … … …
415 415 17.96 1.63 19.25 1.33 18.36 1.54 31.25 1.33 21.99 1.15
416 416 19.21 1.64 20.76 1.49 19.24 1.45 33.21 1.28 23.60 1.15
417 417 18.79 1.63 19.83 1.53 20.96 1.43 34.45 1.41 22.86 1.21
418 418 20.38 1.78 22.53 1.50 21.35 1.48 36.09 1.53 25.98 1.24
419 419 17.89 1.44 19.26 1.10 17.62 1.34 29.81 1.24 21.69 1.05
type
1 SW
2 SW
4 SW
5 SW
6 SW
.. …
44
415 SO
416 SO
417 SO
418 SO
419 SO
[27]:
[27] : id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw type
342 342 NaN NaN NaN NaN 32.54 2.65 55.06 2.81 38.94 2.25 SO
[28] :
[28]: 67.39
[29] :
[29] : huml humw

11 186.00 9.83
12 172.00 8.44
13 148.91 6.78
14 149.19 6.98
15 140.59 6.59
4.18 4.5 Sorting and Ranking operations in DataFrame
[30] : data
0 0 80.78 6.68 72.01 4.88 41.81 3.70 5.50 4.03 38.70 3.84
1 1 88.91 6.63 80.53 5.59 47.04 4.30 80.22 4.51 41.50 4.01
2 2 79.97 6.37 69.26 5.28 43.07 3.90 75.35 4.04 38.31 3.34
3 3 77.65 5.70 65.76 4.77 40.04 3.52 69.17 3.40 35.78 3.41
4 4 62.80 4.84 52.09 3.73 33.95 2.72 56.27 2.96 31.88 3.13
.. … … … … … … … … … … …
415 415 17.96 1.63 19.25 1.33 18.36 1.54 31.25 1.33 21.99 1.15
416 416 19.21 1.64 20.76 1.49 19.24 1.45 33.21 1.28 23.60 1.15
417 417 18.79 1.63 19.83 1.53 20.96 1.43 34.45 1.41 22.86 1.21
418 418 20.38 1.78 22.53 1.50 21.35 1.48 36.09 1.53 25.98 1.24
419 419 17.89 1.44 19.26 1.10 17.62 1.34 29.81 1.24 21.69 1.05
type
0 SW
1 SW
2 SW
45
3 SW
4 SW
.. …
415 SO
416 SO
417 SO
418 SO
419 SO
[31] :
419 419 17.89 1.44 19.26 1.10 17.62 1.34 29.81 1.24 21.69 1.05
418 418 20.38 1.78 22.53 1.50 21.35 1.48 36.09 1.53 25.98 1.24
417 417 18.79 1.63 19.83 1.53 20.96 1.43 34.45 1.41 22.86 1.21
416 416 19.21 1.64 20.76 1.49 19.24 1.45 33.21 1.28 23.60 1.15
415 415 17.96 1.63 19.25 1.33 18.36 1.54 31.25 1.33 21.99 1.15
.. … … … … … … … … … … …
4 4 62.80 4.84 52.09 3.73 33.95 2.72 56.27 2.96 31.88 3.13
3 3 77.65 5.70 65.76 4.77 40.04 3.52 69.17 3.40 35.78 3.41
2 2 79.97 6.37 69.26 5.28 43.07 3.90 75.35 4.04 38.31 3.34
1 1 88.91 6.63 80.53 5.59 47.04 4.30 80.22 4.51 41.50 4.01
0 0 80.78 6.68 72.01 4.88 41.81 3.70 5.50 4.03 38.70 3.84
type
419 SO
418 SO
417 SO
416 SO
415 SO
.. …
4 SW
3 SW
2 SW
1 SW
0 SW
[32] :
[32] : id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw \
369 369 13.48 1.27 16.00 1.00 12.67 1.10 23.12 0.88 16.34 0.89
413 413 12.95 1.16 14.09 1.03 13.03 1.03 22.13 0.96 15.19 1.02
395 395 15.62 1.28 18.52 1.06 15.75 1.17 28.63 1.03 21.39 0.88
46
367 367 13.31 1.17 16.47 1.06 12.32 0.93 22.47 0.95 15.97 0.75
414 414 13.63 1.16 15.22 1.06 13.75 0.99 23.13 0.96 15.62 1.01
376 376 13.52 1.28 17.88 1.07 15.10 1.05 25.14 1.23 17.81 0.69
type
369 SO
413 SO
395 SO
367 SO
414 SO
376 SO
[33] :
369 369 13.48 1.27 16.00 1.00 12.67 1.10 23.12 0.88 16.34 0.89
413 413 12.95 1.16 14.09 1.03 13.03 1.03 22.13 0.96 15.19 1.02
414 414 13.63 1.16 15.22 1.06 13.75 0.99 23.13 0.96 15.62 1.01
367 367 13.31 1.17 16.47 1.06 12.32 0.93 22.47 0.95 15.97 0.75
395 395 15.62 1.28 18.52 1.06 15.75 1.17 28.63 1.03 21.39 0.88
376 376 13.52 1.28 17.88 1.07 15.10 1.05 25.14 1.23 17.81 0.69
type
369 SO
413 SO
414 SO
367 SO
395 SO
376 SO
[34] :
0 1.0 289.0 344.0 275.0 325.5 289.0 295.0 1.0 302.0 272.0 328.0
1 2.0 308.0 343.0 284.0 343.0 312.0 320.0 308.0 327.5 285.0 333.0
2 3.0 286.0 336.0 268.0 334.0 295.0 303.5 292.0 303.5 271.0 305.5
3 4.0 284.0 308.0 255.0 313.5 279.0 288.0 270.0 272.5 247.0 310.5
4 5.0 248.0 281.0 227.5 258.0 224.0 225.5 231.0 250.0 211.0 294.0
5 6.0 246.0 275.0 223.0 242.0 326.0 322.0 234.0 234.0 181.0 268.5
6 7.0 285.0 321.0 262.0 304.0 292.0 282.5 279.0 280.5 259.0 320.0
7 8.0 304.0 306.0 278.0 306.0 300.0 299.0 296.0 295.5 266.0 324.0
8 9.0 362.0 370.0 354.0 362.0 365.0 356.5 363.5 359.0 352.0 346.0
9 10.0 387.0 399.0 381.5 383.0 382.0 398.0 382.0 397.0 392.0 377.0
type
0 274.5
1 274.5
47
2 274.5
3 274.5
4 274.5
5 274.5
6 274.5
7 274.5
8 274.5
9 274.5
[35] :
0 1.0 289.0 344.0 275.0 325.5 289.0 295.0 1.0 302.0 272.0 328.0
1 2.0 308.0 343.0 284.0 343.0 312.0 320.0 308.0 327.5 285.0 333.0
type
0 274.5
1 274.5
[15]:
[15]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket \
0 891.0 617.0 246.0 783.0 289.0 497.0 179.0 552.5 220.0
1 890.0 171.5 783.5 701.0 734.5 183.0 179.0 552.5 112.0
2 889.0 171.5 246.0 538.0 734.5 404.5 587.5 552.5 17.0
3 888.0 171.5 783.5 619.0 734.5 226.5 179.0 552.5 824.5
4 887.0 617.0 246.0 876.0 289.0 226.5 587.5 552.5 283.0
Fare Cabin Embarked

0 815.0 NaN 322.5
1 103.0 94.0 805.5
2 659.5 NaN 322.5
3 144.0 134.5 322.5
4 628.0 NaN 322.5
4.18.1 4.6 Statistical operations

data=pd.read_csv('gym-track.csv')
data
[23]: Age Gender Weight (kg) Height (m) Max_BPM Avg_BPM Resting_BPM \
0 56 Male 88.3 1.71 180 157 60
1 46 Female 74.9 1.53 179 151 66
2 32 Female 68.1 1.66 167 122 54
3 25 Male 53.2 1.70 190 164 56
4 38 Male 46.1 1.79 188 158 68
48
.. … … … … … … …
968 24 Male 87.1 1.74 187 158 67
969 25 Male 66.6 1.61 184 166 56
970 59 Female 60.4 1.76 194 120 53
971 32 Male 126.4 1.83 198 146 62
972 46 Male 88.7 1.63 166 146 66
Session_Duration (hours) Calories_Burned Workout_Type Fat_Percentage \

0 1.69 1313.0 Yoga 12.6
1 1.30 883.0 HIIT 33.9
2 1.11 677.0 Cardio 33.4
3 0.59 532.0 Strength 28.8
4 0.64 556.0 Strength 29.2
.. … … … …
968 1.57 1364.0 Strength 10.0
969 1.38 1260.0 Strength 25.0
970 1.72 929.0 Cardio 18.8
971 1.10 883.0 HIIT 28.2
972 0.75 542.0 Strength 28.8
Water_Intake (liters) Workout_Frequency (days/week) Experience_Level \

0 3.5 4 3
1 2.1 4 2
2 2.3 4 2
3 2.1 3 1
4 2.8 3 1
.. … … …
968 3.5 4 3
969 3.0 2 1
970 2.7 5 3
971 2.1 3 2
972 3.5 2 1
BMI
0 30.20
1 32.00
2 24.71
3 18.41
4 14.39
.. …
968 28.77
969 25.69
970 19.50
971 37.74
972 33.38
49
[25]:
[25]: 38.68345323741007
[28]:
[28]: 40.0
[29]:
[29]: 12.180927866987108
[30] :
[30]: 37639
[31] :
[31]: 148.37500370074312
4.18.2 4.7 count and Uniqueness of given Categorical values
[35]:
[35]: Age 973

Gender 973
Weight (kg) 973
Height (m) 973
Max_BPM 973
Avg_BPM 973
Resting_BPM 973
Session_Duration (hours) 973
Calories_Burned 973
Workout_Type 973
Fat_Percentage 973
Water_Intake (liters) 973
Workout_Frequency (days/week) 973
Experience_Level 973
BMI 973
dtype: int64
50
Data cleaning and preparation
a) handling missing data by detecting dropping and replacing/filling

mising valus
import pandas as pd
import numpy as np
Student performance
Import any csv file to pandas data frame and perform the following
# Load the CSV file into a Pandas DataFrame
df = pd.read_csv('student.csv')
df
Hours_Studied Attendance Parental_Involvement

Access_to_Resources \
0 23 84 Low
High
1 19 64 Low
Medium
2 24 98 Medium
Medium
3 29 89 Low
Medium
4 19 92 Medium
Medium
... ... ... ... .
..
6602 25 69 High
Medium
6603 23 76 High
Medium
6604 20 90 Medium
Low
6605 10 86 High
High
6606 15 67 Medium
Low
Extracurricular_Activities Sleep_Hours Previous_Scores \

0 No 7 73
1 No 8 59
2 Yes 7 91
3 Yes 8 98
4 Yes 6 65
... ... ... ...
6602 No 7 76
51
6603 No 8 81
6604 Yes 6 65
6605 Yes 6 91
6606 Yes 9 94
Motivation_Level Internet_Access Tutoring_Sessions Family_Income

\
0 Low Yes 0 Low
1 Low Yes 2 Medium
2 Medium Yes 2 Medium
... ... ... ... ...
6602 Medium Yes 1 High
6603 Medium Yes 3 Low
6604 Low Yes 3 Low
6605 High Yes 2 Low
Teacher_Quality School_Type Peer_Influence Physical_Activity \

0 Medium Public Positive 3
1 Medium Public Negative 4
2 Medium Public Neutral 4
4 High Public Neutral 4
... ... ... ... ...
6603 High Public Positive 2
6605 Medium Private Positive 3
Learning_Disabilities Parental_Education_Level Distance_from_Home

\
0 No High School Near
1 No College Moderate
2 No Postgraduate Near
52
3 No High School Moderate
4 No College Near
... ... ... ...
6605 No High School Far
Gender Exam_Score
0 Male 67
1 Female 61
2 Male 74
3 Male 71
4 Female 70
... ... ...
6602 Female 68
6603 Female 69
6604 Female 68
6605 Female 68
6606 Male 64
# Display the first few rows of the DataFrame to understand the data
print("Original DataFrame:")
print(df.head())
Original DataFrame:
Hours_Studied Attendance Parental_Involvement Access_to_Resources
\
0 23 84 Low High
1 19 64 Low Medium
2 24 98 Medium Medium
3 29 89 Low Medium
Extracurricular_Activities Sleep_Hours Previous_Scores
53
Motivation_Level \
0 No 7 73
Low
1 No 8 59
Low
2 Yes 7 91
Medium
3 Yes 8 98
Medium
4 Yes 6 65
Medium
Internet_Access Tutoring_Sessions Family_Income Teacher_Quality \

0 Yes 0 Low Medium
1 Yes 2 Medium Medium
4 Yes 3 Medium High
School_Type Peer_Influence Physical_Activity Learning_Disabilities

\
0 Public Positive 3 No
1 Public Negative 4 No
2 Public Neutral 4 No
Parental_Education_Level Distance_from_Home Gender Exam_Score

0 High School Near Male 67
1 College Moderate Female 61
2 Postgraduate Near Male 74
3 High School Moderate Male 71
4 College Near Female 70
# 1. Detect missing data

missing_data = df.isnull()
print("\nMissing Data:")
print(missing_data.head(10))
Missing Data:
0 False False False
False
1 False False False
54
False
2 False False False
False
3 False False False
False
4 False False False
False
5 False False False
False
6 False False False
False
7 False False False
False
8 False False False
False
9 False False False
False

Motivation_Level \
0 False False False
False
1 False False False
False
2 False False False
False
3 False False False
False
4 False False False
False
5 False False False
False
6 False False False
False
7 False False False
False
8 False False False
False
9 False False False
False
Internet_Access Tutoring_Sessions Family_Income Teacher_Quality

\
0 False False False False
55
School_Type Peer_Influence Physical_Activity

Learning_Disabilities \
0 False False False
False
1 False False False
False
2 False False False
False
3 False False False
False
4 False False False
False
5 False False False
False
6 False False False
False
7 False False False
False
8 False False False
False
9 False False False
False

56
# No of null values
n=df.isnull().sum()
n
Hours_Studied 0
Attendance 0
Parental_Involvement 0
Access_to_Resources 0
Extracurricular_Activities 0
Sleep_Hours 0
Previous_Scores 0
Motivation_Level 0
Internet_Access 0
Tutoring_Sessions 0
Family_Income 0
Teacher_Quality 78
School_Type 0
Peer_Influence 0
Physical_Activity 0
Learning_Disabilities 0
Parental_Education_Level 90
Distance_from_Home 67
Gender 0
Exam_Score 0
dtype: int64
# 2. Drop rows with missing values

df_dropna = df.dropna()
print("\nDataFrame after dropping rows with missing values:")
print(df_dropna.head(10))
DataFrame after dropping rows with missing values:

\
0 23 84 Low High
1 19 64 Low Medium
3 29 89 Low Medium
6 29 84 Medium Low
7 25 78 Low High
57
8 17 94 Medium High

Motivation_Level \
0 No 7 73
Low
1 No 8 59
Low
2 Yes 7 91
Medium
3 Yes 8 98
Medium
4 Yes 6 65
Medium
5 Yes 8 89
Medium
6 Yes 7 68
Low
7 Yes 6 50
Medium
8 No 6 80
High
9 Yes 8 71
Medium

0 Yes 0 Low Medium
4 Yes 3 Medium High
6 Yes 1 Low Medium
7 Yes 1 High High
8 Yes 0 Medium Low
9 Yes 0 High High

\
58
6 Private Neutral 2 No

7 High School Far Male 66
8 College Near Male 69
# 3. Fill missing values with a specific value (e.g., mean, median, or

custom value)
# Let's fill missing values in the 'Age' column with the mean value of
that column
mean_chas = df['Attendance'].mean()
df_fillna = df.fillna({'Attendance': mean_chas})
print("\nDataFrame after filling missing values:")
print(df_fillna.head(10))
DataFrame after filling missing values:

\
0 23 84 Low High
1 19 64 Low Medium
3 29 89 Low Medium
6 29 84 Medium Low
59
7 25 78 Low High
8 17 94 Medium High

Motivation_Level \
0 No 7 73
Low
1 No 8 59
Low
2 Yes 7 91
Medium
3 Yes 8 98
Medium
4 Yes 6 65
Medium
5 Yes 8 89
Medium
6 Yes 7 68
Low
7 Yes 6 50
Medium
8 No 6 80
High
9 Yes 8 71
Medium

0 Yes 0 Low Medium
4 Yes 3 Medium High
6 Yes 1 Low Medium
7 Yes 1 High High
8 Yes 0 Medium Low
9 Yes 0 High High

\
60

7 High School Far Male 66
8 College Near Male 69
# 4. Replace missing values conditionally

# For example, replace missing values in 'City' with 'Unknown'
df_replace = df.fillna({'Hours_Studied': 'Unknown'})
print("\nDataFrame after replacing missing values:")
print(df_replace.head())
DataFrame after replacing missing values:

\
0 23 84 Low High
1 19 64 Low Medium
3 29 89 Low Medium

Motivation_Level \
61
0 No 7 73
Low
1 No 8 59
Low
2 Yes 7 91
Medium
3 Yes 8 98
Medium
4 Yes 6 65
Medium

0 Yes 0 Low Medium
4 Yes 3 Medium High

\

b) transform data using apply() and map() method

# Replace 'data.csv' with the actual file path if needed
df=pd.read_csv('student.csv')
print(df.head())
Original DataFrame:
62
\
0 23 84 Low High
1 19 64 Low Medium
3 29 89 Low Medium

Motivation_Level \
0 No 7 73
Low
1 No 8 59
Low
2 Yes 7 91
Medium
3 Yes 8 98
Medium
4 Yes 6 65
Medium

0 Yes 0 Low Medium
4 Yes 3 Medium High

\

63
# Assume 'Price' is a column that we want to transform
# 1. Transform using apply() method

# Let's square the values in the 'Price' column
df['Sleep_Hours'] = df['Previous_Scores'].apply(lambda x: x ** 2)
df

0 23 84 Low
High
1 19 64 Low
Medium
2 24 98 Medium
Medium
3 29 89 Low
Medium
4 19 92 Medium
Medium
... ... ... ... .
..
6602 25 69 High
Medium
6603 23 76 High
Medium
6604 20 90 Medium
Low
6605 10 86 High
High
6606 15 67 Medium
Low
Extracurricular_Activities Sleep_Hours Previous_Scores \

0 No 5329 73
1 No 3481 59
2 Yes 8281 91
3 Yes 9604 98
4 Yes 4225 65
... ... ... ...
6602 No 5776 76
6603 No 6561 81
6604 Yes 4225 65
6605 Yes 8281 91
6606 Yes 8836 94
Motivation_Level Internet_Access Tutoring_Sessions Family_Income

\
0 Low Yes 0 Low
1 Low Yes 2 Medium
64
... ... ... ... ...
6602 Medium Yes 1 High
6603 Medium Yes 3 Low
6604 Low Yes 3 Low
6605 High Yes 2 Low
Teacher_Quality School_Type Peer_Influence Physical_Activity \

2 Medium Public Neutral 4
4 High Public Neutral 4
... ... ... ... ...
6603 High Public Positive 2
6605 Medium Private Positive 3
Learning_Disabilities Parental_Education_Level Distance_from_Home

\
1 No College Moderate
3 No High School Moderate
4 No College Near
... ... ... ...
65
6605 No High School Far
Gender Exam_Score
0 Male 67
1 Female 61
2 Male 74
3 Male 71
4 Female 70
... ... ...
6602 Female 68
6603 Female 69
6604 Female 68
6605 Female 68
6606 Male 64
# 2. Transform using map() method

# Let's map a new column 'Price_category' based on the 'Price' values
Age_category_map = {0: 'Low', 1: 'Medium', 2: 'High'}
df['Sleep_Hours'] = df['Previous_Scores'].map(Age_category_map)
# Display the transformed DataFrame

print("\nDataFrame after transformation:")
print(df.head())
DataFrame after transformation:

\
0 23 84 Low High
1 19 64 Low Medium
3 29 89 Low Medium

Motivation_Level \
0 No NaN 73
Low
1 No NaN 59
66
Low
2 Yes NaN 91
Medium
3 Yes NaN 98
Medium
4 Yes NaN 65
Medium

0 Yes 0 Low Medium
4 Yes 3 Medium High

\

c) Detect and filter outliers

# Replace 'data.csv' with the actual file path if needed
df = pd.read_csv('titanic.csv')
df
PassengerId Survived Pclass \
0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3
.. ... ... ...
67
886 887 0 2
887 888 1 1
888 889 0 3
889 890 1 1
890 891 0 3
Name Sex Age

SibSp \
0 Braund, Mr. Owen Harris male 22.0
1
1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0
1
2 Heikkinen, Miss. Laina female 26.0
0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0
1
4 Allen, Mr. William Henry male 35.0
0
.. ... ... ...
...
886 Montvila, Rev. Juozas male 27.0
0
887 Graham, Miss. Margaret Edith female 19.0
0
888 Johnston, Miss. Catherine Helen "Carrie" female NaN
1
889 Behr, Mr. Karl Howell male 26.0
0
890 Dooley, Mr. Patrick male 32.0
0
Parch Ticket Fare Cabin Embarked

0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
.. ... ... ... ... ...
886 0 211536 13.0000 NaN S
887 0 112053 30.0000 B42 S
888 2 W./C. 6607 23.4500 NaN S
889 0 111369 30.0000 C148 C
890 0 370376 7.7500 NaN Q
68
Titanic
print(df.head())
Original DataFrame:
0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3
Name Sex Age

S ibSp \
1
1
0
1
0

0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
# Select the column to analyze for outliers (replace 'Value' with the
actual column name)
column_name = 'Fare'
# Calculate the z-scores for the selected column

z_scores = np.abs((df[column_name] - df[column_name].mean()) /
df[column_name].std())
z_scores.head(10)
0 0.502163
1 0.786404
2 0.488580
3 0.420494
4 0.486064
5 0.477848
6 0.395591
7 0.223957
69
8 0.424018
9 0.042931
Name: Fare, dtype: float64
# Define a threshold for outliers (e.g., z-score greater than 3)

z_score_threshold = 3
# Filter the DataFrame to keep rows without outliers

filtered_df = df[z_scores <= z_score_threshold]
# Display the DataFrame after filtering outliers

print("\nDataFrame after filtering outliers:")
print(filtered_df.head())
DataFrame after filtering outliers:

0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3
Name Sex Age

SibSp \
1
1
0
1
0

0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
70
d) perform vectorized string operations on pandas series
df = pd.read_csv('titanic.csv')
df

0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3
.. ... ... ...
886 887 0 2
887 888 1 1
888 889 0 3
889 890 1 1
890 891 0 3
Name Sex Age

SibSp \
1
1
0
1
0
.. ... ... ...
...
886 Montvila, Rev. Juozas male 27.0
0
887 Graham, Miss. Margaret Edith female 19.0
0
888 Johnston, Miss. Catherine Helen "Carrie" female NaN
1
889 Behr, Mr. Karl Howell male 26.0
0
890 Dooley, Mr. Patrick male 32.0
0

0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
71
.. ... ... ... ... ...
886 0 211536 13.0000 NaN S
887 0 112053 30.0000 B42 S
888 2 W./C. 6607 23.4500 NaN S
889 0 111369 30.0000 C148 C
890 0 370376 7.7500 NaN Q
# Assuming 'Name' is the column containing strings
# Convert all names to uppercase

df['Name']= df['Sex'].str.upper()
df
PassengerId Survived Pclass Name Sex Age SibSp Parch

\
0 1 0 3 MALE male 22.0 1 0
1 2 1 1 FEMALE female 38.0 1 0
4 5 0 3 MALE male 35.0 0 0
.. ... ... ... ... ... ... ... ...
886 887 0 2 MALE male 27.0 0 0
887 888 1 1 FEMALE female 19.0 0 0
888 889 0 3 FEMALE female NaN 1 2
889 890 1 1 MALE male 26.0 0 0
890 891 0 3 MALE male 32.0 0 0
Ticket Fare Cabin Embarked

0 A/5 21171 7.2500 NaN S
1 PC 17599 71.2833 C85 C
2 STON/O2. 3101282 7.9250 NaN S
3 113803 53.1000 C123 S
4 373450 8.0500 NaN S
.. ... ... ... ...
886 211536 13.0000 NaN S
887 112053 30.0000 B42 S
888 W./C. 6607 23.4500 NaN S
889 111369 30.0000 C148 C
72
890 370376 7.7500 NaN Q
# Calculate the length of each name

df['Name'] = df['Sex'].str.len()
df
PassengerId Survived Pclass Name Sex Age SibSp

Parch \
0 1 0 3 4 male 22.0 1 0
1 2 1 1 6 female 38.0 1 0
2 3 1 3 6 female 26.0 0 0
3 4 1 1 6 female 35.0 1 0
4 5 0 3 4 male 35.0 0 0
.. ... ... ... ... ... ... ... ...
886 887 0 2 4 male 27.0 0 0
887 888 1 1 6 female 19.0 0 0
888 889 0 3 6 female NaN 1 2
889 890 1 1 4 male 26.0 0 0
890 891 0 3 4 male 32.0 0 0

0 A/5 21171 7.2500 NaN S
1 PC 17599 71.2833 C85 C
2 STON/O2. 3101282 7.9250 NaN S
3 113803 53.1000 C123 S
4 373450 8.0500 NaN S
.. ... ... ... ...
886 211536 13.0000 NaN S
887 112053 30.0000 B42 S
888 W./C. 6607 23.4500 NaN S
889 111369 30.0000 C148 C
890 370376 7.7500 NaN Q
# Split the names based on a delimiter (e.g., space) and create a new
column for the first part of the name
73
df['Name'] = df['Sex'].str.split(' ').str[0]
df
PassengerId Survived Pclass Name Sex Age SibSp Parch
\
0 1 0 3 male male 22.0 1 0
1 2 1 1 female female 38.0 1 0
4 5 0 3 male male 35.0 0 0
.. ... ... ... ... ... ... ... ...
886 887 0 2 male male 27.0 0 0
887 888 1 1 female female 19.0 0 0
888 889 0 3 female female NaN 1 2
889 890 1 1 male male 26.0 0 0
890 891 0 3 male male 32.0 0 0

0 A/5 21171 7.2500 NaN S
1 PC 17599 71.2833 C85 C
2 STON/O2. 3101282 7.9250 NaN S
3 113803 53.1000 C123 S
4 373450 8.0500 NaN S
.. ... ... ... ...
886 211536 13.0000 NaN S
887 112053 30.0000 B42 S
888 W./C. 6607 23.4500 NaN S
889 111369 30.0000 C148 C
890 370376 7.7500 NaN Q
# Display the transformed DataFrame

print("DataFrame after performing vectorized string operations:")
print(df.head())
DataFrame after performing vectorized string operations:

PassengerId Survived Pclass Name Sex Age SibSp
Parch \
0 1 0 3 male male 22.0 1 0
74
4 5 0 3 male male 35.0 0 0

0 A/5 21171 7.2500 NaN S
1 PC 17599 71.2833 C85 C
2 STON/O2. 3101282 7.9250 NaN S
3 113803 53.1000 C123 S
4 373450 8.0500 NaN S
75
Data Wrangling
0.0.1 1. Concatenate / Join / Merge/ Reshape DataFrames.

Used to concatenate two or more DataFrame objects. By setting axis=0 it concatenates vertically
(rows), and by setting axis=1 it concatenates horizontally (columns).

df1 = pd.DataFrame({'X': ['X0', 'X1'],# Column 'A' with values 'A0',
'A1'
'Y': ['Y0', 'Y1']})# Column 'B' with values 'B0', 'B1'
# Create the second DataFrame (df2) with columns 'A' and 'B' and two rows
df2 = pd.DataFrame({'X': ['X2', 'X3'],
'Y': ['Y2', 'Y3']})
# Concatenate df1 and df2 vertically (axis=0) to stack rows
# This combines the two DataFrames by adding the rows of df2 below the rows of␣
𝗌df1
result = pd.concat([df1, df2], axis=0)

df1
[3] : X A1Y
0 X0 Y0
1 X1 Y1
[4] : df2
[4] : X Y
0 X2 Y2
1 X3 Y3
[5] :
[5]: X A1Y Y
0 X0 Y0 NaN
1 X1 Y1 NaN
0 X2 NaN Y2
1 X3 NaN Y3
76
0.0.2 MERGE
Used to merge two data frames based on a key column, similar to SQL joins. Options include
how=’inner’, how=’outer’, how=’left’, and how=’right’ for different types of joins.

# Create DataFrame 1
df1 = pd.DataFrame({'key': ['x', 'y', 'z'], 'value1': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['y', 'z', 'a'], 'value2': [4, 5, 6]})
# Merge DataFrames on 'key' column using inner join
result = pd.merge(df1, df2, on='key', how='inner')
df1
[8] : key value1

0 x 1
1 y 2
2 z 3
[9] : df2
[9] : key value2

0 y 4
1 z 5
2 a 6
[10] :
[10] : key value1 value2

0 y 2 4
1 z 3 5

df1 = pd.DataFrame({'key': ['x', 'y', 'z'], 'value1': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['y', 'z', 'a'], 'value2': [4, 5, 6]})
# Merge DataFrames on 'key' column using outer join
result = pd.merge(df1, df2, on="key", how='outer')
df1
[11] : key value1

0 x 1
1 y 2
2 z 3
[12] : df2
77
[12] : key value2
0 y 4
1 z 5
2 a 6
[13] :
[13]: key value1 value2

0 x 1.0 NaN
1 y 2.0 4.0
2 z 3.0 5.0
3 a NaN 6.0
0.0.3 JOIN
A join is a way to combine data from two or more tables (or DataFrames) based on a common
column, known as the join key.
[18]: df1 = pd.DataFrame({"x": ["x0", "x1", "x2"], "y": ["y0", "y1", "y2"]},
index=["j0", "j1", "j2"]) # Create DataFrame 2
df2 = pd.DataFrame({"z": ["z0", "z2", "z3"], "a": ["a0", "a2", "a3"]},
index=["K0", "K2", "K3"])
# Print DataFrame 1
print(df1)
# Print DataFrame 2
print(df2)
# Join DataFrames 1 and 2 on index (default)
df3 = df1.join(df2)
print(df3)
x y
j0 x0 y0
j1 x1 y1
j2 x2 y2
z a
K0 z0 a0
K2 z2 a2
K3 z3 a3
x y z a
j0 x0 y0 NaN NaN
j1 x1 y1 NaN NaN
j2 x2 y2 NaN NaN
0.0.4 INNER JOIN

Returns rows with matching keys in both DataFrames.
78
[21]: #inner join
df1 = pd.DataFrame({"x": ["x0", "x1", "x2"], "y": ["y0", "y1", "y2"]},
index=["j0", "j1", "j2"]) # Create DataFrame 2
df2 = pd.DataFrame({"x": ["x0", "x1", "x3"],"z": ["z0", "z2", "z3"],
"a": ["a0", "a2", "a3"]},
index=["K0", "K2", "K3"])
df4 = df1.merge(df2,on="x", how='inner')
print(df4)
x y z a
0 x0 y0 z0 a0
1 x1 y1 z2 a2
0.0.5 FULL OUTER JOIN

Returns all rows from both DataFrames.
[22]:
df5 = df1.merge(df2,on="x", how='outer')
print(df5)
x y z a
0 x0 y0 z0 a0
1 x1 y1 z2 a2
2 x2 y2 NaN NaN
3 x3 NaN z3 a3
0.0.6 LEFT OUTER JOIN

Returns all rows from the left DataFrame and matching rows from the right DataFrame.
0.0.7 RIGHT OUTER JOIN

Returns all rows from the right DataFrame and matching rows from the left DataFrame.
[25]:
df7 = df1.merge(df2,on="x",how='right')
print(df7)
x y z a
0 x0 y0 z0 a0
1 x1 y1 z2 a2
2 x3 NaN z3 a3
0.0.8 RESHAPE
Reshaping functions like pivot and melt are used to transform the layout of data frames.
79
# Create Series 1
s1 = pd.Series([0, 1, 2, 3], index=['a', 'b', 'c', 'd'])
# Create Series 2
s2 = pd.Series([4, 5, 6], index=['c', 'd', 'e'])
# Concatenate Series into DataFrame
df = pd.concat([s1, s2], keys=['one', 'two'])
print(df)
one a 0
b 1
c 2
d 3
two c 4
d 5
e 6
dtype: int64
[31]:
a b c d e
one 0.0 1.0 2.0 3.0 NaN
two NaN NaN 4.0 5.0 6.0
[ ]:
80
81
84
85
86
87
88
89
90
91
92
93

Numpy Merged

Uploaded by

Copyright:

Available Formats

Numpy Merged

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Numpy Merged

Uploaded by

Copyright:

Available Formats

NumPy Arrays and Vectorized Computation

0.0.1 NUMPY MODULE:

1 1. Numpy Arrays from Python DataStructures,Intrinsic Numpy

2 1. 1 Arrays from python data structures

[1] : array([1, 2, 3])

[8] : #Creation of ndarrays using array() method

[11] : array({1, 2, 3, 4, 5}, dtype=object)

3 1.2 Intirinsic Numpy Objects

[17] : array([[1, 3, 7],

[18] : array([[0, 0, 0],

[25] : array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])

[26] : array([1, 1, 1, 1, 1, 1])

[27] : array([0, 0, 0, 0, 0, 0])

[28] : array([0.1, 0.1, 0.1, 0.1, 0.1, 0.1])

[[0.1 0.1 0.1]

[30] : np.empty((2, 3, 2))

[30]: array([[[1.05337787e-311, 2.86558075e-322],

[31]: array([[1730487296, 496, 0],

[32]: #using diag() mrthod

[32]: array([[1, 0, 0, 0],

3.1 1.3 Random Functions

[[0.29653563+0.94629414j 0.56539718+0.58965768j 0.83340819+0.82456817j

[38] : array([[0.90898124+0.87349692j, 0.64895681+0.87327894j],

[39] : array([0, 3, 4, 1, 2])

[[ 0.08009351 1.04758386 -0.15977457 0.60779634 0.12686552 -2.29032851

['cherry' 'apple' 'bananaa']

4 2.Manipulation Of Numpy Arrays

[47] : array([[1, 2, 3],

[49] : import numpy as np

[50] : import numpy as np

[51] : import numpy as np

[52] : import numpy as np

[53] : import numpy as np

[64]: import numpy as np

arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])

arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])

arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])

[70]: import numpy as np

[71]: import numpy as np

[72]: import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

[73]: import numpy as np

8 2.4 Joining Arrays

[82]: import numpy as np

[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]

10 3.Computation On Numpy Arrays Using Universal Functions

11 3.1 Unary Universal Functions

[86]: arr = np.arange(10)

[88] : array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,

[0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]

12 3.2 Binary Universal Functions

[ 1.11097262 -0.26995231 0.0060993 1.04398907 -1.82141342 0.00998652

[-0.05342373 0.10817525 -0.4610533 0.5755554 -0.66695438 0.25344274

[96] : array([ 1.11097262, 0.10817525, 0.0060993 , 1.04398907, -0.66695438,

[97] : arr = np.random.randn(7) * 5

[-0.98958028 0.75318997 0.47148313 0.96309562 -0.84443205 0.60019609

[-6. 7. 2. 1. -2. 5. 3.]

[99] : import numpy as np

[100] : array([[-10, -9, -8],

[101] : array([[ 0, 10, 20],