FDS Lab Manual (1-3) PDF
FDS Lab Manual (1-3) PDF
FDS Lab Manual (1-3) PDF
Software required
Spyder IDE.
What is Spyder?
Features of Spyder
Syntax highlight
Availability of breakponts
Run configuration
Automatic colon insertion after if, while, etc..
Support all ipython commands.
Inline display for graphics produced using Matplotlib.
Also provides features such as help, file, explorer, find
files and so on.
Step 1:
Go to Anaconda website https://www.anaconda.com.
Step 3: Choose the version that is suitable for your OS and click on download.
Step 5:
Launch Sypder from the Anaconda Navigator.
Aim:
To implement the various operations on arrays, vectors and matrices using NumPy library
in Python.
Theory :
NumPy Library
NumPy is a computational library that helps in speeding up Vector Algebra operations that
involve Vectors (Distance between points, Cosine Similarity) and Matrices.
Specifically, it helps in constructing powerful n-dimensional arrays that works smoothly
with distributed and GPU systems.
It is a very handy library and extensively used in the domains of Data Analytics and
Machine Learning.
2
SRM TRP
SRM TRP ENGINEERING COLLEGE TIRUCHIRAPPALLI – 621 105
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
Create an array from a regular Python list or tuple using the array function. The type of
the resulting array is deduced from the type of the elements in the sequences.
Often, the elements of an array are originally unknown, but its size is known. Hence,
NumPy offers several functions to create arrays with initial placeholder content. These
minimize the necessity of growing arrays, an expensive operation. For example: np.zeros,
np.ones, np.full, np.empty, etc.
Exercises
#1 - Matrix addition
Given are two similar dimensional numpy arrays, get a numpy array output in which every element is an
element-wise sum of the 2 numpy arrays.
Code Sample Output
#Elementwise addition of two numpy arrays
importnumpyas np
a = np.array([[1,2,3], [[111315]
[4,5,6]]) [171921]]
b = np.array([[10,11,12],
[13,14,15]])
c=a+b
print(c)
3
SRM TRP
SRM TRP ENGINEERING COLLEGE
TIRUCHIRAPPALLI – 621 105
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
#4- MatrixMultiplication
Given 2 numpy arrays as matrices, output the result of multiplying the 2 matrices (as a numpy array)
# matrixmultiplication
importnumpyas np a =
np.array([[1,2,3], [[ 364248]
[4,5,6], [7,8,9]]) [ 8196111]
b = [126150174]]
np.array([[2,3,4
], [5,6,7],
[8,9,10]])
c = a@b print(c)
#5- Matrix transpose
Print the transpose of a given matrix.
# matrixtranspose
importnumpyas np a =
np.array([[1,2,3], [[147]
[4,5,6], [258]
[7,8,9]]) [369]]
b = a.T
print(b)
Output
[ 0246810121416182022242628303234
363840424446485052545658606264666870
7274767880828486889092949698100]
importnumpyas np
a = np.array ([[1, 4, 2], Array elements in sorted order:
[3, 4, 6], [-1 0 1 2 3 4 4 5 6]
[0, -1, 5]])
Row-wise sorted array:
# printing the sorted array [[ 1 2 4] [
print (np.sort(a, axis = None)) 3 4 6]
[-1 0 5]]
# sort array row-wise
Print (np.sort(a, axis = 1)) Column wise sort:
[[ 0 -1 2]
# sort array column-wise [ 1 4 5]
print (np.sort(a, axis = 0)) [ 3 4 6]]
Result:
Thus the implementation of the basic features of NumPy arrays has been completed successfully.
Aim:
To implement the basic operations used for data analysis using Pandas in Python.
Theory:
Pandas
Pandas is a Python Data Analysis Lirbary, dealing primarily with tabular data.
It's forms a major Data Analysis Toolbox which is widely used in the domains like Data Mining,
Data Warehousing, Machine Learning and General Data Science.
It is an Open Source Library under a liberal BSD license.
It has mainly 2 forms:
1. Series: Contains data related to a single variable (can be visualized as a vector) along with
indexing information.
2. DataFrame: Contains tabular data.
Data Frames
6
SRM TRP
SRM TRP ENGINEERING COLLEGE TIRUCHIRAPPALLI – 621 105
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and
columns.
7
SRM TRP
SRM TRP ENGINEERING COLLEGE
TIRUCHIRAPPALLI – 621 105
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
Features of DataFrame
Potentially columns are of different types
Size – Mutable
Labeled axes (rows and columns)
Can Perform Arithmetic operations on rows and columns
A pandas DataFrame can be created using the following constructor pandas.DataFrame( data, index,
columns, dtype, copy)
Exercises
9
SRM TRP
SRM TRP ENGINEERING COLLEGE TIRUCHIRAPPALLI
– 621 105
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
11
SRM TRP
#deleting anexisting row import pandas as pd df =
pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b']) df2 =
pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b']) df = a b 1
3 4
df.append(df2) # Drop rows with label 0 df = 1 7 8
df.drop(0) print df
Result:
Thus the implementation of the basic features of Pandas has been completed successfully.
Ex no:4 DATE:
Aim:
To explore various commands for doing descriptive analytics on the iris dataset.
ALGORITHM:
INPUT:
Download iris.csv dataset from https://datahub.io/machine-learning/iris
PROGRAM:
import pandas as pd
df=pd.read_csv("iris_csv.csv")
df.head()
df.shape
df.info()
df.describe()
df.isnull().sum()
df.value_counts("class")
12
SRM TRP
OUTPUT:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sepallength 150 non-null float64
1 sepalwidth 150 non-null float64
2 petallength 150 non-null float64
3 petalwidth 150 non-null float64
4 class 150 non-null object
dtypes: float64(4), object(1)
memory usage: 6.0+ KB
class
Iris-setosa 50
Iris-versicolor 50
Iris-virginica 50
dtype: int64
RESULT:
Thus the implementation of exploring various commands for doing
descriptive analysis on the Iris data set is executed successfully.
EXP:5a DATE:
AIM:
ALGORITHM: 13
SRM TRP
INPUT:
PROGRAM:
import pandas as pd
import statistics as st
df=pd.read_csv("diabetes_csv.csv")
print(df.shape)
print(df.info())
OUTPUT:
(768, 9)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 preg 768 non-null int64
1 plas 768 non-null int64
2 pres 768 non-null int64
3 skin 768 non-null int64
4 insu 768 non-null int64
5 mass 768 non-null float64
6 pedi 768 non-null float64
7 age 768 non-null int64
8 class 768 non-null int64
object dtypes: float64(2), int64(6), object(1)
14
SRM TRP
memory usage: 54.1+ KB
None
MEAN:
preg 3.845052
plas 120.894531
pres 69.105469
skin 20.536458
insu 79.799479
mass 31.992578
pedi 0.471876
age 33.240885
dtype: float64
MEDIAN:
preg 3.0000
plas 117.0000
pres 72.0000
skin 23.0000
insu 30.5000
mass 32.0000
pedi 0.3725
age 29.0000
dtype: float64
MODE:
: preg 11.354056
plas 1022.248314
pres 374.647271
skin 254.473245
insu 13281.180078
mass 62.159984
15
SRM TRP
pedi 0.109779
age 138.303046
dtype: float64
SKEWNESS:
: preg 0.901674
plas 0.173754
pres -1.843608
skin 0.109372
insu 2.272251
mass -0.428982
pedi 1.919911
age 1.129597
dtype: float64
KURTOSIS
|n preg 0.159220
plas 0.640780
pres 5.180157
skin -0.520072
insu 7.214260
mass 3.290443
pedi 5.594954
age 0.643159
dtype: float64
<bound method NDFrame.describe of preg plas pres skin insu mass pedi age class
0 6 148 72 35 0 33.6 0.627 50 tested_positive
1 1 85 66 29 0 26.6 0.351 31 tested_negative
2 8 183 64 0 0 23.3 0.672 32 tested_positive
3 1 89 66 23 94 28.1 0.167 21 tested_negative
4 0 137 40 35 168 43.1 2.288 33 tested_positive
.. ... ... ... ... ... ... ... ... ...
763 10 101 76 48 180 32.9 0.171 63 tested_negative
764 2 122 70 27 0 36.8 0.340 27 tested_negative
765 5 121 72 23 112 26.2 0.245 30 tested_negative
766 1 126 60 0 0 30.1 0.349 47 tested_positive
767 1 93 70 31 0 30.4 0.315 23 tested_negative
RESULT:
16
SRM TRP
17
SRM TRP