FDS Lab Manual (1-3) PDF

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 17

SRM TRP ENGINEERING COLLEGE TIRUCHIRAPPALLI – 621 105

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CS3361 – DATA SCIENCE LABORATORY

Sl. No. Name of the Exercises (using Python) Page No.

1. Array processing with NUMPY 5

2. Manipulating Data frames and Series using PANDAS 9

INTRODUCTION – SOFTWARE INSTALLATION

Software required

Spyder IDE.

What is Spyder?

Spyder is an open-source cross- platform IDE.


Written completely in Python
Also called as Scientific Python Development IDE.

Features of Spyder

Syntax highlight
Availability of breakponts
Run configuration
Automatic colon insertion after if, while, etc..
Support all ipython commands.
Inline display for graphics produced using Matplotlib.
Also provides features such as help, file, explorer, find
files and so on.

SPYDER IDE Installation

Comes as a default implementation along with the Anaconda python distribution.

Step 1:
Go to Anaconda website https://www.anaconda.com.

Step 2: Click get started and click on download option.


1
SRM TRP
SRM TRP ENGINEERING COLLEGE
TIRUCHIRAPPALLI – 621 105
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Step 3: Choose the version that is suitable for your OS and click on download.

Step 4: Complete the Setup and Click on Finish.

Step 5:
Launch Sypder from the Anaconda Navigator.

Exp No: 1 ARRAY PROCESSING USING NUMPY Date:

Aim:
To implement the various operations on arrays, vectors and matrices using NumPy library
in Python.

Theory :

NumPy Library

 NumPy is a computational library that helps in speeding up Vector Algebra operations that
involve Vectors (Distance between points, Cosine Similarity) and Matrices.
 Specifically, it helps in constructing powerful n-dimensional arrays that works smoothly
with distributed and GPU systems.
 It is a very handy library and extensively used in the domains of Data Analytics and
Machine Learning.

(1) Arrays in NumPy

 NumPy’s main object is the homogeneous multidimensional array.


 It is a table of elements (usually numbers), all of the same type, indexed by a tuple of
positive integers.
 In NumPy dimensions are called axes. The number of axes is rank.
 NumPy’s array class is called ndarray. It is also known by the alias array.

(2) Array creation:

 There are various ways to create arrays in NumPy.

2
SRM TRP
SRM TRP ENGINEERING COLLEGE TIRUCHIRAPPALLI – 621 105
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
 Create an array from a regular Python list or tuple using the array function. The type of
the resulting array is deduced from the type of the elements in the sequences.
 Often, the elements of an array are originally unknown, but its size is known. Hence,
NumPy offers several functions to create arrays with initial placeholder content. These
minimize the necessity of growing arrays, an expensive operation. For example: np.zeros,
np.ones, np.full, np.empty, etc.

Exercises

#1 - Matrix addition
Given are two similar dimensional numpy arrays, get a numpy array output in which every element is an
element-wise sum of the 2 numpy arrays.
Code Sample Output
#Elementwise addition of two numpy arrays
importnumpyas np
a = np.array([[1,2,3], [[111315]
[4,5,6]]) [171921]]
b = np.array([[10,11,12],
[13,14,15]])
c=a+b
print(c)

#2 - Multiplying a Matrix by a scalar.


Given a numpy array (matrix), get a numpy array output which is equal to the original matrix multiplied by a
given scalar.
Code Sample Output
#Multiply a matrix by a scalar
importnumpyas np
a = np.array([[1,2,3], [[369 ]
[4,5,6]]) [121518]]
b=3*a
print(b)

#3- Create an identity Matrix


Create an identity matrix of given dimension
Code Sample Output

3
SRM TRP
SRM TRP ENGINEERING COLLEGE
TIRUCHIRAPPALLI – 621 105
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

#Indentity matrix of 4 X 4 size [[1.0.0.0.]


importnumpyas np [0.1.0.0.]
i = np.eye(4) [0.0.1.0.]
print(i) [0.0.0.1.]]

#4- MatrixMultiplication
Given 2 numpy arrays as matrices, output the result of multiplying the 2 matrices (as a numpy array)

Code Sample Output

# matrixmultiplication
importnumpyas np a =
np.array([[1,2,3], [[ 364248]
[4,5,6], [7,8,9]]) [ 8196111]
b = [126150174]]
np.array([[2,3,4
], [5,6,7],
[8,9,10]])
c = a@b print(c)
#5- Matrix transpose
Print the transpose of a given matrix.

Code Sample Output

# matrixtranspose
importnumpyas np a =
np.array([[1,2,3], [[147]
[4,5,6], [258]
[7,8,9]]) [369]]
b = a.T
print(b)

#6- Array datatype conversion


Convert all the elements of a numpy array from one datatype to another datatype (ex: float to int)
Code Sample Output

# Array datatype conversion


importnumpyas np a =
np.array([[2.5, 3.8, 1.5], [[231]
[4.7, 2.9, 1.56]]) [421]]
b = a.astype('int')
print(b)

#7- Stacking of Numpy arrays


4
SRM TRP
SRM TRP ENGINEERING COLLEGE TIRUCHIRAPPALLI – 621 105
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
Stack 2 numpy arrays horizontally i.e., 2 arrays having the same 1st dimension (number of rows in
2D arrays)
Code Sample Output
# Array stacking - horizontal
importnumpyas np a1 =
np.array([[1,2,3], [[ 123789 ]
[4,5,6]]) [ 456101112]]
a2 = np.array([[7,8,9],
[10,11,12]])
c = np.hstack((a1, a2))
print(c)

# Array stacking - Vertical


importnumpyas np
a1 = np.array([[1,2], [[ 12]
[3,4], [5,6]]) [ 34]
b = np.array([[7,8], [ 56]
[9,10], [10,11]]) [ 78]
c = np.vstack((a, b)) [ 910]
print(c) [1011]]

#8- Sequence generation


Generate a sequence of numbers in the form of a numpy array from 0 to 100 with gaps of 2 numbers, for
example: 0, 2, 4 ....
Code
# Sequence generation
importnumpyas np
list = [x for x inrange(0, 101, 2)]
a=np.array(list) print(a)

Output

[ 0246810121416182022242628303234
363840424446485052545658606264666870
7274767880828486889092949698100]

#9- Matrix generation with specific value


Output a matrix (numpy array) of dimension 2-by-3 with each and every value equal to 5
Code Sample Output

# Array stacking - horizontal


importnumpyas np a =
np.full((2, 3), 5) [[555]
print(a) [555]]

#10- Sorting an array


Sort the given NumPy array in ascending order.
Code Sample Output
# Array stacking - horizontal 5
SRM TRP
SRM TRP ENGINEERING COLLEGE
TIRUCHIRAPPALLI – 621 105
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

importnumpyas np
a = np.array ([[1, 4, 2], Array elements in sorted order:
[3, 4, 6], [-1 0 1 2 3 4 4 5 6]
[0, -1, 5]])
Row-wise sorted array:
# printing the sorted array [[ 1 2 4] [
print (np.sort(a, axis = None)) 3 4 6]
[-1 0 5]]
# sort array row-wise
Print (np.sort(a, axis = 1)) Column wise sort:
[[ 0 -1 2]
# sort array column-wise [ 1 4 5]
print (np.sort(a, axis = 0)) [ 3 4 6]]

Result:
Thus the implementation of the basic features of NumPy arrays has been completed successfully.

Exp No: 2 Date:


MANIPULATING DATA FRAMES AND SERIES
USING PANDAS

Aim:

To implement the basic operations used for data analysis using Pandas in Python.

Theory:

Pandas

 Pandas is a Python Data Analysis Lirbary, dealing primarily with tabular data.
 It's forms a major Data Analysis Toolbox which is widely used in the domains like Data Mining,
Data Warehousing, Machine Learning and General Data Science.
 It is an Open Source Library under a liberal BSD license.
 It has mainly 2 forms:
1. Series: Contains data related to a single variable (can be visualized as a vector) along with
indexing information.
2. DataFrame: Contains tabular data.

Data Frames

6
SRM TRP
SRM TRP ENGINEERING COLLEGE TIRUCHIRAPPALLI – 621 105
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
 A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and
columns.

7
SRM TRP
SRM TRP ENGINEERING COLLEGE
TIRUCHIRAPPALLI – 621 105
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

 Features of DataFrame
 Potentially columns are of different types
 Size – Mutable
 Labeled axes (rows and columns)
 Can Perform Arithmetic operations on rows and columns

Creating Data Frame using Pandas

A pandas DataFrame can be created using the following constructor pandas.DataFrame( data, index,
columns, dtype, copy)

A pandas DataFrame can be created using various inputs like:


1. Lists
2. dict
3. Series
4. Numpyndarrays
5. Another DataFrame

Exercises

#1 - Creating an empty data frame


A basic DataFrame, which can be created is an Empty Dataframe.
Code Sample Output

#import the pandas library


import pandas as pd df = Empty DataFrame
pd.DataFrame() print(df) Columns: []
Index: []

#2 - Creating data framefrom a List


The DataFrame can be created using a single list or a list of lists
Code Sample Output
#import the pandas library 0
import pandas as pd data = 0 1
[1,2,3,4,5] df = 1 2
pd.DataFrame(data)
2 3
print (df)
3 4
4 5
#import the pandas library
import pandas as pd Name Age
data = [['Alex',10],['Bob',12],['Clarke',13]] 0 Alex 10.0
df = pd.DataFrame(data,columns=['Name','Age']) 1 Bob 12.0
print (df) 2 Clarke 13.0

#3 - Creating data framefrom Dictionary of n-D arrays / Lists 8


SRM TRP
SRM TRP ENGINEERING COLLEGE TIRUCHIRAPPALLI – 621 105
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
All the ndarrays must be of same length. If index is passed, then the length of the index should equal to the
length of the arrays.
If no index is passed, then by default, index will be range(n), where n is the array length.
Code Sample Output
#import the pandas library Age Name
import pandas as pd 0 28 Tom
data = {'Name':['Tom', 'Jack', 'Steve','Ricky'],'Age':[28,34,29,42]} 1 34 Jack
df = pd.DataFrame(data) 2 29 Steve
print (df) 3 42 Ricky

9
SRM TRP
SRM TRP ENGINEERING COLLEGE TIRUCHIRAPPALLI
– 621 105
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

#creating a data frame from an array


#import the pandas library Age Name import pandas as pd rank1 28 Tom
data={'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]} rank2 34 Jack df =
pd.DataFrame(data, index=['rank1','rank2','rank3','rank4']) rank3 29 Steve print (df) rank4
42 Ricky

#4 - Creating a data framefrom a Series


Dictionary of Series can be passed to form a DataFrame. The resultant index is the union of all the series
indexes passed
Code Sample Output
import pandas as pd one two
d = { 'one' :pd.Series([1, 2, 3], index=['a', 'b', 'c']), a 1.0 1
'two' :pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} b 2.0 2
df = pd.DataFrame(d) print (df) c 3.0 3
d NaN 4

#5 - Sorting data frame


Given a data frame sort by a given column.
Code Sample Output
Age Name
0 28 Tom
#import the pandas library 1 34 Jack
import pandas as pd 2 29 Steve
data ={ 'Name':['Tom', 'Jack', 'Steve','Ricky'],'Age':[28,34,29,42]} 3 42 Ricky
df = pd.DataFrame(data) Sorted data frame…
print (df) Age Name
df_sorted = df.sort_values( by = ‘Name’) 0 34Jack
print (“Sorted data frame…”) print 1 42Ricky
(df_sorted) 2 29 Steve
3 28 Tom

#6- Manipulating a Data frame Column


Manipulating column includes selection of column, adding a new column and removing an existing
column from the data frame.
Code Sample
Output
#selecting a column one
import pandas as pd a 1.0 b
d = { 'one' :pd.Series([1, 2, 3], 2.0 c
index=['a', 'b', 'c']), 10
SRM TRP
'two' :pd.Series([1, 2, 3, 4], 3.0 d
index=['a', 'b', 'c', 'd'])} df = NaN
pd.DataFrame(d) print(df [ ‘one’])
SRM TRP ENGINEERING COLLEGE TIRUCHIRAPPALLI
– 621 105
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

#adding a new column


import pandas as pd one two three
d = { 'one' :pd.Series([1, 2, 3], index=['a', 'b', 'c']), a 1.0 1 10.0
'two' :pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} b 2.0 2 20.0
df = pd.DataFrame(d) c 3.0 3 30.0
df['three']=pd.Series([10,20,30],index=['a','b','c']) d NaN 4 NaN
print (df)
#deleting anexisting column import
pandas as pd
d={ 'one' :pd.Series([1, 2, 3], index=['a', 'b', 'c']), two three
'two' :pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']), a 1 10.0
'three' :pd.Series([10,20,30], index=['a','b','c'])} b 2 20.0
df = pd.DataFrame(d) c 3 30.0
print ("Deleting the first column using DEL function:") d 4 NaN
del df['one'] print(df)

#7 - Manipulating a Data frame row


Manipulating a row includes selection of row, adding new row, and removing an existing row from the
data frame.
Code Sample Output
#selecting a row
import pandas as pd b
d={ 'one' :pd.Series([1, 2, 3], index=['a', 'b', 'c']), one 2.0 two
'two' :pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 2.0
'd'])}
df = pd.DataFrame(d)
print df.loc['b']
#addinga new row import pandas as pd df =
pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b']) df2 = a b0
pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b']) df = 1 2
1 3 4
df.append(df2) print df
0 5 6
1 7 8

11
SRM TRP
#deleting anexisting row import pandas as pd df =
pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b']) df2 =
pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b']) df = a b 1
3 4
df.append(df2) # Drop rows with label 0 df = 1 7 8
df.drop(0) print df

Result:
Thus the implementation of the basic features of Pandas has been completed successfully.

Ex no:4 DATE:

Exploring various commands for doing descriptive analytics on the


IRIS dataset

Aim:
To explore various commands for doing descriptive analytics on the iris dataset.

ALGORITHM:

INPUT:
Download iris.csv dataset from https://datahub.io/machine-learning/iris

PROGRAM:

import pandas as pd
df=pd.read_csv("iris_csv.csv")
df.head()
df.shape
df.info()
df.describe()
df.isnull().sum()
df.value_counts("class")
12
SRM TRP
OUTPUT:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sepallength 150 non-null float64
1 sepalwidth 150 non-null float64
2 petallength 150 non-null float64
3 petalwidth 150 non-null float64
4 class 150 non-null object
dtypes: float64(4), object(1)
memory usage: 6.0+ KB

class
Iris-setosa 50
Iris-versicolor 50
Iris-virginica 50
dtype: int64

RESULT:
Thus the implementation of exploring various commands for doing
descriptive analysis on the Iris data set is executed successfully.

EXP:5a DATE:

Univariate analysis:Frequency ,mean,median,mode,variance,standard


deviation,skewness and kurtosis.

AIM:

ALGORITHM: 13
SRM TRP
INPUT:

Download Diabetes data set from” https://datahub.io/machine-learning/diabetes”

PROGRAM:

import pandas as pd
import statistics as st
df=pd.read_csv("diabetes_csv.csv")
print(df.shape)
print(df.info())

#univariate analysis of diabetes dataset


print('MEAN:\n',df.mean())
print('MEDIAN:\n',df.median())
print('MODE:\n',df.mode())
print('STANDARD DEVIATION\n',df.std())
print('VARIANCE:\n:',df.var())
print('SKEWNESS:\n:',df.skew())
print('KURTOSIS|n',df.kurtosis())
df.describe

OUTPUT:

(768, 9)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 preg 768 non-null int64
1 plas 768 non-null int64
2 pres 768 non-null int64
3 skin 768 non-null int64
4 insu 768 non-null int64
5 mass 768 non-null float64
6 pedi 768 non-null float64
7 age 768 non-null int64
8 class 768 non-null int64
object dtypes: float64(2), int64(6), object(1)
14
SRM TRP
memory usage: 54.1+ KB
None

MEAN:

preg 3.845052
plas 120.894531
pres 69.105469
skin 20.536458
insu 79.799479
mass 31.992578
pedi 0.471876
age 33.240885
dtype: float64
MEDIAN:

preg 3.0000
plas 117.0000
pres 72.0000
skin 23.0000
insu 30.5000
mass 32.0000
pedi 0.3725
age 29.0000
dtype: float64
MODE:

preg plas pres skin insu mass pedi age class


0 1.0 99 70.0 0.0 0.0 32.0 0.254 22.0 tested_negative
1 NaN 100 NaN NaN NaN NaN 0.258 NaN NaN
STANDARD DEVIATION
preg 3.369578
plas 31.972618
pres 19.355807
skin 15.952218
insu 115.244002
mass 7.884160
pedi 0.331329
age 11.760232
dtype: float64
VARIANCE:

: preg 11.354056
plas 1022.248314
pres 374.647271
skin 254.473245
insu 13281.180078
mass 62.159984
15
SRM TRP
pedi 0.109779
age 138.303046
dtype: float64
SKEWNESS:

: preg 0.901674
plas 0.173754
pres -1.843608
skin 0.109372
insu 2.272251
mass -0.428982
pedi 1.919911
age 1.129597
dtype: float64
KURTOSIS
|n preg 0.159220
plas 0.640780
pres 5.180157
skin -0.520072
insu 7.214260
mass 3.290443
pedi 5.594954
age 0.643159
dtype: float64

<bound method NDFrame.describe of preg plas pres skin insu mass pedi age class
0 6 148 72 35 0 33.6 0.627 50 tested_positive
1 1 85 66 29 0 26.6 0.351 31 tested_negative
2 8 183 64 0 0 23.3 0.672 32 tested_positive
3 1 89 66 23 94 28.1 0.167 21 tested_negative
4 0 137 40 35 168 43.1 2.288 33 tested_positive
.. ... ... ... ... ... ... ... ... ...
763 10 101 76 48 180 32.9 0.171 63 tested_negative
764 2 122 70 27 0 36.8 0.340 27 tested_negative
765 5 121 72 23 112 26.2 0.245 30 tested_negative
766 1 126 60 0 0 30.1 0.349 47 tested_positive
767 1 93 70 31 0 30.4 0.315 23 tested_negative

[768 rows x 9 columns]>

RESULT:

16
SRM TRP
17
SRM TRP

You might also like