45B AIML Practical1.1

Name of Student: Ahmed Mobin Ahmed Shaikh
Roll Number: 45 Lab Practical Number: 1.1
Title of Lab Assignment: Numpy, Pandas Implementation and exercises.
DOP: 23-01-24 DOS: 30-01-24
CO Mapped: PO Mapped : Signature:

CO1 PO1, PO2,
PO3, PSO1,
PSO2
NumPy Arrays | Numpy Arange | Numpy
Linspace | Numpy Rand | Numpy Reshape |
Numpy Shape
Notebook Link:
https://colab.research.google.com/drive/1re8EJQ0Q4PfEqIoj
f6L1G-IHlSE0OvDn#scrollTo=vBm7sbc6ed8C
NumPy Arrays:
NumPy is a powerful library for numerical computing in Python. One
of its key features is the NumPy array, a multidimensional array of
elements, usually of the same type. NumPy arrays are more efficient
than Python lists for numerical operations because they are
implemented in C and allow for vectorized operations.
Creating NumPy Arrays:
You can create NumPy arrays in various ways:
python
import numpy as np
# Creating an array from a list

arr_list = [1, 2, 3, 4, 5]
np_array_from_list = np.array(arr_list)
# Creating an array using np.arange

arr_range = np.arange(0, 10, 2) # Creates an array from 0 to 10
(exclusive) with step 2
# Creating an array using np.linspace

arr_linspace = np.linspace(0, 1, 5) # Creates an array of 5 evenly
spaced values between 0 and 1
# Creating an array of random values using np.random

arr_random = np.random.rand(3, 3) # Creates a 3x3 array of random
values between 0 and 1
NumPy arange:
np.arange is a function that returns an array with regularly spaced

values within a given interval. It is similar to the Python range
function but returns a NumPy array.
python
arr_arange = np.arange(start, stop, step)
NumPy linspace:
np.linspace returns an array with evenly spaced values over a

specified range. Unlike np.arange, it includes both the start and
stop values, and you specify the number of elements you want.
python
arr_linspace = np.linspace(start, stop, num)
NumPy random:
np.random module provides functions for generating random data. Some

commonly used functions are rand, randn, randint, random, and
shuffle.
python
arr_random = np.random.rand(3, 3) # Generates a 3x3 array of random
values between 0 and 1
NumPy reshape:
np.reshape is used to change the shape of an array. It allows you to

reorganize the elements of an array into a new shape without
changing their values.
python
arr_reshape = np.reshape(original_array, new_shape)
NumPy shape:
np.shape is an attribute that returns the shape of an array. It is a

tuple representing the dimensions of the array.
python
shape_tuple = np.shape(arr)
NumPy Indexing and Selection | Fancy
Indexing | Matrices in Python | Numpy in
Machine Learning
Notebook Link:
https://colab.research.google.com/drive/1yVclsVQ8Amiq
cT2euXUbn4MJTb5zxUjX
NumPy Indexing and Selection:

NumPy provides powerful indexing and selection mechanisms for
accessing elements or subsets of elements in arrays.
Basic Indexing:
python
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
# Accessing elements using index

element_at_index_2 = arr[2] # Returns 3
# Slicing to get a subset

subset = arr[1:4] # Returns array([2, 3, 4])
Multidimensional Array Indexing:

python
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Accessing elements in a 2D array

element_row2_col1 = arr_2d[1, 0] # Returns 4
# Slicing a 2D array
subset_2d = arr_2d[:2, 1:] # Returns array([[2, 3], [5, 6]])
Fancy Indexing:
Fancy indexing allows you to use arrays of indices to access
multiple elements at once.
python
indices = np.array([0, 2, 1])
selected_elements = arr[indices] # Returns array([1, 3, 2])
Matrices in Python:
In NumPy, matrices are represented using the np.array class with two
dimensions. Matrices can be created by passing nested lists or using
the np.matrix class.
python
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
NumPy provides functions for matrix operations such as

multiplication (np.dot or @ operator), inversion (np.linalg.inv),
and determinant (np.linalg.det).
NumPy in Machine Learning:

NumPy is extensively used in machine learning for its efficient
array operations. Here are some key aspects:
1. Data Representation:
● In machine learning, datasets are often represented as NumPy

arrays.
● Features of a dataset can be stored in a 2D array, where rows
represent samples and columns represent features.
2. Vectorization:
● NumPy allows vectorized operations, making it possible to

perform operations on entire arrays without using explicit
loops. This significantly improves computational efficiency.
3. Linear Algebra:
● Linear algebra operations, such as matrix multiplication, are

fundamental in machine learning algorithms. NumPy provides
efficient implementations of these operations.
4. Random Number Generation:
● NumPy's random module is used for generating random numbers,

which is often crucial in machine learning for tasks like data
shuffling or initialization of weights in neural networks.
5. Indexing and Selection:
Efficient indexing and selection using NumPy are essential for

manipulating and accessing data in machine learning applications.
Numpy Operations | Numpy arithmetic
Operations | Numpy Universal Array Functions
Notebook Link:
https://colab.research.google.com/drive/19RdU0QOltzqPPscW
fkqkqMb8GEG7qttl
NumPy Operations:
NumPy provides a wide range of operations that can be performed on
arrays. These operations include arithmetic operations, statistical
operations, linear algebra operations, and more.
Arithmetic Operations:
NumPy allows you to perform element-wise arithmetic operations on

arrays.
python
import numpy as np
arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])
# Addition
result_addition = arr1 + arr2 # [5, 7, 9]
# Subtraction
result_subtraction = arr1 - arr2 # [-3, -3, -3]
# Multiplication
result_multiplication = arr1 * arr2 # [4, 10, 18]
# Division
result_division = arr1 / arr2 # [0.25, 0.4, 0.5]
# Element-wise power
result_power = arr1 ** 2 # [1, 4, 9]
Universal Array Functions (ufuncs):
NumPy also provides Universal Functions (ufuncs), which are

functions that operate element-wise on arrays. These functions are
highly optimized and can operate on arrays of any size and shape.
python
# Square root
result_sqrt = np.sqrt(arr1) # [1.0, 1.414, 1.732]
# Exponential
result_exp = np.exp(arr1) # [2.718, 7.389, 20.085]
# Trigonometric functions
result_sin = np.sin(arr1) # [0.841, 0.909, 0.141]
result_cos = np.cos(arr1) # [0.540, -0.416, -0.990]
Aggregation Functions:
NumPy provides functions for aggregating values in an array, such as

sum, mean, median, min, max, etc.
python
# Sum
total_sum = np.sum(arr1) # 6
# Mean
mean_value = np.mean(arr1) # 2.0
# Minimum and maximum

min_value = np.min(arr1) # 1
max_value = np.max(arr1) # 3
Linear Algebra Operations:
NumPy has a comprehensive set of functions for linear algebra

operations.
python
matrix_a = np.array([[1, 2], [3, 4]])
matrix_b = np.array([[5, 6], [7, 8]])
# Matrix multiplication
result_matrix_multiply = np.dot(matrix_a, matrix_b)
# Matrix determinant
matrix_det = np.linalg.det(matrix_a)
# Matrix inverse
matrix_inv = np.linalg.inv(matrix_a)
Pandas in Python| Series in Pandas | Pandas
Series to Dataframe | Pandas Series to List
Notebook Link:
https://colab.research.google.com/drive/1y5rHaTKhKSwmkx-
4jZbRQ7CcL471JlQd
Pandas in Python:
Pandas is a popular open-source data manipulation and

analysis library for Python. It provides two primary data
structures: Series and DataFrame. These structures are
built on top of NumPy arrays, offering more functionality
and flexibility for data manipulation.
Series in Pandas:
A Series is a one-dimensional labeled array in Pandas. It

is capable of holding any data type, and each element in
the Series has a label called an index.
Creating a Pandas Series:
python
import pandas as pd
# Creating a Series from a list

data_list = [1, 2, 3, 4, 5]
series_from_list = pd.Series(data_list)
# Creating a Series with custom index

custom_index_series = pd.Series(data_list, index=['a',
'b', 'c', 'd', 'e'])
Pandas Series to DataFrame:

A DataFrame is a two-dimensional labeled data structure
with columns that can be of different data types. You can
convert a Pandas Series to a DataFrame using the
pd.DataFrame() constructor.
python
# Creating a DataFrame from a Series
df_from_series = pd.DataFrame(series_from_list,
columns=['Column_Name'])
# Creating a DataFrame from multiple Series

series1 = pd.Series([1, 2, 3])
series2 = pd.Series(['a', 'b', 'c'])
df_from_multiple_series = pd.DataFrame({'Column1':
series1, 'Column2': series2})
Pandas Series to List:
You can convert a Pandas Series to a Python list using

the tolist() method.
python
# Converting a Pandas Series to a list
list_from_series = series_from_list.tolist()
DataFrames in Pandas:
A DataFrame is a two-dimensional labeled data structure

in Pandas, resembling a table or a spreadsheet with rows
and columns. It is one of the most widely used data
structures for data manipulation and analysis in Python.
Creating a DataFrame:
There are several ways to create a DataFrame in Pandas:
1. From a Dictionary of Lists:
python
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'San Francisco', 'Los Angeles']
}
df = pd.DataFrame(data)
2. From a List of Lists:
python
data_list = [
['Alice', 25, 'New York'],
['Bob', 30, 'San Francisco'],
['Charlie', 35, 'Los Angeles']
]
df = pd.DataFrame(data_list, columns=['Name', 'Age',

'City'])
3. From a NumPy Array:
python
import numpy as np
data_array = np.array([
['Alice', 25, 'New York'],
['Bob', 30, 'San Francisco'],
['Charlie', 35, 'Los Angeles']
])
df = pd.DataFrame(data_array, columns=['Name', 'Age',

'City'])
4. From a CSV File:
python
df = pd.read_csv('example.csv')
Essential DataFrame Operations:
Once you have a DataFrame, you can perform various

operations on it:
1. Viewing Data:
python
# Display the first few rows
df.head()
# Display the last few rows

df.tail()
# Display basic statistics

df.describe()
2. Selecting Data:
python
# Selecting a column
name_column = df['Name']
# Selecting multiple columns
selected_columns = df[['Name', 'Age']]
3. Filtering Data:
python
# Filtering based on a condition
filtered_df = df[df['Age'] > 30]
4. Adding and Removing Columns:

python
# Adding a new column
df['Salary'] = [50000, 60000, 70000]
# Removing a column
df = df.drop('Salary', axis=1)
5. Handling Missing Data:

python
# Check for missing values
df.isnull()
# Drop rows with missing values

df = df.dropna()
# Fill missing values with a specific value

df = df.fillna(0)
6. Grouping and Aggregation:

python
# Group by a column and calculate mean
grouped_df = df.groupby('City').mean()
7. Merging DataFrames:
python
# Merge two DataFrames
merged_df = pd.merge(df1, df2, on='Key_Column')
8. Writing to CSV:
python
df.to_csv('output.csv', index=False)
Handling Missing Data:
1. Dropping Missing Values:
Use dropna() to remove rows or columns with missing

values.
python
# Drop rows with any missing values
df_no_missing_rows = df.dropna()
# Drop columns with any missing values

df_no_missing_cols = df.dropna(axis=1)
2. Filling Missing Values:
Use fillna() to fill missing values with a specified

value or a calculated value (mean, median, etc.).
python
# Fill missing values with a specific value (e.g., 0)
df_filled_zero = df.fillna(0)
# Fill missing values with the mean of each column

df_filled_mean = df.fillna(df.mean())
3. Interpolation:
Use interpolate() to fill missing values by interpolating

between existing values.
python
# Interpolate missing values linearly
df_interpolated = df.interpolate()
4. Forward or Backward Fill:
Use ffill (forward fill) or bfill (backward fill) to fill

missing values with the previous or next valid value.
python
# Forward fill missing values
df_ffill = df.ffill()
# Backward fill missing values
df_bfill = df.bfill()
5. Handling Missing Values in Time Series:
For time series data, you might want to fill missing

values using methods like forward-fill or backward-fill
with specific limits.
python
# Forward fill with a limit of 1 (fills missing values up
to one non-missing value)
df_ffill_limit = df.ffill(limit=1)
Pandas Operations:
Pandas provides a rich set of operations for data

manipulation. Here, we'll discuss some common operations:
GroupBy, Merge, Joins, and Concatenation.
1. GroupBy with Pandas:
The GroupBy operation involves splitting the data based

on some criteria, applying a function to each group
independently, and then combining the results.
python
import pandas as pd
# Creating a DataFrame
data = {'Category': ['A', 'B', 'A', 'B', 'A', 'B'],
'Value': [10, 20, 15, 25, 12, 18]}
df = pd.DataFrame(data)
# Grouping by 'Category' and calculating the mean for

each group
grouped_df = df.groupby('Category').mean()
2. Merge with Pandas:
Merging is a way to combine two DataFrames based on a

common column.
python
# Creating two DataFrames
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice',
'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [2, 3, 4], 'Age': [25, 30,
35]})
# Merging DataFrames based on the 'ID' column

merged_df = pd.merge(df1, df2, on='ID')
3. Joins with Pandas:
Joins are similar to merges but allow you to specify the

type of join (inner, outer, left, or right).
python
# Performing a left join

left_join_df = pd.merge(df1, df2, on='ID', how='left')
4. Concatenation:
Concatenation combines two DataFrames along a particular

axis (either rows or columns).
python
# Concatenating two DataFrames vertically (along rows)
concatenated_df = pd.concat([df1, df2], axis=0)
# Concatenating two DataFrames horizontally (along

columns)
concatenated_df_horizontal = pd.concat([df1, df2],
axis=1)
Summary of Operations:
1. GroupBy:
● Use groupby() to group data based on specific

criteria.
● Apply aggregation functions like mean(), sum(), etc.,
on grouped data.
2. Merge:
● Use merge() to combine two DataFrames based on a

common column.
● Specify the 'on' parameter as the column for merging.
3. Joins:
● Joins are a type of merge.

● Use the how parameter to specify the type of join
(inner, outer, left, right).
4. Concatenation:
● Use concat() to combine DataFrames along a specified

axis.
● Specify the axis (0 for rows, 1 for columns).
Exploratory Data Analysis - 1
Exploratory Data Analysis (EDA) is a crucial step in the
data analysis process. It involves exploring and
understanding the main characteristics of a dataset
before applying more advanced statistical modeling. Here,
I'll detail various aspects of EDA:
1. Understanding the Data:

- Load the Data:
● Import necessary libraries (e.g., Pandas, NumPy) and

load your dataset into a DataFrame.
python
import pandas as pd
# Load the dataset

df = pd.read_csv('your_dataset.csv')
- Initial Inspection:
● Use methods like head(), info(), and describe() to

get an initial overview of the dataset.
python
# Display the first few rows
print(df.head())
# Get general information about the dataset

print(df.info())
# Get summary statistics

print(df.describe())
2. Dealing with Missing Values:

- Identify Missing Values:
● Check for missing values using isnull().
python
# Check for missing values
print(df.isnull().sum())
- Handling Missing Values:
● Decide on a strategy for handling missing values

(removing, imputing, etc.).
python
# Drop rows with missing values
df = df.dropna()
# Impute missing values

df['column_name'].fillna(df['column_name'].mean(),
inplace=True)
3. Exploratory Visualization:
- Univariate Analysis:
● Visualize the distribution of individual variables.
python
import matplotlib.pyplot as plt
import seaborn as sns
# Histogram for a numeric variable

plt.hist(df['numeric_column'], bins=20)
plt.show()
# Bar chart for a categorical variable

sns.countplot(x='category_column', data=df)
plt.show()
- Bivariate Analysis:
● Explore relationships between pairs of variables.
python
# Scatter plot for two numeric variables
plt.scatter(df['numeric_column1'], df['numeric_column2'])
plt.xlabel('Numeric Column 1')
plt.ylabel('Numeric Column 2')
plt.show()
# Boxplot for a numeric variable across categories

sns.boxplot(x='category_column', y='numeric_column',
data=df)
plt.show()
- Correlation Analysis:
● Understand the correlation between numeric variables.
python
# Correlation matrix
correlation_matrix = df.corr()
# Heatmap for correlation matrix

sns.heatmap(correlation_matrix, annot=True,
cmap='coolwarm')
plt.show()
4. Feature Engineering:
- Creating New Features:
● Derive new features that might be more informative.
python
# Create a new feature
df['new_feature'] = df['numeric_column1'] *
df['numeric_column2']
- Transforming Features:
● Apply transformations to existing features (e.g., log

transformation).
python
# Log transformation
df['log_numeric_column'] = np.log(df['numeric_column'])
5. Statistical Testing:
- Hypothesis Testing:
● Conduct statistical tests to validate hypotheses.
python
from scipy.stats import ttest_ind
# Perform t-test for two groups

group1 = df[df['condition'] == 'A']['numeric_column']
group2 = df[df['condition'] == 'B']['numeric_column']
t_stat, p_value = ttest_ind(group1, group2)
Collab Link:
https://colab.research.google.com/drive/19RdU0QOltzqPPscWfkqkqMb8GEG
7qttl#scrollTo=MSAho0QKo-2F

45B AIML Practical1.1

Uploaded by

Copyright:

Available Formats

45B AIML Practical1.1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

45B AIML Practical1.1

Uploaded by

Copyright:

Available Formats

Name of Student: Ahmed Mobin Ahmed Shaikh

Roll Number: 45 Lab Practical Number: 1.1

Title of Lab Assignment: Numpy, Pandas Implementation and exercises.

DOP: 23-01-24 DOS: 30-01-24

CO Mapped: PO Mapped : Signature:

Creating NumPy Arrays:

You can create NumPy arrays in various ways:

# Creating an array from a list

# Creating an array using np.arange

# Creating an array using np.linspace

# Creating an array of random values using np.random

np.arange is a function that returns an array with regularly spaced

np.linspace returns an array with evenly spaced values over a

np.random module provides functions for generating random data. Some

np.reshape is used to change the shape of an array. It allows you to

np.shape is an attribute that returns the shape of an array. It is a

NumPy Indexing and Selection:

arr = np.array([1, 2, 3, 4, 5])

# Accessing elements using index

# Slicing to get a subset

Multidimensional Array Indexing:

# Accessing elements in a 2D array

NumPy provides functions for matrix operations such as

NumPy in Machine Learning:

● In machine learning, datasets are often represented as NumPy

● NumPy allows vectorized operations, making it possible to

● Linear algebra operations, such as matrix multiplication, are

4. Random Number Generation:

● NumPy's random module is used for generating random numbers,

Efficient indexing and selection using NumPy are essential for

NumPy allows you to perform element-wise arithmetic operations on

arr1 = np.array([1, 2, 3])

Universal Array Functions (ufuncs):

NumPy also provides Universal Functions (ufuncs), which are

NumPy provides functions for aggregating values in an array, such as

# Minimum and maximum

Linear Algebra Operations:

NumPy has a comprehensive set of functions for linear algebra

Pandas is a popular open-source data manipulation and

A Series is a one-dimensional labeled array in Pandas. It

# Creating a Series from a list

# Creating a Series with custom index

Pandas Series to DataFrame:

# Creating a DataFrame from multiple Series

Pandas Series to List:

You can convert a Pandas Series to a Python list using

A DataFrame is a two-dimensional labeled data structure

There are several ways to create a DataFrame in Pandas:

1. From a Dictionary of Lists:

2. From a List of Lists:

df = pd.DataFrame(data_list, columns=['Name', 'Age',

3. From a NumPy Array:

df = pd.DataFrame(data_array, columns=['Name', 'Age',

4. From a CSV File:

Essential DataFrame Operations:

Once you have a DataFrame, you can perform various

# Display the last few rows

# Display basic statistics

4. Adding and Removing Columns: