45B AIML Practical1.1

Name of Student: Ahmed Mobin Ahmed Shaikh

Roll Number: 45 Lab Practical Number: 1.1

Title of Lab Assignment: Numpy, Pandas Implementation and exercises.

DOP: 23-01-24 DOS: 30-01-24

CO Mapped: PO Mapped :

CO1 PO1, PO2,
PO3, PSO1,
NumPy Arrays | Numpy Arange | Numpy
Linspace | Numpy Rand | Numpy Reshape |
Numpy Shape
Notebook Link:

NumPy Arrays:
NumPy is a powerful library for numerical computing in Python. One
of its key features is the NumPy array, a multidimensional array of
elements, usually of the same type. NumPy arrays are more efficient
than Python lists for numerical operations because they are
implemented in C and allow for vectorized operations.

Creating NumPy Arrays:

You can create NumPy arrays in various ways:

import numpy as np

# Creating an array from a list

arr_list = [1, 2, 3, 4, 5]
np_array_from_list = np.array(arr_list)

# Creating an array using np.arange

arr_range = np.arange(0, 10, 2) # Creates an array from 0 to 10
(exclusive) with step 2

# Creating an array using np.linspace

arr_linspace = np.linspace(0, 1, 5) # Creates an array of 5 evenly
spaced values between 0 and 1

# Creating an array of random values using np.random

arr_random = np.random.rand(3, 3) # Creates a 3x3 array of random
values between 0 and 1

NumPy arange:

np.arange is a function that returns an array with regularly spaced

values within a given interval. It is similar to the Python range
function but returns a NumPy array.

arr_arange = np.arange(start, stop, step)

NumPy linspace:

np.linspace returns an array with evenly spaced values over a

specified range. Unlike np.arange, it includes both the start and
stop values, and you specify the number of elements you want.

arr_linspace = np.linspace(start, stop, num)

NumPy random:

np.random module provides functions for generating random data. Some

commonly used functions are rand, randn, randint, random, and

arr_random = np.random.rand(3, 3) # Generates a 3x3 array of random
values between 0 and 1

NumPy reshape:

np.reshape is used to change the shape of an array. It allows you to

reorganize the elements of an array into a new shape without
changing their values.

arr_reshape = np.reshape(original_array, new_shape)

NumPy shape:

np.shape is an attribute that returns the shape of an array. It is a

tuple representing the dimensions of the array.

shape_tuple = np.shape(arr)
NumPy Indexing and Selection | Fancy
Indexing | Matrices in Python | Numpy in
Machine Learning
Notebook Link:

NumPy Indexing and Selection:

NumPy provides powerful indexing and selection mechanisms for
accessing elements or subsets of elements in arrays.

Basic Indexing:
import numpy as np

arr = np.array([1, 2, 3, 4, 5])

# Accessing elements using index

element_at_index_2 = arr[2] # Returns 3

# Slicing to get a subset

subset = arr[1:4] # Returns array([2, 3, 4])

Multidimensional Array Indexing:

arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Accessing elements in a 2D array

element_row2_col1 = arr_2d[1, 0] # Returns 4

# Slicing a 2D array
subset_2d = arr_2d[:2, 1:] # Returns array([[2, 3], [5, 6]])

Fancy Indexing:
Fancy indexing allows you to use arrays of indices to access
multiple elements at once.
indices = np.array([0, 2, 1])
selected_elements = arr[indices] # Returns array([1, 3, 2])

Matrices in Python:

In NumPy, matrices are represented using the np.array class with two
dimensions. Matrices can be created by passing nested lists or using
the np.matrix class.

matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

NumPy provides functions for matrix operations such as

multiplication (np.dot or @ operator), inversion (np.linalg.inv),
and determinant (np.linalg.det).

NumPy in Machine Learning:

NumPy is extensively used in machine learning for its efficient
array operations. Here are some key aspects:

1. Data Representation:

● In machine learning, datasets are often represented as NumPy

● Features of a dataset can be stored in a 2D array, where rows
represent samples and columns represent features.

2. Vectorization:

● NumPy allows vectorized operations, making it possible to

perform operations on entire arrays without using explicit
loops. This significantly improves computational efficiency.

3. Linear Algebra:

● Linear algebra operations, such as matrix multiplication, are

fundamental in machine learning algorithms. NumPy provides
efficient implementations of these operations.

4. Random Number Generation:

● NumPy's random module is used for generating random numbers,

which is often crucial in machine learning for tasks like data
shuffling or initialization of weights in neural networks.
5. Indexing and Selection:

Efficient indexing and selection using NumPy are essential for

manipulating and accessing data in machine learning applications.
Numpy Operations | Numpy arithmetic
Operations | Numpy Universal Array Functions
Notebook Link:

NumPy Operations:
NumPy provides a wide range of operations that can be performed on
arrays. These operations include arithmetic operations, statistical
operations, linear algebra operations, and more.

Arithmetic Operations:

NumPy allows you to perform element-wise arithmetic operations on


import numpy as np

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

# Addition
result_addition = arr1 + arr2 # [5, 7, 9]

# Subtraction
result_subtraction = arr1 - arr2 # [-3, -3, -3]

# Multiplication
result_multiplication = arr1 * arr2 # [4, 10, 18]

# Division
result_division = arr1 / arr2 # [0.25, 0.4, 0.5]

# Element-wise power
result_power = arr1 ** 2 # [1, 4, 9]

Universal Array Functions (ufuncs):

NumPy also provides Universal Functions (ufuncs), which are

functions that operate element-wise on arrays. These functions are
highly optimized and can operate on arrays of any size and shape.

# Square root
result_sqrt = np.sqrt(arr1) # [1.0, 1.414, 1.732]
# Exponential
result_exp = np.exp(arr1) # [2.718, 7.389, 20.085]

# Trigonometric functions
result_sin = np.sin(arr1) # [0.841, 0.909, 0.141]
result_cos = np.cos(arr1) # [0.540, -0.416, -0.990]

Aggregation Functions:

NumPy provides functions for aggregating values in an array, such as

sum, mean, median, min, max, etc.

# Sum
total_sum = np.sum(arr1) # 6

# Mean
mean_value = np.mean(arr1) # 2.0

# Minimum and maximum

min_value = np.min(arr1) # 1
max_value = np.max(arr1) # 3

Linear Algebra Operations:

NumPy has a comprehensive set of functions for linear algebra


matrix_a = np.array([[1, 2], [3, 4]])
matrix_b = np.array([[5, 6], [7, 8]])

# Matrix multiplication
result_matrix_multiply = np.dot(matrix_a, matrix_b)

# Matrix determinant
matrix_det = np.linalg.det(matrix_a)

# Matrix inverse
matrix_inv = np.linalg.inv(matrix_a)
Pandas in Python| Series in Pandas | Pandas
Series to Dataframe | Pandas Series to List
Notebook Link:

Pandas in Python:

Pandas is a popular open-source data manipulation and

analysis library for Python. It provides two primary data
structures: Series and DataFrame. These structures are
built on top of NumPy arrays, offering more functionality
and flexibility for data manipulation.

Series in Pandas:

A Series is a one-dimensional labeled array in Pandas. It

is capable of holding any data type, and each element in
the Series has a label called an index.
Creating a Pandas Series:
import pandas as pd

# Creating a Series from a list

data_list = [1, 2, 3, 4, 5]
series_from_list = pd.Series(data_list)

# Creating a Series with custom index

custom_index_series = pd.Series(data_list, index=['a',
'b', 'c', 'd', 'e'])

Pandas Series to DataFrame:

A DataFrame is a two-dimensional labeled data structure
with columns that can be of different data types. You can
convert a Pandas Series to a DataFrame using the
pd.DataFrame() constructor.

# Creating a DataFrame from a Series
df_from_series = pd.DataFrame(series_from_list,

# Creating a DataFrame from multiple Series

series1 = pd.Series([1, 2, 3])
series2 = pd.Series(['a', 'b', 'c'])
df_from_multiple_series = pd.DataFrame({'Column1':
series1, 'Column2': series2})

Pandas Series to List:

You can convert a Pandas Series to a Python list using

the tolist() method.

# Converting a Pandas Series to a list
list_from_series = series_from_list.tolist()
DataFrames in Pandas:

A DataFrame is a two-dimensional labeled data structure

in Pandas, resembling a table or a spreadsheet with rows
and columns. It is one of the most widely used data
structures for data manipulation and analysis in Python.
Creating a DataFrame:

There are several ways to create a DataFrame in Pandas:

1. From a Dictionary of Lists:

import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'San Francisco', 'Los Angeles']

df = pd.DataFrame(data)

2. From a List of Lists:

data_list = [
['Alice', 25, 'New York'],
['Bob', 30, 'San Francisco'],
['Charlie', 35, 'Los Angeles']

df = pd.DataFrame(data_list, columns=['Name', 'Age',


3. From a NumPy Array:

import numpy as np

data_array = np.array([
['Alice', 25, 'New York'],
['Bob', 30, 'San Francisco'],
['Charlie', 35, 'Los Angeles']

df = pd.DataFrame(data_array, columns=['Name', 'Age',


4. From a CSV File:

df = pd.read_csv('example.csv')

Essential DataFrame Operations:

Once you have a DataFrame, you can perform various

operations on it:
1. Viewing Data:
# Display the first few rows

# Display the last few rows


# Display basic statistics


2. Selecting Data:
# Selecting a column
name_column = df['Name']
# Selecting multiple columns
selected_columns = df[['Name', 'Age']]

3. Filtering Data:
# Filtering based on a condition
filtered_df = df[df['Age'] > 30]

4. Adding and Removing Columns:

# Adding a new column
df['Salary'] = [50000, 60000, 70000]

# Removing a column
df = df.drop('Salary', axis=1)

5. Handling Missing Data:

# Check for missing values

# Drop rows with missing values

df = df.dropna()

# Fill missing values with a specific value

df = df.fillna(0)

6. Grouping and Aggregation:

# Group by a column and calculate mean
grouped_df = df.groupby('City').mean()

7. Merging DataFrames:
# Merge two DataFrames
merged_df = pd.merge(df1, df2, on='Key_Column')

8. Writing to CSV:
df.to_csv('output.csv', index=False)
Handling Missing Data:
1. Dropping Missing Values:

Use dropna() to remove rows or columns with missing


# Drop rows with any missing values
df_no_missing_rows = df.dropna()

# Drop columns with any missing values

df_no_missing_cols = df.dropna(axis=1)

2. Filling Missing Values:

Use fillna() to fill missing values with a specified

value or a calculated value (mean, median, etc.).

# Fill missing values with a specific value (e.g., 0)
df_filled_zero = df.fillna(0)

# Fill missing values with the mean of each column

df_filled_mean = df.fillna(df.mean())

3. Interpolation:

Use interpolate() to fill missing values by interpolating

between existing values.

# Interpolate missing values linearly
df_interpolated = df.interpolate()

4. Forward or Backward Fill:

Use ffill (forward fill) or bfill (backward fill) to fill

missing values with the previous or next valid value.

# Forward fill missing values
df_ffill = df.ffill()
# Backward fill missing values
df_bfill = df.bfill()

5. Handling Missing Values in Time Series:

For time series data, you might want to fill missing

values using methods like forward-fill or backward-fill
with specific limits.

# Forward fill with a limit of 1 (fills missing values up
to one non-missing value)
df_ffill_limit = df.ffill(limit=1)
Pandas Operations:

Pandas provides a rich set of operations for data

manipulation. Here, we'll discuss some common operations:
GroupBy, Merge, Joins, and Concatenation.
1. GroupBy with Pandas:

The GroupBy operation involves splitting the data based

on some criteria, applying a function to each group
independently, and then combining the results.

import pandas as pd

# Creating a DataFrame
data = {'Category': ['A', 'B', 'A', 'B', 'A', 'B'],
'Value': [10, 20, 15, 25, 12, 18]}
df = pd.DataFrame(data)

# Grouping by 'Category' and calculating the mean for

each group
grouped_df = df.groupby('Category').mean()

2. Merge with Pandas:

Merging is a way to combine two DataFrames based on a

common column.

# Creating two DataFrames
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice',
'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [2, 3, 4], 'Age': [25, 30,

# Merging DataFrames based on the 'ID' column

merged_df = pd.merge(df1, df2, on='ID')

3. Joins with Pandas:

Joins are similar to merges but allow you to specify the

type of join (inner, outer, left, or right).

# Performing a left join

left_join_df = pd.merge(df1, df2, on='ID', how='left')

4. Concatenation:

Concatenation combines two DataFrames along a particular

axis (either rows or columns).

# Concatenating two DataFrames vertically (along rows)
concatenated_df = pd.concat([df1, df2], axis=0)

# Concatenating two DataFrames horizontally (along

concatenated_df_horizontal = pd.concat([df1, df2],

Summary of Operations:
1. GroupBy:

● Use groupby() to group data based on specific

● Apply aggregation functions like mean(), sum(), etc.,
on grouped data.

2. Merge:

● Use merge() to combine two DataFrames based on a

common column.
● Specify the 'on' parameter as the column for merging.

3. Joins:

● Joins are a type of merge.

● Use the how parameter to specify the type of join
(inner, outer, left, right).

4. Concatenation:

● Use concat() to combine DataFrames along a specified

● Specify the axis (0 for rows, 1 for columns).
Exploratory Data Analysis - 1
Exploratory Data Analysis (EDA) is a crucial step in the
data analysis process. It involves exploring and
understanding the main characteristics of a dataset
before applying more advanced statistical modeling. Here,
I'll detail various aspects of EDA:

1. Understanding the Data:

- Load the Data:

● Import necessary libraries (e.g., Pandas, NumPy) and

load your dataset into a DataFrame.

import pandas as pd

# Load the dataset

df = pd.read_csv('your_dataset.csv')

- Initial Inspection:

● Use methods like head(), info(), and describe() to

get an initial overview of the dataset.

# Display the first few rows

# Get general information about the dataset


# Get summary statistics


2. Dealing with Missing Values:

- Identify Missing Values:

● Check for missing values using isnull().

# Check for missing values

- Handling Missing Values:

● Decide on a strategy for handling missing values

(removing, imputing, etc.).

# Drop rows with missing values
df = df.dropna()

# Impute missing values


3. Exploratory Visualization:
- Univariate Analysis:

● Visualize the distribution of individual variables.

import matplotlib.pyplot as plt
import seaborn as sns

# Histogram for a numeric variable

plt.hist(df['numeric_column'], bins=20)

# Bar chart for a categorical variable

sns.countplot(x='category_column', data=df)
- Bivariate Analysis:

● Explore relationships between pairs of variables.

# Scatter plot for two numeric variables
plt.scatter(df['numeric_column1'], df['numeric_column2'])
plt.xlabel('Numeric Column 1')
plt.ylabel('Numeric Column 2')

# Boxplot for a numeric variable across categories

sns.boxplot(x='category_column', y='numeric_column',

- Correlation Analysis:

● Understand the correlation between numeric variables.

# Correlation matrix
correlation_matrix = df.corr()

# Heatmap for correlation matrix

sns.heatmap(correlation_matrix, annot=True,

4. Feature Engineering:
- Creating New Features:

● Derive new features that might be more informative.

# Create a new feature
df['new_feature'] = df['numeric_column1'] *

- Transforming Features:

● Apply transformations to existing features (e.g., log


# Log transformation
df['log_numeric_column'] = np.log(df['numeric_column'])
5. Statistical Testing:
- Hypothesis Testing:

● Conduct statistical tests to validate hypotheses.

from scipy.stats import ttest_ind

# Perform t-test for two groups

group1 = df[df['condition'] == 'A']['numeric_column']
group2 = df[df['condition'] == 'B']['numeric_column']
t_stat, p_value = ttest_ind(group1, group2)
Collab Link:

