45B AIML Practical1.1
45B AIML Practical1.1
45B AIML Practical1.1
NumPy Arrays:
NumPy is a powerful library for numerical computing in Python. One
of its key features is the NumPy array, a multidimensional array of
elements, usually of the same type. NumPy arrays are more efficient
than Python lists for numerical operations because they are
implemented in C and allow for vectorized operations.
python
import numpy as np
NumPy arange:
python
arr_arange = np.arange(start, stop, step)
NumPy linspace:
python
arr_linspace = np.linspace(start, stop, num)
NumPy random:
python
arr_random = np.random.rand(3, 3) # Generates a 3x3 array of random
values between 0 and 1
NumPy reshape:
python
arr_reshape = np.reshape(original_array, new_shape)
NumPy shape:
python
shape_tuple = np.shape(arr)
NumPy Indexing and Selection | Fancy
Indexing | Matrices in Python | Numpy in
Machine Learning
Notebook Link:
https://colab.research.google.com/drive/1yVclsVQ8Amiq
cT2euXUbn4MJTb5zxUjX
Basic Indexing:
python
import numpy as np
# Slicing a 2D array
subset_2d = arr_2d[:2, 1:] # Returns array([[2, 3], [5, 6]])
Fancy Indexing:
Fancy indexing allows you to use arrays of indices to access
multiple elements at once.
python
indices = np.array([0, 2, 1])
selected_elements = arr[indices] # Returns array([1, 3, 2])
Matrices in Python:
In NumPy, matrices are represented using the np.array class with two
dimensions. Matrices can be created by passing nested lists or using
the np.matrix class.
python
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
1. Data Representation:
2. Vectorization:
3. Linear Algebra:
NumPy Operations:
NumPy provides a wide range of operations that can be performed on
arrays. These operations include arithmetic operations, statistical
operations, linear algebra operations, and more.
Arithmetic Operations:
python
import numpy as np
# Addition
result_addition = arr1 + arr2 # [5, 7, 9]
# Subtraction
result_subtraction = arr1 - arr2 # [-3, -3, -3]
# Multiplication
result_multiplication = arr1 * arr2 # [4, 10, 18]
# Division
result_division = arr1 / arr2 # [0.25, 0.4, 0.5]
# Element-wise power
result_power = arr1 ** 2 # [1, 4, 9]
python
# Square root
result_sqrt = np.sqrt(arr1) # [1.0, 1.414, 1.732]
# Exponential
result_exp = np.exp(arr1) # [2.718, 7.389, 20.085]
# Trigonometric functions
result_sin = np.sin(arr1) # [0.841, 0.909, 0.141]
result_cos = np.cos(arr1) # [0.540, -0.416, -0.990]
Aggregation Functions:
python
# Sum
total_sum = np.sum(arr1) # 6
# Mean
mean_value = np.mean(arr1) # 2.0
python
matrix_a = np.array([[1, 2], [3, 4]])
matrix_b = np.array([[5, 6], [7, 8]])
# Matrix multiplication
result_matrix_multiply = np.dot(matrix_a, matrix_b)
# Matrix determinant
matrix_det = np.linalg.det(matrix_a)
# Matrix inverse
matrix_inv = np.linalg.inv(matrix_a)
Pandas in Python| Series in Pandas | Pandas
Series to Dataframe | Pandas Series to List
Notebook Link:
https://colab.research.google.com/drive/1y5rHaTKhKSwmkx-
4jZbRQ7CcL471JlQd
Pandas in Python:
Series in Pandas:
python
# Creating a DataFrame from a Series
df_from_series = pd.DataFrame(series_from_list,
columns=['Column_Name'])
python
# Converting a Pandas Series to a list
list_from_series = series_from_list.tolist()
DataFrames in Pandas:
python
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'San Francisco', 'Los Angeles']
}
df = pd.DataFrame(data)
python
data_list = [
['Alice', 25, 'New York'],
['Bob', 30, 'San Francisco'],
['Charlie', 35, 'Los Angeles']
]
python
import numpy as np
data_array = np.array([
['Alice', 25, 'New York'],
['Bob', 30, 'San Francisco'],
['Charlie', 35, 'Los Angeles']
])
python
df = pd.read_csv('example.csv')
2. Selecting Data:
python
# Selecting a column
name_column = df['Name']
# Selecting multiple columns
selected_columns = df[['Name', 'Age']]
3. Filtering Data:
python
# Filtering based on a condition
filtered_df = df[df['Age'] > 30]
# Removing a column
df = df.drop('Salary', axis=1)
7. Merging DataFrames:
python
# Merge two DataFrames
merged_df = pd.merge(df1, df2, on='Key_Column')
8. Writing to CSV:
python
df.to_csv('output.csv', index=False)
Handling Missing Data:
1. Dropping Missing Values:
python
# Drop rows with any missing values
df_no_missing_rows = df.dropna()
python
# Fill missing values with a specific value (e.g., 0)
df_filled_zero = df.fillna(0)
3. Interpolation:
python
# Interpolate missing values linearly
df_interpolated = df.interpolate()
python
# Forward fill missing values
df_ffill = df.ffill()
# Backward fill missing values
df_bfill = df.bfill()
python
# Forward fill with a limit of 1 (fills missing values up
to one non-missing value)
df_ffill_limit = df.ffill(limit=1)
Pandas Operations:
python
import pandas as pd
# Creating a DataFrame
data = {'Category': ['A', 'B', 'A', 'B', 'A', 'B'],
'Value': [10, 20, 15, 25, 12, 18]}
df = pd.DataFrame(data)
python
# Creating two DataFrames
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice',
'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [2, 3, 4], 'Age': [25, 30,
35]})
4. Concatenation:
python
# Concatenating two DataFrames vertically (along rows)
concatenated_df = pd.concat([df1, df2], axis=0)
Summary of Operations:
1. GroupBy:
2. Merge:
3. Joins:
4. Concatenation:
python
import pandas as pd
- Initial Inspection:
python
# Display the first few rows
print(df.head())
python
# Check for missing values
print(df.isnull().sum())
python
# Drop rows with missing values
df = df.dropna()
3. Exploratory Visualization:
- Univariate Analysis:
python
import matplotlib.pyplot as plt
import seaborn as sns
python
# Scatter plot for two numeric variables
plt.scatter(df['numeric_column1'], df['numeric_column2'])
plt.xlabel('Numeric Column 1')
plt.ylabel('Numeric Column 2')
plt.show()
- Correlation Analysis:
python
# Correlation matrix
correlation_matrix = df.corr()
4. Feature Engineering:
- Creating New Features:
python
# Create a new feature
df['new_feature'] = df['numeric_column1'] *
df['numeric_column2']
- Transforming Features:
python
# Log transformation
df['log_numeric_column'] = np.log(df['numeric_column'])
5. Statistical Testing:
- Hypothesis Testing:
python
from scipy.stats import ttest_ind