Python Amit

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 11

PROBLEM :- 01

(A)Take a dataset which contains 20 rows


and 7 columns .
Write syntax for following scenario
1Find missing value in dataset and replace
with prefix or next value.
2 drop 1 column and 1 row from dataset
3 access multiple rows
4 Access multiple columns
5 use of loc and iloc
6 create label or index by taking any example
7 Print the first 5 rows of the DataFrame:
8 Print the last 5 rows of the DataFrame: tail()
9 sort the data by axis = 1
10 perform data alignment
11 Print information about the data
12 Visualization of any column
(B). Use matplotlib library to plot data points in
various style.
Solution (A) :-
import pandas as
pd import numpy as
np

# Create a dictionary with sample


data data = {

'EmployeeID': range(1,
21), 'Name': [
'Amit Kumar Singh', 'Aashu Kumar', 'Abhishek Raj', 'Gulshan
Kumar', 'Anmol Srivastava',
'Ujwal Singh', 'Tej Pratap Singh', 'Chirag Goyal', 'Siddharth
Pandey', 'Sudhanshu Yadav',
'Sourav Keshri', 'Bibhuti Singh', 'Tanishq Tiwari', 'Mukesh Kumar',
'Vikhyat Singh',
'Aditya Singh', 'Sarvesh Kumar', 'Ravi Prakash', 'Sachin Singh',
'Sanchit Mishra'
],
'Age':
[
29,
34,
22,
37,
28,
45,
31,
39,
23,
50,
33,
40,
27,
44,
32,
'Developer', 'Manager', 'Analyst', 'Developer', 'Executive',
'Specialist', 'Accountant', 'Manager', 'Director', 'Supervisor',
'Developer', 'Analyst', 'Assistant', 'Coordinator', 'Executive',
'Developer', 'Accountant', 'Supervisor', 'Manager', 'Developer'
],
'Salary': [
60000, 75000, 80000, 62000, 50000, 68000, 57000, 90000,
95000,
85000,
63000, 82000, 45000, 78000, 54000, 61000, 56000, 83000,
76000,
60000
],
'DateOfJoining': [
'2019-01-15', '2018-03-22', '2016-07-19', '2020-11-03', '2021-05-
10',
'2015-12-29', '2019-08-17', '2017-06-01', '2016-02-11', '2013-09-
23',
'2018-10-14', '2014-05-18', '2021-12-01', '2015-04-07', '2020-03-15',
'2017-08-21', '2019-11-27', '2014-01-30', '2016-12-08', '2019-04-15'
]
}

# Create a pandas DataFrame from the


dictionary df = pd.DataFrame(data)

# Adjust display options to show all


columns
pd.set_option('display.max_columns', None)

# Display the
DataFrame print(df)
Output:-

1.Find missing value in dataset and replace with prefix or next


value. df.fillna(method='ffill', inplace=True) # Forward fill
df.fillna(method='bfill', inplace=True) # Backward fill
print("After filling missing values with next values:\n",
df)

2. drop 1 column and 1 row from dataset.


df_dropped_col = df.drop(columns=['Position']) # Drop the 'Position' column
df_dropped_row = df_dropped_col.drop(index=[0]) # Drop the first row
print("After dropping a column and a row:\n", df_dropped_row)
3. access multiple rows
multiple_rows = df.iloc[5:11] # Access rows 5 to 10
print("Accessing multiple rows (5 to 10):\n",
multiple_rows)

4. Access multiple columns


multiple_columns = df[['Name', 'Salary']] # Access the 'Name' and 'Salary'
columns
print("Accessing multiple columns ('Name' and 'Salary'):\n",
multiple_columns)

5. use of loc and iloc


# Using loc to access rows and columns by label
loc_access = df.loc[5:10, ['Name', 'Department',
'Salary']] # Using iloc to access rows and columns by
integer index iloc_access = df.iloc[5:10, [1, 3, 5]]
print("Using loc to access data:\n", loc_access)
print("Using iloc to access data:\n",
iloc_access)

6. create label or index by taking any example


df.set_index('EmployeeID', inplace=True) # Set 'EmployeeID' as the index
# Display the DataFrame to verify the index has been set
print("After setting 'EmployeeID' as the index:\n", df)

7.Print the first 5 rows of the DataFrame:


print("First 5 rows of the DataFrame:\n",
df.head())

8.Print the last 5 rows of the DataFrame: tail()


print("Last 5 rows of the DataFrame:\n", df.tail())
9.sort the data by axis = 1
sorted_df = df.sort_index(axis=1)
print("DataFrame sorted by
columns:\n", sorted_df)

10. perform data alignment


# Create another DataFrame with similar
index df2 = pd.DataFrame({
'EmployeeID': range(1, 21),
'Bonus': np.random.randint(1000, 5000,
size=20)
}).set_index('EmployeeID')
aligned_df, aligned_df2 = df.align(df2, join='inner') # Align the
data print("Aligned DataFrame 1:\n", aligned_df)
print("Aligned DataFrame 2:\n", aligned_df2)

11.Print information about the data


print("Information about the DataFrame:")
print(df.info())

12 Visualization of any column


import matplotlib.pyplot as plt
plt.figure(figsize=(10, 5))
plt.plot(df.index, df['Salary'], marker='o') # Plot the 'Salary'
column plt.title('Salary of Employees')
plt.xlabel('EmployeeID')
plt.ylabel('Salary')
plt.grid(True)
plt.show()
(B). Use matplotlib library to plot data points in various
style.
Solution (B):-
# Scatter plot
plt.figure(figs
ize=(10, 5))
plt.scatter(df.index, df['Salary'], color='red')
plt.title('Scatter Plot of Salary')
plt.xlabel('EmployeeID')
plt.ylabel('Salary')
plt.grid(True)
plt.show()

# Bar plot
plt.figure(figsize=(10, 5))
plt.bar(df.index,
df['Salary'], color='blue')
plt.title('Bar Plot of Salary')
plt.xlabel('EmployeeID')
plt.ylabel('Salary')
plt.show()
# Histogram
plt.figure(figsize=(10, 5))
plt.hist(df['Salary'], bins=10, color='green')
plt.title('Histogram of Salary')
plt.xlabel('Salary')
plt.ylabel('Frequency')
plt.show()

# Line plot
plt.figure(figsize=(10, 5))
plt.plot(df['EmployeeID'], df['Salary'], color='purple', marker='o', linestyle='-')
plt.title('Line Plot of Salary')
plt.xlabel('EmployeeID')
plt.ylabel('Salary')
plt.grid(True)
plt.show()
PROBLEM:- 2
Show the output of following syntax:
import numpy as np
import pandas as pd
df= pd.DataFrame(np.arange(12).reshape(3, 4),columns=['P', 'Q', 'R', 'S'])
df
Output:

df.drop(['Q', 'R'], axis=1)


output:

df.drop([0, 1])
output
PROBLEM:-03
import pandas as pd
import numpy as np
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56,
np.nan],
'Third Score':[np.nan, 40, 80,
98]} # creating a dataframe from list
df = pd.DataFrame(dict)
# using isnull()
function Print(df.isnull())
Print(df.notnull())
Print(df.fillna(0))
Print(df.fillna(method = ‘pad’))
Print(df.fillna(method = ‘bfill’))

Output:- 1.

2.

3.

4.

You might also like