Numpy Basics Introduction To

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 35

Introduction to

NumPy Basics
Nump
y
Numpy is known as numerical python. It is a library used for working with arrays.

It has a functions for working with matrix operations, linear algebra.

Numpy was released in 2005. It is an open source library and you can use it for
free.

Why numpy?

In python , we have list as an array but the processing speed of list is very

slow, Numpy is considered as 50x faster than list.

Data type in numpy is ndarray. It has lots of supporting function which is


best for array.

Arrays are frequently used in data science where speed and accuracy is very

important. Numpy provides speed and accuracy while doing operations with arrays.
Arra
y
Array is the collection of data stored in a continuous memory location.

The data stored in array is always same data type.

Array is a data structure that is used to store the element of same data type in a organized way and faster

access. Types of Array

● one dimensional array: it is used to store data in a sequential manner. Data can be accessed by index
position of array.
Multi- Dimensional
Array
Multi-dimensions array:-

We define multidimensional array in simple words as an array of


arrays.

Data in multidimensional array are stored in tabular

form . 2d and 3d array are examples of multi-

dimensions array. This is how elements in 2d array are

stored.
Creating a numpy array - One
dimensional
Numpy is used to work with arras. . An array object in numpy is known as
ndarray.

We can create an nd array with array() function in numpy.

This is an 1 dimensional
array.
Creating a numpy array - 2-
dimensional
We can pass list, tuple into array() function to create ndarray. Here we have passed a tuple inside an array
function..

How to create 2d array?

2d array stores the data in the form of row and columns. We need to pass arrays inside an array according
to our needs to create 2d array.
Numpy array
Shape
Shape is the number of elements in each dimension of the array, i.e. no. of rows and
columns.

output
(2,
4)
Numpy Array-
Reshaping
Array reshaping means changing the shape of the array. By applying “reshaping” we can add or remove
dimensions and can edit number of elements in each dimension

From 1D to 2D

output

[[ 1 2
3]
[4 5
6]
[7 8
9]
[10 11
Numpy Joining -
Array
Joining two array means putting contents of two or more arrays together in one array. In numpy we add arrays
by axes. The concatenation() function is used to join axis of two arrays, while axis is not explicitly passed to the
function, it is taken as zero.

Output
[1,2,3,4,5
,6]
Numpy Array -
Splitting
Splitting is opposite of joining, In joining we join two arrays in one while in splitting we break one array in two or
more. The array_split() function is used to split array.

Splitting array in three parts.

Output
[array([1,2]),array([3,4]),array([5,6])]
Numpy Array -
Sorting
Sorting means arranging elements in an ordered list. An ordered list can be like numeric or alphabetic, ascending
or descending.

sort() is the function of Numpy ndarray object that sort an array in a specific order

Output
[0 1 2
3]
Random number in
numpy
Random number means something that cannot be predicted logically.

Numpy offers random module to work with numbers. The randint() function helps to generate random int

values. Eg: Generate the random integer from o to 10

Output:
20
Numpy array
operations
Adding two numpy
array

Output:[3 5 7]

Matrix multiplication performed in 2d


arrays
Introduction to
Pandas Basics
Pandas Data structure
Pandas deal with three types of data structures:
1. Series
2. Dataframe
3. Panel

Series :
Series is a 1- dimensional array capable of storing( int , float , string , etc. ).
Its size is immutable but value of data is mutable. It stores
homogenous data .For eg.

56 34 24 75 47 10 23

As show in above example the series is homogenous labeled array of integer.


Creating - Pandas
Series
#creating pandas series by ndarray
import pandas as pd #pandas library aliasing as pd
import numpy as np Output:
0 raj
array=np.array(["raj","sonu","krish"]) 1 sonu
series=pd.Series(array) 2 krish
print(series) dtype: object
Pandas Data structure -
Dataframe
DataFrame :
● Dataframe is a 2-dimensional array with heterogeneous
data.
Columns
● Its size is mutable
● Data is mutable
● We mostly use pandas data structure for data analysis String int object float

Name Age salary Height(ft)

The tables show the example of 2 dimensional array dataframe .


Krish 22 22k 5.6
● It is heterogeneous as it stores multi-type data
like string, int, float and object.
R
O Raj 21 21k 4.9
● It has four columns and five rows.
W Pinki 15 27k 6.1
S
Sonu 17 33k 5.8
Creating dataframe from
dict
import pandas as pd Output:
Name Rupee food
dict={
0 Radha 20 Egg
'Name':['Radha','Krishan','Krish'], 1 Krishan 45 pIza
'Rupee':['20','45','56'], 2 Krish 56 Choc
'food':['Egg','pIza','Choc']
}
df=pd.DataFrame(dict)
print(df)

# creating dataframe by dict of series


import pandas as pd
df = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
Output:
one
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', two
'd'])} a 1.0 1
b 2.0 2
df = c 3.0 3
pd.DataFrame(df) d NaN 4
print(df)

*Detail explanation about pandas dataframe will be covered in


Reading CSV,EXCEL file by
pandas
To import csv , Excel, json file follow below steps:

1. Import pandas as pd
2. Variable_name=pd.read_csv(“File path”) #for
csv Variable_name=pd.read_excel(“File path”) #for excel
file Variable_name=pd.read_json(“File path”) #for
json file

--------- After doing above step----------------------


Convert it into dataframe :
df=pd.DataFrame(“Variable_name”)
Indexing and Selecting Data with
Pandas
Indexing in Pandas : Indexing means selecting the columns and rows of data from the dataframe.

Selecting in Pandas: Selecting means selecting all the rows and some of the columns or some of the rows
and all of the columns, or some of the rows and columns.

Pandas has some inbuilt function for Indexing :


● Dataframe.[ ] ; This function also known as indexing operator
● Dataframe.loc[ ]: This function is used for labels.
● Dataframe.iloc[ ]: This function is used for positions or integer based

Selecting a single column:

print(df[‘Column_name’ # here df is dataframe name and column name means that column
] you want to print
Missing Data-Handling &
Filtering
The data we get from real word is very messy .That has lot of NaN and Null value in it. So, to deal with missing values, pandas
has a lot of inbuilt functions.

Finding the columns with missing value:


import pandas as pd
df=pd.read_csv("/content/titanic_test.csv") #import the dataset
df.isnull().any() # isnull() is a python inbuilt function to check null value
Output:
PassengerId False
Pclass False
False
Name False False means columns has some
Sex True null value or missing value
Age False
SibSp
Parch False

Ticket False
Fare
Embarked False
Adding new column to existing DataFrame
in Pandas
Using Dataframe.insert() method.

Dataframe.insert(col_index ,col_name ,Row_value)

Col_index : Index no where you want to add column


Col_name : Name of the column
Row_value :Insert the data
Merging DataFrames in
Pandas
Pandas provides various facilities for easily combining together Series or DataFrame with
various kinds of set logic for the indexes and relational algebra functionality in the case of join /
merge-type operations.

Pandas inbuilt function for merging :


● pd.concat(df1,df2)
● dataframe.append(df1)
● pd.merge()
Joins in
Dataframe
Two Dataframe can be joined by rows from two or more tables, based on a related column between them.
Joins are classify as :

1. INNER JOIN: Select records that have matching

common values in both tables.

2. FULL OUTER JOIN: Selects all records that match

either left or right table records.

3. LEFT OUTER JOIN: Select records from the first

left most table with matching right table records.

4. RIGHT OUTER JOIN: Select records from the

second (right-most) table with matching left

table records.
Introduction to
Matplotlib
What is Data
visualization?
Data visualization is graphical representation of information and data.

Human minds get easily familiar with the visual representation of data rather than raw (text/ numerical) data.

It is better to represent the data through graphs and other visual aids where we can analyze the data more
efficiently and make the specific decision according to data analysis.

Data Visualization is very help full in understanding the dataset.

Finding the hidden patterns inside a data and show how one variable is related to other variables etc.

Data visualization can do following tasks:


1. It summarises large data
2. It makes easier to identify patterns and trends
3. It compares the columns with target variable(s).
4. Forecasting
5. It helps to understand which product to place where.
Basic example of plotting
graph
from matplotlib import pyplot as plt #importing pyplot
from
x=[1,2,5] matplotlib
y=[2,4,6] library

plt.plot(x,y)
plt.show() #display the
graph

Add labels in
graph
from matplotlib import pyplot as
plt x=[1,2,5]
y=[2,4,6]
plt.plot(x,y)
plt.title("X-Y #adding title
plot") #adding X label
plt.xlabel('X-axis') #adding y label
plt.ylabel('Y-axis') #display the
plt.show() graph
Subplot
Function
Subplot() function is used to plot more then 2 plots in one figure.
We can use this method to separate two graphs which plotted in the same
axis.

Matplotlib supports all kinds of subplots.

● It accepts three arguments. - nrows- number of rows


ncols- number of columns
index- number of index

from matplotlib import pyplot


vehical=['car','bus','bike']
no_of_vehical=[20,45,12]

plt.figure(figsize=(9,3))

plt.subplot(131)
plt.bar(vehical,no_of_vehical)
plt.subplot(132)
plt.scatter(vehical,no_of_vehical)
plt.subplot(133)
plt.plot(vehical,no_of_vehical)
plt.suptitle('Vehical record')
Types of Graph: Bar
Graph
Bar graph : Bar graphs are one of the most common types of graphs and are used to show the categorical data .

Matplotlib provides a bar() function to make bar graphs which accepts arguments such as: categorical variables,
their value and color.

Ex:
from matplotlib import pyplot as plt
Team = ['KLP','RCB','RR','Mi']
match_wins = [11,8,15,7]
plt.bar(Team,match_wins,color = 'orange')
plt.title('Score Card')
plt.xlabel('Team')
plt.ylabel('match_wins')
plt.show()
Type of graph: Line
Graph
Line graph : A line graph is a type of chart used to show information that changes over time. We plot line
graphs
using several points connected by straight lines.

Ex:
#Basic example of plotting line graph

from matplotlib import pyplot as


plt x=[1,2,5]
y=[2,4,6]
plt.plot(x,y,color='red')
plt.show() #display the
graph
Type of graph: Pie-
chart
Pie Chart : A pie chart is a type of graph that represents the data in the circular graph. The slices of pie shows
the
relative size of the data in the form of percentage.

Ex:
from matplotlib import pyplot as plt
Team = ['KLP ','RCB','RR','Mi']
match wining_percent = [23,45,10,22]
plt.pie(match wining_percent, labels=Team, autopct='%1.1f%%',
shadow=True, startangle=90)
plt.title('Score Card')
plt.show()
Type of graph:
Histogram
Histogram : A histogram is a graphical representation that organizes a group of data points into user defined ranges.
It is similar in appearance to a bar graph, the histogram condenses a data series into an easily interpreted visual by
taking many data points and grouping them into logical ranges or bins..

Ex:
from matplotlib import pyplot as plt
height =[132,122,145,165,145,135,137,160,122,111,190,189,178.155]
bins=[100,120,140,160,180,200]
plt.hist(height,bins,histtype='bar', rwidth=0.8,color='red')
plt.title('Height of students')
plt.show()
Type of graph: Scatter
Plot
Scatter plot : A scatter plot is a diagram where each value in the data set is represented by a dot. The Matplotlib
module
has a method for drawing scatter plots.

Ex:
from matplotlib import pyplot as plt
from matplotlib import style
style.use('ggplot')

x1 = [1,7,13]
y1 = [12,11,6]
x2 = [4,9,11,5]
y2 = [7,14,17,6]
plt.scatter(x1, y1)
plt.scatter(x2, y2,
color='y')
plt.title('Scatter plot')
plt.ylabel('Y-axis')
plt.xlabel('X-
axis') plt.show()
Type of graph: 3D
Graph
3D graph : A 3-D graph is composed of three axis X-axis , Y-axis , Z -axis . Three-dimension plots can be
created by
importing the mplot3d toolkit and pass 3D- projection argument.

Ex:
from mpl_toolkits import mplot3d
import numpy as np
import matplotlib.pyplot as plt
height =
np.array([100,110,87,85,65,80,96,75,42,59,54,63,95,71,86])
weight =
np.array([105,123,84,85,78,95,69,42,87,91,63,83,75,41,80])
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.scatter3D(height,weight) # This is used to plot 3D
scatter plt.title("3D Scatter P lot")
plt.xlabel("Height")
plt.ylabel("Weight")
plt.title("3D Scatter
P lot")
Thank
You...

You might also like