Pandas & Numpy
Pandas & Numpy
Pandas & Numpy
In [ ]:
"""
Mention the different types of Data Structures in Pandas?
Series:
It is a one-dimensional array-like structure with homogeneous data
which means data of different data types cannot be a part of the same series.
It can hold any data type such as integers, floats, and strings and its
values are mutable i.e. it can be changed but the size of the series is
immutable i.e. it cannot be changed. By using a ‘series’ method, we can
easily convert the list, tuple, and dictionary into a series. A Series cannot
contain multiple columns.
DataFrame :
It is a two-dimensional array-like structure with heterogeneous
data. It can contain data of different data types and the data is
aligned in a tabular manner i.e. in rows and columns and the
indexes with respect to these are called row index and column
index respectively. Both size and values of DataFrame are mutable.
The columns can be heterogeneous types like int and bool. It can
also be defined as a dictionary of Series.
"""
In [ ]:
"""
What are the significant features of the pandas Library?
Fast and efficient DataFrame object with default and customized indexing.
High-performance merging and joining of data.
Data alignment and integrated handling of missing data.
Label-based slicing, indexing, and subsetting of large data sets.
Reshaping and pivoting of data sets.
Tools for loading data into in-memory data objects from different file formats.
Columns from a data structure can be deleted or inserted.
Group by data for aggregation and transformations.
Time Series functionality.
"""
In [ ]:
"""
Different ways to create series in pandas?
"""
# 1. creating empty sereis
import pandas as pd
ser = pd.Series()
print(ser)
ser = pd.Series(dict)
print(ser)
In [ ]:
"""
crating different types of dataframe?
Empty DataFrame
Columns: []
Index: []
Amounts
0 110
1 202
2 303
3 404
4 550
5 650
Name Age
0 mark 20
1 zack 16
2 ron 24
Name Age
0 Max 10
1 Lara 31
2 Koke 91
3 muller 48
aa bs cd
0 1 2 3
1 10 20 30
0
0 10
1 20
2 30
3 40
one two
a 10 10
b 20 20
c 30 30
d 40 40
In [ ]:
In [ ]:
"""
How can we create a copy of the series in Pandas?
"""
0 s
1 c
2 a
3 l
4 a
5 r
dtype: object
Out[ ]:
0 s
1 c
2 a
3 l
4 a
5 r
dtype: object
In [ ]:
"""
Categorical data in python:
In [ ]:
"""
What is MultiIndexing in Pandas?
MultiIndexing in Python, particularly within libraries like Pandas,
is a method of handling and organizing data with multiple levels of
indexing. It allows you to work with data that has more than one
key to index.
"""
import pandas as pd
print(df)
Values
Group Subgroup
A a 1
b 2
B a 3
b 4
In [ ]:
"""
Convert dataframe to numpy array :
Pandas DataFrame to a NumPy array using the values attribute of
the DataFrame. The values attribute returns a NumPy array
representation of the DataFrame's data
"""
import pandas as pd
import numpy as np
data = {
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
}
df = pd.DataFrame(data)
numpy_array = df.values
print(numpy_array)
[[1 4 7]
[2 5 8]
[3 6 9]]
In [ ]:
"""
how to convert dataframe to excel file in pandas?
"""
import pandas as pd
data = {
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
}
df = pd.DataFrame(data)
file_path = 'data.xlsx'
df.to_excel(file_path, index=False)
df1=df.to_csv(file_path) # Set index=False to not write row numbers as index
In [ ]:
"""
Timedelta in python ?
A Timedelta can represent differences in time at various
resolutions (days, hours, minutes, seconds, milliseconds,
microseconds, and nanoseconds). You can create a Timedelta
object by subtracting two dates or times, or by using the
pd.Timedelta() constructor.
"""
import pandas as pd
td = pd.Timedelta(days = 5, hours = 5, minutes = 5,
seconds = 5, milliseconds = 5, microseconds = 5,
nanoseconds = 5)
print(td)
5 days 05:05:05.005005005
In [ ]:
#
start_date = pd.Timestamp('2022-01-01')
end_date = pd.Timestamp('2022-01-10')
9 days 00:00:00
2022-01-06 00:00:00
In [ ]:
In [ ]:
"""
Is iterating over a Pandas Dataframe a good practice?
If not what are the important conditions to keep in mind before iterating?
Iterative manipulations:
Printing a DataFrame:
"""
In [ ]:
"""
how to iterate over rows in pandas dataframe
"""
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
In [1]:
import pandas as pd
In [2]:
"""
Interpolating Along Columns or Rows:
The interpolate() function in pandas allows you to interpolate along either
the rows or the columns of a DataFrame, depending on the axis parameter.
This flexibility allows you to handle different data structures effectively.
"""
import pandas as pd
df_dropped = df.dropna()
print("\n DataFrame after dropping rows with missing values:\n ", df_dropped)
df_filled_value = df.fillna(0)
print("\n DataFrame after filling missing values with a specific value:\n ", df_filled_valu
e)
df_filled_mean = df.fillna(df.mean())
print("\n DataFrame after filling missing values with the mean:\n ", df_filled_mean)
In [3]:
import pandas as pd
A B
3 4 9
4 5 10
In [4]:
"""
The groupby() function in pandas is used to split the data into groups
based on some criteria. After splitting, the function applies a function
to each group independently and then combines the results back into a
DataFrame.
"""
import pandas as pd
grouped_df = df.groupby('Category').mean()
print(grouped_df)
Value
Category
A 30.0
B 21.0
C 2.0
In [5]:
"""
Method Chaining in Pandas:
Method chaining in pandas involves calling multiple methods on a DataFrame
or Series object sequentially in a single line, which allows for more concise
and readable code.
"""
import pandas as pd
A B
0 5 10
1 4 9
2 3 8
In [ ]:
"""
The pivot_table() function in pandas is used to create a spreadsheet-style
pivot table as a DataFrame.
It allows users to summarize and aggregate data from a DataFrame according
to one or more keys.
"""
import pandas as pd
df = pd.DataFrame(data)
Category A B
Date
2022-01-01 10.0 20.0
2022-01-02 30.0 40.0
2022-01-03 50.0 NaN
In [6]:
"""
Handling duplicate rows in a DataFrame in pandas:
import pandas as pd
df_no_duplicates = df.drop_duplicates()
print(df_no_duplicates)
import pandas as pd
A B
0 1 4
1 2 5
2 3 6
A B C
0 1 4 40
1 2 5 60
In [ ]:
"""
Descriptive Statistics:
mean(): Computes the mean of the values.
median(): Computes the median of the values.
mode(): Computes the mode of the values.
std(): Computes the standard deviation of the values.
var(): Computes the variance of the values.
Summary Statistics:
describe(): Generates descriptive statistics summary of the DataFrame.
Aggregation Functions:
sum(): Computes the sum of values.
count(): Computes the count of non-null values.
min(): Computes the minimum value.
max(): Computes the maximum value.
"""
"""
Skewness:
Skewness measures the asymmetry of the distribution of values around the
mean of the data. A distribution is symmetric if it looks the same on both
sides of the mean. Skewness quantifies the extent to which a distribution
differs from this symmetry. It can be positive, negative, or zero.
Positive skewness: The distribution has a longer right tail. The majority of
the data points are concentrated on the left side of the mean, and the tail
extends towards the right.
Negative skewness: The distribution has a longer left tail. The majority of the
data points are concentrated on the right side of the mean, and the tail extends
towards the left.
Positive kurtosis (leptokurtic): The distribution has fatter tails and a sharper
peak than the normal distribution. It indicates more extreme values than would be
expected under a normal distribution.
Negative kurtosis (platykurtic): The distribution has thinner tails and a flatter
peak than the normal distribution. It indicates fewer extreme values than would be
expected under a normal distribution.
Mesokurtic: The distribution has kurtosis equal to that of the normal distribution.
"""
In [7]:
import pandas as pd
# Sample DataFrame
data = {'A': [1, 2, 3, 4, 5],
'B': [5, 4, 3, 2, 1]}
df = pd.DataFrame(data)
# Descriptive statistics
print("Mean:", df.mean())
print("Median:", df.median())
print("Standard Deviation:", df.std())
print("Summary Statistics:\n ", df.describe())
# Aggregation functions
print("Sum:", df.sum())
print("Count:", df.count())
print("Minimum:", df.min())
print("Maximum:", df.max())
Mean: A 3.0
B 3.0
dtype: float64
Median: A 3.0
B 3.0
dtype: float64
Standard Deviation: A 1.581139
B 1.581139
dtype: float64
Summary Statistics:
A B
count 5.000000 5.000000
mean 3.000000 3.000000
std 1.581139 1.581139
min 1.000000 1.000000
25% 2.000000 2.000000
50% 3.000000 3.000000
75% 4.000000 4.000000
max 5.000000 5.000000
Correlation:
A B
A 1.0 -1.0
B -1.0 1.0
Covariance:
A B
A 2.5 -2.5
B -2.5 2.5
Sum: A 15
B 15
dtype: int64
Count: A 5
B 5
dtype: int64
Minimum: A 1
B 1
dtype: int64
Maximum: A 5
B 5
dtype: int64
Unique values in column A: [1 2 3 4 5]
Value counts in column B:
B
5 1
4 1
3 1
2 1
1 1
Name: count, dtype: int64
Skewness: A 0.0
B 0.0
dtype: float64
Kurtosis: A -1.2
B -1.2
dtype: float64
Groupby mean:
A
B
1 5.0
2 4.0
3 3.0
4 2.0
5 1.0
In [ ]:
import pandas as pd
df_csv = pd.read_csv('sample.csv')
df_excel = pd.read_excel('sample.xlsx')
df_json = pd.read_json('sample.json')
url = 'https://example.com/sample.csv'
df_url = pd.read_csv(url)
print("CSV File:")
print(df_csv.head())
print("\n Excel File:")
print(df_excel.head())
print("\n JSON File:")
print(df_json.head())
print("\n Data from URL:")
print(df_url.head())
In [8]:
"""
loc vs iloc
"""
import pandas as pd
data = {
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
}
print("Using loc:")
print("Single row 'Y':")
print(df.loc['Y'])
print("First row:")
print(df.iloc[0])
Using loc:
Single row 'Y':
A 2
B 5
C 8
Name: Y, dtype: int64
In [12]:
"""
how to drop row and column in pandas :
"""
import pandas as pd
data = {
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
}
df = pd.DataFrame(data)
In [13]:
"""
count greq of unique number :
"""
import pandas as pd
data = {
'Category': ['A', 'B', 'A', 'C', 'B', 'A', 'A', 'C', 'B']
}
df = pd.DataFrame(data)
frequency_count = df['Category'].value_counts()
print(frequency_count)
frequency_count_df = frequency_count.reset_index()
frequency_count_df.columns = ['Category', 'Frequency']
print(frequency_count_df)
frequency_count_df = frequency_count_df.rename_axis('Index_Name')
print(frequency_count_df)
Category
A 4
B 3
C 2
Name: count, dtype: int64
Category Frequency
0 A 4
1 B 3
2 C 2
Category Frequency
Index_Name
0 A 4
1 B 3
2 C 2
In [ ]:
"""
find the row for which the value of specific column is min or max
"""
import pandas as pd
data = {
'A': [10, 20, 15, 25],
'B': [30, 25, 20, 35],
'C': [5, 10, 15, 20]
}
df = pd.DataFrame(data)
max_row_A = df['A'].idxmax()
min_row_B = df['B'].idxmin()
In [14]:
"""
groupby():
The groupby() function is used to split the DataFrame into groups based on some criteria.
It creates a GroupBy object that contains information about how the DataFrame is split.
You typically follow groupby() with an aggregation function to perform some operation on
each group.
"""
import pandas as pd
Value
sum mean max min count
Category
A 55 13.75 20 10 4
B 90 30.00 40 20 3
C 60 30.00 35 25 2
In [ ]:
"""
String Operation:
Pandas provide a set of string functions for working with string data. The following
are the few operations on string data:
lower(): Any strings in the index or series are converted to lowercase letters.
upper(): Any strings in the index or series are converted to uppercase letters.
strip(): This method eliminates spacing from every string in the Series/index,
along with a new line.
islower(): If all of the characters in the Series/Index string are lowercase,
it returns True. Otherwise, False is returned.
isupper(): If all of the characters in the Series/Index string are uppercase,
it returns True. Otherwise, False is returned.
split(’ '): It’s a method that separates a string according to a pattern.
cat(sep=’ '): With a defined separator, it concatenates series/index items.
contains(pattern): If a substring is available in the current element,
it returns True; otherwise, it returns False.
replace(a,b): It substitutes the value b for the value a.
startswith(pattern): If all of the components in the series begin with a pattern,
it returns True.
endswith(pattern): If all of the components in the series terminate in a pattern,
it returns True.
find(pattern): It can be used to return the pattern’s first occurrence.
findall(pattern): It gives you a list of all the times the pattern appears.
swapcase: It is used to switch the lower/upper case.
Null values:
When no data is being sent to the items, a Null value/missing value can appear.
There may be no values in the respective columns, which are commonly represented as NaN.
Pandas provide several useful functions for identifying, deleting, and changing null
values in Data Frames. The following are the functions.
isnull(): isnull 's job is to return true if either of the rows has null values.
notnull(): It is the inverse of the isnull() function, returning true values for non-null
values.
dropna(): This function evaluates and removes null values from rows and columns.
fillna(): It enables users to substitute other values for the NaN values.
replace(): It’s a powerful function that can take the role of a regex, dictionary, string
, series, and more.
interpolate(): It’s a useful function for filling null values in a series or data frame.
Row and column selection: We can retrieve any row and column of the DataFrame by specifyi
ng
the names of the rows and columns. It is one-dimensional and is regarded as a Series when
you select it from the DataFrame.
Filter Data: By using some of the boolean logic in DataFrame, we can filter the data.
Count Values: Using the ‘value counts()’ option, this process is used to count the overal
l
possible combinations.
"""
In [15]:
"""
apply():
The apply() method is used to apply a function along an axis of the DataFrame or Series.
It can be used with both DataFrame and Series objects.
When applied to a DataFrame, apply() allows you to apply a function along the rows or
columns (specified by the axis parameter).
When applied to a Series, apply() allows you to apply a function element-wise to each
element in the Series.
applymap():
The applymap() method is a DataFrame method and is used to apply a function to every
element of the DataFrame.
It applies the function to each element independently, irrespective of rows or columns.
applymap() is particularly useful when you want to apply an element-wise operation to
every cell in a DataFrame.
map():
The map() method is a Series method and is used to substitute each value in a Series with
another value.
It's primarily used for mapping values from one domain to another or for substituting
specific values with other values.
map() is not applicable to DataFrames directly, only to Series.
apply() is used for applying a function along the rows or columns of a DataFrame
or element-wise to a Series.
applymap() is used specifically for applying a function to every element of a DataFrame.
map() is used for substituting each value in a Series with another value.
"""
import pandas as pd
# Sample Series
s = pd.Series(['cat', 'dog', 'bird'])
A 6
B 15
dtype: int64
A B
0 1 16
1 4 25
2 9 36
0 feline
1 canine
2 avian
dtype: object
In [16]:
"""
merge():
The merge() function in Pandas is used to merge two DataFrames based on
the values of the specified columns.It is similar to SQL join operations.
It can perform inner, outer, left, and right joins.
join():
The join() method in Pandas is used to combine columns of two potentially
differently-indexed DataFrames into a single result DataFrame.
It uses indexes to join DataFrames.
concatenate():
The concatenate() function in Pandas is used to concatenate two or more
DataFrames along rows or columns.
It does not perform any joins or merges based on values or indexes.
"""
import pandas as pd
In [ ]:
"""
How Do you optimize the performance while working with large datasets in pandas?
Load less data: While reading data using pd.read_csv(), choose only the columns
you need with the “usecols” parameter to avoid loading unnecessary data. Plus,
specifying the “chunksize” parameter splits the data into different chunks and
processes them sequentially.
Avoid loops: Loops and iterations are expensive, especially when working with
large datasets. Instead, opt for vectorized operations, as they are applied on
an entire column at once, making them faster than row-wise iterations.
Use data aggregation: Try aggregating data and perform statistical operations
because operations on aggregated data are more efficient than on the entire dataset.
Parallel processing:
Dask is a pandas-like API to work with large datasets. It utilizes multiple processes
of your system to parallely execute different data tasks.
"""
In [ ]:
"""
sort values based on columns :
"""
import pandas as pd
In [17]:
"""
different ways to filter the values in pandas:
"""
import pandas as pd
print("Boolean Indexing:")
print(bool_filtered_df)
Boolean Indexing:
Name Age Category Score
2 Charlie 35.0 A 75
3 David 40.0 C 85
Query Method:
Name Age Category Score
2 Charlie 35.0 A 75
3 David 40.0 C 85
Loc Method:
Name Age Category Score
2 Charlie 35.0 A 75
3 David 40.0 C 85
Isin Method:
Name Age Category Score
0 Alice 25.0 A 80
1 Bob 30.0 B 90
2 Charlie 35.0 A 75
4 Emma NaN B 95
Isna Method:
Name Age Category Score
4 Emma NaN B 95
String Methods:
Name Age Category Score
0 Alice 25.0 A 80
2 Charlie 35.0 A 75
3 David 40.0 C 85
4 Emma NaN B 95
In [ ]:
"""
How do you handle null or missing values in pandas?
You can use any of the following three methods to handle missing values in pandas:
dropna() – the function removes the missing rows or columns from the DataFrame.
fillna() – fill nulls with a specific value using this function.
interpolate() – this method fills the missing values with computed interpolation values.
In [ ]:
"""
Difference between fillna() and interpolate() methods
fillna():
fillna() fills the missing values with the given constant.
Plus, you can give forward-filling or backward-filling inputs to its ‘method’ parameter.
interpolate():
By default, this function fills the missing or NaN values with the linear interpolated
values. However, you can customize the interpolation technique to polynomial, time,
index, spline, etc., using its ‘method’ parameter.
The interpolation method is highly suitable for time series data, whereas fillna
is a more generic approach.
"""
In [ ]:
"""
What is Resampling?
Resampling is used to change the frequency at which time series data is reported.
Imagine you have monthly time series data and want to convert it into weekly
data or yearly, this is where resampling is used.
Converting monthly to weekly or daily data is nothing but upsampling. Interpolation
techniques are used to increase the frequencies here.
converting monthly to yearly data is termed as downsampling, where data aggregation
techniques are applied.
"""
In [18]:
"""
How do you perform one-hot encoding using pandas?
"""
import pandas as pd
df = pd.DataFrame(data)
new_df = pd.get_dummies(df.Name)
new_df.head()
Out[18]:
In [19]:
import pandas as pd
EmpData=pd.DataFrame({'Name': ['ram','ravi','sham','sita','gita'],
'id': [101,102,103,104,105],
'Gender': ['M','M','M','F','F'],
'Age': [21,25,24,28,25]
})
print(EmpData)
# Replacing values in data globally for all the columns
# Wherever you find values, replace them: M-->Male, and 21-->22
EmpDataReplaced=EmpData.replace(to_replace={'M':'Male', 21:30}, inplace=False)
EmpDataReplaced
3 sita 104 F 28
4 gita 105 F 25
In [20]:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
'Age': [25, 30, 35, 40, None],
'Category': ['A', 'B', 'A', 'C', 'B'],
'Score': [80, 90, 75, 85, 95]
}
df = pd.DataFrame(data)
In [21]:
df.loc[(df['Age'] >= 30) & (df['Age'] < 40), 'Age'] = 35
Numpy questions
In [22]:
"""
Main datastructures in Nuture of Numpy ?
The main data structure in NumPy is the ndarray, short for n-dimensional array.
It is a powerful data structure that allows for efficient storage and manipulation
of arrays containing homogeneous data (data of the same type).
Fixed Size: The size of a NumPy array is fixed upon creation, meaning you cannot
resize it like a Python list.
Efficient Computation: NumPy arrays are implemented in C, which allows for efficient
computation and vectorized operations.
Indexing and Slicing: Similar to Python lists, you can access elements of a NumPy array u
sing
indexing and slicing.
"""
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print("NumPy Array:")
print(arr)
print("Type of array:", type(arr))
print("Data type of elements:", arr.dtype)
print("Shape of array:", arr.shape)
NumPy Array:
[1 2 3 4 5]
Type of array: <class 'numpy.ndarray'>
Data type of elements: int64
Shape of array: (5,)
In [ ]:
"""
NumPy is a fundamental library in data science and machine learning.
Random Number Generation: NumPy provides functions for generating random numbers and
sampling from various probability distributions. This is useful for simulating data,
bootstrapping, and conducting statistical experiments in data science.
Integration with Other Libraries: NumPy is a foundational library in the Python ecosystem
and is extensively used by other libraries and frameworks in data science, such as Pandas
,
SciPy, Matplotlib, and scikit-learn. These libraries often accept NumPy arrays as input o
r
return NumPy arrays as output.
NumPy is implemented in C and Fortran, which makes it efficient for numerical computation
s.
It also provides interfaces to libraries written in these languages, enabling seamless
integration with existing computational libraries and frameworks.
NumPy provides a rich set of functions for linear algebra operations, including matrix
multiplication, eigenvalue decomposition, singular value decomposition, and solving linea
r
systems of equations. These operations are fundamental in many data science applications,
such as machine learning and optimization.
"""
In [ ]:
"""
There are several reasons why NumPy is an important library in Python:
Easy to use:
NumPy provides a simple and intuitive interface for working with numerical data in Python
. Its
syntax is similar to Python's built-in data types and it integrates well with other libra
ries,
such as Matplotlib for visualization.
Flexibility:
NumPy arrays can be used to store data of any type and can be easily resized or reshaped
to fit
the needs of your application.
"""
In [ ]:
"""
why numpy preferd over Matlab, Octave ?
In [ ]:
"""
How are NumPy arrays better than Python’s lists?
Python lists support storing heterogeneous data types whereas NumPy arrays can store data
types
of one nature itself.
NumPy provides extra functional capabilities that make operating on its arrays easier whi
ch makes
NumPy array advantageous in comparison to Python lists as those functions cannot be opera
ted on
heterogeneous data.
NumPy arrays are treated as objects which results in minimal memory usage. Since Python k
eeps
track of objects by creating or deleting them based on the requirements, NumPy objects ar
e also
treated the same way. This results in lesser memory wastage.
NumPy provides various powerful and efficient functions for complex computations on the a
rrays.
"""
In [25]:
"""
what are the different types of data types in numpy?
"""
import numpy as np
In [ ]:
"""
Here are a few examples of situations where NumPy might be useful:
Scientific computing:
NumPy provides a number of functions and features that are useful for scientific computi
ng tasks,
such as numerical integration, linear algebra, and random number generation.
Data analysis:
NumPy is often used as a foundation for other libraries that are used for data analysis,
such as
Pandas and SciPy. It provides functions for reading and writing data to and from files,
as well
as functions for performing statistical analysis and manipulating data.
Machine learning:
NumPy is frequently used in machine learning tasks, such as preparing data, creating tra
ining and
testing sets, and implementing algorithms. It provides a number of functions that are us
eful for
these tasks, such as matrix multiplication and element-wise operations.
Image processing:
NumPy is often used for image processing tasks, such as resizing and cropping images, as
well as
applying filters and transformations. It provides functions for working with arrays of p
ixel values,
which can be used to represent images.
Data visualization:
NumPy can be used to create data visualizations such as histograms, scatter plots, and l
ine plots.
It provides functions for generating data to be plotted as well as functions for creatin
g plots
using Matplotlib or other visualization libraries.
Data manipulation:
NumPy provides functions for efficiently manipulating large arrays of data, such as sele
cting
specific elements or subarrays, sorting, and reshaping.
Optimization:
NumPy provides functions for minimizing or maximizing objective functions, such as NumPy
.argmin
and NumPy.argmax, which can be used to find the optimal parameters for a given model.
Signal processing:
NumPy provides functions for performing tasks such as filtering, convolution, and correl
ation,
which are commonly used in signal processing.
Text processing:
NumPy can be used to encode and decode text data for use in natural language processing t
asks.
Financial modeling:
NumPy can be used to perform financial modeling tasks, such as calculating returns, risk
, and
portfolio optimization.
Simulation:
NumPy can be used to generate random numbers and perform simulations, such as Monte Carl
o simulations.
Computer vision:
NumPy can be used to process and manipulate images and video data for use in computer vi
sion tasks.
"""
In [26]:
"""
"""
Difference between the mean() and average in numpy :
np.mean():
np.average():
Computes the weighted average of the elements in the array if the weights parameter is sp
ecified.
Allows for the elements to contribute unequally to the final average based on their weigh
ts.
Useful when you want to give different importance to different elements in the array.
"""
import numpy as np
# Example data
data = np.array([1, 2, 3, 4, 5])
#
weights = np.array([0.1, 0.2, 0.3, 0.2, 0.2])
weighted_average = np.average(data, weights=weights)
print("Weighted Average (np.average()):", weighted_average)
In [27]:
"""
How do you count the frequency of a given positive value appearing in the NumPy array?
"""
import numpy as np
value_to_count = 2
In [28]:
"""
How is arr[:,0] different from arr[:,[0]] give two example similar to this ?
"""
import numpy as np
result = arr[:, 0]
arr[:, 0]:
[1 4 7]
Shape of result: (3,)
In [29]:
import numpy as np
arr[:, [0]]:
[[1]
[4]
[7]]
Shape of result: (3, 1)
In [30]:
"""
Vectorization in NumPy refers to the ability to apply operations element-wise on
entire arrays, which is more efficient than using traditional Python loops. It
leverages optimized C and Fortran code under the hood to execute these operations efficie
ntly.
"""
import numpy as np
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
result = a + b
In [31]:
"""
convert data frame into array ?
"""
import pandas as pd
import numpy as np
numpy_array = df.values
print("NumPy Array:")
print(numpy_array)
NumPy Array:
[[1 4 7]
[2 5 8]
[3 6 9]]
In [ ]:
"""
How is Vectorization related to Broadcasting in NumPy?
Vectorization involves delegating NumPy operations internally to optimized C language
functions to result in faster Python code. Whereas Broadcasting refers to the methods
that allow NumPy to perform array-related arithmetic operations. The size or shape of
the arrays does not matter in this case. Broadcasting solves the problem of mismatched
shaped arrays by replicating the smaller array along the larger array to ensure both
arrays are having compatible shapes for NumPy operations. Performing Broadcasting before
Vectorization helps to vectorize operations which support arrays of different dimensions.
"""
In [32]:
"""
Write a program to repeat each of the elements five times for a given array.
"""
import numpy as np
Original Array: [1 2 3 4 5]
Array with each element repeated five times: [1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5
5 5 5 5]
In [33]:
""" how to add zero at border in numpy """
import numpy as np
print("Existing Array:")
print(existing_array)
Existing Array:
[[1 2 3]
[4 5 6]
[7 8 9]]
In [34]:
"""
how to split array into different parts in numpy ?
"""
import numpy as np
Original Array: [1 2 3 4 5 6 7 8 9]
Sub-Arrays: [array([1, 2, 3]), array([4, 5, 6]), array([7, 8, 9])]
In [35]:
"""
How to rehshape and resize array in numpy ?
"""
import numpy as np
print("Resized Array:")
print(resized_arr)
Resized Array:
[[1 2 3 4]
[5 6 1 2]
[3 4 5 6]]
Reshaped Array:
[[1 2]
[3 4]
[5 6]]
In [36]:
import numpy as np
Original Array: [3 1 2 5 4]
Sorted Array: [1 2 3 4 5]
In [37]:
import numpy as np
values = list(data_dict.values())
numpy_array = np.array(values)
In [38]:
In [38]:
"""
Arrays (numpy.ndarray):
Arrays can have any number of dimensions (1D, 2D, 3D, etc.).
Arrays are the fundamental data structure in NumPy.
Arrays support element-wise operations.
Arrays are more flexible and commonly used in numerical computing and data analysis.
You can create arrays using np.array() function.
Matrices (numpy.matrix):
Matrices are a subclass of arrays and always have exactly two dimensions (rows and column
s).
Matrices support matrix multiplication with the * operator.
Matrices have some additional methods like I for computing the inverse and T for
computing the transpose.
Matrices can be less flexible compared to arrays, especially when dealing with operations
beyond linear algebra.
You can create matrices using np.matrix() function.
"""
import numpy as np
print("Array:")
print(array_a)
print("Type of array:", type(array_a))
print("\n Matrix:")
print(matrix_b)
print("Type of matrix:", type(matrix_b))
Array:
[[1 2]
[3 4]]
Type of array: <class 'numpy.ndarray'>
Matrix:
[[1 2]
[3 4]]
Type of matrix: <class 'numpy.matrix'>
In [ ]:
In [ ]: