DS Unit 3 Part 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

4-2 B.

Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

UNIT-3
Python for Data Handling
Syllabus: UNIT III:part-1
Python for Data Handling: Basics of Numpy arrays – aggregations – computations on arrays
– comparisons, masks, Boolean logic – fancy indexing – structured arrays

Introduction to NumPy Arrays


 Datasets can come from a wide range of sources and a wide range of
formats, including collections of documents, collections of images,
collections of sound clips, collections of numerical measurements, or
nearly anything else. Despite this apparent heterogeneity, it will help us
to think of all data fundamentally as arrays of numbers.
 For example, images—particularly digital images—can be thought of as
simply two dimensional arrays of numbers representing pixel brightness
across the area. Sound clips can be thought of as one-dimensional arrays
of intensity versus time. Text can be converted in various ways into
numerical representations, perhaps binary digits representing the
frequency of certain words or pairs of words. No matter what the data are,
the first step in making them analyzable will be to transform them into
arrays of numbers.
 For this reason, efficient storage and manipulation of numerical arrays is
absolutely fundamental to the process of doing data science
 NumPy (short for Numerical Python) provides an efficient interface to
store and operate on dense data buffers.
 In some ways, NumPy arrays are like Python’s built-in list type, but
NumPy arrays provide much more efficient storage and data operations as
the arrays grow larger in size.
 NumPy arrays form the core of nearly the entire ecosystem of data
science tools in Python.
 NumPy in Python is a library that is used to work with arrays and was
created in 2005 by Travis Oliphant.
 NumPy library in Python has functions for working in domain of Fourier
transform, linear algebra, and matrices
 In particular, NumPy arrays provide an efficient way of storing and
manipulating data.NumPy also includes a number of functions that make
it easy to perform mathematical operations on arrays. This can be really
useful for scientific or engineering applications.
Basics of Numpy Arrays
 Categories of basic array manipulations are:
1. Attributes of arrays
Determining the size, shape, memory consumption, and data types of
arrays

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 1
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

2. Indexing of arrays
Getting and setting the value of individual array elements
3. Slicing of arrays
Getting and setting smaller subarrays within a larger array
4. Reshaping of arrays
Changing the shape of a given array
5. Joining and splitting of arrays
Combining multiple arrays into one, and splitting one array into many
NumPy Array Attributes
 some useful array attributes are:
ndim : the number of dimensions
shape :the size of each dimension
size :the total size of the array
dtype: the data type of the array
itemsize: lists the size (in bytes) of each array element,
nbytes: lists the total size (in bytes) of the array
 In general, nbytes is equal to itemsize times size.
# Python program to demonstrate Attribute of arrays
import numpy as np
# Creating array object
arr = np.array( [[ 1, 2, 3],
[ 4, 5, 6]] )
# Printing array dimensions (axes)
print("No. of dimensions: ", arr.ndim)
# Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size)
# Printing type of elements in array
print("Array Elements type: ", arr.dtype)
# Printing size of each elements in array
print("Size of array elment: ", arr.itemsize,"bytes")
# Printing total size of array
print("Total size of array: ",arr.nbytes ,"bytes")

Array Indexing: Accessing Single Elements


 Indexing in NumPy is quite similar to Python’s standard list indexing.
 In a one-dimensional array, we can access the ith value (counting from
zero) by specifying the desired index in square brackets, just as with
Python lists
Example:
Import numpy as np
a=np. array([5, 0, 3, 3, 7, 9])
Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 2
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

print(a[0])
print(a[4])
Output:5
7
 To index from the end of the array, we can use negative indices:
print(a[-1])
print(a[-2])
Output:9
7
 In a multidimensional array, we access items using a comma-separated
tuple of indices:
Example: import numpy as np
a=np.array([[3, 5, 2, 4],
[7, 6, 8, 8],
[1, 6, 7, 7]])
Print(a[0,0])
Print(a[2,0])
Print(a[2,-1])
Output: 3
1
7
 We can also modify values using any of the above index notation:
a[0, 0] = 12

Output: [[12, 5, 2, 4],


[ 7, 6, 8, 8],
[ 1, 6, 7, 7]])
Array Slicing: Accessing Subarrays
 Just as we can use square brackets to access individual array elements, we
can also use them to access subarrays with the slice notation, marked by
the colon (:) character.
 The NumPy slicing syntax follows that of the standard Python list; to
access a slice of an array x, use this: x[start:stop:step] If any of these are
unspecified, they default to the values start=0, stop=size of dimension,
step=1.
 Example: One-dimensional subarrays
import numpy as np
x = np.arange(10)
print(x)
print( x[:5]) # first five elements
print( x[5:]) # elements after index 5
print(x[4:7]) # middle subarray
print(x[::2]) # every other element

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 3
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

print( x[1::2]) # every other element, starting at index 1


print(x[::-1]) # all elements, reversed
print( x[5::-2]) # reversed every other from index 5
Output: [0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4]
[5 6 7 8 9]
[4 5 6]
[0 2 4 6 8]
[1 3 5 7 9]
[9 8 7 6 5 4 3 2 1 0]
[5 3 1]
 Example: Multidimensional subarrays
import numpy as np
x2=np.array([[12, 5, 2, 4],
[ 7, 6, 8, 8],
[ 1, 6, 7, 7]])
print( x2[:2, :3]) # two rows, three columns
print(x2[:3, ::2]) # all rows, every other column
print(x2[::-1, ::-1]) # subarray dimensions reversed together
Output: [[12 5 2]
[ 7 6 8]]

[[12 2]
[ 7 8]
[ 1 7]]

[[ 7 7 6 1]
[ 8 8 6 7]
[ 4 2 5 12]]
Accessing array rows and columns.
 One commonly needed routine is accessing single rows or columns of an
array. We can do this by combining indexing and slicing, using an empty
slice marked by a single colon (:):
 Example:
print(x2[:, 0]) # first column of x2 [12 7 1]
print(x2[0, :]) # first row of x2 [12 5 2 4]
 In the case of row access, the empty slice can be omitted for a more
compact syntax:
print(x2[0]) # equivalent to x2[0, :] [12 5 2 4]

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 4
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

Subarrays as no-copy views


 One important—and extremely useful—thing to know about array slices
is that they return views rather than copies of the array data.
 This is one area in which NumPy array slicing differs from Python list
slicing: in lists, slices will be copies.
 Consider our two-dimensional array from before:
print(x2) Output: [[12 5 2 4] [ 7 6 8 8] [ 1 6 7 7]]
x2_sub = x2[:2, :2]
print(x2_sub) Ougtput: [[12 5] [ 7 6]]
 Now if we modify this subarray, we’ll see that the original array is
changed.
Example:
x2_sub[0, 0] = 99
print(x2_sub) Output: [[99 5] [ 7 6]]
print(x2) Output: [[99 5 2 4] [ 7 6 8 8] [ 1 6 7 7]]
 This default behavior is actually quite useful: it means that when we work
with large datasets, we can access and process pieces of these datasets
without the need to copy the underlying data buffer.
 Creating copies of arrays Despite the nice features of array views, it is
sometimes useful to instead explicitly copy the data within an array or a
subarray.
 This can be most easily done with the copy() method:
Example:
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy) Output: [[99 5] [ 7 6]]
 If we now modify this subarray, the original array is not touched:
Example:
x2_sub_copy[0, 0] = 42
print(x2_sub_copy) Output:[[42 5] [ 7 6]]
print(x2) Output: [[99 5 2 4] [ 7 6 8 8] [ 1 6 7 7]]
Reshaping of Arrays
 Another useful type of operation is reshaping of arrays.
 The most flexible way of doing this is with the reshape() method.
 For example:
import numpy as np
grid = np.arange(1, 10).reshape((3, 3))
print(grid)
Output: [[1 2 3]
[4 5 6]
[7 8 9]]
 Note that for this to work; the size of the initial array must match the size
of the reshaped array.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 5
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

 Where possible, the reshape method will use a no-copy view of the initial
array, but with noncontiguous memory buffers this is not always the case.
 Another common reshaping pattern is the conversion of a one-
dimensional array into a two-dimensional row or column matrix. You can
do this with the reshape method, or more easily by making use of the
newaxis keyword within a slice operation
Example:
import numpy as np
x = np.array([1, 2, 3])
# row vector via reshape
x.reshape((1, 3))
# row vector via newaxis
x[np.newaxis, :]
# column vector via reshape
x.reshape((3, 1))
# column vector via newaxis
x[:, np.newaxis]
Output: [[1 2 3]]

[[1 2 3]]

[[1]
[2]
[3]]

[[1]
[2]
[3]]
Array Concatenation and Splitting
 It’s also possible to combine multiple arrays into one, and to conversely
split a single array into multiple arrays.
 Concatenation of arrays: Concatenation, or joining of two arrays in
NumPy, is primarily accomplished through the routines np.concatenate,
np.vstack, and np.hstack.
 np.concatenate takes a tuple or list of arrays as its first argument
 Example:
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
print(np.concatenate([x, y]))
Output: [1 2 3 4 5 6]
 We can also concatenate more than two arrays at once:

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 6
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

Example:
z = [99, 99, 99]
print(np.concatenate([x, y, z]))
Output: [ 1 2 3 4 5 6 99 99 99]
 np.concatenate can also be used for two-dimensional arrays:
 Example:
import numpy as np
grid = np.array([[1, 2, 3], [4, 5, 6]])
# concatenate along the first axis
np.concatenate([grid, grid])
Output: [[1, 2, 3], [4, 5, 6], [1, 2, 3], [4, 5, 6]]
# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)
Output: [[1, 2, 3, 1, 2, 3], [4, 5, 6, 4, 5, 6]]
 For working with arrays of mixed dimensions, it can be clearer to use the
np.vstack (vertical stack) and np.hstack (horizontal stack) functions:
Example:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7], [6, 5, 4]])
# vertically stack the arrays
np.vstack([x, grid])
Output: [[1, 2, 3], [9, 8, 7], [6, 5, 4]])
# horizontally stack the arrays
y = np.array([[99], [99]])
np.hstack([grid, y])
Ouput:[[ 9, 8, 7, 99], [ 6, 5, 4, 99]])
 Similarly, np.dstack will stack arrays along the third axis.
Splitting of arrays
 The opposite of concatenation is splitting, which is implemented by the
functions np.split, np.hsplit, and np.vsplit.
 For each of these, we can pass a list of indices giving the split points:
Example
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)
[1 2 3] [99 99] [3 2 1]
 Notice that N split points lead to N + 1 subarrays.
 The related functions np.hsplit and np.vsplit are similar:
Example:
grid = np.arange(16).reshape((4, 4))
Output: [[ 0, 1, 2, 3],
[ 4, 5, 6, 7],

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 7
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

[ 8, 9, 10, 11],
[12, 13, 14, 15]])
upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)
Ouput:[[0 1 2 3]
[4 5 6 7]]

[[ 8 9 10 11]
[12 13 14 15]]
left, right = np.hsplit(grid, [2])
print(left)
print(right)
Output:
[[ 0 1]
[ 4 5]
[ 8 9]
[12 13]]

[[ 2 3]
[ 6 7]
[10 11]
[14 15]]
 Similarly, np.dsplit will split arrays along the third axis.
Aggregations
 Aggregations are used to compute summary statistics for the data in
question.
 Perhaps the most common summary statistics are the mean and standard
deviation, which allows to summarize the “typical” values in a dataset,
but other aggregates are useful as well (the sum, product, median,
minimum and maximum, quantiles, etc.).
Summing the Values in an Array
 np.sum() function is used for computing the sum of all values in an array.
 Example:
import numpy as np
arr = np.array([2,4,6,8])
b = np.sum(arr)
print(b)
Output: 20
Minimum and Maximum
 np.min(), np.max() are used to find minimum and maximum in given array
 Example:
import numpy as np
arr = np.array([2,4,6,8])

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 8
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

print(np.min(arr))
print(np.max(arr))
output:
2
8
Multidimensional aggregates
 One common type of aggregation operation is an aggregate along a row
or column.
 By default, each NumPy aggregation function will return the aggregate
over the entire array:
 Aggregation functions take an additional argument specifying the axis
along which the aggregate is computed.
 For example, we can find the minimum value within each column by
specifying axis=0: Similarly, we can find the maximum value within each
row by specifying axis=1
 Example:
import numpy as np
a = np.array([[1,3,5,7],
[2,4,6,8]])
print(a.max())
print(np.max(a))
print(a.max(axis=0))
print(a.max(axis=1))
Output:

8
8
[2 4 6 8]
[7 8]
 The axis keyword specifies the dimension of the array that will be
collapsed, rather than the dimension that will be returned. So specifying
axis=0 means that the first axis will be collapsed: for two-dimensional
arrays, this means that values within each column will be aggregated.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 9
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

Computation on NumPy Arrays: Universal Functions


 NumPy provides an easy and flexible interface to optimized computation
with arrays of data.
 Computation on NumPy arrays can be very fast, or it can be very slow.
The key to making it fast is to use vectorized operations, generally
implemented through NumPy’s universal functions (ufuncs).
 These Universal (mathematical NumPy functions) operate on the NumPy
Array and perform element-wise operations on the data values.
 For many types of operations, NumPy provides a convenient interface
into just this kind of statically typed, compiled routine. This is known as a
vectorized operation.
 ufuncs are used to implement vectorization in NumPy which is way faster
than iterating over elements.
 This vectorized approach is designed to push the loop into the compiled
layer that underlies NumPy, leading to much faster execution.
 Vectorized operations in NumPy are implemented via ufuncs, whose
main purpose is to quickly execute repeated operations on values in
NumPy arrays. Ufuncs are extremely flexible.
 Another means of vectorizing operations is to use NumPy’s broadcasting
functionality. Broadcasting is simply a set of rules for applying binary
ufuncs (addition, subtraction, multiplication, etc.) on arrays of different
sizes.
 Computations using vectorization through ufuncs are nearly always more
efficient than their counterpart implemented through Python loops,
especially as the arrays grow in size.
 These functions contain standard trigonometric functions, arithmetic
operations, complex number handling, statistical functions, and so forth.
 The following are some of the characteristics of universal functions:

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 10
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

 These functions work with ndarray (N-dimensional array), which


is Numpy's array class.
 It provides quick array operations on elements.
 It provides a variety of functions such as array broadcasting, type
casting, and so on.
 Numpy universal functions are objects in the numpy.ufunc class.
 Python functions can also be made universal by utilizing the
frompyfunc library function.
 When the corresponding array arithmetic operator is applied, some
ufuncs are called automatically. When two arrays are added
element by element using the '+' operator, np.add() is called
internally.
 Ufuncs exist in two flavors:
 unary ufuncs, which operate on a single input
 binary ufuncs, which operate on two inputs.

 Example:
import numpy as np
x=np.array([1,2,3,4])
y=np.array([4,5,6,7])
z=np.add(x,y)
print(z)
Output: [ 5 7 9 11]
Absolute value
 Just as NumPy understands Python’s built-in arithmetic operators, it also
understands Python’s built-in absolute value function:
Example:
Import numpy as np
x = np.array([-2, -1, 0, 1, 2])
print(np.abs(x))
Output: [2 1 0 1 2]

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 11
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

Trigonometric functions
 NumPy provides a large number of useful ufuncs, and some of the most
useful for the data scientist are the trigonometric functions.
 Example:
theta = np.linspace(0, np.pi, 3)
print("theta = ", theta)
print("sin(theta) = ", np.sin(theta))
print("cos(theta) = ", np.cos(theta))
print("tan(theta) = ", np.tan(theta))
Output:
theta = [ 0. 1.57079633 3.14159265]
sin(theta) = [ 0.00000000e+00 1.00000000e+00 1.22464680e-16]
cos(theta) = [ 1.00000000e+00 6.12323400e-17 -1.00000000e+00]
tan(theta) = [ 0.00000000e+00 1.63312394e+16 -1.22464680e-16]
 Inverse trigonometric functions are also available:
Example:
import numpy as np
x = [-1, 0, 1]
print("x = ", x)
print("arcsin(x) = ", np.arcsin(x))
print("arccos(x) = ", np.arccos(x))
print("arctan(x) = ", np.arctan(x))
Output:
x = [-1, 0, 1]
arcsin(x) = [-1.57079633 0. 1.57079633]
arccos(x) = [3.14159265 1.57079633 0. ]
arctan(x) = [-0.78539816 0. 0.78539816]
Exponents and logarithms
 Another common type of operation available in a NumPy ufunc are the
exponentials:
Example:
import numpy as np
x = [1, 2, 3]
print("x =", x)
print("e^x =", np.exp(x))
print("2^x =", np.exp2(x))
print("3^x =", np.power(3, x))
Output:
x = [1, 2, 3]
e^x = [ 2.71828183 7.3890561 20.08553692]
2^x = [ 2. 4. 8.]
3^x = [ 3 9 27]
Advanced Ufunc Features
Some of specialized features of ufuncs are:.
Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 12
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

Specifying output
 For large calculations, it is sometimes useful to be able to specify the
array where the result of the calculation will be stored. Rather than
creating a temporary array, we can use this to write computation results
directly to the required memory location where we would like them to be.
x = np.arange(5)
y = np.empty(5)
np.multiply(x, 10, out=y)
print(y)
Output: [ 0. 10. 20. 30. 40.]
Aggregates
 For binary ufuncs, there are some interesting aggregates that can be
computed directly from the object.
 For example, if we’d like to reduce an array with a particular operation,
we can use the reduce method of any ufunc. A reduce repeatedly applies a
given operation to the elements of an array until only a single result
remains.
 For example, calling reduce on the add ufunc returns the sum of all
elements in the
array:
x = np.arange(1, 6)
np.add.reduce(x)
Output: 15
 If we’d like to store all the intermediate results of the computation, we
can instead use accumulate:
np.add.accumulate(x)
Output: [ 1 3 6 10 15]
Outer products
 Any ufunc can compute the output of all pairs of two different inputs
using the outer method.
Example:
x = np.arange(1, 6)
np.multiply.outer(x, x)
Ouput:
[[ 1, 2, 3, 4, 5],
[ 2, 4, 6, 8, 10],
[ 3, 6, 9, 12, 15],
[ 4, 8, 12, 16, 20],
[ 5, 10, 15, 20, 25]])

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 13
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

Broadcasting
 Broadcasting is means of vectorizing Operations.
 Broadcasting is simply a set of rules for applying binary ufuncs
(addition, subtraction, multiplication, etc.) on arrays of different sizes.
 Broadcasting allows binary operations to be performed on arrays of
different sizes.
 Broadcasting is a mechanism that allows Numpy to handle arrays of
different shapes during arithmetic operations.
 In broadcasting, we can think of it as a smaller array being “broadcasted”
into the same shape as the larger array, before doing certain operations. In
general, the smaller array will be copied multiple times, until it reaches the
same shape as the larger array.
 Using broadcasting allows for vectorization, a style of programming that
works with entire arrays instead of individual elements
 Broadcasting is usually fast, since it vectorizes array operations so that
looping occurs in optimized C code instead of the slower Python. In
addition, it doesn’t really require storing all copies of the smaller array;
instead, there are faster and more efficient algorithms to store that.
 The central idea around broadcasting is that it tries to copy the data
contained within the smaller array to match the shape of the larger array.
 Example 1:For example, we can just as easily add a scalar (think of it as a
zero dimensional array) to an array:
import numpy as np
a = np.array([0, 1, 2])
print(a + 5)
Output: [5, 6, 7]
We can think of this as an operation that stretches or duplicates the value
5 into the array [5, 5, 5], and adds the results.
Example2:
import numpy as np
a = np.array([0, 1, 2])
M = np.ones((3, 3))
print(M + a)
Output: [[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.]])

 The advantage of NumPy’s broadcasting is that this duplication of values


does not actually take place, but it is a useful mental model as we think
about broadcasting.
 In broadcasting, the smaller array is broadcast to the larger array to make
their shapes compatible with each other.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 14
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

Visualization of NumPy broadcasting


In above diagram, the light boxes represent the broadcasted values: this
extra memory is not actually allocated in the course of the operation, but
it can be useful conceptually to imagine that it is.
Rules of Broadcasting
 Broadcasting in NumPy follows a strict set of rules to determine the
interaction between the two arrays:
 Rule 1: If the two arrays differ in their number of dimensions, the
shape of the one with fewer dimensions is padded with ones on its
leading (left) side.
 Rule 2: If the shape of the two arrays does not match in any
dimension, the array with shape equal to 1 in that dimension is
stretched to match the other shape.
 Rule 3: If in any dimension the sizes disagree and neither is equal
to 1, an error is raised.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 15
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 16
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

 A set of arrays is said to be compatible with broadcasting (


broadcastable) if the one of the following is true:
 Arrays have exactly the same shape.
 Arrays have the same number of dimensions and the length of each
dimension is either a common length or 1.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 17
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

 Array having too few dimensions can have its shape prepended
with a dimension of length 1, so that the above stated property is
true.
Uses of Broadcasting
(Broadcasting in Practice)
Centering an array:
 One com monly seen example is centering an array of data.
 Example: Imagine you have an array of 10 observations, each of which
consists of 3 values. We will store this in a 10×3 array:
X = np.random.random((10, 3))
We can compute the mean of each feature using the mean aggregate
across the first dimension:
Xmean = X.mean(0)
print(Xmean)
And now we can center the X array by subtracting the mean
X_centered = X - Xmean
To double-check that we’ve done this correctly, we can check that the
centered array has near zero mean:
print(X_centered.mean(0))
To within-machine precision, the mean is now zero.
The entire program is:
import numpy as np
X = np.random.random((10, 3))
Xmean = X.mean(0)
print(Xmean)
X_centered = X - Xmean
print(X_centered.mean(0))
Output: [ 0.53514715, 0.66567217, 0.44385899])
[ 2.22044605e-17, -7.77156117e-17, -1.66533454e-17])
Plotting a two-dimensional function:
 One place that broadcasting is very useful is in displaying images based
on two dimensional functions.
 If we want to define a function z = f(x, y), broadcasting can be used to
compute the function across the grid:
# x and y have 50 steps from 0 to 5
x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 50)[:, np.newaxis]
z = np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)
 We can use Matplotlib to plot this two-dimensional array

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 18
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

Comparisons, Masks, and Boolean Logic


Comparisons:
 NumPy also implements all six of standard comparison operators such as
< (less than) and > (greater than) as element-wise ufuncs.
 The result of these comparison operators is always an array with Boolean
data type called Boolean array.
 As in the case of arithmetic operators, the comparison operators are
implemented as ufuncs in NumPy; for example, when you write x < 3,
internally NumPy uses np.less(x, 3).
 A summary of the comparison operators and their equivalent ufunc is
shown here:

 Example 1:
x = np.array([1, 2, 3, 4, 5])
print(x < 3) # less than operator
Output: [True, True, False, False, False]
 Example 2:
x = np.array([1, 2, 3, 4, 5])
print(np.less(x,3)) # less than ufunc
Output: [True, True, False, False, False]
Working with Boolean Arrays:
Counting entries:
 To count the number of True entries in a Boolean array,
np.count_nonzero is useful:
# how many values less than 6?
rng = np.random.RandomState(0)
x = rng.randint(10, size=(3, 4))
print(x)
print(np.less(x,6))
print(np.count_nonzero(x < 6)))
Output:
[[5 0 3 3]
[7 9 3 5]
[2 4 7 6]]
[[ True True True True]

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 19
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

[ False False True True]


[ True True False False]]
8
 Another way to get at this information is to use np.sum; in this case,
False is interpreted as 0, and True is interpreted as 1:
print( np.sum(x < 6))
output: 8
 The benefit of sum() is that like with other NumPy aggregation functions,
this summation can be done along rows or columns as well:
Ex: # how many values less than 6 in each row?
print(np.sum(x < 6, axis=1))
Output: [4, 2, 2]
This counts the number of values less than 6 in each row of the matrix.
Boolean Operators
 We can combine the comparison operators using Python’s bitwise logic
operators, &, |, ^, and ~.
 Like with the standard arithmetic operators, NumPy overloads these as
ufuncs that work element-wise on (usually Boolean) arrays.
 The following table summarizes the bitwise Boolean operators and their
equivalent ufuncs:

 Example:
import numpy as np
a =np.arange(10)
print(a)
#bitwise or operatot
b=((a<=2) | (a>=8))
print(b)
d=np.sum(b)
print(d)
#bitwise or ufunc
c=np.bitwise_or(a<=2,a>=8)
print(c)
e=np.sum(c)
print(e)

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 20
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

Output:
[0 1 2 3 4 5 6 7 8 9]
[ True True True False False False False False True True]
5
[ True True True False False False False False True True]
5
Boolean Masking (Boolean Indexing)
(Boolean Arrays as Masks)
 Boolean masks are used to examine and manipulate values within NumPy
arrays.
 Masking comes up when we want to extract, modify, count, or otherwise
manipulate values in an array based on some criterion: for example,
counting all values greater than a certain value, or perhaps remove all
outliers that are above some threshold.
 In NumPy, Boolean masking is often the most efficient way to
manipulate values in an array based on some criterion.
 Boolean masking, also called boolean indexing, is a feature in Python
NumPy that allows for the filtering of values in numpy arrays.
 Numpy allows us to use an array of boolean values as an index of another
array.
 Each element of the boolean array indicates whether or not to select the
elements from the array.
 If the value is True, the element of that index is selected. In case the value
is False, the element of that index is not selected.
 Example 1:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([True, True, False])
c = a[b]
print(c)
Output: [1 2]
 Example 2:
import numpy as np
a = np.arange(1, 10)
b=a>5
print(b)
c = a[b]
print(c)
Output:
[False False False False False True True True True]
[6 7 8 9]

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 21
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

Fancy Indexing
 Fancy indexing is a new style of array indexing, where we pass arrays of
indices in place of single scalars. This allows us to very quickly access
and modify complicated subsets of an array’s values.
 Fancy indexing means passing an array of indices to access multiple
array elements at once.
 For example, consider the following array:
import numpy as np
x = np.arange(1,11)
print(x)
Output:[ 1 2 3 4 5 6 7 8 9 10]
 Suppose we want to access three different elements. We could do it like this:
print(x[3], x[7], x[2])
Output: 4 8 3
 Alternatively, we can pass a single list or array of indices to obtain the same
result:
ind = [3, 7, 2]
print(x[ind])
Outut: [4 8 3]
 With fancy indexing, the shape of the result reflects the shape of the index
arrays rather than the shape of the array being indexed:
Example: ind = np.array([[3, 7],
[4, 5]])
print(x[ind])
Output: [[4 8]
[5 6]]
 Fancy indexing also works in multiple dimensions.
Example:
import numpy as np
X = np.arange(12).reshape((3, 4))
print(X)
Output: [[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Like with standard indexing, the first index refers to the row, and the
second to the column:
row = np.array([0, 1, 2])
col = np.array([2, 1, 3])
print(X[row, col])
Output: [ 2, 5, 11])

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 22
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

Combined Indexing
(Combining fancy indexing with other indexing schemes)
 For even more powerful operations, fancy indexing can be combined with
the other indexing schemes
 We can combine fancy and simple indices:
import numpy as np
X = np.arange(12).reshape((3, 4))
print(X)
a= X[2, [2, 0, 1]]
print(a)
Output: : [[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]

[10, 8, 9]
 We can also combine fancy indexing with slicing:
b=X[1:, [2, 0, 1]]
print(b)
Output:
[[ 6, 4, 5],
[10, 8, 9]]
 We can combine fancy indexing with masking:
mask = np.array([True, False, True, False])
row = np.array([0, 1, 2])
c= X[row[:, np.newaxis], mask]
print(c)
Output: [[ 0, 2],
[ 4, 6],
[ 8, 10]]
 All of these indexing options combined lead to a very flexible set of
operations for accessing and modifying array values.
Structured Arrays
 Structure arrays are arrays with compound data types.
 They provide efficient storage for compound, heterogeneous data.
 Example:
import numpy as np
# Use a compound data type for structured arrays
data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),
'formats':('U10', 'i4', 'f8')})
print(data.dtype)
#storing data in three separate arrays
name = np.array( ['Kumar', 'Rao', 'Ali', 'Singh'])

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 23
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

age = np.array([25, 45, 37, 19])


weight =np.array( [55.0, 85.5, 68.0, 61.5])
# filling the array with our lists of values
data['name'] = name
data['age'] = age
data['weight'] = weight
print(data)
Output:
[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]
[('Kumar', 25, 55. ) ('Rao', 45, 85.5) ('Ali', 37, 68. ) ('Singh', 19, 61.5)]

 The handy thing with structured arrays is that we can now refer to values
either by index or by name:
 Example:
# Get all names
print(data['name'])
# Get first row of data
print(data[0])
# Get the name from the last row
print(data[-1]['name'])
# Get names where age is under 30
print(data[data['age'] < 30]['name'])
Output:
['Kumar' 'Rao' 'Ali' 'Singh']
('Kumar', 25, 55.)
Singh
['Kumar' 'Singh']
Creating Structured Arrays
 Structured array data types can be specified in a number of ways.
Method 1: Dictionary method: We can create a structured array using a
compound data type specification:
struct = np.dtype({'names':('name', 'age', 'weight'),
'formats':('U10', 'i4', 'f8')})
Method2: Numerical types can be specified with Python types or NumPy
dtypes instead:
struct2 = np.dtype({'names':('name', 'age', 'weight'),
'formats':((np.str_, 10), int, np.float32)})
Method3: A compound type can also be specified as a list of tuples:
struct3 = np.dtype([('name', 'U10'), ('age', 'i4'), ('weight', 'f8')])
 Example:
import numpy as np

struct = np.dtype({'names':('name', 'age', 'weight'),


Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 24
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

'formats':('U10', 'i4', 'f8')})


data = np.zeros(4,struct)

struct2 = np.dtype({'names':('name', 'age', 'weight'),


'formats':((np.str_, 10), int, np.float32)})
data2 = np.zeros(4,struct2)

struct3 = np.dtype([('name', 'U10'), ('age', 'i4'), ('weight', 'f8')])


data3 = np.zeros(4,struct3)

name = ['Kumar', 'Rao', 'Ali', 'Singh']


age = [25, 45, 37, 19]
weight = [55.0, 85.5, 68.0, 61.5]

data['name'] = name
data['age'] = age
data['weight'] = weight

data2['name'] = name
data2['age'] = age
data2['weight'] = weight

data3['name'] = name
data3['age'] = age
data3['weight'] = weight

print(data)
print(data2)
print(data3)

Output:
[('Kumar', 25, 55. ) ('Rao', 45, 85.5) ('Ali', 37, 68. ) ('Singh', 19, 61.5)]
[('Kumar', 25, 55. ) ('Rao', 45, 85.5) ('Ali', 37, 68. ) ('Singh', 19, 61.5)]
[('Kumar', 25, 55. ) ('Rao', 45, 85.5) ('Ali', 37, 68. ) ('Singh', 19, 61.5)]
Record Arrays: Structured Arrays with a Twist
 NumPy also provides the np.recarray class, which is almost identical to the
structured arrays , but with one additional feature: fields can be accessed as
attributes rather than as dictionary keys.
 Recall can access the ages by writing: data['age']
Output: array([25, 45, 37, 19], dtype=int32).
 If we view our data as a record array instead, we can access this with
slightly fewer keystrokes:
Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 25
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

data_rec = data.view(np.recarray)
print(data_rec.age)
Output:array([25, 45, 37, 19], dtype=int32)
 The downside is that for record arrays, there is some extra overhead
involved in accessing the fields
 Example program:
import numpy as np
name = ['Kumar','Rao','Ali','Singh']
age = [25,45,37,19]
weight = [55.0,85.5,68.0,61.5]
struct = np.dtype({'names':('name','age','weight'),
'formats':('U10','i4','f8')})
data = np.zeros(4,struct)
data['name'] = name
data['age'] = age
data['weight'] = weight
print(data)
#accesing field as dictinary keys
print(data['age'])
# accesing field as attribute
data_rec = data.view(np.recarray)
print(data_rec.age)
Output:
[('Kumar', 25, 55. ) ('Rao', 45, 85.5) ('Ali', 37, 68. ) ('Singh', 19, 61.5)]
[25 45 37 19]
[25 45 37 19]

Tutorial Questions
1. Illustrate different categories of basic array manipulations with examples.
2. What are universal functions in NumPy array? Explain the different advanced
features of universal functions.
3. Discuss and demonstrate some of built-in aggregation functions in NumPy.
4. What is broadcasting in NumPy? Discuss the different rules of broadcasting with
examples
5. What is Boolean masking in NumPay? Explain with example.
6. What is fancy indexing in NumPy? Discuss and demonstrate the Fancy Indexing in
NumpPy.
7. Demonstrate the use of structured arrays and record arrays in NumpPy
8. How fancy indexing can be combined with other indexing schemes.
9. Illustrate different attributes of NumPy arrays with example.
10. Write short note on Computation on NumPy arrays

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 26
4-2 B.Tech IT Regulation: R19 Data Science: UNIT-3 Part-1

Assignment Questions:
1. Write a python program to demonstrate the Attributes of Arrays in NumpPy
2. Write a python program to demonstrate the Indexing of Arrays in NumpPy
3. Write a python program to demonstrate the Slicing of Arrays in NumpPy
4. Write a python program to demonstrate the Reshaping of Arrays in NumpPy
5. Write a python program to demonstrate the Joining and Splitting of Arrays in
NumpPy
6. Write a python program to demonstrate the Aggregation Universal Functions in
NumpPy
7. Write a python program to demonstrate the Broadcasting in NumpPy
8. Write a python program to demonstrate the Boolean Making in NumpPy
9. Write a python program to demonstrate the Fancy Indexing in NumpPy
10. Write a python program to demonstrate the use of structured arrays and record
arrays in NumpPy

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 27

You might also like