Introduction To NumPy
Introduction To NumPy
Introduction To NumPy
Data Science: is a branch of computer science where we study how to store, use and analyze
data for deriving information from it.
INTRODUCTION
NumPy is a Python library used for working with arrays. NumPy stands for Numerical
Python. NumPy was created in 2005 by Travis Oliphant. It is an open source project and you
can use it freely.
In Python we have lists that serve the purpose of arrays, but they are slow to process.
NumPy aims to provide an array object that is faster than traditional Python lists. The array
object in NumPy is called ndarray, it provides a lot of supporting functions that make working
with ndarray very easy.
---
Ex:
import numpy as np
x1 = np.random.randint(10, size=6)
x2 = np.random.randint(10, size=(3, 4))
x3 = np.random.randint(10, size=(3, 4, 5))
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)
print("dtype:", x3.dtype)
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")
output
x3 ndim: 3
x3 shape: (3, 4, 5)
x3 size: 60
dtype: int64
itemsize: 8 bytes
nbytes: 480 bytes
Array Indexing
Getting and setting the value of individual array elements.
Ex:
x1=np.array([5, 0, 3, 3, 7, 9])
x1[0] 5
To access the elements from the end of the array, you can use negative indices.
x1[-1] 9
x1[-2] 7
Array Slicing
Slicing is the process of getting and setting smaller subarrays from larger array. We can
use square brackets to access subarrays with the slice notation marked by the colon (:)
character.
x[start:stop:step]
Ex:
x = np.arange(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
x[4:7] [4, 5, 6]
x[::-1] [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
Multidimensional slices work in the same way, with multiple slices separated by commas.
x2[:2, :3]
[[12, 5, 2],
[ 7, 6, 8]]
Reshaping of arrays
Reshaping is changing the shape of a given array. reshape() method is used to reshape the
array. The size of the initial array must match the size of the reshaped array.
Ex:
grid = np.arange(1, 10).reshape((3, 3))
print(grid)
[[1 2 3]
[4 5 6]
[7 8 9]]
x = np.array([1, 2, 3])
x.reshape((3, 1))
Array Concatenation and Splitting
Concatenation is combining multiple arrays into one. Concatenation, or joining of two
arrays in NumPy, is primarily accomplished through the routines np.concatenate, np.vstack,
and np.hstack. np.concatenate.
Ex:
X = np.array([1, 2, 3])
y = np.array([3, 2, 1])
z = [99, 99, 99]
print(np.concatenate([x, y, z]))
[ 1 2 3 3 2 1 99 99 99]
print(lower)
[[ 8 9 10 11]
[12 13 14 15]]
---
Computation on NumPy arrays can be very fast, or it can be very slow. The key to
make it fast is to use vectorized operations which is generally implemented through
NumPy’s universal functions (ufuncs).
Universal functions in NumPy are simple mathematical functions. NumPy provides
various universal functions that cover a wide variety of operations. Ufuncs exist in two
flavors: unary ufuncs, which operate on a single input, and binary ufuncs, which operate on
two inputs. These functions include standard trigonometric functions, functions for
arithmetic operations, handling complex numbers, statistical functions, etc.
Trigonometric functions:
Function Description
sin, cos, tan compute sine, cosine and tangent of angles
arcsin, arccos, arctan calculate inverse sine, cosine and tangent
hypot calculate hypotenuse of given right triangle
sinh, cosh, tanh compute hyperbolic sine, cosine and tangent
arcsinh, arccosh, arctanh compute inverse hyperbolic sine, cosine and tangent
deg2rad convert degree into radians
rad2deg convert radians into degree
Ex:
# Python code to demonstrate trigonometric function
import numpy as np
# sine of angles
print('Sine of angles in the array:')
sine_value = np.sin(radians)
print(np.sin(radians))
Outputs:
Sine of angles in the array:
[ 0.00000000e+00 5.00000000e-01 7.07106781e-01 8.66025404e-01
1.00000000e+00 1.22464680e-16]
These functions are used to calculate mean, median, variance, minimum of array
elements. It includes functions like-
Function Description
amin, amax returns minimum or maximum of an array or along an axis
returns range of values (maximum-minimum) of an array or along an
ptp axis
percentile(a, p, axis) calculate pth percentile of array or along specified axis
median compute median of data along specified axis
mean compute mean of data along specified axis
std compute standard deviation of data along specified axis
var compute variance of data along specified axis
average compute average of data along specified axis
Ex:
# Python code demonstrate statistical function
import numpy as np
# percentile
print('Weight below which 70 % student fall: ')
print(np.percentile(weight, 70))
# mean
print('Mean weight of the students: ')
print(np.mean(weight))
# median
print('Median weight of the students: ')
print(np.median(weight))
# standard deviation
print('Standard deviation of weight of the students: ')
print(np.std(weight))
# variance
print('Variance of weight of the students: ')
print(np.var(weight))
# average
print('Average weight of the students: ')
print(np.average(weight))
Output:
Minimum and maximum weight of the students:
45.0 73.25
Bit-twiddling functions:
These functions accept integer values as input arguments and perform bitwise
operations on binary representations of those integers. It include functions like
Function Description
bitwise_and performs bitwise and operation on two array elements
bitwies_or performs bitwise or operation on two array elements
bitwise_xor performs bitwise xor operation on two array elements
invert performs bitwise inversion of an array elements
left_shift shift the bits of elements to left
right_shift shift the bits of elements to left
Ex
# Python code to demonstrate bitwise-function
import numpy as np
# bitwise_and
print('bitwise_and of two arrays: ')
print(np.bitwise_and(even, odd))
# bitwise_or
print('bitwise_or of two arrays: ')
print(np.bitwise_or(even, odd))
# bitwise_xor
print('bitwise_xor of two arrays: ')
print(np.bitwise_xor(even, odd))
# invert or not
print('inversion of even no. array: ')
print(np.invert(even))
# left_shift
print('left_shift of even no. array: ')
print(np.left_shift(even, 1))
# right_shift
print('right_shift of even no. array: ')
print(np.right_shift(even, 1))
Outputs
bitwise_and of two arrays:
[ 0 2 4 6 8 16 32]
---
AGGREGATIONS
When we have large amount of data, a first step is to compute summary statistics for
the data in question. The most common summary statistics are the mean and standard deviation,
which allow you to summarize the “typical” values in a dataset. The other aggregates are useful
such as the sum, product, median, minimum and maximum, etc. NumPy has fast built-in
aggregation functions for working on arrays.
(1.1717128136634614e-06, 0.9999976784968716)
Multidimensional Aggregates:
M = np.random.random((3, 4))
print(M)
[[ 0.8967576 0.03783739 0.75952519 0.06682827]
[ 0.8354065 0.99196818 0.19544769 0.43447084]
[ 0.66859307 0.15038721 0.37911423 0.6687194 ]]
Aggregation functions take an additional argument specifying the axis along which the
aggregate is computed. For example, we can find the minimum value within each column by
specifying axis=0.
M.min(axis=0)
array([ 0.66859307, 0.03783739, 0.19544769, 0.06682827])
---
BROADCASTING
Broadcasting is simply a set of rules for applying binary ufuncs (addition, subtraction,
multiplication, etc.) on arrays of different sizes. The term broadcasting refers to the ability of
NumPy to treat arrays of different shapes during arithmetic operations. Arithmetic operations
on arrays are usually done on corresponding elements. If two arrays are of exactly the same
shape, then these operations are smoothly performed.
Ex:
import numpy as np
a = np.array([1,2,3,4])
b = np.array([10,20,30,40])
c=a*b
print c
If the dimensions of two arrays are dissimilar, element-to-element operations are not
possible. However, operations on arrays of non-similar shapes is still possible in NumPy,
because of the broadcasting capability. The smaller array is broadcast to the size of the larger
array so that they have compatible shapes.
Ex 1:
a = np.array([0, 1, 2])
a+5
output: [5, 6, 7]
Ex 2:
M = np.ones((3, 3))
M= [[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]]
a = np.array([0, 1, 2])
M+a
Output: [[ 1., 2., 3.],
[ 1., 2., 3.],
[1., 2., 3.]]
Here the one-dimensional array ‘a’ is stretched, or broadcast across the second dimension in
order to match the shape of M.
Ex 3:
a = np.arange(3)
b = np.arange(3)[:, np.newaxis]
print(a)
print(b)
[0 1 2]
[[0]
[1]
[2]]
a+b
Output: [[0, 1, 2],
[1, 2, 3],
[2, 3, 4]]
Rules of Broadcasting:
• Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with
fewer dimensions is padded with ones on its leading (left) side.
• Rule 2: If the shape of the two arrays does not match in any dimension, the array with
shape equal to 1 in that dimension is stretched to match the other shape.
• Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.
Ex:
M.shape = (2, 3)
a.shape = (3,)
We see by rule 1 that the array a has fewer dimensions, so we pad it on the left with ones:
M.shape -> (2, 3)
a.shape -> (1, 3)
By rule 2, we now see that the first dimension disagrees, so we stretch this dimension to
match:
M.shape -> (2, 3)
a.shape -> (2, 3)
Ex:
An example in which the two arrays are not compatible:
M = np.ones((3, 2))
a = np.arange(3)
Here matrix M is transposed. How does this affect the calculation? The shapes of the arrays
are:
M.shape = (3, 2)
a.shape = (3,)
Again, rule 1 tells us that we must pad the shape of a with ones:
M.shape -> (3, 2)
a.shape -> (1, 3)
By rule 2, the first dimension of a is stretched to match that of M:
M.shape -> (3, 2)
a.shape -> (3, 3)
Now we hit rule 3—the final shapes do not match, so these two arrays are incompatible.
---
Comparison Operators:
NumPy provides various element-wise comparison operators that can compare the
elements of two NumPy arrays. The output of these comparison operators is also an array with
a boolean data type where each element is either true of false based on the array element's
comparison.
Here's a list of various comparison operators available in NumPy.
Operators Descriptions
<= (less than or returns True if element of the first array is less than
equal to) or equal to the second one
>= (greater than or returns True if element of the first array is greater
equal to) than or equal to the second one
Ex:
import numpy as np
array1 = np.array([1, 2, 3])
array2 = np.array([3, 2, 1])
# equal to operator
result3 = array1 == array2
print("array1 == array2:",result3) # Output: [False True False]
NumPy also provides built-in functions to perform all the comparison operations.
Just as in the case of arithmetic ufuncs, these will work on arrays of any size and shape.
Ex:
rng = np.random.RandomState(0)
x = rng.randint(10, size=(3, 4))
x
[[5, 0, 3, 3],
[7, 9, 3, 5],
[2, 4, 7, 6]]
x<6
Counting entries
To count the number of True entries in a Boolean array, np.count_nonzero is useful:
# how many values less than 6?
np.count_nonzero(x < 6)
8
Another way to get at this information is to use np.sum. The benefit of sum() is that the
summation can be done along rows or columns as well.
# how many values less than 6 in each row?
np.sum(x < 6, axis=1)
[4, 2, 2]
We can also check whether any or all the values are true by using np.any() or np.all().
Logical operators:
Logical operators perform Boolean algebra; a branch of algebra that deals with
True and False statements. Logical operations are performed element-wise. For example, if we
have two arrays x1 and x2 of the same shape, the output of the logical operator will also be an
array of the same shape.
Operators Descriptions
Ex:
import numpy as np
x1 = np.array([True, False, True])
x2 = np.array([False, False, True])
# Logical AND
print(np.logical_and(x1, x2)) # Output: [False False True]
# Logical OR
print(np.logical_or(x1, x2)) # Output: [ True False True]
# Logical NOT
print(np.logical_not(x1)) # Output: [False True False]
Here, array1 > 20 creates a boolean mask that evaluates to True for elements that are greater
than 20, and False for elements that are less than or equal to 20. The resulting mask is an
array stored in the boolean_mask variable as:
[False, True, False, True, True, True, False, False]
Ex:
import numpy as np
array1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) # create an array of numbers
boolean_mask = array1 % 2 != 0 # create a boolean mask
result = array1[boolean_mask] # boolean indexing to filter the odd numbers
print(result)
# Output: [ 1 3 5 7 9]
---
FANCY INDEXING
Generally we access the elements of array by using simple indeces, (ex: arr[0]). This
indexing can also be done in another style known as fancy indexing. In fancy indexing we
passing an array of indices to access multiple array elements at once.
Ex:
x = rand.randint(100, size=10)
print(x)
[51 92 14 71 60 20 82 86 74 74]
In normal indexing accessing can be done as:
[x[3], x[7], x[2]]
[71, 86, 14]
In fancy indexing:
Combined Indexing
Fancy indexing can be combined with the other indexing schemes.
x=
---
SORTING ARRAYS
Sorting the array is arranging the elements of array in some particular order. NumPy
got two functions np.sort and np.argsort which are more useful and efficient. To return a sorted
version of the array without modifying the input, you can use np.sort.
Ex:
If you want to sort the array in-place, you can use the sort method.
Another function is argsort, which returns the indices of the sorted elements.
Sorting along rows or columns
NumPy’s sorting function has the ability to sort along specific rows or columns of a
multidimensional array using the axis argument.
---
STRUCTURED DATA
NumPy’s structured arrays provides efficient storage for compound, heterogeneous
data.
Creating a structured array using a compound data type specification.
In structured arrays we can refer to values either by index or by name.
RecordArrays
NumPy also provides the np.recarray class, which is almost identical to the
structured arrays, but with one additional feature. That is the fields can be accessed as
attributes rather than as dictionary keys.
Ex:
---