Introduction To NumPy

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

NUMPY

Data Science: is a branch of computer science where we study how to store, use and analyze
data for deriving information from it.

INTRODUCTION

NumPy is a Python library used for working with arrays. NumPy stands for Numerical
Python. NumPy was created in 2005 by Travis Oliphant. It is an open source project and you
can use it freely.

Why Use NumPy?

In Python we have lists that serve the purpose of arrays, but they are slow to process.
NumPy aims to provide an array object that is faster than traditional Python lists. The array
object in NumPy is called ndarray, it provides a lot of supporting functions that make working
with ndarray very easy.

---

UNDERSTANDING DATA TYPES IN PYTHON


A Python Integer Is More Than Just an Integer
A single integer in Python 3.4 actually contains four pieces:

 ob_refcnt - a reference count that helps Python silently handle memory


allocation and deallocation.
 ob_type - which encodes the type of the variable.
 ob_size - which specifies the size of the following data members.
 ob_digit - which contains the actual integer value.
Here PyObject_HEAD is the part of the structure containing the reference count,
type code, and other information.

A Python List Is More Than Just a List


The standard mutable multielement container in Python is the list. We can create a list
of integers as:
L = list(range(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Creating Arrays from Python Lists


To create arrays from Python lists we use np.array.
np.array([1, 4, 2, 5, 3])
Unlike Python lists, NumPy is constrained to arrays that all contain the same type. If
types do not match, NumPy will upcast if possible.
np.array([3.14, 4, 2, 3])
[ 3.14, 4. , 2. , 3. ]

NumPy Standard Data Types


The standard NumPy data types are:
int_ , bool_ , intc , intp , int8 , int16 , int32 , uint8 , uint64 , float_ , float64 , complex_
---

THE BASICS OF NUMPY ARRAYS

Data manipulation in Python is similar with NumPy array manipulation. Categories of


basic array manipulations:

 NumPy Array Attributes


Attributes of array determining the size, shape, memory consumption, and data types.
ndim - the number of dimensions
shape - the size of each dimension
size - the total size of the array
dtype - the data type of the array
itemsize - size of each array element (in bytes)
nbytes - total size of the array (in bytes)

Ex:
import numpy as np
x1 = np.random.randint(10, size=6)
x2 = np.random.randint(10, size=(3, 4))
x3 = np.random.randint(10, size=(3, 4, 5))
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)
print("dtype:", x3.dtype)
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")

output
x3 ndim: 3
x3 shape: (3, 4, 5)
x3 size: 60
dtype: int64
itemsize: 8 bytes
nbytes: 480 bytes

 Array Indexing
Getting and setting the value of individual array elements.
Ex:
x1=np.array([5, 0, 3, 3, 7, 9])
x1[0]  5

To access the elements from the end of the array, you can use negative indices.
x1[-1]  9
x1[-2]  7
 Array Slicing
Slicing is the process of getting and setting smaller subarrays from larger array. We can
use square brackets to access subarrays with the slice notation marked by the colon (:)
character.
x[start:stop:step]

Ex:
x = np.arange(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
x[4:7]  [4, 5, 6]
x[::-1]  [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

Multidimensional slices work in the same way, with multiple slices separated by commas.

x2[:2, :3]
[[12, 5, 2],
[ 7, 6, 8]]

 Reshaping of arrays
Reshaping is changing the shape of a given array. reshape() method is used to reshape the
array. The size of the initial array must match the size of the reshaped array.
Ex:
grid = np.arange(1, 10).reshape((3, 3))
print(grid)
[[1 2 3]
[4 5 6]
[7 8 9]]

x = np.array([1, 2, 3])
x.reshape((3, 1))
 Array Concatenation and Splitting
Concatenation is combining multiple arrays into one. Concatenation, or joining of two
arrays in NumPy, is primarily accomplished through the routines np.concatenate, np.vstack,
and np.hstack. np.concatenate.

Ex:
X = np.array([1, 2, 3])
y = np.array([3, 2, 1])
z = [99, 99, 99]
print(np.concatenate([x, y, z]))
[ 1 2 3 3 2 1 99 99 99]

The opposite of concatenation is splitting, which is implemented by the functions


np.split, np.hsplit, and np.vsplit.
Ex:
[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]]
upper, lower = np.vsplit(grid, [2])
print(upper)
[[0 1 2 3]
[4 5 6 7]]

print(lower)
[[ 8 9 10 11]
[12 13 14 15]]
---

COMPUTATION ON NUMPY ARRAYS: UNIVERSAL FUNCTIONS

Computation on NumPy arrays can be very fast, or it can be very slow. The key to
make it fast is to use vectorized operations which is generally implemented through
NumPy’s universal functions (ufuncs).
Universal functions in NumPy are simple mathematical functions. NumPy provides
various universal functions that cover a wide variety of operations. Ufuncs exist in two
flavors: unary ufuncs, which operate on a single input, and binary ufuncs, which operate on
two inputs. These functions include standard trigonometric functions, functions for
arithmetic operations, handling complex numbers, statistical functions, etc.

Universal functions have various characteristics which are as follows-

 These functions operates on ndarray (N-dimensional array) i.e Numpy’s array


class.
 It performs fast element-wise array operations.
 It supports various features like array broadcasting, type casting etc.
 Numpy, universal functions are objects those belongs to numpy.ufunc class.
 Python functions can also be created as a universal function using
frompyfunc library function.
 Some ufuncs are called automatically when the corresponding arithmetic
operator is used on arrays. For example when addition of two array is performed
element-wise using ‘+’ operator then np.add() is called internally.

Some of the basic universal functions in Numpy are-

Trigonometric functions:

These functions work on radians, so angles need to be converted to radians by


multiplying by pi/180. Only then we can call trigonometric functions. They take an array as
input arguments. It includes functions like-

Function Description
sin, cos, tan compute sine, cosine and tangent of angles
arcsin, arccos, arctan calculate inverse sine, cosine and tangent
hypot calculate hypotenuse of given right triangle
sinh, cosh, tanh compute hyperbolic sine, cosine and tangent
arcsinh, arccosh, arctanh compute inverse hyperbolic sine, cosine and tangent
deg2rad convert degree into radians
rad2deg convert radians into degree

Ex:
# Python code to demonstrate trigonometric function
import numpy as np

# create an array of angles


angles = np.array([0, 30, 45, 60, 90, 180])

# conversion of degree into radians


# using deg2rad function
radians = np.deg2rad(angles)

# sine of angles
print('Sine of angles in the array:')
sine_value = np.sin(radians)
print(np.sin(radians))

# inverse sine of sine values


print('Inverse Sine of sine values:')
print(np.rad2deg(np.arcsin(sine_value)))

# hyperbolic sine of angles


print('Sine hyperbolic of angles in the array:')
sineh_value = np.sinh(radians)
print(np.sinh(radians))

# inverse sine hyperbolic


print('Inverse Sine hyperbolic:')
print(np.sin(sineh_value))

# hypot function demonstration


base = 4
height = 3
print('hypotenuse of right triangle is:')
print(np.hypot(base, height))

Outputs:
Sine of angles in the array:
[ 0.00000000e+00 5.00000000e-01 7.07106781e-01 8.66025404e-01
1.00000000e+00 1.22464680e-16]

Inverse Sine of sine values:


[ 0.00000000e+00 3.00000000e+01 4.50000000e+01 6.00000000e+01
9.00000000e+01 7.01670930e-15]

Sine hyperbolic of angles in the array:


[ 0. 0.54785347 0.86867096 1.24936705 2.3012989
11.54873936]

Inverse Sine hyperbolic:


[ 0. 0.52085606 0.76347126 0.94878485 0.74483916 -0.85086591]

hypotenuse of right triangle is: 5.0


Statistical functions:

These functions are used to calculate mean, median, variance, minimum of array
elements. It includes functions like-

Function Description
amin, amax returns minimum or maximum of an array or along an axis
returns range of values (maximum-minimum) of an array or along an
ptp axis
percentile(a, p, axis) calculate pth percentile of array or along specified axis
median compute median of data along specified axis
mean compute mean of data along specified axis
std compute standard deviation of data along specified axis
var compute variance of data along specified axis
average compute average of data along specified axis

Ex:
# Python code demonstrate statistical function
import numpy as np

# construct a weight array


weight = np.array([50.7, 52.5, 50, 58, 55.63, 73.25, 49.5, 45])

# minimum and maximum


print('Minimum and maximum weight of the students: ')
print(np.amin(weight), np.amax(weight))

# range of weight i.e. max weight-min weight


print('Range of the weight of the students: ')
print(np.ptp(weight))

# percentile
print('Weight below which 70 % student fall: ')
print(np.percentile(weight, 70))

# mean
print('Mean weight of the students: ')
print(np.mean(weight))
# median
print('Median weight of the students: ')
print(np.median(weight))

# standard deviation
print('Standard deviation of weight of the students: ')
print(np.std(weight))

# variance
print('Variance of weight of the students: ')
print(np.var(weight))

# average
print('Average weight of the students: ')
print(np.average(weight))
Output:
Minimum and maximum weight of the students:
45.0 73.25

Range of the weight of the students:


28.25

Weight below which 70 % student fall:


55.317

Mean weight of the students:


54.3225

Median weight of the students:


51.6

Standard deviation of weight of the students:


8.05277397857

Variance of weight of the students:


64.84716875
Average weight of the students:
54.3225

Bit-twiddling functions:

These functions accept integer values as input arguments and perform bitwise
operations on binary representations of those integers. It include functions like

Function Description
bitwise_and performs bitwise and operation on two array elements
bitwies_or performs bitwise or operation on two array elements
bitwise_xor performs bitwise xor operation on two array elements
invert performs bitwise inversion of an array elements
left_shift shift the bits of elements to left
right_shift shift the bits of elements to left

Ex
# Python code to demonstrate bitwise-function
import numpy as np

# construct an array of even and odd numbers


even = np.array([0, 2, 4, 6, 8, 16, 32])
odd = np.array([1, 3, 5, 7, 9, 17, 33])

# bitwise_and
print('bitwise_and of two arrays: ')
print(np.bitwise_and(even, odd))

# bitwise_or
print('bitwise_or of two arrays: ')
print(np.bitwise_or(even, odd))

# bitwise_xor
print('bitwise_xor of two arrays: ')
print(np.bitwise_xor(even, odd))

# invert or not
print('inversion of even no. array: ')
print(np.invert(even))
# left_shift
print('left_shift of even no. array: ')
print(np.left_shift(even, 1))

# right_shift
print('right_shift of even no. array: ')
print(np.right_shift(even, 1))

Outputs
bitwise_and of two arrays:
[ 0 2 4 6 8 16 32]

bitwise_or of two arrays:


[ 1 3 5 7 9 17 33]

bitwise_xor of two arrays:


[1 1 1 1 1 1 1]

inversion of even no. array:


[ -1 -3 -5 -7 -9 -17 -33]

left_shift of even no. array:


[ 0 4 8 12 16 32 64]

right_shift of even no. array:


[ 0 1 2 3 4 8 16]

---

AGGREGATIONS
When we have large amount of data, a first step is to compute summary statistics for
the data in question. The most common summary statistics are the mean and standard deviation,
which allow you to summarize the “typical” values in a dataset. The other aggregates are useful
such as the sum, product, median, minimum and maximum, etc. NumPy has fast built-in
aggregation functions for working on arrays.

Summing the values in an array:


In Python the built-in sum function is used to compute all the values in an array. The
syntax is quite similar to that of NumPy’s sum function.
Ex: import numpy as np
L = np.random.random(100)
np.sum(L)

Minimum and Maximum:


To find the minimum value and maximum value of an array NumPy’s min and max
functions are used.
Ex: np.min(big_array), np.max(big_array)

(1.1717128136634614e-06, 0.9999976784968716)

Multidimensional Aggregates:

M = np.random.random((3, 4))
print(M)
[[ 0.8967576 0.03783739 0.75952519 0.06682827]
[ 0.8354065 0.99196818 0.19544769 0.43447084]
[ 0.66859307 0.15038721 0.37911423 0.6687194 ]]

Aggregation functions take an additional argument specifying the axis along which the
aggregate is computed. For example, we can find the minimum value within each column by
specifying axis=0.
M.min(axis=0)
array([ 0.66859307, 0.03783739, 0.19544769, 0.06682827])

Similarly, we can find the maximum value within each row:


M.max(axis=1)
array([ 0.8967576 , 0.99196818, 0.6687194 ])
Other aggregation functions:

Function Name NaN-safe Version Description


np.sum np.nansum Compute sum of elements
np.prod np.nanprod Compute product of elements
np.mean np.nanmean Compute median of elements
np.std np.nanstd Compute standard deviation
np.var np.nanvar Compute variance
np.min np.nanmin Find minimum value
np.max np.nanmax Find maximum value
np.argmin np.nanargmin Find index of minimum value
np.argmax np.nanargmax Find index of maximum value
np.median np.nanmedian Compute median of elements
np.percentile np.nanpercentile Compute rank-based statistics of elements
np.any N/A Evaluate whether any elements are true
np.all N/A Evaluate whether all elements are true

---

BROADCASTING
Broadcasting is simply a set of rules for applying binary ufuncs (addition, subtraction,
multiplication, etc.) on arrays of different sizes. The term broadcasting refers to the ability of
NumPy to treat arrays of different shapes during arithmetic operations. Arithmetic operations
on arrays are usually done on corresponding elements. If two arrays are of exactly the same
shape, then these operations are smoothly performed.

Ex:

import numpy as np

a = np.array([1,2,3,4])
b = np.array([10,20,30,40])
c=a*b
print c

Its output is as follows −


[10 40 90 160]

If the dimensions of two arrays are dissimilar, element-to-element operations are not
possible. However, operations on arrays of non-similar shapes is still possible in NumPy,
because of the broadcasting capability. The smaller array is broadcast to the size of the larger
array so that they have compatible shapes.

Ex 1:
a = np.array([0, 1, 2])
a+5
output: [5, 6, 7]

Ex 2:

Adding a one-dimensional array to a two-dimensional array:

M = np.ones((3, 3))
M= [[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]]

a = np.array([0, 1, 2])
M+a
Output: [[ 1., 2., 3.],
[ 1., 2., 3.],
[1., 2., 3.]]

Here the one-dimensional array ‘a’ is stretched, or broadcast across the second dimension in
order to match the shape of M.
Ex 3:
a = np.arange(3)
b = np.arange(3)[:, np.newaxis]
print(a)
print(b)
[0 1 2]
[[0]
[1]
[2]]

a+b
Output: [[0, 1, 2],
[1, 2, 3],
[2, 3, 4]]

Rules of Broadcasting:

Broadcasting in NumPy follows a strict set of rules to determine the interaction


between the two arrays:

• Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with
fewer dimensions is padded with ones on its leading (left) side.

• Rule 2: If the shape of the two arrays does not match in any dimension, the array with
shape equal to 1 in that dimension is stretched to match the other shape.

• Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

Ex:

M.shape = (2, 3)
a.shape = (3,)
We see by rule 1 that the array a has fewer dimensions, so we pad it on the left with ones:
M.shape -> (2, 3)
a.shape -> (1, 3)

By rule 2, we now see that the first dimension disagrees, so we stretch this dimension to
match:
M.shape -> (2, 3)
a.shape -> (2, 3)
Ex:
An example in which the two arrays are not compatible:
M = np.ones((3, 2))
a = np.arange(3)
Here matrix M is transposed. How does this affect the calculation? The shapes of the arrays
are:
M.shape = (3, 2)
a.shape = (3,)
Again, rule 1 tells us that we must pad the shape of a with ones:
M.shape -> (3, 2)
a.shape -> (1, 3)
By rule 2, the first dimension of a is stretched to match that of M:
M.shape -> (3, 2)
a.shape -> (3, 3)
Now we hit rule 3—the final shapes do not match, so these two arrays are incompatible.

---

COMPARISONS, MASKS AND BOOLEAN LOGIC

Comparison Operators:
NumPy provides various element-wise comparison operators that can compare the
elements of two NumPy arrays. The output of these comparison operators is also an array with
a boolean data type where each element is either true of false based on the array element's
comparison.
Here's a list of various comparison operators available in NumPy.

Operators Descriptions

returns True if element of the first array is less than


< (less than)
the second one

<= (less than or returns True if element of the first array is less than
equal to) or equal to the second one

returns True if element of the first array is greater


> (greater than)
than the second one

>= (greater than or returns True if element of the first array is greater
equal to) than or equal to the second one

returns True if the element of the first array is


== (equal to)
equal to the second one

returns True if the element of the first array is not


!= (not equal to)
equal to the second one

Ex:
import numpy as np
array1 = np.array([1, 2, 3])
array2 = np.array([3, 2, 1])

# less than operator


result1 = array1 < array2
print("array1 < array2:",result1) # Output: [ True False False]

# less than or equal to operator


result11 = array1 <= array2
print(“array1 <= array2:”, result11) # Output: [ True True False]

# greater than operator


result2 = array1 > array2
print("array1 > array2:",result2) # Output: [False False True]
# greater than or equal to operator
result22 = array1 >= array2
print(“array1 <= array2:”, result22) #Output: [False True True]

# equal to operator
result3 = array1 == array2
print("array1 == array2:",result3) # Output: [False True False]

# not equal to operator


result33 = array1 == array2
print("array1 != array2:",result33) #Output: [ True False True]

NumPy also provides built-in functions to perform all the comparison operations.

Ex: result = np.less(array1, array2)


print("Using less():",result) #Output: [ True False False]

Just as in the case of arithmetic ufuncs, these will work on arrays of any size and shape.

Ex:
rng = np.random.RandomState(0)
x = rng.randint(10, size=(3, 4))
x

[[5, 0, 3, 3],
[7, 9, 3, 5],
[2, 4, 7, 6]]

x<6

[[ True, True, True, True],


[False, False, True, True],
[ True, True, False, False]]

Counting entries
To count the number of True entries in a Boolean array, np.count_nonzero is useful:
# how many values less than 6?
np.count_nonzero(x < 6)
8
Another way to get at this information is to use np.sum. The benefit of sum() is that the
summation can be done along rows or columns as well.
# how many values less than 6 in each row?
np.sum(x < 6, axis=1)
[4, 2, 2]

We can also check whether any or all the values are true by using np.any() or np.all().

# are there any values greater than 8?


np.any(x > 8)
True
# are there any values less than zero?
np.any(x < 0)
False
# are all values less than 10?
np.all(x < 10)
True
# are all values equal to 6?
np.all(x == 6)
False

Logical operators:
Logical operators perform Boolean algebra; a branch of algebra that deals with
True and False statements. Logical operations are performed element-wise. For example, if we
have two arrays x1 and x2 of the same shape, the output of the logical operator will also be an
array of the same shape.

List of various logical operators available in NumPy:

Operators Descriptions

logical_and Computes the element-wise truth value of x1 AND x2

logical_or Computes the element-wise truth value of x1 OR x2

logical_not Computes the element-wise truth value of NOT x

Ex:
import numpy as np
x1 = np.array([True, False, True])
x2 = np.array([False, False, True])

# Logical AND
print(np.logical_and(x1, x2)) # Output: [False False True]

# Logical OR
print(np.logical_or(x1, x2)) # Output: [ True False True]

# Logical NOT
print(np.logical_not(x1)) # Output: [False True False]

Boolean Arrays as Masks

Masking is the process of extract, modify, count, or otherwise manipulate values in an


array based on some criterion. For example, you might wish to count all values greater than a
certain. Boolean mask is a numpy array containing truth values (True/False) that correspond to
each element in the array.
Ex:

array1 = np.array([12, 24, 16, 21, 32, 29, 7, 15])


Boolean_mask = array1 > 20

Here, array1 > 20 creates a boolean mask that evaluates to True for elements that are greater
than 20, and False for elements that are less than or equal to 20. The resulting mask is an
array stored in the boolean_mask variable as:
[False, True, False, True, True, True, False, False]

Ex:

import numpy as np
array1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) # create an array of numbers
boolean_mask = array1 % 2 != 0 # create a boolean mask
result = array1[boolean_mask] # boolean indexing to filter the odd numbers
print(result)

# Output: [ 1 3 5 7 9]

---

FANCY INDEXING
Generally we access the elements of array by using simple indeces, (ex: arr[0]). This
indexing can also be done in another style known as fancy indexing. In fancy indexing we
passing an array of indices to access multiple array elements at once.
Ex:
x = rand.randint(100, size=10)
print(x)
[51 92 14 71 60 20 82 86 74 74]
In normal indexing accessing can be done as:
[x[3], x[7], x[2]]
[71, 86, 14]

In fancy indexing:

Fancy indexing also works in multiple dimensions.


Ex:

Combined Indexing
Fancy indexing can be combined with the other indexing schemes.
x=

We can combine fancy and simple indices:


X[2, [2, 0, 1]]
[10, 8, 9]

We can also combine fancy indexing with slicing and masking.


Modifying Values with Fancy Indexing
Fancy indexing can also be used to modify parts of an array.

---

SORTING ARRAYS

Sorting the array is arranging the elements of array in some particular order. NumPy
got two functions np.sort and np.argsort which are more useful and efficient. To return a sorted
version of the array without modifying the input, you can use np.sort.
Ex:

If you want to sort the array in-place, you can use the sort method.

Another function is argsort, which returns the indices of the sorted elements.
Sorting along rows or columns
NumPy’s sorting function has the ability to sort along specific rows or columns of a
multidimensional array using the axis argument.

Partial Sorts: Partitioning


Sometimes it is not necessary to sort the entire array, but simply want to find the K
smallest values in the array. NumPy provides this in the np.partition function. ‘np.partition’
takes an array and a number K. The result is a new array with the smallest K values to the left
of the partition, and the remaining values to the right, in arbitrary order.

Just as there is a np.argsort that computes indices of the sort, there is a


np.argpartition that computes indices of the partition.

---

STRUCTURED DATA
NumPy’s structured arrays provides efficient storage for compound, heterogeneous
data.
Creating a structured array using a compound data type specification.
In structured arrays we can refer to values either by index or by name.

Data type representation in structured data.

RecordArrays
NumPy also provides the np.recarray class, which is almost identical to the
structured arrays, but with one additional feature. That is the fields can be accessed as
attributes rather than as dictionary keys.
Ex:
---

You might also like