Num Py
Num Py
Num Py
org/doc/numpy/reference/
NumPy
Data types
NumPy supports a much greater variety of numerical types than Python does. This section shows which are available, and
how to modify an array’s data-type.
Data type Description
bool_ Boolean (True or False) stored as a byte
int_ Default integer type (same as C long; normally either int64 or int32)
intc Identical to C int (normally int32 or int64)
intp Integer used for indexing (same as C ssize_t; normally either int32 or int64)
int8 Byte (-128 to 127)
int16 Integer (-32768 to 32767)
int32 Integer (-2147483648 to 2147483647)
int64 Integer (-9223372036854775808 to 9223372036854775807)
uint8 Unsigned integer (0 to 255)
uint16 Unsigned integer (0 to 65535)
uint32 Unsigned integer (0 to 4294967295)
uint64 Unsigned integer (0 to 18446744073709551615)
float_ Shorthand for float64.
float16 Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
float32 Single precision float: sign bit, 8 bits exponent, 23 bits mantissa
float64 Double precision float: sign bit, 11 bits exponent, 52 bits mantissa
complex_ Shorthand for complex128.
complex64 Complex number, represented by two 32-bit floats (real and imaginary components)
complex128 Complex number, represented by two 64-bit floats (real and imaginary components)
Additionally to intc the platform dependent C integer types short, long, longlong and their unsigned versions are
defined.
NumPy numerical types are instances of dtype (data-type) objects, each having unique characteristics. Once you have
imported NumPy using
The following corresponds to the usual functions except that nans are excluded from the results:
nansum()
nanmax()
nanmin()
nanargmax()
nanargmin()
>>> x = np.arange(10.)
>>> x[3] = np.nan
>>> x.sum()
nan
>>> np.nansum(x)
42.0
These behaviors can be set for all kinds of errors or specific ones:
Note that integer divide-by-zero is handled by the same machinery. These behaviors are set on a per-thread basis.
Arrays
At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data
types, with many operations being performed in compiled code for performance. There are several important differences
between NumPy arrays and the standard Python sequences:
NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an
ndarray will create a new array and delete the original.
The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in
memory. The exception: one can have arrays of (Python, including NumPy) objects, thereby allowing for arrays of
different sized elements.
NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically,
such operations are executed more efficiently and with less code than is possible using Python’s built-in sequences.
A growing plethora of scientific and mathematical Python-based packages are using NumPy arrays; though these
typically support Python-sequence input, they convert such input to NumPy arrays prior to processing, and they often
output NumPy arrays. In other words, in order to efficiently use much (perhaps even most) of today’s
scientific/mathematical Python-based software, just knowing how to use Python’s built-in sequence types is
insufficient - one also needs to know how to use NumPy arrays.
NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the
same type, indexed by a tuple of positive integers. In NumPy dimensions are called axes.
For example, the coordinates of a point in 3D space [1, 2, 1] has one axis. That axis has 3 elements in it, so we say it
has a length of 3. In the example pictured below, the array has 2 axes. The first axis has a length of 2, the second axis has a
length of 3.
[[ 1., 0., 0.],
[ 0., 1., 2.]]
NumPy’s array class is called ndarray. It is also known by the alias array. Note that numpy.array is not the same
as the Standard Python Library class array.array, which only handles one-dimensional arrays and offers less
functionality. The more important attributes of an ndarray object are:
import numpy as np
#rank1 np.array is a vector
an_array = np.array([3, 33, 333]) # Create a rank 1 array
print(type(an_array)) # The type of an ndarray is: "<class
'numpy.ndarray'>"
# test the shape of the array we just created, it should have just one dimension (Rank
1)
print(an_array.shape)
#for a vector returns 1 dimension which is vector length
(3,)
# because this is a 1-rank array, we need only one index to accesss each element
print(an_array[0], an_array[1], an_array[2])
>>3 33 333
an_array[0] =888 # ndarrays are mutable, here we change an element of the
array
print(an_array)
>>[888 33 333]
>>[[11 12 13]
[21 22 23]]
>>The shape is 2 rows, 3 columns: (2, 3)
>>Accessing elements [0,0], [0,1], and [1,0] of the ndarray: 11 , 12 , 21
Array Creation
There are several ways to create arrays. For example, you can create an array from a regular Python list or tuple using the
array function. The type of the resulting array is deduced from the type of the elements in the sequences.
>>> a = np.array([2,3,4])
A frequent error consists in calling array with multiple numeric arguments, rather than providing a single list of
numbers as an argument.
array transforms sequences of sequences into two-dimensional arrays, sequences of sequences of sequences into three-
dimensional arrays, and so on.
Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy offers several functions to
create arrays with initial placeholder content. These minimize the necessity of growing arrays, an expensive operation.
The function zeros creates an array full of zeros, the function ones creates an array full of ones, and the function empty
creates an array whose initial content is random and depends on the state of the memory. By default, the dtype of the
created array is float64.
import numpy as np
# create a 2x2 array of zeros
ex1 = np.zeros((2,2))
print(ex1)
[[ 0. 0.]
[ 0. 0.]]
[[ 9. 9.]
[ 9. 9.]]
# notice that the above ndarray (ex4) is actually rank 2, it is a 1x2 array
print(ex4.shape)
#>>(1,2)
np.arrange() - To create sequences of numbers, NumPy provides a function analogous to range that returns arrays
instead of lists.
When arange is used with floating point arguments, it is generally not possible to predict the number of elements
obtained, due to the finite floating point precision. For this reason, it is usually better to use the function linspace that
receives as an argument the number of elements that we want, instead of the step:
Example:
import numpy as np
from numpy import pi
One-dimensional arrays are then printed as rows, bidimensionals as matrices and tridimensionals as lists of matrices.
>>> print(np.arange(10000))
[ 0 1 2 ..., 9997 9998 9999]
>>>
>>> print(np.arange(10000).reshape(100,100))
[[ 0 1 2 ..., 97 98 99]
[ 100 101 102 ..., 197 198 199]
[ 200 201 202 ..., 297 298 299]
...,
[9700 9701 9702 ..., 9797 9798 9799]
[9800 9801 9802 ..., 9897 9898 9899]
[9900 9901 9902 ..., 9997 9998 9999]]
To disable this behaviour and force NumPy to print the entire array, you can change the printing options using
set_printoptions.
>>> np.set_printoptions(threshold=np.nan)
Basic Operations
Arithmetic operators on arrays apply elementwise. A new array is created and filled with the result.
a = np.array( [20,30,40,50] ); a
b = np.arange( 4 ) ; b #array([0, 1, 2, 3])
#kazdy element wektora jest odjety od siebie, a[1] - b[1], a[2] - b[2], itd
c = a-b; c #array([20, 29, 38, 47])
#mnozenie wektora przez liczbe
#b[1] *2, b[2] * 2, itd.
b*2
#podnoszenie do potegi elementow wektora
#b[1] ** 2, b[2] ** 2, itd
b**2
#array([0, 1, 4, 9])
10*np.sin(a)
#porownanie elemtow wektora do liczby - zwraca wektor T albo F
a<35
#array([ True, True, False, False])
Unlike in many matrix languages, the product operator * operates elementwise in NumPy arrays. The matrix product
can be performed using the @ operator (in python >=3.5) or the dot function or method:
A = np.array( [[1,1],
[0,1]] )
B = np.array( [[2,0],
[3,4]] )
#mozenie elemtow array przez siebie
A * B
#a[1] * b[1]; a[2] * b[2], a[3] * b[3]
array([[2, 0],
[0, 4]])
Some operations, such as += and *=, act in place to modify an existing array rather
than create a new one.
a *= 3
a
array([[3, 3, 3],
[3, 3, 3]])
b += a # a was converted to float
b
array([[ 3.87491218, 3.4993892 , 3.23323 ],
[ 3.5179975 , 3.91004357, 3.7154386 ]])
When operating with arrays of different types, the type of the resulting array corresponds to the more general or precise
one (a behavior known as upcasting).
Many unary operations, such as computing the sum of all the elements in the array, are implemented as methods of the
ndarray class.
>>> a = np.random.random((2,3))
>>> a
array([[ 0.18626021, 0.34556073, 0.39676747],
[ 0.53881673, 0.41919451, 0.6852195 ]])
>>> a.sum()
2.5718191614547998
>>> a.min()
0.1862602113776709
>>> a.max()
0.6852195003967595
By default, these operations apply to the array as though it were a list of numbers, regardless of its shape. However, by
specifying the axis parameter you can apply an operation along the specified axis of an array:
>>> b = np.arange(12).reshape(3,4)
>>> b
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> b.sum(axis=0) # sum of each column
array([12, 15, 18, 21])
NumPy provides familiar mathematical functions such as sin, cos, and exp. In NumPy, these are called “universal
functions”(ufunc). Within NumPy, these functions operate elementwise on an array, producing an array as output.
>>> B = np.arange(3)
>>> B
array([0, 1, 2])
>>> np.exp(B)
array([ 1. , 2.71828183, 7.3890561 ])
>>> np.sqrt(B)
array([ 0. , 1. , 1.41421356])
>>> C = np.array([2., -1., 4.])
>>> np.add(B, C)
array([ 2., 0., 6.])
-3.80436928566
[-8.74964934 4.94528006]
[-29.07022182 2.80457625 3.49679545 22.89842902 -3.93394818]
>>> a = np.arange(10)**3
>>> a
array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729])
>>> a[2]
8
>>> a[2:5]
array([ 8, 27, 64])
>>> a[:6:2] = -1000 # equivalent to a[0:6:2] = -1000; from start to position 6, exclusive, se
t every 2nd element to -1000
>>> a
array([-1000, 1, -1000, 27, -1000, 125, 216, 343, 512, 729])
>>> a[ : :-1] # reversed a
array([ 729, 512, 343, 216, 125, -1000, 27, -1000, 1, -1000])
>>> for i in a:
... print(i**(1/3.))
...
Nan #pierwiatek z liczby ujemnej
1.0
Nan #pierwiatek z liczby ujemnej
3.0
Nan #pierwiatek z liczby ujemnej
5.0
6.0
7.0
8.0
9.0
Multidimensional arrays can have one index per axis. These indices are given in a tuple separated by commas:
d=np.arange(20).reshape(5,4); d
#d= np.fromfunction(f,(5,4),dtype=int); d
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]])
The expression within brackets in b[i] is treated as an i followed by as many instances of : as needed to represent the
remaining axes. NumPy also allows you to write this using dots as b[i,...].
The dots (...) represent as many colons as needed to produce a complete indexing tuple. For example, if x is an array
with 5 axes, then
y_1 = np.array([11,22,33,44,55])
#dla wartosci filter T bierzemy element z x_1, dla wartosci F bierzemy element z y_1
out = np.where(filter, x_1, y_1)
print(out)
>>[ 1 22 3 44 5]
mat = np.random.rand(5,5)
mat
array([[ 0.52947625, 0.41875705, 0.29102789, 0.91510274, 0.66849413],
[ 0.69031262, 0.94227646, 0.19339872, 0.49799366, 0.46822884],
[ 0.23988332, 0.31886198, 0.17293111, 0.17862612, 0.28943965],
[ 0.39178129, 0.2280181 , 0.38454648, 0.13993122, 0.34982648],
[ 0.94357092, 0.61724024, 0.70305995, 0.5601409 , 0.01207867]])
New object (array) created as slice operates on the same data set as orginal array !!!
an_array = np.array([[11,12,13,14], [21,22,23,24], [31,32,33,34]])
a_slice = an_array[:2, 1:3] #first 2 rows and columns from 2nd to 4th exclusive
a_slice[0, 0] = 1000 #assigning new value the element of a_slice modifies the an_array
as well
New object created as array of the slice creates separate data set and the modification
does not changes orgianl data set
Slice indexing:
Similar to the use of slice indexing with lists and strings, we can use slice indexing
to pull out sub-regions of ndarrays.
import numpy as np
[[11 12 13 14]
[21 22 23 24]
[31 32 33 34]]
#slice of the array creates new object which actually operates on the same data
a_slice = an_array[:2, 1:3] #first 2 rows and columns from 2nd to 4th exclusive
print(a_slice)
[[12 13]
[22 23]]
When you modify a slice, you actually modify the underlying array.
#modifying the value in a_slice object would modify the value in the an_array as well
as they both works on the same data
#indieces used in a_slice and an_array might be different but they refer to the same
data
print("Before:", an_array[0, 1]) #inspect the element at 0, 1
a_slice[0, 0] = 1000 # a_slice[0, 0] is the same piece of data as an_array[0, 1]
print("After:", an_array[0, 1])
>>Before: 12
>>After: 1000
#to create new object with new data set we need to use slicing in following way
a_slice2 = np.array(an_array[:2, 1:3])
print(a_slice2)
[[1000 13]
[ 22 23]]
for row in e:
print(row)
[[ 0 1 2]
[10 12 13]]
[[100 101 102]
[110 112 113]]
for row in d:
print(row)
[0 1 2 3]
[4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]
However, if one wants to perform an operation on each element in the array, one can use the flat attribute which is an
iterator over all the elements of the array:
0
1
2
10
12
13
100
101
102
110
112
113
Array indexing
[[11 12 13 14]
[21 22 23 24]
[31 32 33 34]]
# Using both integer indexing & slicing generates an array of lower rank
row_rank1 = an_array[1, :] # Rank 1 view, we are selecting the signle row of
an_array
[>>[12]
[22]
[32]] (3, 1)
print('Original Array:')
print(an_array)
Original Array:
[[11 12 13]
[21 22 23]
[31 32 33]
[41 42 43]]
row_indices = np.arange(4)
print('\nRows indices picked : ', row_indices)
# Examine the pairings of row_indices and col_indices. These are the elements we'll
change next.
for row,col in zip(row_indices,col_indices):
print(row, ", ",col)
0 , 0
1 , 1
2 , 2
3 , 0
# Change one element from each row using the indices selected
an_array[row_indices, col_indices] += 100000
print('\nChanged Array:')
print(an_array)
Changed Array:
[[100011 12 13]
[ 21 100022 23]
[ 31 32 100033]
[100041 42 43]
Boolean Indexing
# create a 3x2 array
an_array = np.array([[11,12], [21, 22], [31, 32]])
print(an_array)
[[11 12]
[21 22]
[31 32]]
# create a filter which will be boolean values for whether each element meets this
condition
filter = (an_array > 15)
print(filter)
[[False False]
[ True True]
[ True True]]
Notice that the filter is a same size ndarray as an_array which is filled with True
for each element whose corresponding element in an_array which is greater than 15 and
False for those elements whose value is less than 15.
# we can now select just those elements which meet that criteria
print(an_array[filter])
print("To samo mozemy uzyskac")
print(an_array[an_array>15])
# For short, we could have just used the approach below without the need for the
separate filter array.
an_array[(an_array % 2 == 0)]
>>array([12, 22, 32])
What is particularly useful is that we can actually change elements in the array
applying a similar logical filter. Let's add 100 to all the even values
an_array[an_array % 2 == 0] +=100
print(an_array)
[[ 11 112]
[ 21 122]
[ 31 132]]
>>> a = np.floor(10*np.random.random((3,4)))
>>> a
array([[ 2., 8., 0., 6.],
[ 4., 5., 1., 1.],
[ 8., 9., 3., 6.]])
>>> a.shape
(3, 4)
The shape of an array can be changed with various commands. Note that the following three commands all return a
modified array, but do not change the original array:
If a dimension is given as -1 in a reshaping operation, the other dimensions are automatically calculated:
>>> a.reshape(3,-1)
array([[ 2., 8., 0., 6.],
[ 4., 5., 1., 1.],
[ 8., 9., 3., 6.]])
Sorting
print(sorted)
print()
print(unsorted)
[-2.29644406 -0.88380097 -0.67777783 -0.27486074 -0.1961417 -0.08179373
0.24561215 0.35905521 0.56116878 0.79698279]
# inplace sorting
unsorted.sort()
print(unsorted)
[-2.29644406 -0.88380097 -0.67777783 -0.27486074 -0.1961417 -0.08179373
0.24561215 0.35905521 0.56116878 0.79698279]
>>> a = np.floor(10*np.random.random((2,2)))
>>> a
array([[ 8., 8.],
[ 0., 0.]])
>>> b = np.floor(10*np.random.random((2,2)))
>>> b
array([[ 1., 8.],
[ 0., 4.]])
>>> np.vstack((a,b)) # vertical stack
array([[ 8., 8.],
[ 0., 0.],
[ 1., 8.],
[ 0., 4.]])
>>> np.hstack((a,b)) #horizontal stack
array([[ 8., 8., 1., 8.],
[ 0., 0., 0., 4.]])
The function column_stack stacks 1D arrays as columns into a 2D array. It is equivalent to hstack only for 2D arrays:
On the other hand, the function row_stack is equivalent to vstack for any input arrays. In general, for arrays of with
more than two dimensions, hstack stacks along their second axes, vstack stacks along their first axes, and
concatenate allows for an optional arguments giving the number of the axis along which the concatenation should
happen.
a = np.floor(10*np.random.random((2,2)))
a
array([[ 6., 9.],
[ 1., 3.]])
b = np.floor(10*np.random.random((2,2)))
b
array([[ 9., 2.],
[ 4., 8.]])
#podajemy 2 parametry jako jeden dlatego podwojne nawiasy
np.row_stack((a,b))
array([[ 6., 9.],
[ 1., 3.],
[ 9., 2.],
[ 4., 8.]])
In complex cases, r_ and c_ are useful for creating arrays by stacking numbers along one axis. They allow the use of
range literals
row = np.r_[0:4,44,444]; row
array([ 0, 1, 2, 3, 44, 444])
np.r_[np.array([1,2,3]), 0, 0, np.array([4,5,6])]
array([1, 2, 3, 0, 0, 4, 5, 6])
#np.c_ moze tez byc uzyte do zlaczenia 2 i wiecej maceirzy w jedna kolumne
np.c_[np.array([[1,2,3]]), 0, 0, np.array([[4,5,6]])]
array([[1, 2, 3, 0, 0, 4, 5, 6]])
np.r_[np.array([1,2,3]), 0, 0, np.array([4,5,6])]
array([1, 2, 3, 0, 0, 4, 5, 6])
Using hsplit, you can split an array along its horizontal axis, either by specifying the number of equally shaped arrays to
return, or by specifying the columns after which the division should occur:
>>> a = np.floor(10*np.random.random((2,12)))
>>> a
array([[ 9., 5., 6., 3., 6., 8., 0., 7., 9., 7., 2., 7.],
[ 1., 4., 9., 2., 2., 1., 0., 6., 2., 2., 4., 0.]])
>>> np.hsplit(a,3) # Split a into 3 arrays
[array([[ 9., 5., 6., 3.],
[ 1., 4., 9., 2.]]), array([[ 6., 8., 0., 7.],
[ 2., 1., 0., 6.]]), array([[ 9., 7., 2., 7.],
[ 2., 2., 4., 0.]])]
>>> np.hsplit(a,(3,4)) # Split a after the third and the fourth column
[array([[ 9., 5., 6.],
[ 1., 4., 9.]]), array([[ 3.],
[ 2.]]), array([[ 6., 8., 0., 7., 9., 7., 2., 7.],
[ 2., 1., 0., 6., 2., 2., 4., 0.]])]
>>> x = np.arange(16.0).reshape(4, 4)
>>> x
array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.],
[ 12., 13., 14., 15.]])
>>> np.vsplit(x, 2)
[array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.]]),
array([[ 8., 9., 10., 11.],
[ 12., 13., 14., 15.]])]
array_split allows indices_or_sections to be an integer that does not equally divide the axis. For an array of length l
that should be split into n sections, it returns l % n sub-arrays of size l//n + 1 and the rest of size l//n.
>>> x = np.arange(8.0)
>>> np.array_split(x, 3) # dzielimy na 3 array, 2 po 2 kolumny, i 1 z 2 kolumnami
[array([ 0., 1., 2.]), array([ 3., 4., 5.]), array([ 6., 7.])]
>>> x = np.arange(7.0)
>>> np.array_split(x, 3) # dzileimy na 3 array, 1sza z 3 kolumnami, 2wie z 2 kolumnami
[array([ 0., 1., 2.]), array([ 3., 4.]), array([ 5., 6.])]
>>> a = np.arange(12)
>>> b = a # no new object is created
>>> b is a # a and b are two names for the same ndarray object
True
>>> b.shape = 3,4 # changes the shape of a
>>> a.shape
(3, 4)
View or Shallow Copy
Different array objects can share the same data. The view method creates a new array object that looks at the same data.
Zmiana danej w nowym array c stworzonym jako view array a pociąga za soba też zmianę tego elementu w array a
a = np.arange(12); a
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
>>> c = a.view()
>>> c
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
>>>#a i c to 2 roznie obiekty ale maja te same wartosci
id(a) # 1971848529120
id(c) # 1971848540688
>>> c is a
False
>>> c.base is a # c is a view of the data owned by a
True
>>> c.flags.owndata
False
>>>
>>> c.shape = 2,6 # a's shape doesn't change
>>> a.shape
(3, 4)
>>> c[0,4] = 1234 # a's data changes
>>> a
array([[ 0, 1, 2, 3],
[1234, 5, 6, 7],
[ 8, 9, 10, 11]])
Deep Copy
The copy method makes a complete copy of the array and its data.
Broadcasting
The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject
to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.
Broadcasting allows universal functions to deal in a meaningful way with inputs that do not have exactly the same shape.
The first rule of broadcasting is that if all input arrays do not have the same number of dimensions, a “1” will be repeatedly
prepended to the shapes of the smaller arrays until all the arrays have the same number of dimensions.
The second rule of broadcasting ensures that arrays with a size of 1 along a particular dimension act as if they had the size
of the array with the largest shape along that dimension. The value of the array element is assumed to be the same along
that dimension for the “broadcast” array.
After application of the broadcasting rules, the sizes of all arrays must match.
Two dimensions are compatible when
If these conditions are not met, a ValueError: frames are not aligned exception is thrown, indicating that the
arrays have incompatible shapes. The size of the resulting array is the maximum size along each dimension of the input
array
NumPy operations are usually done on pairs of arrays on an element-by-element basis. In the simplest case, the two arrays
must have exactly the same shape, as in the following example:
x = np.arange(4);x
array([0, 1, 2, 3])
y = np.ones(5); y
array([ 1., 1., 1., 1., 1.])
ValueError: operands could not be broadcast together with shapes (4,) (5,)
xx = x.reshape(4,1); xx
array([[0],
[1],
[2],
[3]])
import numpy as np
start = np.zeros((4,3))
print(start)
[[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]]
[[ 1. 0. 2.]
[ 1. 0. 2.]
[ 1. 0. 2.]
[ 1. 0. 2.]]
print(add_cols)
[[0]
[1]
[2]
[3]]
[[ 1 3 3 6]
[ 5 7 7 10]
[ 9 11 11 14]
#broadcasting nie zadziala jak array are not in the same dimensions or one of each is
not shape of 1
print(start)
print()
print(arrA)
start+arrA
#ValueError: operands could not be broadcast together with shapes (4,3) (3,4)
Read or Write to Disk:
Binary Format:
x = np.array([ 23.23, 24.24] )
np.save('an_array', x)#zapisuje array do cwd
np.load('an_array.npy')
Text Format:
np.savetxt('array.txt', X=x, delimiter=',')
np.loadtxt('array.txt', delimiter=',')
array([ 23.23, 24.24])
NumPy offers more indexing facilities than regular Python sequences. In addition to indexing by integers and slices, as we
saw before, arrays can be indexed by arrays of integers and arrays of booleans.
When the indexed array a is multidimensional, a single array of indices refers to the first dimension of a. The following
example shows this behavior by converting an image of labels into a color image using a palette.
>>> a = np.arange(5)
>>> a
array([0, 1, 2, 3, 4])
>>> a[[1,3,4]] = 0
>>> a
array([0, 0, 2, 0, 0])
However, when the list of indices contains repetitions, the assignment is done several times, leaving behind the last value:
>>> a = np.arange(5)
>>> a[[0,0,2]]=[1,2,3]
>>> a
array([2, 1, 3, 3, 4])
This is reasonable enough, but watch out if you want to use Python’s += construct, as it may not do what you expect:
>>> a = np.arange(5)
>>> a[[0,0,2]]+=1
>>> a
array([1, 1, 3, 3, 4])
Even though 0 occurs twice in the list of indices, the 0th element is only incremented once. This is because Python requires
“a+=1” to be equivalent to “a = a + 1”.
>>> a = np.arange(12).reshape(3,4)
>>> b = a > 4
>>> b # b is a boolean with a's shape
array([[False, False, False, False],
[False, True, True, True],
[ True, True, True, True]])
>>> a[b] # 1d array with the selected elements
array([ 5, 6, 7, 8, 9, 10, 11])
This property can be very useful in assignments:
The second way of indexing with booleans is more similar to integer indexing; for each dimension of the array we give a
1D boolean array selecting the slices we want:
>>> a = np.arange(12).reshape(3,4)
>>> b1 = np.array([False,True,True]) # first dim selection
>>> b2 = np.array([True,False,True,False]) # second dim selection
>>>
>>> a[b1,:] # selecting rows
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>>
>>> a[b1] # same thing
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>>
>>> a[:,b2] # selecting columns
array([[ 0, 2],
[ 4, 6],
[ 8, 10]])
>>>
>>> a[b1,b2] # a weird thing to do
array([ 4, 10])
Note that the length of the 1D boolean array must coincide with the length of the dimension (or axis) you want to slice. In
the previous example, b1 has length 3 (the number of rows in a), and b2 (of length 4) is suitable to index the 2nd axis
(columns) of a.
The ix_ function can be used to combine different vectors so as to obtain the result for each n-uplet. For example, if you
want to compute all the a+b*c for all the triplets taken from each of the vectors a, b and c:
This function takes N 1-D sequences and returns N outputs with N dimensions each, such that the shape is 1 in all but one
dimension and the dimension with the non-unit shape value cycles through all N dimensions.
Using ix_ one can quickly construct index arrays that will index the cross product. a[np.ix_([1,3],[2,5])] returns
the array [[a[1,2] a[1,5]], [a[3,2] a[3,5]]].
>>> a.transpose()
array([[ 1., 3.],
[ 2., 4.]])
>>> np.linalg.inv(a)
array([[-2. , 1. ],
[ 1.5, -0.5]])
>>> np.linalg.eig(j)
(array([ 0.+1.j, 0.-1.j]), array([[ 0.70710678+0.j , 0.70710678-0.j ],
[ 0.00000000-0.70710678j, 0.00000000+0.70710678j]]))
Parameters:
square matrix
Returns
The eigenvalues, each repeated according to its multiplicity.
The normalized (unit "length") eigenvectors, such that the
column ``v[:,i]`` is the eigenvector corresponding to the
eigenvalue ``w[i]`` .
Histograms
The NumPy histogram function applied to an array returns a pair of vectors: the histogram of the array and the vector of
bins. Beware: matplotlib also has a function to build histograms (called hist, as in Matlab) that differs from the one in
NumPy. The main difference is that pylab.hist plots the histogram automatically, while numpy.histogram only
generates the data.
Pandas
pandas has two main data structures it uses, namely, Series and DataFrames.
pandas Series
pandas Series one-dimensional labeled array. Series can be created using folloiwng constructor:
pandas.Series( data, index, dtype, copy)
data - data takes various forms like ndarray, list, constants
index - Index values must be unique and hashable, same length as data. Default np.arrange(n) if no index is passed.
dtype - dtype is for data type. If None, data type will be inferred
copy - Copy data. Default False
Pandas Series
import pandas as pd
import numpy as np
#create an empty series
s = pd.Series()
s
>>Series([], dtype: float64)
a 0.0
b 1.0
c 2.0
dtype: float64
#when creating data from dictionary we can change the index order and/or provide
additional index not exiting in dictionary
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data,index=['b','c','d','a'])
s
b 1.0
c 2.0
d NaN
a 0.0
dtype: float64
0 5
1 5
2 5
3 5
dtype: int64
tom 100
bob foo
nancy 300
dan bar
eric 500
dtype: object
#we can provide only data for the series without label for each value, than the series
starting from 0 would be assigned as label
ser2 = pd.Series(data = [100, 'foo', 300, 'bar', 500])
ser2
0 100
1 foo
2 300
3 bar
4 500
dtype: object
a 1
b 2
c 3
dtype: int64
c 3
d 4
e 5
dtype: int64
a 1
c 3
d 4
dtype: int64
#we can select the index and value assigned to it by providing index names in the .loc
method
ser.loc[['nancy','bob']] # getting the value from the location "nancy" and "bob
nancy 300
bob 200
dtype: int64
#the same we can achvieve when providing the elements indecies of the array for which
we want to select index label and value
ser[[4, 3, 1]]
eric 500
dan 400
bob 200
dtype: int64
#selecting the value assinged instead of index label + value by using .iloc method
ser.iloc[2]
>>300
#we can check if both the index label is present in the series
b = 'bob' in ser
print(b)
d = 100 in ser # the 100 is a value not the index label so False is returned
print(d)
True
False
#we can multiply, divide add and substract the values in the series
ser * 2
ser**2
ser + ser
ser - ser
ser / ser
#in one series object with different data types can be used but than the dtype will be
object
ser2 = pd.Series(data = [100, 'pawel', 300, 2.0, 'T'], index =['tom', 'bob', 3.5,
'dan', 100])
ser2
tom 100
bob pawel
3.5 300
dan 2
100 T
dtype: object
a 44
b 2
c 3
d 4
e 5
dtype: int64
#assigning the character to one of the values would change entire stype for Series as
the element must have the same dtype
s["b"]="pawel"
s
a 44
b pawel
c 3
d 4
e 5
dtype: object
Pandas DataFrame
DataFrame can be created using folloiwng constructor pandas.DataFrame( data, index, columns, dtype, copy)
data - forms like ndarray, series, map, lists, dict, constants and also another DataFrame.
index - Optional Default np.arrange(n) if no index is passed.
columns - the optional default syntax is - np.arrange(n). This is only true if no index is passed.
dtype - data type foe each column
inputs - data for creating DataFrame - lists, dict, Series, Numpy ndarrays, another DataFrame
Empty DataFrame
Columns: []
Index: []
data = [['Alex',10],['Bob',12],['Clarke',13]]
#wskazanie nazw kolumn przy tworzeniu DF
df = pd.DataFrame(data,columns=['Name','Age'])
df
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13
data = [['Alex',10],['Bob',12],['Clarke',13]]
#stworzenie DF z podaniem data type jaki chcemy uzyc dla kolumn
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
df
Name Age
0 Alex 10.0
1 Bob 12.0
2 Clarke 13.0
Age Name
0 28 Tom
1 34 Jack
Age Name
2 29 Steve
3 42 Ricky
Age Name
rank1 28 Tom
rank2 34 Jack
rank3 29 Steve
rank4 42 Ricky
one two
a b c
FirstDict 1 2 NaN
SecondDict 5 10 20.0
a b
FirstDict 1 2
SecondDict 5 10
kol1 kol2
FirstDict NaN NaN
SecondDict NaN NaN
kolumna1
a 1
b 2
c 3
#we can add the lablels for the row index by providing the values in the index=[]
#the same command was used for slicing !??
pd.DataFrame(data, index=['orange', 'red'])
alex alice dora ema joe
df.index
>>Index(['apple', 'ball', 'cerill', 'clock', 'dancy'], dtype='object')
#slicing the data frame and selecting only 3 given row labels
pd.DataFrame(d, index=['dancy', 'ball', 'apple'])
one two
#slicing the data frame and selecting only 3 rows with given labels and the specified
colum names
#if the label provided by us doesnot exist it would be filled out with NaN values
pd.DataFrame(d, index=['dancy', 'ball', 'apple'], columns=['two', 'five'])
two five
Example:
df1
a 1.0 1 10.0
b 2.0 2 20.0
c 3.0 3 30.0
d NaN 4 NaN
a 1.0 1 10.0
b 2.0 2 20.0
c 3.0 3 30.0
d NaN 4 NaN
a 1.0 1 10.0
b 2.0 2 20.0
c 3.0 3 30.0
d NaN 4 NaN
one three
a 1.0 10.0
b 2.0 20.0
c 3.0 30.0
d NaN NaN
#wybieranie tylko niektorych kolumn, kolumnny monza podac wprost przy tworzeniu nowegp
DataFrame
df3 = pd.DataFrame(df1, columns = ['one', 'three', 'five'])
df3
a 1.0 10.0 1
b 2.0 20.0 2
one three five
c 3.0 30.0 3
d NaN NaN 4
a 1.0 10.0 1
c 3.0 30.0 3
d NaN NaN 4
df1 = pd.DataFrame(d1)
print(df1)
print()
# Adding a new column to an existing DataFrame object with column label by passing new
series
print ("Adding a new column by passing as Series:")
df1['three']=pd.Series([10,20,30],index=['a','b','c'])
print(df1)
print()
#mozemy podac tez wartosci jakie maja byc po kolei wpisane do nowej kolumny
df1['four']=['Tom', 'Jack', 'Steve', 'Ricky']
print(df1)
print()
df1['five'] = [1,2,3,4]
print(df1)
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4
#to add new column to a data frame we can specify non existing column name in df and
assign the values
df['three'] = df['one'] * df['two']
df
#new column could have the values based on the filterning creiteria as well
df['flag'] = df['one'] > 250
df
#adding new column with values specified by hand with using pd.Series()
df['new'] = pd.Series([444,444,444,444,444], index =['apple', 'ball', 'cerill',
'clock', 'dancy'])
df
#dodanie nowej kolumny uzywajac methody .assign() - wazne, ze nowa nazwa kolumny podana
bez ciapek
df = df.assign(new2 = pd.Series([555,555,555,555,555], index = ['apple', 'ball',
'cerill', 'clock', 'dancy']))
df
#jak pdoamy wartosci bez wartosci dla index to wstawia sie wartosci NaN bo panda nie
wie jaka wartosc do czeko przypisac
#pomimo, ze sa takie same
df = df.assign(new3 = pd.Series([555,555,555,555,555]))
df
one two three flag new new2 new3
import pandas as pd
import numpy as np
row1 = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
row2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
row1=row1.append(row2)
print(row1)
a b
0 1 2
1 3 4
0 5 6
1 7 8
import pandas as pd
import numpy as np
row1 = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
row2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
row1=row1.append(row2)
print(row1)
a b
0 1 2
1 3 4
0 5 6
1 7 8
#pop() return the column values based on the index name but at the same time it removes
th column from data frame
a = row1.pop('a')
a
0 1
1 3
0 5
1 7
Name: a, dtype: int64
#after using .pop() method the column is no longer available in the data frame
print(row1)
print()
#but the values are still kept in th variable that we used for assigning the pop()
function
print(a)
b
0 2
1 4
0 6
1 8
b
0 2
1 4
0 6
1 8
#insert mozna tez zrobic dodajac obiekt Series i wskazujac jaki ma miec typ
#nalezy pamietac o podaniu parametru index=[] bo ianczej wypisze nam wartosci NaN, nie
bedzie wiedzial co gdzie przypisac
df.insert(2, 'PF', pd.Series([1,0,1,0,1], index = ['apple', 'ball', 'cerill', 'clock',
'dancy'], dtype=bool))
df
#nowa kolumna moze byc tez utworzona jaki slicing juz isntniejacej
df['one_upper_half'] = df['one'][:2]
df
import numpy as np
import pandas as pd
random = np.random.randn(6,4)
random
A B C D
0 0.287050 1.265009 -0.383207 -2.203757
1 -1.037116 0.490834 -0.269300 -0.149600
2 1.923341 1.047369 -0.735437 1.587828
3 -0.199024 0.818319 0.366812 -0.217814
4 -0.669263 -1.253094 -0.015311 0.845937
5 -0.632030 1.911470 0.411479 0.575946
#the rename() can take the function instead of dictionary when renaming the columns
#lambda x :x[0:3] wybiera pierwsze 3 znaki z kazdej nazwy kolumny
df.rename(columns = lambda x :x[0:3]) #brak inplace=True sprawi. ze pojaiw sie widok z
nowa nazwa kolumn...
0 kol2 kol3 3
0 0.287050 1.265009 -0.383207 -2.203757
1 -1.037116 0.490834 -0.269300 -0.149600
2 1.923341 1.047369 -0.735437 1.587828
3 -0.199024 0.818319 0.366812 -0.217814
4 -0.669263 -1.253094 -0.015311 0.845937
5 -0.632030 1.911470 0.411479 0.575946
#the colum names and the index labels could be changed simultaneously
df.rename(index={2:'two', 3:'three'},
columns={'kol2':1, 'kol3':2},
inplace=True)
print(df)
0 1 2 3
zero 0.287050 1.265009 -0.383207 -2.203757
one -1.037116 0.490834 -0.269300 -0.149600
two 1.923341 1.047369 -0.735437 1.587828
three -0.199024 0.818319 0.366812 -0.217814
4 -0.669263 -1.253094 -0.015311 0.845937
5 -0.632030 1.911470 0.411479 0.575946