Num Py

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 46

https://docs.scipy.

org/doc/numpy/reference/

NumPy

Data types
NumPy supports a much greater variety of numerical types than Python does. This section shows which are available, and
how to modify an array’s data-type.
Data type Description
bool_ Boolean (True or False) stored as a byte
int_ Default integer type (same as C long; normally either int64 or int32)
intc Identical to C int (normally int32 or int64)
intp Integer used for indexing (same as C ssize_t; normally either int32 or int64)
int8 Byte (-128 to 127)
int16 Integer (-32768 to 32767)
int32 Integer (-2147483648 to 2147483647)
int64 Integer (-9223372036854775808 to 9223372036854775807)
uint8 Unsigned integer (0 to 255)
uint16 Unsigned integer (0 to 65535)
uint32 Unsigned integer (0 to 4294967295)
uint64 Unsigned integer (0 to 18446744073709551615)
float_ Shorthand for float64.
float16 Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
float32 Single precision float: sign bit, 8 bits exponent, 23 bits mantissa
float64 Double precision float: sign bit, 11 bits exponent, 52 bits mantissa
complex_ Shorthand for complex128.
complex64 Complex number, represented by two 32-bit floats (real and imaginary components)
complex128 Complex number, represented by two 64-bit floats (real and imaginary components)
Additionally to intc the platform dependent C integer types short, long, longlong and their unsigned versions are
defined.
NumPy numerical types are instances of dtype (data-type) objects, each having unique characteristics. Once you have
imported NumPy using

>>> import numpy as np

the dtypes are available as np.bool_, np.float32, etc.


ata-types can be used as functions to convert python numbers to array scalars (see the array scalar section for an
explanation), python sequences of numbers to arrays of that type, or as arguments to the dtype keyword that many numpy
functions or methods accept. Some examples:

>>> import numpy as np


>>> x = np.float32(1.0)
>>> x
1.0
>>> y = np.int_([1,2,4])
>>> y
array([1, 2, 4])
>>> z = np.arange(3, dtype=np.uint8)
>>> z
array([0, 1, 2], dtype=uint8)
Array types can also be referred to by character codes, mostly to retain backward compatibility with older packages such
as Numeric. Some documentation may still refer to these, for example:

>>> np.array([1, 2, 3], dtype='f')


array([ 1., 2., 3.], dtype=float32)
Special values defined in numpy: nan, inf,
NaNs can be used as a poor-man’s mask (if you don’t care what the original value was)
Note: cannot use equality to test NaNs. E.g.:

>>> myarr = np.array([1., 0., np.nan, 3.])


>>> np.nonzero(myarr == np.nan)
(array([], dtype=int64),)
>>> np.nan == np.nan # is always False! Use special numpy functions instead.

Other related special value functions:

isinf(): True if value is inf


isfinite(): True if not nan or inf
nan_to_num(): Map nan to 0, inf to max float, -inf to min float

The following corresponds to the usual functions except that nans are excluded from the results:

nansum()
nanmax()
nanmin()
nanargmax()
nanargmin()

>>> x = np.arange(10.)
>>> x[3] = np.nan
>>> x.sum()
nan
>>> np.nansum(x)
42.0

ex1 = np.array([11, 12]) # Python assigns the data type


print(ex1.dtype)
>>int32
ex2 = np.array([11.0, 12.0]) # Python assigns the data type
print(ex2.dtype)
>>float64
ex3 = np.array([11, 21], dtype=np.int64) #You can also tell Python the data type
print(ex3.dtype)
>>int64
# you can use this to force floats into integers (using floor function)
ex4 = np.array([11.1,12.7], dtype=np.int64)
print(ex4.dtype)
print()
print(ex4)
>>int64
>>[11 12]
# you can use this to force integers into floats if you anticipate
# the values may change to floats later
ex5 = np.array([11, 21], dtype=np.float64)
print(ex5.dtype)
print()
print(ex5)
>>float64
>>[ 11. 21.]
How numpy handles numerical exceptions
The default is to 'warn' for invalid, divide, and overflow and 'ignore' for underflow. But this can be changed,
and it can be set individually for different kinds of exceptions. The different behaviors are:

 ‘ignore’ : Take no action when the exception occurs.


 ‘warn’ : Print a RuntimeWarning (via the Python warnings module).
 ‘raise’ : Raise a FloatingPointError.
 ‘call’ : Call a function specified using the seterrcall function.
 ‘print’ : Print a warning directly to stdout.
 ‘log’ : Record error in a Log object specified by seterrcall.

These behaviors can be set for all kinds of errors or specific ones:

 all : apply to all numeric exceptions


 invalid : when NaNs are generated
 divide : divide by zero (for integers as well!)
 overflow : floating point overflows
 underflow : floating point underflows

Note that integer divide-by-zero is handled by the same machinery. These behaviors are set on a per-thread basis.

>>> oldsettings = np.seterr(all='warn')


>>> np.zeros(5,dtype=np.float32)/0.
invalid value encountered in divide
>>> j = np.seterr(under='ignore')
>>> np.array([1.e-100])**10
>>> j = np.seterr(invalid='raise')
>>> np.sqrt(np.array([-1.]))
FloatingPointError: invalid value encountered in sqrt
>>> def errorhandler(errstr, errflag):
... print("saw stupid error!")
>>> np.seterrcall(errorhandler)
<function err_handler at 0x...>
>>> j = np.seterr(all='call')
>>> np.zeros(5, dtype=np.int32)/0
FloatingPointError: invalid value encountered in divide
saw stupid error!
>>> j = np.seterr(**oldsettings) # restore previous
... # error-handling settings

Arrays
At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data
types, with many operations being performed in compiled code for performance. There are several important differences
between NumPy arrays and the standard Python sequences:

 NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an
ndarray will create a new array and delete the original.
 The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in
memory. The exception: one can have arrays of (Python, including NumPy) objects, thereby allowing for arrays of
different sized elements.
 NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically,
such operations are executed more efficiently and with less code than is possible using Python’s built-in sequences.
 A growing plethora of scientific and mathematical Python-based packages are using NumPy arrays; though these
typically support Python-sequence input, they convert such input to NumPy arrays prior to processing, and they often
output NumPy arrays. In other words, in order to efficiently use much (perhaps even most) of today’s
scientific/mathematical Python-based software, just knowing how to use Python’s built-in sequence types is
insufficient - one also needs to know how to use NumPy arrays.

NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the
same type, indexed by a tuple of positive integers. In NumPy dimensions are called axes.

For example, the coordinates of a point in 3D space [1, 2, 1] has one axis. That axis has 3 elements in it, so we say it
has a length of 3. In the example pictured below, the array has 2 axes. The first axis has a length of 2, the second axis has a
length of 3.
[[ 1., 0., 0.],
[ 0., 1., 2.]]

NumPy’s array class is called ndarray. It is also known by the alias array. Note that numpy.array is not the same
as the Standard Python Library class array.array, which only handles one-dimensional arrays and offers less
functionality. The more important attributes of an ndarray object are:

ndarray.ndim - the number of axes (dimensions) of the array.


ndarray.shape- the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension.
For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the number of
axes, ndim.
ndarray.size - the total number of elements of the array. This is equal to the product of the elements of shape.
ndarray.dtype - an object describing the type of the elements in the array. One can create or specify dtype’s using
standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are
some examples.
ndarray.itemsize - the size in bytes of each element of the array. For example, an array of elements of type float64 has
itemsize 8 (=64/8), while one of type complex32 has itemsize 4 (=32/8). It is equivalent to
ndarray.dtype.itemsize.
ndarray.data - the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute
because we will access the elements in an array using indexing facilities.
>>> import numpy as np
>>> a = np.arange(15).reshape(3, 5)
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> a.shape
(3, 5)
>>> a.ndim
2
>>> a.dtype.name
'int64'
>>> a.itemsize
8
>>> a.size
15
>>> type(a)
<type 'numpy.ndarray'>

How to create Rank 1 numpy arrays:

import numpy as np
#rank1 np.array is a vector
an_array = np.array([3, 33, 333]) # Create a rank 1 array
print(type(an_array)) # The type of an ndarray is: "<class
'numpy.ndarray'>"
# test the shape of the array we just created, it should have just one dimension (Rank
1)
print(an_array.shape)
#for a vector returns 1 dimension which is vector length
(3,)
# because this is a 1-rank array, we need only one index to accesss each element
print(an_array[0], an_array[1], an_array[2])
>>3 33 333
an_array[0] =888 # ndarrays are mutable, here we change an element of the
array
print(an_array)
>>[888 33 333]

How to create a Rank 2 numpy array:


A rank 2 ndarray is one with two dimensions. Notice the format below of [ [row] , [row]
]. 2 dimensional arrays are great for representing matrices which are often useful in
data science.

#there is specific notation [ [row1] , [row2] ] when creating 2 dimenson array


another = np.array([[11,12,13],[21,22,23]]) # Create a rank 2 array
print(another) # print the array
print("The shape is 2 rows, 3 columns: ", another.shape) # rows x columns
print("Accessing elements [0,0], [0,1], and [1,0] of the ndarray: ", another[0, 0], ",
",another[0, 1],", ", another[1, 0])

>>[[11 12 13]
[21 22 23]]
>>The shape is 2 rows, 3 columns: (2, 3)
>>Accessing elements [0,0], [0,1], and [1,0] of the ndarray: 11 , 12 , 21

Array Creation
There are several ways to create arrays. For example, you can create an array from a regular Python list or tuple using the
array function. The type of the resulting array is deduced from the type of the elements in the sequences.

>>> import numpy as np

>>> a = np.array([2,3,4])

>>> a array([2, 3, 4])

A frequent error consists in calling array with multiple numeric arguments, rather than providing a single list of
numbers as an argument.

>>> a = np.array(1,2,3,4) # WRONG

>>> a = np.array([1,2,3,4]) # RIGHT

array transforms sequences of sequences into two-dimensional arrays, sequences of sequences of sequences into three-
dimensional arrays, and so on.

>>> b = np.array([(1.5,2,3), (4,5,6)])


>>> b
array([[ 1.5, 2. , 3. ],
[ 4. , 5. , 6. ]])
The type of the array can also be explicitly specified at creation time:

>>> c = np.array( [ [1,2], [3,4] ], dtype=complex )


>>> c
array([[ 1.+0.j, 2.+0.j],
[ 3.+0.j, 4.+0.j]])

Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy offers several functions to
create arrays with initial placeholder content. These minimize the necessity of growing arrays, an expensive operation.
The function zeros creates an array full of zeros, the function ones creates an array full of ones, and the function empty
creates an array whose initial content is random and depends on the state of the memory. By default, the dtype of the
created array is float64.

>>> np.zeros( (3,4) )


array([[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])
>>> np.ones( (2,3,4), dtype=np.int16 ) # dtype can also be specified
array([[[ 1, 1, 1, 1],
[ 1, 1, 1, 1],
[ 1, 1, 1, 1]],
[[ 1, 1, 1, 1],
[ 1, 1, 1, 1],
[ 1, 1, 1, 1]]], dtype=int16)
>>> np.empty( (2,3) ) # uninitialized, output may vary
array([[ 3.73603959e-262, 6.02658058e-154, 6.55490914e-260],
[ 5.30498948e-313, 3.14673309e-307, 1.00000000e+000]])

import numpy as np
# create a 2x2 array of zeros
ex1 = np.zeros((2,2))
print(ex1)

[[ 0. 0.]
[ 0. 0.]]

# create a 2x2 array filled with 9.0


#np.full((size of array), NumberToFill)
ex2 = np.full((2,2), 9.0)
print(ex2)

[[ 9. 9.]
[ 9. 9.]]

# create a 2x2 matrix with the diagonal 1s and the others 0


ex3 = np.eye(2,2)
print(ex3)
[[ 1. 0.]
[ 0. 1.]]

# create an 1x2 array of ones


ex4 = np.ones((1,2))
print(ex4)
[[ 1. 1.]]

# notice that the above ndarray (ex4) is actually rank 2, it is a 1x2 array
print(ex4.shape)
#>>(1,2)

# which means we need to use two indexes to access an element


print()
print(ex4[0,1])
#>>1.0
# create an array of random floats between 0 and 1
ex5 = np.random.random((2,2))
print(ex5)

np.arrange() - To create sequences of numbers, NumPy provides a function analogous to range that returns arrays
instead of lists.

>>> np.arange( 10, 30, 5 )


array([10, 15, 20, 25])
>>> np.arange( 0, 2, 0.3 ) # it accepts float arguments
array([ 0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8])

When arange is used with floating point arguments, it is generally not possible to predict the number of elements
obtained, due to the finite floating point precision. For this reason, it is usually better to use the function linspace that
receives as an argument the number of elements that we want, instead of the step:

>>> from numpy import pi


>>> np.linspace( 0, 2, 9 ) # 9 numbers from 0 to 2
array([ 0. , 0.25, 0.5 , 0.75, 1. , 1.25, 1.5 , 1.75, 2. ])
>>> x = np.linspace( 0, 2*pi, 100 ) # useful to evaluate function at lots of points
>>> f = np.sin(x)

Example:
import numpy as np
from numpy import pi

np.linspace #getting help for the function


#np.linpsace(start=0, stop=2, num=9)
np.linspace(start=0, stop=2, num=9)# 9 cyfr od zera do 2
#array([ 0. , 0.25, 0.5 , 0.75, 1. , 1.25, 1.5 , 1.75, 2. ])

np.linspace(start=0, stop=2, num=9, endpoint = False)# 9 cyfr od zera do 2 ale koniec


przedzialu nie jest jednym z wynikow
#array([ 0. , 0.22222222, 0.44444444, 0.66666667, 0.88888889,
# 1.11111111, 1.33333333, 1.55555556, 1.77777778])

One-dimensional arrays are then printed as rows, bidimensionals as matrices and tridimensionals as lists of matrices.

>>> a = np.arange(6) # 1d array


>>> print(a)
[0 1 2 3 4 5]
>>> b = np.arange(12).reshape(4,3) # 2d array
>>> print(b)
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
>>> c = np.arange(24).reshape(2,3,4) # 3d array
>>> print(c)
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
If an array is too large to be printed, NumPy automatically skips the central part of the array and only prints the corners:

>>> print(np.arange(10000))
[ 0 1 2 ..., 9997 9998 9999]
>>>
>>> print(np.arange(10000).reshape(100,100))
[[ 0 1 2 ..., 97 98 99]
[ 100 101 102 ..., 197 198 199]
[ 200 201 202 ..., 297 298 299]
...,
[9700 9701 9702 ..., 9797 9798 9799]
[9800 9801 9802 ..., 9897 9898 9899]
[9900 9901 9902 ..., 9997 9998 9999]]
To disable this behaviour and force NumPy to print the entire array, you can change the printing options using
set_printoptions.

>>> np.set_printoptions(threshold=np.nan)

Basic Operations
Arithmetic operators on arrays apply elementwise. A new array is created and filled with the result.
a = np.array( [20,30,40,50] ); a
b = np.arange( 4 ) ; b #array([0, 1, 2, 3])

#kazdy element wektora jest odjety od siebie, a[1] - b[1], a[2] - b[2], itd
c = a-b; c #array([20, 29, 38, 47])
#mnozenie wektora przez liczbe
#b[1] *2, b[2] * 2, itd.
b*2
#podnoszenie do potegi elementow wektora
#b[1] ** 2, b[2] ** 2, itd
b**2
#array([0, 1, 4, 9])
10*np.sin(a)
#porownanie elemtow wektora do liczby - zwraca wektor T albo F
a<35
#array([ True, True, False, False])

Unlike in many matrix languages, the product operator * operates elementwise in NumPy arrays. The matrix product
can be performed using the @ operator (in python >=3.5) or the dot function or method:

A = np.array( [[1,1],
[0,1]] )
B = np.array( [[2,0],
[3,4]] )
#mozenie elemtow array przez siebie
A * B
#a[1] * b[1]; a[2] * b[2], a[3] * b[3]
array([[2, 0],
[0, 4]])

#mnoznie macierzowe 2 array - @


A @ B
#[1,1] * [2,3] = 1*2 +3 * 1 =5
#[1,1] * [0, 4] = 1*0 + 1*4 = 4, itd.
array([[5, 4],
[3, 4]])

#mnoznie macierzowe 2 array - dot()


A.dot(B)
array([[5, 4],
[3, 4]])

#mnoznie macierzowe 2 array - dot()


np.dot(A, B)

#tworzenie macierzy z samych jedynek - rezerwacja miejsca


a = np.ones((2,3), dtype=int)
a
array([[1, 1, 1],
[1, 1, 1]])

#wybieranie losowych wartosci do maceirzy


b = np.random.random((2,3))
b
array([[ 0.87491218, 0.4993892 , 0.23323 ],
[ 0.5179975 , 0.91004357, 0.7154386 ]])

Some operations, such as += and *=, act in place to modify an existing array rather
than create a new one.
a *= 3
a
array([[3, 3, 3],
[3, 3, 3]])
b += a # a was converted to float
b
array([[ 3.87491218, 3.4993892 , 3.23323 ],
[ 3.5179975 , 3.91004357, 3.7154386 ]])

a += b ## b is NOT automatically converted to integer type


TypeError: Cannot cast ufunc add output from dtype('float64') to dtype('int64') with casting
rule 'same_kind'

When operating with arrays of different types, the type of the resulting array corresponds to the more general or precise
one (a behavior known as upcasting).

>>> a = np.ones(3, dtype=np.int32)


>>> b = np.linspace(0,pi,3)
>>> b.dtype.name
'float64'
>>> c = a+b
>>> c
array([ 1. , 2.57079633, 4.14159265])
>>> c.dtype.name
'float64'
>>> d = np.exp(c*1j)
>>> d
array([ 0.54030231+0.84147098j, -0.84147098+0.54030231j,
-0.54030231-0.84147098j])
>>> d.dtype.name
'complex128'

Many unary operations, such as computing the sum of all the elements in the array, are implemented as methods of the
ndarray class.

>>> a = np.random.random((2,3))
>>> a
array([[ 0.18626021, 0.34556073, 0.39676747],
[ 0.53881673, 0.41919451, 0.6852195 ]])
>>> a.sum()
2.5718191614547998
>>> a.min()
0.1862602113776709
>>> a.max()
0.6852195003967595
By default, these operations apply to the array as though it were a list of numbers, regardless of its shape. However, by
specifying the axis parameter you can apply an operation along the specified axis of an array:

>>> b = np.arange(12).reshape(3,4)
>>> b
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> b.sum(axis=0) # sum of each column
array([12, 15, 18, 21])

>>> b.min(axis=1) # min of each row


array([0, 4, 8])

>>> b.cumsum(axis=1) # cumulative sum along each row


array([[ 0, 1, 3, 6],
[ 4, 9, 15, 22],
[ 8, 17, 27, 38]])

NumPy provides familiar mathematical functions such as sin, cos, and exp. In NumPy, these are called “universal
functions”(ufunc). Within NumPy, these functions operate elementwise on an array, producing an array as output.

>>> B = np.arange(3)
>>> B
array([0, 1, 2])
>>> np.exp(B)
array([ 1. , 2.71828183, 7.3890561 ])
>>> np.sqrt(B)
array([ 0. , 1. , 1.41421356])
>>> C = np.array([2., -1., 4.])
>>> np.add(B, C)
array([ 2., 0., 6.])

Basic Statistical Operations


# setup a random 2 x 4 matrix -
# np.random.randn() - Return a sample from the “standard normal” distribution
#For random samples from N(mu, sigma^2), use: sigma * np.random.randn(...) + mu

#np.random.random((2,2)) - array of random floats between 0 and 1


#Return random floats in the half-open interval [0.0, 1.0).
#o sample Unif[a, b), b > a multiply the output of random_sample by (b-a) and add a =>
(b - a) * random_sample() + a
arr = 10* np.random.randn(2,5)
print(arr)
print()
arr2 = np.random.random((2,5))
print(arr2)

[[-31.2611818 3.40544911 -1.83361072 12.88669536 8.05299871]


[ 2.19095999 -0.60087286 5.33040616 10.01173366 -11.98694689]]

[[ 0.7422966 0.13301205 0.41993349 0.99046004 0.3688667 ]


[ 0.68954316 0.9893082 0.03602801 0.06488499 0.34385863]]
# compute the mean for all elements
print(arr.mean())
-0.380436928566

# compute the means by row


print(arr.mean(axis = 1))
[-1.74992987 0.98905601]

# compute the means by column


print(arr.mean(axis = 0))
[-14.53511091 1.40228813 1.74839772 11.44921451 -1.96697409]

# sum all the elements


print(arr.sum())
#sum by rows
print(arr.sum(axis=1))
#sum by columns
print(arr.sum(axis=0))

-3.80436928566
[-8.74964934 4.94528006]
[-29.07022182 2.80457625 3.49679545 22.89842902 -3.93394818]

# compute the medians by columns


print(np.median(arr, axis = 1))
[ 3.40544911 2.19095999]
# returns element wise maximum between two arrays
#porownuje wartosci elementow na tych samych miejscach w 2 macierach i zwraca maceirz z
max dla kazdej z nich
np.maximum(x, y)

"any" or "all" conditionals:


arr_bools = np.array([ True, False, True, True, False ])
#if any of the value is TRUE?
arr_bools.any()
True
#if all of the values are TRUE?
arr_bools.all()
False

Indexing, Slicing and Iterating


One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences.

>>> a = np.arange(10)**3
>>> a
array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729])
>>> a[2]
8
>>> a[2:5]
array([ 8, 27, 64])
>>> a[:6:2] = -1000 # equivalent to a[0:6:2] = -1000; from start to position 6, exclusive, se
t every 2nd element to -1000
>>> a
array([-1000, 1, -1000, 27, -1000, 125, 216, 343, 512, 729])
>>> a[ : :-1] # reversed a
array([ 729, 512, 343, 216, 125, -1000, 27, -1000, 1, -1000])
>>> for i in a:
... print(i**(1/3.))
...
Nan #pierwiatek z liczby ujemnej
1.0
Nan #pierwiatek z liczby ujemnej
3.0
Nan #pierwiatek z liczby ujemnej
5.0
6.0
7.0
8.0
9.0

Multidimensional arrays can have one index per axis. These indices are given in a tuple separated by commas:

d=np.arange(20).reshape(5,4); d
#d= np.fromfunction(f,(5,4),dtype=int); d
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]])

#selecting single element of array


d[2,3] # 11
# each row in the second column of d
d[0:5, 1]
#the same could be obtained with following command
d[ : ,1] #each row in the second column of d
array([ 1, 5, 9, 13, 17])
# each column in the second and third row of b
d[1:3, : ]
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
#When fewer indices are provided than the number of axes, the missing indices are
considered complete slices:
d[-1] # the last row. Equivalent to d[-1,:]
array([16, 17, 18, 19])
#the last column
d[:,-1]
array([ 3, 7, 11, 15, 19])
#all rows from columns 2nd, 3rd, 4th
d[:,1:4]
array([[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11],
[13, 14, 15],
[17, 18, 19]])

The expression within brackets in b[i] is treated as an i followed by as many instances of : as needed to represent the
remaining axes. NumPy also allows you to write this using dots as b[i,...].
The dots (...) represent as many colons as needed to produce a complete indexing tuple. For example, if x is an array
with 5 axes, then

 x[1,2,...] is equivalent to x[1,2,:,:,:],


 x[...,3] to x[:,:,:,:,3] and
 x[4,...,5,:] to x[4,:,:,5,:].
e = np.array( [[[ 0, 1, 2], # a 3D array (two stacked 2D arrays)
[ 10, 12, 13]],
[[100,101,102],
[110,112,113]]])
e
array([[[ 0, 1, 2],
[ 10, 12, 13]],

[[100, 101, 102],


[110, 112, 113]]])
e.shape #(2, 2, 3)
e[0,...] #wypisanie elementów z macierzy na pierwszym miejscu
array([[ 0, 1, 2],
[10, 12, 13]])
e[1, ...] #wypisanie elementow z macierzy na drugim miejscu
#same as c[1,:,:] or c[1]
array([[100, 101, 102],
[110, 112, 113]])
e[...,2] # wypisanie 3ich elementow z kazdej macierzy w 3D array
# same as c[:,:,2]
array([[ 2, 13],
[102, 113]])

Indexing using where():


#numpy.where(condition[, x, y])
#Return elements chosen from x or y depending on condition. If condition T than take x,
if F than take y
x_1 = np.array([1,2,3,4,5])

y_1 = np.array([11,22,33,44,55])

filter = np.array([True, False, True, False, True])#tworzymy maceirz typu boolean


print(filter)
>>[ True False True False True]

#dla wartosci filter T bierzemy element z x_1, dla wartosci F bierzemy element z y_1
out = np.where(filter, x_1, y_1)
print(out)
>>[ 1 22 3 44 5]

mat = np.random.rand(5,5)
mat
array([[ 0.52947625, 0.41875705, 0.29102789, 0.91510274, 0.66849413],
[ 0.69031262, 0.94227646, 0.19339872, 0.49799366, 0.46822884],
[ 0.23988332, 0.31886198, 0.17293111, 0.17862612, 0.28943965],
[ 0.39178129, 0.2280181 , 0.38454648, 0.13993122, 0.34982648],
[ 0.94357092, 0.61724024, 0.70305995, 0.5601409 , 0.01207867]])

#jak wartosc w mat > 0.5 to wypisujemy 1000, jak mniejsza to -1


np.where( mat > 0.5, 1000, -1)
array([[1000, -1, -1, 1000, 1000],
[1000, 1000, -1, -1, -1],
[ -1, -1, -1, -1, -1],
[ -1, -1, -1, -1, -1],
[1000, 1000, 1000, 1000, -1]])
Array crated with slice [ ] vs array created with np.array()

New object (array) created as slice operates on the same data set as orginal array !!!
an_array = np.array([[11,12,13,14], [21,22,23,24], [31,32,33,34]])
a_slice = an_array[:2, 1:3] #first 2 rows and columns from 2nd to 4th exclusive
a_slice[0, 0] = 1000 #assigning new value the element of a_slice modifies the an_array
as well

New object created as array of the slice creates separate data set and the modification
does not changes orgianl data set

an_array = np.array([[11,12,13,14], [21,22,23,24], [31,32,33,34]])


a_slice2 = np.array(an_array[:2, 1:3])
a_slice2[0, 0] = 9999 #modifying the element of a_slice2 does not changes data in
an_array

Slice indexing:
Similar to the use of slice indexing with lists and strings, we can use slice indexing
to pull out sub-regions of ndarrays.

import numpy as np

# Rank 2 array of shape (3, 4)


an_array = np.array([[11,12,13,14], [21,22,23,24], [31,32,33,34]])
print(an_array)

[[11 12 13 14]
[21 22 23 24]
[31 32 33 34]]

#slice of the array creates new object which actually operates on the same data
a_slice = an_array[:2, 1:3] #first 2 rows and columns from 2nd to 4th exclusive
print(a_slice)
[[12 13]
[22 23]]

When you modify a slice, you actually modify the underlying array.
#modifying the value in a_slice object would modify the value in the an_array as well
as they both works on the same data

#indieces used in a_slice and an_array might be different but they refer to the same
data
print("Before:", an_array[0, 1]) #inspect the element at 0, 1
a_slice[0, 0] = 1000 # a_slice[0, 0] is the same piece of data as an_array[0, 1]
print("After:", an_array[0, 1])
>>Before: 12
>>After: 1000

#to create new object with new data set we need to use slicing in following way
a_slice2 = np.array(an_array[:2, 1:3])
print(a_slice2)

[[1000 13]
[ 22 23]]

print("Before:", an_array[0, 1]) #inspect the element at 0, 1


a_slice2[0, 0] = 9999 # now new data set was created and chaning the value in
a_slice2
#does not changes the value in orginal an_array
print("After:", an_array[0, 1])
>>Before: 1000
>>After: 1000
Iterating over multidimensional arrays is done with respect to the first axis:

for row in e:
print(row)
[[ 0 1 2]
[10 12 13]]
[[100 101 102]
[110 112 113]]

for row in d:
print(row)
[0 1 2 3]
[4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]

However, if one wants to perform an operation on each element in the array, one can use the flat attribute which is an
iterator over all the elements of the array:

for element in e.flat:


print(element)

0
1
2
10
12
13
100
101
102
110
112
113

Array indexing

# Create a Rank 2 array of shape (3, 4)


an_array = np.array([[11,12,13,14], [21,22,23,24], [31,32,33,34]])
print(an_array)

[[11 12 13 14]
[21 22 23 24]
[31 32 33 34]]

# Using both integer indexing & slicing generates an array of lower rank
row_rank1 = an_array[1, :] # Rank 1 view, we are selecting the signle row of
an_array

print(row_rank1, row_rank1.shape) # notice only a single []

#using single index to access the row [1, :] creates a vector


>>[21 22 23 24] (4,)

# Slicing alone: generates an array of the same rank as the an_array


row_rank2 = an_array[1:2, :] # Rank 2 view

print(row_rank2, row_rank2.shape) # Notice the [[ ]]


#using slicing [1:2, :] to access the data crates the 2 dminesions array
#the selected values are the same but the shape is different
>>[[21 22 23 24]] (1, 4)
#We can do the same thing for columns of an array:
print()
col_rank1 = an_array[:, 1]
col_rank2 = an_array[:, 1:2]

print(col_rank1, col_rank1.shape) # we are getting Rank1 (vector) array


print()
print(col_rank2, col_rank2.shape) # we are getting Rank2 arr
>>[12 22 32] (3,)

[>>[12]
[22]
[32]] (3, 1)

Array Indexing for changing elements:


Sometimes it's useful to use an array of indexes to access or change elements

# Create a new array


an_array = np.array([[11,12,13], [21,22,23], [31,32,33], [41,42,43]])

print('Original Array:')
print(an_array)
Original Array:
[[11 12 13]
[21 22 23]
[31 32 33]
[41 42 43]]

# Create an array of indices


col_indices = np.array([0, 1, 2, 0])
print('\nCol indices picked : ', col_indices)

row_indices = np.arange(4)
print('\nRows indices picked : ', row_indices)

# Examine the pairings of row_indices and col_indices. These are the elements we'll
change next.
for row,col in zip(row_indices,col_indices):
print(row, ", ",col)

0 , 0
1 , 1
2 , 2
3 , 0

# Select one element from each row


print('Values in the array at those indices: ',an_array[row_indices, col_indices])
>>Values in the array at those indices: [11 22 33 41]

# Change one element from each row using the indices selected
an_array[row_indices, col_indices] += 100000

print('\nChanged Array:')
print(an_array)

Changed Array:
[[100011 12 13]
[ 21 100022 23]
[ 31 32 100033]
[100041 42 43]

Boolean Indexing
# create a 3x2 array
an_array = np.array([[11,12], [21, 22], [31, 32]])
print(an_array)
[[11 12]
[21 22]
[31 32]]

# create a filter which will be boolean values for whether each element meets this
condition
filter = (an_array > 15)
print(filter)

[[False False]
[ True True]
[ True True]]

Notice that the filter is a same size ndarray as an_array which is filled with True
for each element whose corresponding element in an_array which is greater than 15 and
False for those elements whose value is less than 15.

# we can now select just those elements which meet that criteria
print(an_array[filter])
print("To samo mozemy uzyskac")
print(an_array[an_array>15])

# For short, we could have just used the approach below without the need for the
separate filter array.
an_array[(an_array % 2 == 0)]
>>array([12, 22, 32])

What is particularly useful is that we can actually change elements in the array
applying a similar logical filter. Let's add 100 to all the even values
an_array[an_array % 2 == 0] +=100
print(an_array)

[[ 11 112]
[ 21 122]
[ 31 132]]

Changing the shape of an array


An array has a shape given by the number of elements along each axis:

>>> a = np.floor(10*np.random.random((3,4)))
>>> a
array([[ 2., 8., 0., 6.],
[ 4., 5., 1., 1.],
[ 8., 9., 3., 6.]])
>>> a.shape
(3, 4)
The shape of an array can be changed with various commands. Note that the following three commands all return a
modified array, but do not change the original array:

>>> a.ravel() # returns the array, flattened


array([ 2., 8., 0., 6., 4., 5., 1., 1., 8., 9., 3., 6.])
>>> a.reshape(6,2) # returns the array with a modified shape
array([[ 2., 8.],
[ 0., 6.],
[ 4., 5.],
[ 1., 1.],
[ 8., 9.],
[ 3., 6.]])
>>> a.T # returns the array, transposed
array([[ 2., 4., 8.],
[ 8., 5., 9.],
[ 0., 1., 3.],
[ 6., 1., 6.]])
>>> a.T.shape
(4, 3)
>>> a.shape
(3, 4)

If a dimension is given as -1 in a reshaping operation, the other dimensions are automatically calculated:

>>> a.reshape(3,-1)
array([[ 2., 8., 0., 6.],
[ 4., 5., 1., 1.],
[ 8., 9., 3., 6.]])

Sorting

# create a 10 element array of randoms


unsorted = np.random.randn(10)
print(unsorted)

[-0.27486074 0.79698279 -0.67777783 -0.88380097 -0.1961417 0.56116878


-0.08179373 0.24561215 -2.29644406 0.35905521]

# create copy and sort


sorted = np.array(unsorted)
#sorting the copy od data does not sort the orignial data set
sorted.sort()

print(sorted)
print()
print(unsorted)
[-2.29644406 -0.88380097 -0.67777783 -0.27486074 -0.1961417 -0.08179373
0.24561215 0.35905521 0.56116878 0.79698279]

[-0.27486074 0.79698279 -0.67777783 -0.88380097 -0.1961417 0.56116878


-0.08179373 0.24561215 -2.29644406 0.35905521]

# inplace sorting
unsorted.sort()
print(unsorted)
[-2.29644406 -0.88380097 -0.67777783 -0.27486074 -0.1961417 -0.08179373
0.24561215 0.35905521 0.56116878 0.79698279]

Finding Unique elements:


array = np.array([1,2,1,4,2,1,4,2])
print(np.unique(array))
#[1 2 4]

Set Operations with np.array data type:


s1 = np.array(['desk','chair','bulb'])
s2 = np.array(['lamp','bulb','chair'])
print(s1, s2)
#np.intersect1d() zwraca czesc wspolna obu zbiorow
print( np.intersect1d(s1, s2) )
>>['bulb' 'chair']

#return all unique elements in both data sets


print( np.union1d(s1, s2) )
>>['bulb' 'chair' 'desk' 'lamp']

#return difference betweend data sets


print( np.setdiff1d(s1, s2) )# elements in s1 that are not in s2
>>['desk']

print( np.in1d(s1, s2) )#which element of s1 is also in s2

Stacking (joining) together different arrays

Several arrays can be stacked together along different axes:

>>> a = np.floor(10*np.random.random((2,2)))
>>> a
array([[ 8., 8.],
[ 0., 0.]])
>>> b = np.floor(10*np.random.random((2,2)))
>>> b
array([[ 1., 8.],
[ 0., 4.]])
>>> np.vstack((a,b)) # vertical stack
array([[ 8., 8.],
[ 0., 0.],
[ 1., 8.],
[ 0., 4.]])
>>> np.hstack((a,b)) #horizontal stack
array([[ 8., 8., 1., 8.],
[ 0., 0., 0., 4.]])
The function column_stack stacks 1D arrays as columns into a 2D array. It is equivalent to hstack only for 2D arrays:

>>> from numpy import newaxis


>>> np.column_stack((a,b)) # with 2D arrays
array([[ 8., 8., 1., 8.],
[ 0., 0., 0., 4.]])
>>> a = np.array([4.,2.])
>>> b = np.array([3.,8.])
>>> np.column_stack((a,b)) # returns a 2D array
array([[ 4., 3.],
[ 2., 8.]])
>>> np.hstack((a,b)) # the result is different
array([ 4., 2., 3., 8.])
>>> a[:,newaxis] # this allows to have a 2D columns vector
array([[ 4.],
[ 2.]])
>>> np.column_stack((a[:,newaxis],b[:,newaxis]))
array([[ 4., 3.],
[ 2., 8.]])
>>> np.hstack((a[:,newaxis],b[:,newaxis])) # the result is the same
array([[ 4., 3.],
[ 2., 8.]])

On the other hand, the function row_stack is equivalent to vstack for any input arrays. In general, for arrays of with
more than two dimensions, hstack stacks along their second axes, vstack stacks along their first axes, and
concatenate allows for an optional arguments giving the number of the axis along which the concatenation should
happen.
a = np.floor(10*np.random.random((2,2)))
a
array([[ 6., 9.],
[ 1., 3.]])

b = np.floor(10*np.random.random((2,2)))
b
array([[ 9., 2.],
[ 4., 8.]])
#podajemy 2 parametry jako jeden dlatego podwojne nawiasy
np.row_stack((a,b))
array([[ 6., 9.],
[ 1., 3.],
[ 9., 2.],
[ 4., 8.]])

In complex cases, r_ and c_ are useful for creating arrays by stacking numbers along one axis. They allow the use of
range literals
row = np.r_[0:4,44,444]; row
array([ 0, 1, 2, 3, 44, 444])

np.r_[np.array([1,2,3]), 0, 0, np.array([4,5,6])]
array([1, 2, 3, 0, 0, 4, 5, 6])

column = np.c_[1,2,3, 4]; column

#np.c_ moze tez byc uzyte do zlaczenia 2 i wiecej maceirzy w jedna kolumne
np.c_[np.array([[1,2,3]]), 0, 0, np.array([[4,5,6]])]
array([[1, 2, 3, 0, 0, 4, 5, 6]])

row = np.r_[0:4,44,444]; row


array([ 0, 1, 2, 3, 44, 444])

np.r_[np.array([1,2,3]), 0, 0, np.array([4,5,6])]
array([1, 2, 3, 0, 0, 4, 5, 6])

Splitting one array into several smaller ones

Using hsplit, you can split an array along its horizontal axis, either by specifying the number of equally shaped arrays to
return, or by specifying the columns after which the division should occur:

>>> a = np.floor(10*np.random.random((2,12)))
>>> a
array([[ 9., 5., 6., 3., 6., 8., 0., 7., 9., 7., 2., 7.],
[ 1., 4., 9., 2., 2., 1., 0., 6., 2., 2., 4., 0.]])
>>> np.hsplit(a,3) # Split a into 3 arrays
[array([[ 9., 5., 6., 3.],
[ 1., 4., 9., 2.]]), array([[ 6., 8., 0., 7.],
[ 2., 1., 0., 6.]]), array([[ 9., 7., 2., 7.],
[ 2., 2., 4., 0.]])]
>>> np.hsplit(a,(3,4)) # Split a after the third and the fourth column
[array([[ 9., 5., 6.],
[ 1., 4., 9.]]), array([[ 3.],
[ 2.]]), array([[ 6., 8., 0., 7., 9., 7., 2., 7.],
[ 2., 1., 0., 6., 2., 2., 4., 0.]])]

vsplit splits along the vertical axis

>>> x = np.arange(16.0).reshape(4, 4)
>>> x
array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.],
[ 12., 13., 14., 15.]])
>>> np.vsplit(x, 2)
[array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.]]),
array([[ 8., 9., 10., 11.],
[ 12., 13., 14., 15.]])]

>>> np.vsplit(x, np.array([3, 6]))


[array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]]),
array([[ 12., 13., 14., 15.]]),
array([], dtype=float64)]

array_split allows indices_or_sections to be an integer that does not equally divide the axis. For an array of length l
that should be split into n sections, it returns l % n sub-arrays of size l//n + 1 and the rest of size l//n.

>>> x = np.arange(8.0)
>>> np.array_split(x, 3) # dzielimy na 3 array, 2 po 2 kolumny, i 1 z 2 kolumnami
[array([ 0., 1., 2.]), array([ 3., 4., 5.]), array([ 6., 7.])]
>>> x = np.arange(7.0)
>>> np.array_split(x, 3) # dzileimy na 3 array, 1sza z 3 kolumnami, 2wie z 2 kolumnami
[array([ 0., 1., 2.]), array([ 3., 4.]), array([ 5., 6.])]

Copies and Views


When operating and manipulating arrays, their data is sometimes copied into a new array and sometimes not. This is often
a source of confusion for beginners. There are three cases:
No Copy at All

Simple assignments make no copy of array objects or of their data.

>>> a = np.arange(12)
>>> b = a # no new object is created
>>> b is a # a and b are two names for the same ndarray object
True
>>> b.shape = 3,4 # changes the shape of a
>>> a.shape
(3, 4)
View or Shallow Copy
Different array objects can share the same data. The view method creates a new array object that looks at the same data.
Zmiana danej w nowym array c stworzonym jako view array a pociąga za soba też zmianę tego elementu w array a

a = np.arange(12); a
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
>>> c = a.view()
>>> c
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
>>>#a i c to 2 roznie obiekty ale maja te same wartosci
id(a) # 1971848529120
id(c) # 1971848540688
>>> c is a
False
>>> c.base is a # c is a view of the data owned by a
True
>>> c.flags.owndata
False
>>>
>>> c.shape = 2,6 # a's shape doesn't change
>>> a.shape
(3, 4)
>>> c[0,4] = 1234 # a's data changes
>>> a
array([[ 0, 1, 2, 3],
[1234, 5, 6, 7],
[ 8, 9, 10, 11]])

Deep Copy
The copy method makes a complete copy of the array and its data.

>>> d = a.copy() # a new array object with new data is created


>>> d is a
False
>>> d.base is a # d doesn't share anything with a
False
>>> d[0,0] = 9999
>>> a
array([[ 0, 10, 10, 3],
[1234, 10, 10, 7],
[ 8, 10, 10, 11]])

Broadcasting
The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject
to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.

Broadcasting allows universal functions to deal in a meaningful way with inputs that do not have exactly the same shape.
The first rule of broadcasting is that if all input arrays do not have the same number of dimensions, a “1” will be repeatedly
prepended to the shapes of the smaller arrays until all the arrays have the same number of dimensions.
The second rule of broadcasting ensures that arrays with a size of 1 along a particular dimension act as if they had the size
of the array with the largest shape along that dimension. The value of the array element is assumed to be the same along
that dimension for the “broadcast” array.
After application of the broadcasting rules, the sizes of all arrays must match.
Two dimensions are compatible when

1. they are equal, or


2. one of them is 1

If these conditions are not met, a ValueError: frames are not aligned exception is thrown, indicating that the
arrays have incompatible shapes. The size of the resulting array is the maximum size along each dimension of the input
array

NumPy operations are usually done on pairs of arrays on an element-by-element basis. In the simplest case, the two arrays
must have exactly the same shape, as in the following example:

>>> a = np.array([1.0, 2.0, 3.0])


>>> b = np.array([2.0, 2.0, 2.0])
>>> a * b
array([ 2., 4., 6.])
NumPy’s broadcasting rule relaxes this constraint when the arrays’ shapes meet certain constraints. The simplest
broadcasting example occurs when an array and a scalar value are combined in an operation:

>>> a = np.array([1.0, 2.0, 3.0])


>>> b = 2.0
>>> a * b
array([ 2., 4., 6.])
The result is equivalent to the previous example where b was an array. We can think of the scalar b being stretched during
the arithmetic operation into an array with the same shape as a. The new elements in b are simply copies of the original
scalar. The stretching analogy is only conceptual.

x = np.arange(4);x
array([0, 1, 2, 3])

y = np.ones(5); y
array([ 1., 1., 1., 1., 1.])
ValueError: operands could not be broadcast together with shapes (4,) (5,)

xx = x.reshape(4,1); xx
array([[0],
[1],
[2],
[3]])

xx +y #now we can add 2 arrays because the one of the xx dimension is 1


array([[ 1., 1., 1., 1., 1.],
[ 2., 2., 2., 2., 2.],
[ 3., 3., 3., 3., 3.],
[ 4., 4., 4., 4., 4.]])

import numpy as np
start = np.zeros((4,3))
print(start)

[[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]]

# create a rank 1 ndarray with 3 values


add_rows = np.array([1, 0, 2])
print(add_rows)
>>[1 0 2]

y = start + add_rows # add to each row of 'start' using broadcasting


#add_row is added to each row of the start array, add_row array was broadcasted to fit
the shape of start array
print(y)

[[ 1. 0. 2.]
[ 1. 0. 2.]
[ 1. 0. 2.]
[ 1. 0. 2.]]

# create an ndarray which is 4 x 1 to broadcast across columns


add_cols = np.array([[0,1,2,3]])
add_cols = add_cols.T

print(add_cols)
[[0]
[1]
[2]
[3]]

# add to each column of 'start' using broadcasting


##add_cols is added to each column of the start array, add_cols array was broadcasted
to fit the shape of start array
y = start + add_cols
print(y)
[[ 0. 0. 0.]
[ 1. 1. 1.]
[ 2. 2. 2.]
[ 3. 3. 3.]]

# this will just broadcast in both dimensions


add_scalar = np.array([1])
#podhedyczna wartosc - 1 elemetowy wektor zostanie broadcastowany zeby dopasowac sie do
ilosci kolumn i wierszy
print(start+add_scalar)
[[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]]

# create our 3x4 matrix


arrA = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
print(arrA)

# create our 4x1 array


arrB = [0,1,0,2]
print(arrB)

# add the two together using broadcasting


print(arrA + arrB)

[[ 1 3 3 6]
[ 5 7 7 10]
[ 9 11 11 14]

#broadcasting nie zadziala jak array are not in the same dimensions or one of each is
not shape of 1
print(start)
print()
print(arrA)
start+arrA
#ValueError: operands could not be broadcast together with shapes (4,3) (3,4)
Read or Write to Disk:
Binary Format:
x = np.array([ 23.23, 24.24] )
np.save('an_array', x)#zapisuje array do cwd
np.load('an_array.npy')

array([ 23.23, 24.24])

Text Format:
np.savetxt('array.txt', X=x, delimiter=',')
np.loadtxt('array.txt', delimiter=',')
array([ 23.23, 24.24])

Fancy indexing and index tricks

NumPy offers more indexing facilities than regular Python sequences. In addition to indexing by integers and slices, as we
saw before, arrays can be indexed by arrays of integers and arrays of booleans.

>>> a = np.arange(12)**2 # the first 12 square numbers


>>> i = np.array( [ 1,1,3,8,5 ] ) # an array of indices
>>> a[i] # the elements of a at the positions i
array([ 1, 1, 9, 64, 25])
>>>
>>> j = np.array( [ [ 3, 4], [ 9, 7 ] ] ) # a bidimensional array of indices
>>> a[j] # the same shape as j
array([[ 9, 16],
[81, 49]])

When the indexed array a is multidimensional, a single array of indices refers to the first dimension of a. The following
example shows this behavior by converting an image of labels into a color image using a palette.

>>> palette = np.array( [ [0,0,0], # black


... [255,0,0], # red
... [0,255,0], # green
... [0,0,255], # blue
... [255,255,255] ] ) # white
>>> image = np.array( [ [ 0, 1, 2, 0 ], # each value corresponds to a color in the palette
... [ 0, 3, 4, 0 ] ] )
>>> palette[image] # the (2,4,3) color image
array([[[ 0, 0, 0],
[255, 0, 0],
[ 0, 255, 0],
[ 0, 0, 0]],
[[ 0, 0, 0],
[ 0, 0, 255],
[255, 255, 255],
[ 0, 0, 0]]])
You can also use indexing with arrays as a target to assign to:

>>> a = np.arange(5)
>>> a
array([0, 1, 2, 3, 4])
>>> a[[1,3,4]] = 0
>>> a
array([0, 0, 2, 0, 0])
However, when the list of indices contains repetitions, the assignment is done several times, leaving behind the last value:

>>> a = np.arange(5)
>>> a[[0,0,2]]=[1,2,3]
>>> a
array([2, 1, 3, 3, 4])
This is reasonable enough, but watch out if you want to use Python’s += construct, as it may not do what you expect:

>>> a = np.arange(5)
>>> a[[0,0,2]]+=1
>>> a
array([1, 1, 3, 3, 4])
Even though 0 occurs twice in the list of indices, the 0th element is only incremented once. This is because Python requires
“a+=1” to be equivalent to “a = a + 1”.

Indexing with Boolean Arrays¶


When we index arrays with arrays of (integer) indices we are providing the list of indices to pick. With boolean indices the
approach is different; we explicitly choose which items in the array we want and which ones we don’t.
The most natural way one can think of for boolean indexing is to use boolean arrays that have the same shape as the
original array:

>>> a = np.arange(12).reshape(3,4)
>>> b = a > 4
>>> b # b is a boolean with a's shape
array([[False, False, False, False],
[False, True, True, True],
[ True, True, True, True]])
>>> a[b] # 1d array with the selected elements
array([ 5, 6, 7, 8, 9, 10, 11])
This property can be very useful in assignments:

>>> a[b] = 0 # All elements of 'a' higher than 4 become 0


>>> a
array([[0, 1, 2, 3],
[4, 0, 0, 0],
[0, 0, 0, 0]])

The second way of indexing with booleans is more similar to integer indexing; for each dimension of the array we give a
1D boolean array selecting the slices we want:

>>> a = np.arange(12).reshape(3,4)
>>> b1 = np.array([False,True,True]) # first dim selection
>>> b2 = np.array([True,False,True,False]) # second dim selection
>>>
>>> a[b1,:] # selecting rows
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>>
>>> a[b1] # same thing
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>>
>>> a[:,b2] # selecting columns
array([[ 0, 2],
[ 4, 6],
[ 8, 10]])
>>>
>>> a[b1,b2] # a weird thing to do
array([ 4, 10])
Note that the length of the 1D boolean array must coincide with the length of the dimension (or axis) you want to slice. In
the previous example, b1 has length 3 (the number of rows in a), and b2 (of length 4) is suitable to index the 2nd axis
(columns) of a.

The ix_() function ???

The ix_ function can be used to combine different vectors so as to obtain the result for each n-uplet. For example, if you
want to compute all the a+b*c for all the triplets taken from each of the vectors a, b and c:
This function takes N 1-D sequences and returns N outputs with N dimensions each, such that the shape is 1 in all but one
dimension and the dimension with the non-unit shape value cycles through all N dimensions.
Using ix_ one can quickly construct index arrays that will index the cross product. a[np.ix_([1,3],[2,5])] returns
the array [[a[1,2] a[1,5]], [a[3,2] a[3,5]]].

Simple Array Operations


See linalg.py in numpy folder for more.

>>> import numpy as np


>>> a = np.array([[1.0, 2.0], [3.0, 4.0]])
>>> print(a)
[[ 1. 2.]
[ 3. 4.]]

>>> a.transpose()
array([[ 1., 3.],
[ 2., 4.]])

>>> np.linalg.inv(a)
array([[-2. , 1. ],
[ 1.5, -0.5]])

>>> u = np.eye(2) # unit 2x2 matrix; "eye" represents "I"


>>> u
array([[ 1., 0.],
[ 0., 1.]])
>>> j = np.array([[0.0, -1.0], [1.0, 0.0]])

>>> j @ j # matrix product


array([[-1., 0.],
[ 0., -1.]])
>>> np.trace(u) # trace
2.0

>>> y = np.array([[5.], [7.]])


>>> np.linalg.solve(a, y)
array([[-3.],
[ 4.]])

>>> np.linalg.eig(j)
(array([ 0.+1.j, 0.-1.j]), array([[ 0.70710678+0.j , 0.70710678-0.j ],
[ 0.00000000-0.70710678j, 0.00000000+0.70710678j]]))
Parameters:
square matrix
Returns
The eigenvalues, each repeated according to its multiplicity.
The normalized (unit "length") eigenvectors, such that the
column ``v[:,i]`` is the eigenvector corresponding to the
eigenvalue ``w[i]`` .

Histograms
The NumPy histogram function applied to an array returns a pair of vectors: the histogram of the array and the vector of
bins. Beware: matplotlib also has a function to build histograms (called hist, as in Matlab) that differs from the one in
NumPy. The main difference is that pylab.hist plots the histogram automatically, while numpy.histogram only
generates the data.

>>> import numpy as np


>>> import matplotlib.pyplot as plt
>>> # Build a vector of 10000 normal deviates with variance 0.5^2 and mean 2
>>> mu, sigma = 2, 0.5
>>> v = np.random.normal(mu,sigma,10000)
>>> # Plot a normalized histogram with 50 bins
>>> plt.hist(v, bins=50, density=1) # matplotlib version (plot)
>>> plt.show()

>>> # Compute the histogram with numpy and then plot it


>>> (n, bins) = np.histogram(v, bins=50, density=True) # NumPy version (no plot)
>>> plt.plot(.5*(bins[1:]+bins[:-1]), n)
>>> plt.show()

Pandas

Dimension & Description


The best way to think of these data structures is that the higher dimensional data
structure is a container of its lower dimensional data structure. For example,
DataFrame is a container of Series, Panel is a container of DataFrame.

Data Dimensions Description


Structure

Series 1 1D labeled homogeneous array, sizeimmutable.

Data Frames 2 General 2D labeled, size-mutable tabular structure


with potentially heterogeneously typed columns.

Panel 3 General 3D labeled, size-mutable array.

pandas has two main data structures it uses, namely, Series and DataFrames.
pandas Series
pandas Series one-dimensional labeled array. Series can be created using folloiwng constructor:
pandas.Series( data, index, dtype, copy)
data - data takes various forms like ndarray, list, constants
index - Index values must be unique and hashable, same length as data. Default np.arrange(n) if no index is passed.
dtype - dtype is for data type. If None, data type will be inferred
copy - Copy data. Default False
Pandas Series

import pandas as pd
import numpy as np
#create an empty series
s = pd.Series()
s
>>Series([], dtype: float64)

#create series from ndarray


data = np.array(["a", "b", "c", "d"])#we do not pass any index, the default from 0 to
len(data)-1 would be used
s = pd.Series(data)
s
0 a
1 b
2 c
3 d
dtype: object

#assigning index for Series


data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[100,101,102,103])
s
100 a
101 b
102 c
103 d
dtype: object

#create series from dictionary


data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)#as data is dictionary the index would be used from dictionary
s

a 0.0
b 1.0
c 2.0
dtype: float64

#when creating data from dictionary we can change the index order and/or provide
additional index not exiting in dictionary
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data,index=['b','c','d','a'])
s

b 1.0
c 2.0
d NaN
a 0.0
dtype: float64

#creating series from scalar


s = pd.Series(5, index=[0, 1, 2, 3])#the same value = 5 would be used for all indicies
s

0 5
1 5
2 5
3 5
dtype: int64

#each data in series can get the lable for it


ser = pd.Series(data = [100, 200, 300, 400, 500], index =['tom', 'bob', 'nancy', 'dan',
'eric'])
ser

tom 100
bob foo
nancy 300
dan bar
eric 500
dtype: object

#we can provide only data for the series without label for each value, than the series
starting from 0 would be assigned as label
ser2 = pd.Series(data = [100, 'foo', 300, 'bar', 500])
ser2

0 100
1 foo
2 300
3 bar
4 500
dtype: object

#we can select the indecies assigned to data in our series


ser.index
Index(['tom', 'bob', 'nancy', 'dan', 'eric'], dtype='object')

#accessing the data from Series with position


s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
s[0] #selecting the first value in Series
>>1

#slicing the Series


print(s[:3]) # the first 3 elements
print()
print(s[-3:]) # last 3 elements

a 1
b 2
c 3
dtype: int64

c 3
d 4
e 5
dtype: int64

#retrieving the data using label (index)


print(s["a"])
print()
print(s[["a","c","d"]]) #retrieving multiple elments - podwojne nawiasy []

a 1
c 3
d 4
dtype: int64

#we can select the index and value assigned to it by providing index names in the .loc
method
ser.loc[['nancy','bob']] # getting the value from the location "nancy" and "bob

nancy 300
bob 200
dtype: int64

#the same we can achvieve when providing the elements indecies of the array for which
we want to select index label and value
ser[[4, 3, 1]]
eric 500
dan 400
bob 200
dtype: int64

#selecting the value assinged instead of index label + value by using .iloc method
ser.iloc[2]
>>300

#we can check if both the index label is present in the series
b = 'bob' in ser
print(b)
d = 100 in ser # the 100 is a value not the index label so False is returned
print(d)

True
False

#we can multiply, divide add and substract the values in the series
ser * 2
ser**2
ser + ser
ser - ser
ser / ser

#in one series object with different data types can be used but than the dtype will be
object
ser2 = pd.Series(data = [100, 'pawel', 300, 2.0, 'T'], index =['tom', 'bob', 3.5,
'dan', 100])
ser2
tom 100
bob pawel
3.5 300
dan 2
100 T
dtype: object

#chaning the values of the Series


s['a'] =44
s

a 44
b 2
c 3
d 4
e 5
dtype: int64

#assigning the character to one of the values would change entire stype for Series as
the element must have the same dtype
s["b"]="pawel"
s

a 44
b pawel
c 3
d 4
e 5
dtype: object

Pandas DataFrame
DataFrame can be created using folloiwng constructor pandas.DataFrame( data, index, columns, dtype, copy)
data - forms like ndarray, series, map, lists, dict, constants and also another DataFrame.
index - Optional Default np.arrange(n) if no index is passed.
columns - the optional default syntax is - np.arrange(n). This is only true if no index is passed.
dtype - data type foe each column

inputs - data for creating DataFrame - lists, dict, Series, Numpy ndarrays, another DataFrame

Creating Data Frame


#creating empty DataFrame
import pandas as pd
df = pd.DataFrame()
print(df)

Empty DataFrame
Columns: []
Index: []

data = [['Alex',10],['Bob',12],['Clarke',13]]
#wskazanie nazw kolumn przy tworzeniu DF
df = pd.DataFrame(data,columns=['Name','Age'])
df

Name Age

0 Alex 10

1 Bob 12

2 Clarke 13

data = [['Alex',10],['Bob',12],['Clarke',13]]
#stworzenie DF z podaniem data type jaki chcemy uzyc dla kolumn
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
df

Name Age

0 Alex 10.0

1 Bob 12.0

2 Clarke 13.0

#klucze z dictionary stana sie nazwami kolumn


data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
df

Age Name

0 28 Tom

1 34 Jack
Age Name

2 29 Steve

3 42 Ricky

#klucze z dictionary stana sie nazwami kolumn,


data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
#definiowanie index przy tworzeniu data fr
df = pd.DataFrame(data, index=['rank1', 'rank2', 'rank3', 'rank4'])
df

Age Name

rank1 28 Tom

rank2 34 Jack

rank3 29 Steve

rank4 42 Ricky

#defining dictionary with index


d = {'one' : pd.Series([100., 200., 300.], index=['apple', 'ball', 'clock']),
'two' : pd.Series([111., 222., 333., 4444.], index=['apple', 'ball', 'cerill',
'dancy'])}
d

{'one': apple 100.0


ball 200.0
clock 300.0
dtype: float64, 'two': apple 111.0
ball 222.0
cerill 333.0
dancy 4444.0
dtype: float64}

#creating data frame - panda.DataFrame


df = pd.DataFrame(d)
#2 columns for data are generated, the series with the same index label are merged
together
df

one two

apple 100.0 111.0

ball 200.0 222.0

cerill NaN 333.0

clock 300.0 NaN

dancy NaN 4444.0


#creating data frame from list of dictionaries,
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data, index=['FirstDict', 'SecondDict'])
#each dictionary given as a list would became single row in data frame
print(df)
print()
#when passing the columns name we can limit of the output in dataframe
df2 = pd.DataFrame(data, index=['FirstDict', 'SecondDict'], columns=['a', 'b'])
#zapisyjemy tylko kolumny dla kluczy a i b z dict
print(df2)
print()
df2 = pd.DataFrame(data, index=['FirstDict', 'SecondDict'], columns=['kol1', 'kol2'])
#podajac inne nazwy kolumn mamy wartosci NaN bo nie ma takich kluczy w dict
print(df2)

a b c
FirstDict 1 2 NaN
SecondDict 5 10 20.0

a b
FirstDict 1 2
SecondDict 5 10

kol1 kol2
FirstDict NaN NaN
SecondDict NaN NaN

#creating data Frame from Series


d =pd.Series([1, 2, 3], index=['a', 'b', 'c'])
df = pd.DataFrame(d, columns=['kolumna1'])#tutaj mozmey podac wartosc opisowa dla
kolumny
df

kolumna1

a 1

b 2

c 3

data = [{'alex': 1, 'joe': 2}, {'ema': 5, 'dora': 10, 'alice': 20}]


#there is no index for rows as they were not provided by us
pd.DataFrame(data)

alex alice dora ema joe

0 1.0 NaN NaN NaN 2.0

1 NaN 20.0 10.0 5.0 NaN

#we can add the lablels for the row index by providing the values in the index=[]
#the same command was used for slicing !??
pd.DataFrame(data, index=['orange', 'red'])
alex alice dora ema joe

orange 1.0 NaN NaN NaN 2.0

red NaN 20.0 10.0 5.0 NaN

#we can slice the data using the column names


pd.DataFrame(data, columns=['joe', 'dora','alice'])

joe dora alice

0 2.0 NaN NaN

1 NaN 10.0 20.0

Slicing Data Frame

#selecting the indexes out of data frame


#defining dictionary with index
d = {'one' : pd.Series([100., 200., 300.], index=['apple', 'ball', 'clock']),
'two' : pd.Series([111., 222., 333., 4444.], index=['apple', 'ball', 'cerill',
'dancy'])}
df=pd.DataFrame(d)

df.index
>>Index(['apple', 'ball', 'cerill', 'clock', 'dancy'], dtype='object')

#selecting the columns from data frame


df.columns
>>Index(['one', 'two'], dtype='object')

#slicing the data frame and selecting only 3 given row labels
pd.DataFrame(d, index=['dancy', 'ball', 'apple'])

one two

dancy NaN 4444.0

ball 200.0 222.0

apple 100.0 111.0

#slicing the data frame and selecting only 3 rows with given labels and the specified
colum names
#if the label provided by us doesnot exist it would be filled out with NaN values
pd.DataFrame(d, index=['dancy', 'ball', 'apple'], columns=['two', 'five'])

two five

dancy 4444.0 NaN

ball 222.0 NaN


two five

apple 111.0 NaN

Example:

df1

one two three four five

a 1.0 1 10.0 Tom 1

b 2.0 2 20.0 Jack 2

c 3.0 3 30.0 Steve 3

d NaN 4 NaN Ricky 4

#slicing single column and possibly multiple rows


df1['two'][1:4]
b 2
c 3
d 4
Name: two, dtype: int64

#slicing rows only


df1[1:3]

one two three four five

b 2.0 2 20.0 Jack 2

c 3.0 3 30.0 Steve 3

#wybieranie perwszych 3 kolumn z dataframe uzywajac columns[]


df1[df1.columns[:3]]

one two three

a 1.0 1 10.0

b 2.0 2 20.0

c 3.0 3 30.0

d NaN 4 NaN

#wybranie tych samych kolumn uzywajac funkcji loc()


df1.loc[:, 'one': 'three']
one two three

a 1.0 1 10.0

b 2.0 2 20.0

c 3.0 3 30.0

d NaN 4 NaN

#funckja loc() jako pierwszy argument przyjmuje wartosci indeksow wierszy


df1.loc['a':'c', 'one': 'three']

one two three

a 1.0 1 10.0

b 2.0 2 20.0

c 3.0 3 30.0

d NaN 4 NaN

#wybieranie kolumn nie po kolei


columns=['one', 'three']
df2 = pd.DataFrame(df1, columns = columns)
df2

one three

a 1.0 10.0

b 2.0 20.0

c 3.0 30.0

d NaN NaN

#wybieranie tylko niektorych kolumn, kolumnny monza podac wprost przy tworzeniu nowegp
DataFrame
df3 = pd.DataFrame(df1, columns = ['one', 'three', 'five'])
df3

one three five

a 1.0 10.0 1

b 2.0 20.0 2
one three five

c 3.0 30.0 3

d NaN NaN 4

#mozemy tez jednoczesnie wybierac kolumny jak i wierzse(index)


df3 = pd.DataFrame(df1, columns = ['one', 'three', 'five'], index=['a', 'c', 'd'])
df3

one three five

a 1.0 10.0 1

c 3.0 30.0 3

d NaN NaN 4

Adding, removing columns and rows

#selecting single column with index lablel - 'one'


df['one']
apple 100.0
ball 200.0
cerill NaN
clock 300.0
dancy NaN
Name: one, dtype: float64

#adding new column from Series


d1 = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df1 = pd.DataFrame(d1)
print(df1)
print()

# Adding a new column to an existing DataFrame object with column label by passing new
series
print ("Adding a new column by passing as Series:")
df1['three']=pd.Series([10,20,30],index=['a','b','c'])
print(df1)
print()

print ("Adding a new column using the existing columns in DataFrame:")


df1['four']=df1['one']+df1['three']
print(df1)
print()

#mozemy podac tez wartosci jakie maja byc po kolei wpisane do nowej kolumny
df1['four']=['Tom', 'Jack', 'Steve', 'Ricky']
print(df1)
print()

df1['five'] = [1,2,3,4]
print(df1)
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4

Adding a new column by passing as Series:


one two three
a 1.0 1 10.0
b 2.0 2 20.0
c 3.0 3 30.0
d NaN 4 NaN

Adding a new column using the existing columns in DataFrame:


one two three four
a 1.0 1 10.0 11.0
b 2.0 2 20.0 22.0
c 3.0 3 30.0 33.0
d NaN 4 NaN NaN

one two three four


a 1.0 1 10.0 Tom
b 2.0 2 20.0 Jack
c 3.0 3 30.0 Steve
d NaN 4 NaN Ricky

one two three four five


a 1.0 1 10.0 Tom 1
b 2.0 2 20.0 Jack 2
c 3.0 3 30.0 Steve 3
d NaN 4 NaN Ricky 4

#to add new column to a data frame we can specify non existing column name in df and
assign the values
df['three'] = df['one'] * df['two']
df

one two three

apple 100.0 111.0 11100.0

ball 200.0 222.0 44400.0

cerill NaN 333.0 NaN

clock 300.0 NaN NaN

dancy NaN 4444.0 NaN

#new column could have the values based on the filterning creiteria as well
df['flag'] = df['one'] > 250
df

one two three flag

apple 100.0 111.0 11100.0 False


one two three flag

ball 200.0 222.0 44400.0 False

cerill NaN 333.0 NaN False

clock 300.0 NaN NaN True

dancy NaN 4444.0 NaN False

#adding new column with values specified by hand with using pd.Series()
df['new'] = pd.Series([444,444,444,444,444], index =['apple', 'ball', 'cerill',
'clock', 'dancy'])
df

one two three flag new

apple 100.0 111.0 11100.0 False 444

ball 200.0 222.0 44400.0 False 444

cerill NaN 333.0 NaN False 444

clock 300.0 NaN NaN True 444

dancy NaN 4444.0 NaN False 444

#dodanie nowej kolumny uzywajac methody .assign() - wazne, ze nowa nazwa kolumny podana
bez ciapek
df = df.assign(new2 = pd.Series([555,555,555,555,555], index = ['apple', 'ball',
'cerill', 'clock', 'dancy']))
df

one two three flag new new2

apple 100.0 111.0 11100.0 False 444 555

ball 200.0 222.0 44400.0 False 444 555

cerill NaN 333.0 NaN False 444 555

clock 300.0 NaN NaN True 444 555

dancy NaN 4444.0 NaN False 444 555

#jak pdoamy wartosci bez wartosci dla index to wstawia sie wartosci NaN bo panda nie
wie jaka wartosc do czeko przypisac
#pomimo, ze sa takie same
df = df.assign(new3 = pd.Series([555,555,555,555,555]))
df
one two three flag new new2 new3

apple 100.0 111.0 11100.0 False 444 555 NaN

ball 200.0 222.0 44400.0 False 444 555 NaN

cerill NaN 333.0 NaN False 444 555 NaN

clock 300.0 NaN NaN True 444 555 NaN

dancy NaN 4444.0 NaN False 444 555 NaN

import pandas as pd
import numpy as np
row1 = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
row2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
row1=row1.append(row2)
print(row1)

a b
0 1 2
1 3 4
0 5 6
1 7 8

import pandas as pd
import numpy as np
row1 = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
row2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
row1=row1.append(row2)
print(row1)

a b
0 1 2
1 3 4
0 5 6
1 7 8

#pop() return the column values based on the index name but at the same time it removes
th column from data frame
a = row1.pop('a')
a

0 1
1 3
0 5
1 7
Name: a, dtype: int64

#after using .pop() method the column is no longer available in the data frame
print(row1)
print()
#but the values are still kept in th variable that we used for assigning the pop()
function
print(a)
b
0 2
1 4
0 6
1 8

row1 = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])


row2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
row1=row1.append(row2)
#we can delete the colum from data frame without exporting it as in .pop
del row1['a']
print(row1)

b
0 2
1 4
0 6
1 8

#inserting the column - insert()


#.insert(numer kolumny gdzie dodac, index - nazwa nowej kolumny, wartosci do dodania)
df.insert(2, 'copy_of_one', df['one'])
df

one two copy_of_one flag new new2

apple 100.0 111.0 100.0 False 444 555

ball 200.0 222.0 200.0 False 444 555

cerill NaN 333.0 NaN False 444 555

clock 300.0 NaN 300.0 True 444 555

dancy NaN 4444.0 NaN False 444 555

df.insert(2, 'same5', [5,5,5,5,5])


df

one two same5 copy_of_one flag new new2

apple 100.0 111.0 5 100.0 False 444 555

ball 200.0 222.0 5 200.0 False 444 555

cerill NaN 333.0 5 NaN False 444 555

clock 300.0 NaN 5 300.0 True 444 555

dancy NaN 4444.0 5 NaN False 444 555

#insert mozna tez zrobic dodajac obiekt Series i wskazujac jaki ma miec typ
#nalezy pamietac o podaniu parametru index=[] bo ianczej wypisze nam wartosci NaN, nie
bedzie wiedzial co gdzie przypisac
df.insert(2, 'PF', pd.Series([1,0,1,0,1], index = ['apple', 'ball', 'cerill', 'clock',
'dancy'], dtype=bool))
df

one two PF same5 copy_of_one flag new new2

apple 100.0 111.0 True 5 100.0 False 444 555

ball 200.0 222.0 False 5 200.0 False 444 555

cerill NaN 333.0 True 5 NaN False 444 555

clock 300.0 NaN False 5 300.0 True 444 555

dancy NaN 4444.0 True 5 NaN False 444 555

#nowa kolumna moze byc tez utworzona jaki slicing juz isntniejacej
df['one_upper_half'] = df['one'][:2]
df

one two PF same5 copy_of_one flag new new2 one_upper_half

apple 100.0 111.0 True 5 100.0 False 444 555 100.0

ball 200.0 222.0 False 5 200.0 False 444 555 200.0

cerill NaN 333.0 True 5 NaN False 444 555 NaN

clock 300.0 NaN False 5 300.0 True 444 555 NaN

dancy NaN 4444.0 True 5 NaN False 444 555 NaN

Renaming kolumns and rows

import numpy as np
import pandas as pd
random = np.random.randn(6,4)
random

array([[ 0.2870495 , 1.26500913, -0.38320656, -2.2037571 ],


[-1.0371159 , 0.49083367, -0.26930021, -0.14959956],
[ 1.92334072, 1.0473691 , -0.73543742, 1.58782826],
[-0.19902438, 0.81831941, 0.36681202, -0.21781424],
[-0.66926265, -1.25309374, -0.01531053, 0.84593687],
[-0.63203033, 1.91147004, 0.41147878, 0.57594596]])

#create the data frame with column names


df=pd.DataFrame(random, columns=list('ABCD'))
df

A B C D
0 0.287050 1.265009 -0.383207 -2.203757
1 -1.037116 0.490834 -0.269300 -0.149600
2 1.923341 1.047369 -0.735437 1.587828
3 -0.199024 0.818319 0.366812 -0.217814
4 -0.669263 -1.253094 -0.015311 0.845937
5 -0.632030 1.911470 0.411479 0.575946

#getting the current column names


df.columns
Index(['A', 'B', 'C', 'D'], dtype='object')

#renaming the columns with function rename()


#we are providing dictionary with old kolum name: new kolumn name
df.rename(columns={'A':'kol1', 'B':'kol2', 'C':'kol3', 'D':'kol4'}, inplace=True)
print(df)

kol1 kol2 kol3 kol4


0 0.287050 1.265009 -0.383207 -2.203757
1 -1.037116 0.490834 -0.269300 -0.149600
2 1.923341 1.047369 -0.735437 1.587828
3 -0.199024 0.818319 0.366812 -0.217814
4 -0.669263 -1.253094 -0.015311 0.845937
5 -0.632030 1.911470 0.411479 0.575946

#the rename() can take the function instead of dictionary when renaming the columns
#lambda x :x[0:3] wybiera pierwsze 3 znaki z kazdej nazwy kolumny
df.rename(columns = lambda x :x[0:3]) #brak inplace=True sprawi. ze pojaiw sie widok z
nowa nazwa kolumn...

#...ale orgynalny DataFrame nie zostanie zmieniony


print(df)
0 kol2 kol3 3
0 0.287050 1.265009 -0.383207 -2.203757
1 -1.037116 0.490834 -0.269300 -0.149600
2 1.923341 1.047369 -0.735437 1.587828
3 -0.199024 0.818319 0.366812 -0.217814
4 -0.669263 -1.253094 -0.015311 0.845937
5 -0.632030 1.911470 0.411479 0.575946

#mozemy tez zmienic nazwe tylko czesci kolumn


df.rename(columns={'kol1':0, 'kol4':3}, inplace=True)
print(df)

0 kol2 kol3 3
0 0.287050 1.265009 -0.383207 -2.203757
1 -1.037116 0.490834 -0.269300 -0.149600
2 1.923341 1.047369 -0.735437 1.587828
3 -0.199024 0.818319 0.366812 -0.217814
4 -0.669263 -1.253094 -0.015311 0.845937
5 -0.632030 1.911470 0.411479 0.575946

#chaning the index names using rename()


df.rename(index={0:'zero', 1:'one'}, inplace=True)
print(df)
0 kol2 kol3 3
zero 0.287050 1.265009 -0.383207 -2.203757
one -1.037116 0.490834 -0.269300 -0.149600
2 1.923341 1.047369 -0.735437 1.587828
3 -0.199024 0.818319 0.366812 -0.217814
4 -0.669263 -1.253094 -0.015311 0.845937
5 -0.632030 1.911470 0.411479 0.575946

#the colum names and the index labels could be changed simultaneously
df.rename(index={2:'two', 3:'three'},
columns={'kol2':1, 'kol3':2},
inplace=True)
print(df)
0 1 2 3
zero 0.287050 1.265009 -0.383207 -2.203757
one -1.037116 0.490834 -0.269300 -0.149600
two 1.923341 1.047369 -0.735437 1.587828
three -0.199024 0.818319 0.366812 -0.217814
4 -0.669263 -1.253094 -0.015311 0.845937
5 -0.632030 1.911470 0.411479 0.575946

You might also like