Lec 7
Lec 7
Lec 7
Hi everyone welcome to the second tutorial of this of the introduction to machine learning
course, so in this tutorial we shall be taking a tour of the aspects of linear algebra which you
would need for the course we will cover a variety of concepts and subspaces basis span and
become a decompositions eigenvalues, eigenvectors over the course of the tutorial.
So the first question one would ask is, why we need linear algebra at all and what is linear
algebra? So you may have come across this in school or your +1 or +2 level but just to recap I
mean linear algebra is the branch of mathematics which deals with vectors and make the big
vector spaces and linear mappings between these spaces so, why do we study linear algebra here
and especially the context of machine learning is firstly it gives us a way to manipulate represent
and operate sets of linear equations but why do these linear equations pop up in machine learning
in the first place, so the reason for that is not machine learning we are represent our data as an n
x p matrix open where n is the number of data points and P is the number of features so it is
natural we have to use motions and formalisms developed in linear algebra.
What is the data even I mean the parameter is you use are represented as vectors so as a result
linear algebra has an important role to play in machine learning.
So you can see where a system of linear equations two equations and two variables 4x1 – 5x2 =
-13 and -2x1+ 3x2 = 9, so we can right away see here that advantages of a matrix notation, so if
you see below you can represent the same system of two equations directly as one equation in
the form of ax = b where a is the set of coefficients and X is the 2 x 1 matrix two rows and one
column or you may also call it a 2-dimensional vector X 1 X 2.
So we can see that when you multiply the matrix A with X 1 X 2 the 2 x 1 matrix you will get
back the same L.H.S or the set of two L.H.S and P the matrix B represents the RHS so we can it
is very easy to verify that if you represent this in matrices you can get you will get back the
original representation and from that representation you can get this so to solving this in general
without matrices how we would solve this is you will first solve for one variable and then
substitute to get the other but this using matrices you can even solve this directly, so just multiply
both the sides by a inverse.
So you would get X and A-1 B of course when you want you would have to care about is that all
matrices do not have an inverse but if but in most cases they do, so in that case you can directly
get the solution of X in the form of x = A -1 B were solving it just to one equation so as we said
earlier I mean linear algebra gives us this freedom to manipulate several equations at once and
multiple variables.
A fundamental definition in linear algebra is that of a vector space a set of vectors V is said to be
a vector space if it is closed under the operations of vector addition and scalar multiplication and
in addition satisfies the axioms we have distributed, like knows to be mean that if we take two
elements from this set X & Y then X + Y will also lies in the sector V, in addition to this if we
take her in scalar alpha a real number that is and multiply a vector from this set by with it then αy
also belongs to v, in both these properties are satisfied then the set of vectors is said to be closed
with respect to vector addition and scalar multiplication, now let us have a look at the axioms.
The first one is a commutative law, the commutative law states that if you pick any two elements
from the set v, x and y then x +y = y + x, the associative law says that if you pick any point from
this set X Z then X + Y added together + Z = X +Y + Z added together.
The additive identity loss is that there exists an additive identity or the 0 so as to say in the set
since that if you pick any element from the set and add this 0 to it you get back the same element,
the additive inverse losses is that there exists for every element there exists another for every
element X there exists a corresponding xx such then X + xx = 0.
The fifth law is the distributive law, this law says that, if you have a real scalar α with which you
multiply the sum of two vectors X + Y then that should be equal to α x + α y the second
distributive law says that if you have the sum of two scalars multiplying a vector X from the
same α + ß ( a. x) then that should be equal to α times X + ß times X, if associative law says
that if you will first multiply two scalars and then multiplying the vector with them that should
be equal to multiplying it first with the second scalar ß and then with α. The unitary law says
that on multiplication by the scalar real number one you get back the same vector so this is
important because you would want multiplication to force any unexpected scaling like if you
multiplied by the scalar K is a vector you can surely be exactly K times it should not be say rote
kit.
And secondly that if I pick any two elements x and y from this set and any real number α then x
+ α y should belong to W.
Thirdly for every for every vector if you multiplied by a scalar its norm should get multiplied by
the body modulus of the scalar by modulus here we mean the absolute value and both being that
if we take any pair of vectors in our vector space which is R n the norm of the sum of these two
vectors should be less than some of their norms which this is also known as straggle inequality.
So this is namely related to the fact that the third side of attractor should always be less than
some of the other two sides.
Now an example of a norm is the LP norm then you sum up the absolute values along each
dimension raised to P and then take the 1/P as root of this, so when P = 2 you get the A2 norm
which is the magnitude of a vector as we have learnt in our earlier studies √ x2 + y2 if you are
looking at just the space R2 there are other norms for instance there are norms defined for
matrices as well here we had totally defined all for effect for vectors so the Frobenius norm is a
matrix norm so what it does is it essentially sums of the squares of all the elements and then
takes the root of that, so this also happens to be equal to the trace of A -1 A now the trace of a
matrix is simply the some of its diagonal elements.
(Refer Slide Time: 11:12)
This span of a set of vectors is the set of all vectors which can be composed using these vectors
by using the operations of vector addition and scalar multiplication, so the name span comes
from the fact that this set of vectors spans a potentially larger set of vectors which is then called a
span. So to define this more formally it is the set of all vectors V such that V = sum I = 1 to I = |
x| α I xi where α I is a real number.
Now a related definition is that of range or column space, so if we think of a matrix each of its
columns is a vector so the set of all columns of a matrix is like a set of vectors now the span of
this set of vectors is called the range or column space of that matrix, so if you consider the matrix
given here the columns of this matrix A are 1, 5, 2 and 044, so what would be this column space
of this matrix, it would be this the span of the vectors 1, 5. 2 and 044 which is essentially the
plane which is spanned by these two vectors.
(Refer Slide Time: 12:56)
If we have matrix A of dimensions m x m then the null space is the set of n x 1 vectors which
given M x 1 0 vector on B x A, in other words it is the set of vectors X such that ax = 0 nullity is
the rank or dimensionality of the null space we will be within the definition of nullity later once
you defined rank more clearly, another interesting fact about null spaces is that the null space of
A and not B is of dimension n while the range of A or the column space as we defined it earlier is
of dimension M or M x 1.
This means that vectors in R (A -1) so note that A-1is n X M So vector in the range of A -1will be of
dimension n similar to the vectors in the null space of A, so this means that the vectors in range
of A-1and null space of A would both be of the dimension A x 1.
The row dimension x column dimension secondly the rank of a matrix is the same as the rank of
its transpose, thirdly if you multiply two matrices A and B the rank of the resultant matrix is
utmost the minimum of the ranks of A and B if you add up two matrices the rank of the resultant
matrix is utmost the sum of the ranks of A and B.
In other words all the column vectors should be normalized, an interesting implication of a
matrix being orthogonal is that UU-1 and U-1 U both end up being equal to the n x n identity
matrix I this also means that u U-T = U-1 equals the transpose of such an orthogonal matrix is also
means equals and additional interesting property is seen when we multiply a M x 1 vector X by n
x n orthogonal matrix Q the Euclidean or L20 of such a vector X remains the same on
multiplication by U.
It actively we can understand this as orthogonal matrices u performing only pure rotation on
multiplying the vector X in other words they only change the direction of a vector but do not
change its magnitude.
A matrix A is said to be positive definite you its quadratic form is greater than zero for any vector
X similarly we can define it to be negative definite, a matrix A is positive semi definite if the
quadratic form is greater than equal to 0 for any vector X note that equality will 0 also make
called it but the important property of positive and negative definite matrices is that they are
always full rank.
An implication of this is that A -1 always exists, for a matrix A which is of dimension m x m one
can define a special matrix called a gram matrix, the gram matrix is given by A -TA the semi
property of the gram matrix is that it is always positive semi definite, moreover if the number of
rows exceeds the number of columns in other words if m > = m this means this few lines that G
is positive definite.
Funded by
Department of Higher Education
Ministry of Human Resource Development
Government of India
www.nptel.ac.in
Copyrights Reserved