Lec 7

NPTEL
NPTEL ONLINE CERTIFICATION COURSE
Linear Algebra Tutorial

CS5011 – Machine Learning
Abhinav Garlapati Varun Gangal
Department of Computer Science

IIT Madras
January 17, 2016
Hi everyone welcome to the second tutorial of this of the introduction to machine learning
course, so in this tutorial we shall be taking a tour of the aspects of linear algebra which you
would need for the course we will cover a variety of concepts and subspaces basis span and
become a decompositions eigenvalues, eigenvectors over the course of the tutorial.
(Refer Slide Time: 00:47)
So the first question one would ask is, why we need linear algebra at all and what is linear
algebra? So you may have come across this in school or your +1 or +2 level but just to recap I
mean linear algebra is the branch of mathematics which deals with vectors and make the big
vector spaces and linear mappings between these spaces so, why do we study linear algebra here
and especially the context of machine learning is firstly it gives us a way to manipulate represent
and operate sets of linear equations but why do these linear equations pop up in machine learning
in the first place, so the reason for that is not machine learning we are represent our data as an n
x p matrix open where n is the number of data points and P is the number of features so it is
natural we have to use motions and formalisms developed in linear algebra.
What is the data even I mean the parameter is you use are represented as vectors so as a result
linear algebra has an important role to play in machine learning.
So you can see where a system of linear equations two equations and two variables 4x1 – 5x2 =
-13 and -2x1+ 3x2 = 9, so we can right away see here that advantages of a matrix notation, so if
you see below you can represent the same system of two equations directly as one equation in
the form of ax = b where a is the set of coefficients and X is the 2 x 1 matrix two rows and one
column or you may also call it a 2-dimensional vector X 1 X 2.
So we can see that when you multiply the matrix A with X 1 X 2 the 2 x 1 matrix you will get
back the same L.H.S or the set of two L.H.S and P the matrix B represents the RHS so we can it
is very easy to verify that if you represent this in matrices you can get you will get back the
original representation and from that representation you can get this so to solving this in general
without matrices how we would solve this is you will first solve for one variable and then
substitute to get the other but this using matrices you can even solve this directly, so just multiply
both the sides by a inverse.
So you would get X and A-1 B of course when you want you would have to care about is that all
matrices do not have an inverse but if but in most cases they do, so in that case you can directly
get the solution of X in the form of x = A -1 B were solving it just to one equation so as we said
earlier I mean linear algebra gives us this freedom to manipulate several equations at once and
multiple variables.
A fundamental definition in linear algebra is that of a vector space a set of vectors V is said to be
a vector space if it is closed under the operations of vector addition and scalar multiplication and
in addition satisfies the axioms we have distributed, like knows to be mean that if we take two
elements from this set X & Y then X + Y will also lies in the sector V, in addition to this if we
take her in scalar alpha a real number that is and multiply a vector from this set by with it then αy
also belongs to v, in both these properties are satisfied then the set of vectors is said to be closed
with respect to vector addition and scalar multiplication, now let us have a look at the axioms.
The first one is a commutative law, the commutative law states that if you pick any two elements
from the set v, x and y then x +y = y + x, the associative law says that if you pick any point from
this set X Z then X + Y added together + Z = X +Y + Z added together.
The additive identity loss is that there exists an additive identity or the 0 so as to say in the set
since that if you pick any element from the set and add this 0 to it you get back the same element,
the additive inverse losses is that there exists for every element there exists another for every
element X there exists a corresponding xx such then X + xx = 0.
The fifth law is the distributive law, this law says that, if you have a real scalar α with which you
multiply the sum of two vectors X + Y then that should be equal to α x + α y the second
distributive law says that if you have the sum of two scalars multiplying a vector X from the
same α + ß ( a. x) then that should be equal to α times X + ß times X, if associative law says
that if you will first multiply two scalars and then multiplying the vector with them that should
be equal to multiplying it first with the second scalar ß and then with α. The unitary law says
that on multiplication by the scalar real number one you get back the same vector so this is
important because you would want multiplication to force any unexpected scaling like if you
multiplied by the scalar K is a vector you can surely be exactly K times it should not be say rote
kit.

A second related definition is that of a subspace, a subset W of a vector space V is said to be a
subspace if W is a vector space now this means that W should be closed under vector addition
and scalar multiplication it should also satisfy the eight axioms we stated earlier, now the
question that arises is to begin to verify all these eight conditions given that we know that W is
already a subset of a vector space it is enough to check just for the following two conditions first
means that W is non empty that in other words it has at least a single element.
And secondly that if I pick any two elements x and y from this set and any real number α then x
+ α y should belong to W.

Now let us have a look at the definition of a norm, so intuitively a norm is a measure of the
length of a vector or its magnitude it is a function from this vector space which mostly happens
to be Rn where n is a dimension of a vector to the space of real numbers R, so for a function to be
a norm it should satisfy the four conditions which we have given here, firstly it should be it
should always be non-negative secondly it should be zero if and only if the vector is zero.
Thirdly for every for every vector if you multiplied by a scalar its norm should get multiplied by
the body modulus of the scalar by modulus here we mean the absolute value and both being that
if we take any pair of vectors in our vector space which is R n the norm of the sum of these two
vectors should be less than some of their norms which this is also known as straggle inequality.
So this is namely related to the fact that the third side of attractor should always be less than
some of the other two sides.
Now an example of a norm is the LP norm then you sum up the absolute values along each
dimension raised to P and then take the 1/P as root of this, so when P = 2 you get the A2 norm
which is the magnitude of a vector as we have learnt in our earlier studies √ x2 + y2 if you are
looking at just the space R2 there are other norms for instance there are norms defined for
matrices as well here we had totally defined all for effect for vectors so the Frobenius norm is a
matrix norm so what it does is it essentially sums of the squares of all the elements and then
takes the root of that, so this also happens to be equal to the trace of A -1 A now the trace of a
matrix is simply the some of its diagonal elements.
This span of a set of vectors is the set of all vectors which can be composed using these vectors
by using the operations of vector addition and scalar multiplication, so the name span comes
from the fact that this set of vectors spans a potentially larger set of vectors which is then called a
span. So to define this more formally it is the set of all vectors V such that V = sum I = 1 to I = |
x| α I xi where α I is a real number.
Now a related definition is that of range or column space, so if we think of a matrix each of its
columns is a vector so the set of all columns of a matrix is like a set of vectors now the span of
this set of vectors is called the range or column space of that matrix, so if you consider the matrix
given here the columns of this matrix A are 1, 5, 2 and 044, so what would be this column space
of this matrix, it would be this the span of the vectors 1, 5. 2 and 044 which is essentially the
plane which is spanned by these two vectors.
If we have matrix A of dimensions m x m then the null space is the set of n x 1 vectors which
given M x 1 0 vector on B x A, in other words it is the set of vectors X such that ax = 0 nullity is
the rank or dimensionality of the null space we will be within the definition of nullity later once
you defined rank more clearly, another interesting fact about null spaces is that the null space of
A and not B is of dimension n while the range of A or the column space as we defined it earlier is
of dimension M or M x 1.
This means that vectors in R (A -1) so note that A-1is n X M So vector in the range of A -1will be of
dimension n similar to the vectors in the null space of A, so this means that the vectors in range
of A-1and null space of A would both be of the dimension A x 1.

Let us consider an example to illustrate the concept of a null space, consider the matrix A given
here, A is a 3 x 2 matrix hence the null space of A will be made up of vectors of dimension 2 or 2
x 1. Now we see that the on solving being weakened u = 0, v = 0 this means that the null space
only contains the 0 vector the two dimensional 0 vector 0, 0.

Let us consider another example to illustrate null space is fitted, take the matrix B which is a 3 x
3 matrix the null space would consist of 3 x 1 vectors we leave the finding of the null space to
the audiences and accessories however when on solving we get the null space to be the set of all
vectors of the form x = c, y = C and z = -c, where C is any real number and XYZ referred to the
first second and third dimension respectively.

Before we dive into defining linear independence recollect how we define the linear
combination, a set of vectors is linearly independent if no vector in the set can be produced using
the linear combination of the other vectors in the se. Now let us have a look at the related
concept of rank, so the column rank of m x n matrix A is the size of the largest linearly
independent subset of columns. Note that our columns here are m x 1 vectors the row rank is
defined in a similar way for rows.

Let us walk through some interesting properties of ranks, for any m x n matrix of real numbers
the column rank is equal to the road and we refer to this quantity as the rank of the matrix, earlier
we had looked at the quantity called nullity, nullity is the rank of the null space of A some other
interesting properties of ranks are also listed here, for instance the rank of a matrix is utmost the
minimum of its two dimensions.
The row dimension x column dimension secondly the rank of a matrix is the same as the rank of
its transpose, thirdly if you multiply two matrices A and B the rank of the resultant matrix is
utmost the minimum of the ranks of A and B if you add up two matrices the rank of the resultant
matrix is utmost the sum of the ranks of A and B.

A square matrix U of dimension n x n is defined to the orthogonal if and only if the following
two conditions would firstly all pairs of distinct columns should be orthogonal by columns being
orthogonal we mean that the dot product of any pair of piston column vectors is zero in other
words the I-1 VJ = 0 for all I0= J the second condition is that the dot product on any column with
itself or VI-1 VI = 1.
In other words all the column vectors should be normalized, an interesting implication of a
matrix being orthogonal is that UU-1 and U-1 U both end up being equal to the n x n identity
matrix I this also means that u U-T = U-1 equals the transpose of such an orthogonal matrix is also
means equals and additional interesting property is seen when we multiply a M x 1 vector X by n
x n orthogonal matrix Q the Euclidean or L20 of such a vector X remains the same on
multiplication by U.
It actively we can understand this as orthogonal matrices u performing only pure rotation on
multiplying the vector X in other words they only change the direction of a vector but do not
change its magnitude.

We often encounter the quadratic form which is the vector increment of a quadratic function the
quadratic form with respect to the matrix A of a vector X then the matrix M is n x n and the
vector X is n x 1 is given by the real number X -T AX based on the quadratic forms of matrices we
can classify them as positive definite negative definite positive semi definite and negative semi
definite.
A matrix A is said to be positive definite you its quadratic form is greater than zero for any vector
X similarly we can define it to be negative definite, a matrix A is positive semi definite if the
quadratic form is greater than equal to 0 for any vector X note that equality will 0 also make
called it but the important property of positive and negative definite matrices is that they are
always full rank.
An implication of this is that A -1 always exists, for a matrix A which is of dimension m x m one
can define a special matrix called a gram matrix, the gram matrix is given by A -TA the semi
property of the gram matrix is that it is always positive semi definite, moreover if the number of
rows exceeds the number of columns in other words if m > = m this means this few lines that G
is positive definite.
IIT Madras Production
Funded by
Department of Higher Education
Ministry of Human Resource Development
Government of India
www.nptel.ac.in
Copyrights Reserved

Lec 7

Uploaded by

Copyright:

Available Formats

Lec 7

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec 7

Uploaded by

Copyright:

Available Formats

NPTEL

NPTEL ONLINE CERTIFICATION COURSE

Linear Algebra Tutorial

Abhinav Garlapati Varun Gangal

Department of Computer Science

January 17, 2016

(Refer Slide Time: 00:47)

(Refer Slide Time: 02:05)

(Refer Slide Time: 04:10)

(Refer Slide Time: 06:16)

(Refer Slide Time: 07:35)

(Refer Slide Time: 08:39)

(Refer Slide Time: 14:15)

(Refer Slide Time: 14:52)

(Refer Slide Time: 15:35)

(Refer Slide Time: 16:22)

(Refer Slide Time: 17:33)

(Refer Slide Time: 19:27)

IIT Madras Production

You might also like