Welcome to MIT 18.06: Linear Algebra! The Spring 2025 course information, materials, and links are recorded below. Course materials for previous semesters are archived in the other branches of this repository. You can dive in right away by reading this introduction to the course by Professor Strang.
Catalog Description: Basic subject on matrix theory and linear algebra, emphasizing topics useful in other disciplines, including systems of equations, vector spaces, determinants, eigenvalues, singular value decomposition, and positive definite matrices. Applications to least-squares approximations, stability of differential equations, networks, Fourier transforms, and Markov processes. Uses linear algebra software. Compared with 18.700, more emphasis on matrix algorithms and many applications.
Instructor: Prof. Nike Sun
Textbook: Introduction to Linear Algebra: 6th Edition.
Detailed lecture notes are posted on Canvas (accessible only to registered students).
- Vectors in
, and generalization to vectors in (N-dimensional space). - Vector operations: addition and scalar multiplication. Both operations together: linear combinations.
- The span of a set of vectors
is the set of all linear combinations of these vectors: we covered some examples in class. - Definition of matrix times vector:
where is an matrix, and is in .
Reading: Strang Chapter 1.
- Dot product, vector length, cosine formula.
- The gaussian elimination algorithm for solving a system of linear equations Ax=b: reduce the matrix to row echelon form (REF), and do back-substitution. Worked through an example in class.
- Definition of matrix times matrix:
where is , is , is .* We explained how to view the gaussian elimination operations as matrix multiplication operations: the steps of gaussian elimination correspond to changing to , , etc.
Reading: Strang 2.1-2.2.
- Reviewed the gaussian elimination example using matrix multiplications to encode the operations.
- Gauss-Jordan elimination has a few additional steps which brings the system to reduced row echelon form (RREF) — we did this in the same example, again using matrix multiplications.
- In the example, the final RREF system was
. Moreover we found , the identity matrix. In this example it allowed us to read off . - We reviewed basic rules of matrix multiplication: associative
, distributive , but not commutative: and are generally not equal! - Inversion: if
is an matrix, it is invertible if there exists a matrix , also , such that , the identity matrix. - If
is invertible, then Gauss-Jordan elimination converts to . Moreover it converts to . - Square matrices need not be invertible, and we covered some examples.
Reading: Strang 2.3.
- The columns of
are linearly dependent if a non-trivial linear combination of the columns is zero: can write this as for nonzero. If is a square matrix with linearly dependent columns, then cannot be invertible. We covered some examples. - We defined the column space
of a matrix. An matrix can be viewed as a function/map from to , sending input in to output in . The column space is exactly the image of this map. The equation is solvable if and only if lies in . - We defined general vector spaces and subspaces, and covered some examples.
Reading: Strang 3.1.
- Defined the null space
as the set of vectors such that . - Note that if
is , then is a subspace of , while is a subspace of . - Invertible row operations (such as in Gauss-Jordan elimination) do not affect the null space, so if
is the RREF of , then . - We covered several examples of calculating
. We also noted that in all our examples, dim C(A) + dim N(A) = n.
Reading: Strang 3.2.
- In this class we covered the general solution for a system of linear equations Ax=b.
- The basic principle: if
is not in , then there is no solution. If is in , then there must exist at least one "particular solution," call it . Then the set of all solutions to is the set of vectors , where is the particular solution, and is any vector from the null space . - General recipe for solving:
- given
, apply Gauss-Jordan elimination to transform to RREF system . - If the RREF system contains a row that says 0 = nonzero, then we have a contradiction, and in this case
is not in and there is no solution. - Otherwise, set the free variables to zero to find a particular solution
. - Separately, solve for the null space
. - Then the set of all solutions to
is the set of vectors , where is the particular solution, and is any vector from .
- given
Reading: Strang 3.3.
- Throughout this class, we let
be list of n vectors, each in the space . Let be the matrix with columns . - The vectors
are linearly dependent if a non-trivial linear combination of them equals zero: this corresponds to being strictly larger than . Otherwise, we say they are linearly independent: this corresponds to . - A basis for a vector space
is a list of vectors that span , and are linearly independent. We covered the standard basis for the space . - Let
. Then is the same as . If are linearly independent, then they form a basis for .
- The vectors
- More generally, perform Gauss-Jordan elimination, and let
be the RREF of . Then . - The pivot columns of
form a basis for , and the corresponding columns of form a basis for . - Note that rank(A) = # pivots in R = dim C(R) = dim C(A). Meanwhile # free variables in R = dim N(A).
- There are n columns total, and each column must be pivot or free, so n = # pivot + # free = dim C(A) + dim N(A): this is the rank-nullity theorem.
- The pivot columns of
- Lastly, we reviewed that if
is an matrix, then we view it as a map from to , sending in to in .
Reading: Strang 3.4.
Note: You should be able to do all of exam 1 using only the information covered in Lectures 1–6, together with the homework, recitations, and reading assigned before exam 1. However, since all the topics of the class are closely connected, material from Lecture 7 might also be helpful to know for the exam, and you are free to use that material on the exam as well. All exams in this class are closed book, closed notes.
- We started the lecture with the definition of the matrix transpose
. - Note in general
, and . - If
, then we say that is symmetric. Only square matrices can be symmetric.
- Note in general
- We covered the four fundamental subspaces of a matrix, and how to calculate them. Throughout, let A be an
matrix, and let be the RREF. Thus is an invertible matrix that encodes the Gauss-Jordan row operations. - Column space
. This is a subspace of . - Null space
. This is a subspace of . - Row space
. This is a subspace of . - Left null space
. This is a subspace of .
- Column space
Formal reasoning for the above claims:
- Column space:
and . Thus . This proves . - Null space:
and . Since invertible, . This proves . - Row space: recall
. and . Since is invertible, is also invertible. As ranges over all of , also ranges over all of . Therefore . - Left null space: (There was a typo on the blackboard, so please read this one carefully.)
and . Therefore . This proves .
In class, we calculated the four fundamental subspaces on a small example. We verified that the column space and left null space are orthogonal subspaces of
Reading: Strang 3.5.
- In this class we reviewed the four fundamental subspaces of an
matrix . - We went over an example of how to calculate the four subspaces of
, using the RREF . - Dimensions: both column space and row space have dimension r = rank(A). The null space has dimension
, while the left null space has dimension . - We covered the fact that in
, the row space and null space are orthogonal complements of one another. In , the column space and left null space are orthogonal complements of one another.
Reading: Strang 4.1.
- We covered what it means for two subspaces of
to be: - complementary
- orthogonal
- orthogonal complements.
- In particular:
- If
and are complementary subspaces of , then any can be uniquely written as with from , from . - If
and are in additional orthogonal complements, then is the orthogonal projection of onto , while is the orthogonal projection of onto . Denoted and .
- If
- We discussed the geometric interpretation of orthogonal projection:
-
is the unique vector in that lies closest to . - equivalently,
is the unique vector in such that is perpendicular to . - We used the latter characterization to calculate
where is the span of a single nonzero vector in .
-
Reading: Strang 4.2.
- We covered the general formulas for orthogonal projection.
- Projection onto a one-dimensional subspace
, where is a unit vector in : where . Note that is an symmetric matrix. Its column space is exactly the one-dimensional space , therefore has rank one. - Projection onto a general subspace
of , where : first express where matrix whose columns form a basis of . We showed in class that where . This is an symmetric matrix. Its column space is exactly , therefore has rank . -
Claim: If
is with rank , then is invertible. We stated this fact in class, and used it to define . We did not yet give a justification of this fact, and will do so in a future lecture. - Note that
where . This achieves the minimum distance , and we call this the least squares solution.
-
Claim: If
- Lastly we went over some examples of the projection matrix formula:
- In the one-dimensional case
where is a unit vector, we take and recover the formula . - If we have an orthonormal basis
for , then where is the orthogonal projection onto .
- In the one-dimensional case
Reading: Strang 4.3.
- As we learned previously, the equation
does not have a solution if b does not lie in column space . In this case, one can instead ask for the least squares (LS) solution: the choice of x that minimizes
- This means
should be precisely the projection of onto , so from what we previously learned, we see that , and consequently . - Application: given a data set
for , we covered how to find: - The straight line with no intercept that achieves the least squares fit:
where is the slope; - The straight line with intercept that achieves the least squares fit:
where is the intercept and is the slope; - The cubic function that achieves the least squares fit:
.
- The straight line with no intercept that achieves the least squares fit:
Reading: Strang 4.3.
- We learned the Gram-Schmidt procedure: given a basis
for a subspace of , it produces an orthonormal basis of . - The Gram-Schmidt procedure can be summarized by the QR factorization:
where: -
is the matrix with columns ; -
is the matrix with columns ; -
is the matrix of the coefficients relating the 's to the 's. In particular, is upper triangular with non-zero diagonal entries, and can be inverted by back-substitution.
-
Reading: Strang 4.4.
- Let
be an matrix of rank , with . This means that the column space : therefore, for any , the equation has at least one solution . We also know that the null space is a subspace of of dimension . It follows that in fact has infinitely many solutions, since is also a solution for any from . We can therefore ask, what is the minimum norm solution ? Any solution can be decomposed as where while (the row space of ). We discussed in class that the minimum norm solution to is exactly . If we have a QR factorization , then can be rearranged to give . - If
is an matrix, then its matrix pseudoinverse is the matrix which does the following: - Given
, let be the orthogonal projection of onto the column space . - Let
be the minimum norm solution to the equation . Then is the matrix which acts as .
- Given
- Two examples of calculating the pseudoinverse:
- If
is with rank , then the above calculation tells us that if we have the QR factorization , then . -
(Corrected; this previously had a typo!) If
is with rank , then the pseudoinverse should first map to its orthogonal projection onto , that is, , which lies in . As a result has a unique solution, given by . It follows that .
- If
Reading: Strang 4.5.
- If
is an square determinant, its determinant is the signed factor by which the linear transformation scales -dimensional volumes. - Some key facts:
- Product formula:
. - We have
if and only if is invertible. - The determinant of an upper triangular matrix is the product of the diagonal entries.
- Product formula:
- We covered several cases of
matrices : the unit square maps to a parallelogram , and is (up to sign) the 2-dimensional volume (area) of . - Two ways to calculate
up to sign: - Use a (generalized) QR factorization:
where is an orthogonal matrix, and is upper triangular (possibly with zero entries on the diagonal). Then , so . - Use gaussian elimination:
where is in row echelon form (REF), and is a product of row swap or elimination matrices. Then , so .
- Use a (generalized) QR factorization:
Reading: Strang 5.1.
- We covered the "big formula" for the determinant of an
matrix, . The sum goes over all permutations of , and denotes the sign of the permutation : it is if is a composition of an even number of swaps, and if is a composition of an odd number of swaps. We explained that this formula can be derived from the multilinearity property of the determinant. - In most cases, the more efficient way to compute
will be by gaussian elimination: . is in REF, so it is upper triangular and its determinant is simply the product of its diagonal entries. Each encodes an elementary row operation: if encodes a row swap, it follows from the big formula that . Otherwise, if encodes an elimination operation, then is a lower triangular matrix with all 's along the diagonal, and in this case . It follows that , where is the number of row swaps in the gaussian elimination.
Reading: Strang 5.2.
- We covered the Laplace expansion of the determinant, which can be viewed as a way to organize the "big formula" from last time.
- We considered one example of a circulant matrix; see https://en.wikipedia.org/wiki/Circulant_matrix. Following the Wikipedia notation, our example had
, , and all other . We covered how to evaluate the determinant of this matrix using the "big formula", and also using Laplace expansion along the top row.
Reading: Strang 5.3.
- In this lecture we did some review and examples in preparation for the Wednesday exam.
- If
, we explained how to calculate that . - Let
be the identity matrix, and a vector in . We calculated the pseudoinverse of the matrix
- In this lecture we reviewed the Laplace expansion (also called cofactor expansion) of the determinant.
- Given an
matrix , the minor is the matrix obtained by removing the -th row and -th column of . - We defined the
cofactor as . The cofactor matrix is the matrix with entries . - The adjugate matrix is
. We derived in class that if is invertible, then . - We also used this to derive Cramer's rule for solving a linear system
.
Reading: finish Strang Chapter 5.
- At the start of this class, we discussed that diagonal matrices act in a simple way.
- A (square) matrix
is diagonalizable if it can be related to a diagonal matrix via the equation where is invertible, and is diagonal. - Caution: not all square matrices are diagonalizable! Nevertheless, it is an important and useful concept.
- Let
have columns , and let have diagonal entries . We showed that is an eigenvector of with eigenvalue . - Since
is invertible (by definition), its columns form a basis of , which is called the eigenbasis of . The action of in the eigenbasis is diagonal. - We explained that
and may be viewed as implementing change of basis: maps from standard coordinates to eigenbasis coordinates, while maps from eigenbasis coordinates to standard basis coordinates. - We also covered some concrete examples. In future lectures we will learn how to compute matrix eigenvalues and eigenvectors.
Reading: start Strang Chapter 6.
- Let
be a square matrix. An eigenvector of is a non-zero vector such that for a scalar (the eigenvalue). We will allow to be a real or complex number, so in general . - An eigenvector with eigenvalue
is just a null vector. - In general, for any
, an eigenvector with eigenvalue is any non-zero vector in the null space . - It follows that the eigenvalues of
are exactly the roots of , the characteristic polynomial of . - Let
denote the trace of (sum of its diagonal entries). It follows from the determinant formula that is a polynomial in of degree , of the form
- The fundamental theorem of algebra tells us that
has roots , and can be factorized as
- The eigenvalues are exactly the roots
. They may be complex-valued, and it is possible to have multiple roots. - The algebraic multiplicity of an eigenvalue
is the number of times it appears as a root in the characteristic polynomial. - The geometric multiplicity of an eigenvalue
is the dimension of its eigenspace, . - In general,
. We will discuss this further in the next lecture.
Reading: Strang 6.1-6.2.
- Let
be a square matrix. We discussed last time that the characteristic polynomial is a polynomial in of degree . The fundamental theorem of algebra then tells us that it has roots , and these are precisely the eigenvalues of . The roots may be complex-valued, and it is possible to have multiple roots. - The algebraic multiplicity of an eigenvalue
is the number of times it appears as a root of the characteristic polynomial. - The geometric multiplicity of an eigenvalue
is the dimension of its eigenspace, . - In general,
. - The algebraic multiplicities sum up to the total number
of roots. Eigenspaces for distinct eigenvalues are linearly independent, so the geometric multiplicities sum up to the combined dimension of all eigenspaces. The matrix is diagonalizable if and only if the latter sum equals , which means we must have for all the eigenvalues. This is not guaranteed in general. - A special case is that all
roots are distinct. In this case we must have for all eigenvalues, so the matrix in this case is always diagonalizable. If the roots are not all distinct, then may or may not be diagonalizable. - If
is not diagonalizable, then it has a Jordan canonical form (JCF), which can be viewed as a generalization of the diagonalization. (In the special case that is diagonalizable, the JCF is the same as the diagonalization.)
Reading: finish reading Strang 6.1-6.2.
- We discussed that eigenvalues and eigenvectors can be complex-valued, even for real matrices, and covered an example.
- In general, if
is a real matrix, then its characteristic polynomial has real coefficients. This implies that the non-real eigenvalues of must occur in conjugate pairs. (For example, this also implies that if is with odd, then must have at least one real eiganvalue). - We covered some basic concepts of complex numbers, including conjugate and modulus. We also covered complex vectors, the conjugate transpose operation, and the norm of a complex vector.
- Lastly, we covered the spectral theorem, which says that is
is real symmetric, then it has all real eigenvalues and eigenvectors, and an orthonormal eigenbasis. We can write this as where both and are real, and is an orthogonal matrix. In class we also gave a partial proof of the spectral theorem.
Reading: Strang 6.3, as well as the first three pages of Strang 6.4.
- A symmetric matrix is positive-definite (PD) if all its eigenvalues are strictly positive.
- A symmetric matrix is positive semi-definite (PSD) if all its eigenvalues are nonnegative.
- If
is symmetric, it is PD if and only if for every vector . - If
is symmetric, it is PSD if and only if for every vector . - For any matrix
( ), both and are PSD. - In this lecture we introduced the singular value decomposition (SVD), which applies to any matrix
( ). More precisely we covered both the long SVD and the short SVD. - In the special case that
is a square matrix of full rank, the long and short SVD are identical, and we covered in class the procedure to find this SVD.
Reading: Strang 7.1.
- We covered how to find the short and long SVD of a general matrix
( ). - The matrix
is PSD, so we can find its spectral decomposition . Moreover we can arrange that the diagonal entries of are , where is the rank of (and also the rank of ). - Let
be the matrix formed from the first columns of . - Let
be the diagonal matrix with diagonal entries for . - Let
be the matrix defined by . - Then
gives the short SVD of . - To convert from short to long SVD: expand
from to by adding zeroes; expand from to so that its columns form an orthonormal basis of ; expand from to so that its columns form an orthonormal basis of . - We also covered some examples and practice problems.
Reading: finish reading Strang 7.1.
- In this lecture we covered geometric interpretations of the SVD.
- Throughout, suppose
is with rank , and that we rank its singular values in nondecreasing order . - The maximum singular value
is the operator norm or spectral norm of , usually denoted or . - The operator norm of
can be understood as the maximum value of attained as ranges over all unit vectors in . - The SVD can be used to calculate the pseudoinverse: if the short SVD of
is given by , then the pseudoinverse of is .
Reading: Strang 7.2.
- In this lecture we covered the application of SVD to low-rank approximation and image compression.
- Suppose
is with short SVD . As always, we rank the singular values (diagonal entries of ) . - The rank-$k$ approximation of
is given by , where is formed from the first columns of , is formed from the first columns of , and is the upper left submatrix of . - We discussed three matrix norms: (1) operator norm / spectral norm; (2) Frobenius norm / Hilbert--Schmidt norm, (3) nuclear norm.
- Eckhart--Young theorem: among all matrices of rank at most
, the best approximation to is given by . It is best with respect to all three of the spectral norms listed above. - Application to image compression: if
represents an image, the original image consists of pixels. Storing the compressed image requires storing values. If are large and is relatively small, the compressed image requires much less storage. - See https://timbaumann.info/svd-image-compression-demo/ for examples.
Reading: Strang 7.2.
- We covered the application of SVD to PCA (principal components analysis).
- Let
be an data matrix where is the number of individuals or samples, and is the number of attributes or features. - We assume the data is normalized, so that each column (feature) has mean zero and standard deviation one.
- 2D PCA: choose the 2D projection of the data that shows the most variability.
- We learned in class that this is achieved by taking
to be the two top right singular vectors (corresponding to the first two columns of the matrix), resulting in the 2D scatterplot of the values for . - Lastly, we showed that 1D PCA and ordinary least squares (OLS, see Lecture 12) are not the same. This is the reason the textbook refers to 1D PCA as "perpendicular least squares."
Reading: Strang 7.3.
- In this lecture we finished our discussion of SVD and PCA by going over the application covered in the paper https://doi.org/10.1038/nature07331.
- We reviewed the basics of complex numbers: real part, imaginary part, complex conjugate, modulus, polar form, Euler's formula.
- Example: the permutation matrix
corresponding to the permutation that maps . The eigenvalues are the -th roots of unity. - Example: solving the differential equation
. - We defined the complex dot product (also called scalar product or inner product): for
, we define .
Reading: start Strang 6.4.
- In this class we covered definitions of special classes of complex
matrices. -
Unitary matrices: extension of definition of orthogonal matrices. Orthogonal matrices preserve the
scalar product, and unitary matrices preserve the scalar product. -
Hermitian matrices: extension of definition of symmetric matrices. Symmetric matrices are self-adjoint on
, and hermitian matrices are self-adjoint on . - A matrix
is normal if . Both sets of unitary matrices and hermitian matrices sit inside the set of normal matrices. - We covered the spectral theorem for three cases: symmetric matrices (covered previously), hermitian matrices, and normal matrices.
- Example: same permutation matrix as discussed last time. This matrix is normal, but not symmetric. It has an orthogonal eigenbasis, which corresponds to a special basis of
called the Fourier basis.
Reading: continue Strang 6.4.