Honours Linear Algebra II MATH227

Topic 1 - Determinants

Lecture 1

Carrol determinant formula: \(\det{A} \times \boxed{A} = \text{bottom-left-removed}\times\text{top-right-removed} - \text{bottom-right-removed}\times\text{top-left-removed}\)

\(M_{n\times n}(\mathbb{F})\) is a ring and group under addition

\(\text{GL}_n(\mathbb{F})\) is the group of invertible matrices, and is a group under matrix multiplication

Here, the field \(\mathbb{F}\) is defined as a group under addition (\(\mathbb{F}^{+}\)) and \(\mathbb{F}^{\times} = \mathbb{F}\setminus\set{0}\) is a group under multiplication

Theorem

Theorem
\(\det : \text{GL}_n(\mathbb{F}) \to \mathbb{F}^{\times}\) is a group homomorphism

Cofactor expansion is a strategy for computing the determinant

Lecture 2

The determinant \(\det\) is the unique function \(M_{n\times n}(\mathbb{F}) \to \mathbb{F}\) that is

  1. Linear in each row, i.e. for any row, \(\det{\begin{bmatrix} c\vec{a}+\vec{b} \\ \vec{r_1} \\ \vdots \end{bmatrix}} = c\cdot \det{\begin{bmatrix} \vec{a} \\ \vec{r_1} \\ \vdots \end{bmatrix}} + \det{\begin{bmatrix} \vec{b} \\ \vec{r_1} \\ \vdots \end{bmatrix}}\)
  2. Equals \(0\) when two rows are equal
  3. Maps \(I_n \mapsto 1\)

Each of these properties can be shown/proven inductively using cofactor expansion. Uniqueness is harder to prove

Lecture 3

For \(A \in M_{n\times n}(R)\), \(\det{A} \in R\), where \(R\) is a (not necessarily commutative) ring. This follows from the ring axioms and the cofactor expansion definition

Adjugate formula: For invertible (i.e. \(\det{A} \ne 0\)) \(A \in M_{n\times n}(\mathbb{F})\), \(A^{-1} = \dfrac{1}{\det{A}}\text{Adj}(A)\), where \(\text{Adj}(A)_{ij}\) is the \((j, i)\)th cofactor matrix \(C_{ji}\), i.e. \(\text{Adj}(A) = C^\intercal\)

Lecture 4

Proof: \((A \cdot \text{Adj}(A))_{ij} = \displaystyle\sum\limits_{k=1}^{n} A_{ik} C_{jk}\); cofactor expand along \(j\)th row of the matrix you get by replacing the \(j\)th row of \(A\) with its \(i\)th row

So, \(A \cdot \text{Adj}(A) = \det{A}\cdot I_n\)

For commutative ring \(R\), \(A \in M_{n\times n}(R)\) has a multiplicative inverse in \(M_{n\times n}(R)\) \(\iff\) \(\det{A}\) has inverse in \(R\) → then adjugate formula holds

Proof: (→) If \(A^{-1} \in M_{n\times n}(R)\), then \(1 = \det{AA^{-1}} = \det{A}\det{A^{-1}}\), so \(\det{A}\) is invertible. (←) If \(\det{A}\) is invertible, proof follows from adjugate formula

Lecture 5

We can check if a matrix \(A\) is invertible in field \(\mathbb{F}\) by checking if \(\det{A}\) is invertible in \(\mathbb{F}\). E.g. \(\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}\) is not invertible in \(\mathbb{Z}_{12}\) since \(-2 = \det{\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}} \mid 12\), but it is invertible in \(\mathbb{Z}_9\) since \(-2 \not\mid 9\).

Cramer's Rule: For \(A\vec{x}=\vec{b}\) (where \(A\) is invertible), \(x_i = \dfrac{\det{\tilde{A}}}{\det{A}}\), where \(\tilde{A}\) is the matrix formed by replacing the \(i\)th column of \(A\) with \(\vec{b}\)

Lecture 6

Permutation: invertible function \(\sigma : \set{1, 2, \dots n} \to \set{1, 2, \dots n}\), i.e. maps from self to self uniquely

Symmetric Group (denoted \(S_n\) or \(\text{Sym}(n))\): set of all permutations (functions) on set of size \(n\)

Symmetric groups are not abelian (i.e. not commutative)

Inverses of symmetric groups can be found by flipping the rows, since the "inverse" is just "undoing" the permutation by "moving everything back to where it was"

\(S_n\) trivially has \(n!\) elements

Permutation formula for determinant: \(\det{A} = \displaystyle\sum\limits_{\sigma\in S_n}[\pm 1]\cdot A_{1,\sigma(1)} A_{2,\sigma(2)}, \dots A_{n,\sigma(n)}\), where \(A_{i,j}\) is the \((i, j)\)th entry of \(A \in M_{n\times n}(\mathbb{F})\)

Permutation matrix: For standard basis vectors \(\vec{e_1} \dots \vec{e_n}\), \(P(\sigma) = \begin{bmatrix}\vec{e_{\sigma(1)}} & \vec{e_{\sigma(1)}} & \dots & \vec{e_{\sigma(n)}} \end{bmatrix}\)

Leibniz formula for determinant \(\det{A} = \displaystyle\sum\limits_{\sigma\in S_n}\det{P(\sigma)} \cdot A_{1,\sigma(1)} A_{2,\sigma(2)}, \dots A_{n,\sigma(n)}\)

Lecture 7

The transposition operator \(\tau_{ij}\) represents the \(i \leftrightarrow j\) row swap; the composition \(\tau_{i_1 j_1}\circ \tau_{i_2 j_2}\circ\dots\circ\tau_{i_n j_n}\) represents successive swaps.

Transpositions are the building blocks of \(S_n\)

Determinant test: \(\vec{v_1} \dots \vec{v_n} \in \mathbb{F}^n\) form a basis \(\iff\) \(\det{\begin{bmatrix}\vec{v_1} & \dots & \vec{v_n}\end{bmatrix}} \ne 0\) (\(\iff\) \(\vec{v_1} \dots \vec{v_n}\) are linearly independent) (\(\iff\) \(\text{span}(\vec{v_1} \dots \vec{v_n}) = V\))

Lecture 8

Geometric properties of the determinant: measure of the "parallelogram" defined by sides \(\vec{v_1} \dots \vec{v_n}\) in \(\mathbb{R}^n\) is \(\det{\begin{bmatrix}\vec{v_1} \\ \vdots \\ \vec{v_n}\end{bmatrix}}\) (where the vectors \(\vec{v_1} \dots \vec{v_n}\) form the rows)

The linear transformation \(T : \mathbb{R}^{n}\to \mathbb{R}^{n}\) increases area by a factor of \(\det{T}\)

Topic 2 - Equivalence Classes and Quotient Spaces

Lecture 9

\(\sim\) is an equivalence relation if it satisfies

  1. Reflexivity: \(a \sim a\)
  2. Symmetry: \(a \sim b \implies b\sim a\)
  3. Transitivity: \(a\sim b \land b\sim c \implies a\sim c\)

E.g. \(m\sim n \iff 7 \mid (m-n)\) is an equivalence relation

Each \(\sim\) splits the domain over which it is defined into equivalence classes \(\overline{a}\). These sets partition (i.e. fully cover in aggregate) the domain \(D\), and are defined as the groups of elements that are all equivalent to each other

Inversely, every possible partition has an associated equivalence relation, defined as \(a\sim b\) if \(a\) and \(b\) are in the same partition.

Lecture 10

For set \(X\), \(X/\sim\) is the partition of \(X\) by \(\sim\), i.e. \(X\) is partitioned into the equivalence classes defined by \(\sim\) for elements in \(X\)

We have a vector space structure for equivalence classes over vector spaces; the equivalence classes inherit the structure of the domain.

Lecture 11

Quotient Theorem 1

Theorem
Let \(V\) be a vector space over \(\mathbb{F}\), and \(S \subseteq V\). Define \(\vec{u}\sim\vec{v} \iff \vec{u}-\vec{v} \in S\). Then

  1. \(\sim\) is an equivalence equivalence relation if and only if \(S\) is a group under vector addition, i.e. is closed under \(+\), \(\vec{0}\in S\), and \(\vec{u}\in S \iff -\vec{u}\in S\)
  2. Define \(\overline{\vec{u}} + \overline{\vec{v}} = \overline{\vec{u}+\vec{v}}\) and \(k\overline{\vec{u}}=\overline{k\vec{u}}\). Then \(V/\sim\) is a vector space if and only if \(S\) is a subspace of \(V\)

Informally, \(V/\sim\) "plays nice" iff \(\sim\) is an equivalence relation, meaning \(S\) is a subgraoup, etc.

Equivalence classes are also known as cosets

Quotient of \(V\) by \(S\): \(V/S\), defined by \(V/\sim\) where for \(a, b\in V\), \(a \sim b \iff a-b\in S\)

Lecture 12

For set \(X\), equivalence relation \(\sim\), and function \(f:X\to X\), \(f\) is well-defined on \(X/\sim\) iff \(x\sim y \implies f(x)=f(y)\), i.e. two equivalence elements of \(X\) must be mapped to the same value by \(f\)

Quotient Theorem 2

Theorem
Let \(V\) be a finite-dimensional vector space and \(S\) be a subspace. Then \(V/S\) has dimension \(\dim{V}-\dim{S}\).

A basis can be found for \(V/S\) by finding a basis for \(S\), then adding necessary vectors to make it a basis for \(V\). The vectors that need to be added are the basis for \(V/S\)

Aside: This gives insight into what a quotient space actually is. It is the "complimentary" subspace you get when "forcing the structure of \(S\) onto \(V\)". This is why vectors that are different in \(S\) are the equivalence classes (\(V/S\)) in \(V\), by the definition of the equivalence relation. \(V/S\) is the space that you can use to "build" \(V\) from just \(S\); each equivalence class in \(V/S\) "contains" the structure of \(S\). The basis construction is the best way to give insight into what \(V/S\).

Aside: a semantically better notation for \(V/S\) might be \(V-S\), since this makes the nature of the relationship between \(V\), \(S\), and \(V/S\) more clear

Lecture 13

E.g. if \(S\) is the \(xy\) plane in \(\mathbb{R}^3\), then the equivalence classes for \(\mathbb{R}^3/S\) are the planes parallel to the xy plane

Lecture 14

Quotient Theorem 3

Theorem
We have linear map \(T : V \to V/S\) defined by \(T(\vec{v})=\overline{\vec{v}}\).

We know this is linear since the equivalence relation for \(S\) is well-defined, i.e. \(T(a\vec{u}+\vec{v}) = \overline{a\vec{u}+\vec{v}} = \dots = a\overline{\vec{u}}+\overline{\vec{v}} = aT(\vec{u})+T(\vec{v})\)

Aside: I think the number of equivalence classes of \(V/S\) are \(\#(V/S)=\#V/\#S\), perhaps this is why the notation is is \(V/S\)?

E.g. Let \(S\) be all \(A \in M_{n\times n}(\mathbb{F})\) where \(A \sim A^\intercal\), i.e. for \(A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}\), \(b=c\). Clearly a basis for \(S\) is \(\displaystyle \set{\begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}, \begin{bmatrix} 0 & 0 \\ 0 & 1 \end{bmatrix}, \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}}\), since the transpose "flips" the matrix along the tl-br diagonal, so any element there can be the same. To get a full basis for \(M_{n\times n}(\mathbb{F})\), we can add another matrix \(\begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}\) (there are other options). So, the basis for \(M_{n\times n}(\mathbb{F})/S\) is \(\set{\begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}}\), and so the equivalence classes are \(\overline{\begin{bmatrix} 0 & a \\ 0 & 0 \end{bmatrix}}\) for all \(a \in \mathbb{F}\). We also have from the basis sizes \(\dim{M_{n\times n}(\mathbb{F})/S}=\dim{M_{n\times n}(\mathbb{F})}-\dim{S}\)

Lecture 15

We can think of \(\overline{\vec{u}} + \overline{\vec{v}} = \overline{\vec{u}+\vec{v}}\) and \(k\overline{\vec{u}}=\overline{k\vec{u}}\) in two ways

  1. A linear transformation \(T\) defined by \(T : V \to V/S\) and \(T : \vec{v} \mapsto \overline{\vec{v}}\)
  2. \((\vec{u}+S)+(\vec{v}+S)=(\vec{u}+\vec{v})+(S+S)=\vec{u}+\vec{v}+S\) (since \(S\) is closed under addition)

Sum vector space: \(V \oplus W = \set{(\vec{v}, \vec{w})}\), i.e. combining the vectors element-wise

\(S = \vec{0}\oplus W\) is a subspace of \(V \oplus W\), so \({V\oplus W}/({{\vec{0}\oplus W}})\cong V\) has equivalence classes \(\overline{(\vec{v} , \vec{w})} = \overline{(\vec{v}, 0)}\)

So \(V/S\) could be called "direct difference" \(V\ominus S\), since \((V\oplus W)\ominus(\vec{0}\oplus W)\cong V\)

\(T\) has image \(V/S\) (onto) and kernel \(S\)

Quotient Kernel Theorem

Theorem
Let \(T : V \to W\) be linear. Then \(\text{Image}(T) \cong V/\ker{T}\), implying \(\dim{\text{Image}(T)} = \dim{V}-\dim{\ker{T}}\)

Quotient Theorem 1 for Groups

Theorem
Let \(G\) be a group, and \(S \subseteq G\). Define relation \(g\sim g'\) as \(g^{-1}g' \in S\).

  1. \(\sim\) is an equivalence relation if and only if \(S\) is a subgroup
  2. Let \(S\) be a subgroup. Define \(\overline{g}\overline{h}=\overline{gh}\). This is well-defined and defined a group structure \(G/\sim\) if and only if \(S\) satisfies \(gsg^{-1} \in S\) for all \(g\in G\), \(s\in S\) → sub subgroups are normal subgroups

Topic 3 - Eigenstuff

Lecture 17

Let \(A \in M_{n\times n}(\mathbb{F})\). A number \(\lambda\) is an eigenvalue of \(A\) with eigenvector \(\vec{v}\) if \(A \vec{v} = \lambda \vec{v}\); the set \(\set{\lambda, \vec{v}}\) is an eigenpair.

Eigenstuff is widely applicable:

We have \(A \vec{v} = \lambda \vec{v} \iff (A - \lambda I) \vec{v} = \vec{0}\). So, \(\text{Null}(A - \lambda I)\) is the set (and subspace) of eigenvectors corresponding to eigenvalue \(\lambda\)

Thus, to find all eigenvalues \(\lambda\) of \(A\), we solve the polynomial \(\det(A-\lambda I) = 0\) to find all eigenvalues \(\lambda\), then find the null space \(\text{Null}(A-\lambda I)\) to find all eigenvectors \(\vec{v}\).

Lecture 18

The set of eigenvectors of \(A\) corresponding to eigenvalue \(\lambda\) (denoted \(E_\lambda\)) is a subspace, namely the eigenspace of \(\lambda\). We know that \(E_\lambda = \text{Null}(A - \lambda I)\).

The characteristic polynomial \(C_A(\lambda)\) of \(A\) is \(\det(A-\lambda I) = 0\); its roots are the eigenvalues of \(A\)

We can diagonalize \(A\) if we can find a diagonal matrix \(D\) and invertible matrix \(P\) such that \(P^{-1}AP = D\iff PDP^{-1}=A\).

Lecture 19

The algebraic multiplicity \(a_\lambda\) of eigenvalue \(\lambda\) is its multiplicity as a root of the characteristic polynomial \(\det(A-\lambda I) = 0\)

The geometric multiplicity \(g_\lambda\) of eigenvalue \(\lambda\) is defined as \(\dim{E_{\lambda_i}} = \dim{\text{Null}(A - \lambda I)}\), i.e. the number of vectors that form a basis for the eigenspace \(E_{\lambda_i}\)

For any eigenvalue of any matrix, it is true that \(a_{\lambda_i} \geq b_{\lambda_i}\).

Diagonalizability Criterion: If some characteristic polynomial for matrix \(A\) factors completely into linear factors, then \(A\) is diagonalizable if and only if the geometric multiplicity and the algebraic multiplicity are equal

We always have \(\displaystyle\sum\limits_{\lambda} a_{\lambda} = n\) (i.e. \(\det{A - \lambda I}\) has degree \(n\), so an \(n\times n\) matrix has \(n\) eigenvalues), but this isn't always the case for \(g_{\lambda_i}\).

Corollary: if all the roots of a characteristic polynomial are distinct, the matrix is diagonalizable, implying almost all matrices are diagonalizable

Lecture 20

Some matrices may not be diagonalizable over their own field. In particular, there exist matrices in \(\mathbb{R}\) that can only be diagonalized in \(\mathbb{C}\), e.g. \(A=\begin{bmatrix}0&1\{-1}&0\end{bmatrix}\) has characteristic polynomial \(C_A(\lambda) = \lambda^2+1\), which has solutions \(\set{i, -i} \not\subset \mathbb{R}\), meaning \(A\) is diagonalizable over \(\mathbb{C}\), but not \(\mathbb{R}\).

We note that for matrix \(A\) in \(\mathbb{R}\) with eigenpair \(\set{\lambda, \vec{v}}\), we have \(A \vec{v} = \lambda \vec{v} \iff \overline{A \vec{v}} = \overline{\lambda \vec{v}}\) \(\iff \overline{A}\overline{\vec{v}} = \overline{\lambda}\overline{\vec{v}} \iff A\overline{\vec{v}} = \overline{\lambda}\overline{\vec{v}}\) (since \(A\) is real), meaning \(\set{\overline{\lambda}, \overline{\vec{v}}}\) is also an eigenpair of \(A\).

The characteristic polynomial can be used for the following sanity checks:

  1. \(C_A\) is of degree \(n\) (for \(A_{n\times n}\))
  2. The leading coefficient \(a_n = (-1)^n\), since \(C_A = (\lambda - \lambda_i)^n\) for various \(i\)
  3. The constant term \(a_0\) is the product of \(A\)'s eigenvalues (since the constant term must be divisible by all the factors, which are all the eigenvalues)
  4. \(a_{n-1} = (-1)^{n-1} \text{Trace}(A) = (-1)^{n-1}\displaystyle\sum\limits_{i=0}^{n}\lambda_i\) (follows from expanding polynomial multiplication)

We have \(\boxed{\det{A} = \displaystyle\prod_{i=1}^{n} \lambda_i}\) since \(\det{A} = C_A(0) = a_0\), which is the product of eigenvalues as shown in 3)

Diagonalizability is important because, for diagonalizable \(A\), we have \(A^{n}= PD^nP^{-1}\) where \(D = \begin{bmatrix} \lambda_1^n & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \dots & \lambda_n^n \end{bmatrix}\), since \(D\) is diagonal.

We can also define \(\sqrt{A}\) as \(P\sqrt{D}P^{-1}\), where \(D = \begin{bmatrix} \sqrt{d_1} & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \dots & \sqrt{d_n} \end{bmatrix}\), i.e. \(A^q\) is defined for \(q\in \mathbb{Q}\)

Lecture 21

Since \(A^n\) is defined for \(n \in \mathbb{N}\), we can define the exponential function \(e^A\) or \(\exp{A}\) as \(\displaystyle\sum\limits_{n=1}^{\infty} \dfrac{A^n}{n!}\)

Aside: Recall that \(\frac{d}{dx}\) is a linear operator (linear map on a space of differentiable functions like \(\mathbb{R}[x]\)) that can be described by matrix \(A\). So, we can define another linear operator \(e^{A}= e^{\frac{d}{dx}}\) sends "vector" \(f(x)\) to \(\displaystyle \sum\limits_{n=0}^{\infty} \dfrac{f^{(n)}(x)}{n!}\). This is the taylor expansion of \(f(x+1)\), so the transformation \(T_A\) described by matrix \(A\) maps \(f(x) \mapsto f(x+1)\); it is the translation operator

If \(A\) is diagonalizable and all of \(A\)'s eigenvalues satisfy \(|\lambda|<1\), then \(A^{n}\to \mathcal{O}\) for \(n\to\infty\)

Let \(F_n\) define the \(n\)th Fibonacci number, and define \(\vec{v_n} = \begin{bmatrix}F_{n+1}\\F_{n}\end{bmatrix}\). Then \(\vec{v_{n+1}} = \begin{bmatrix}F_{n+2}\\F_{n+1}\end{bmatrix} = \begin{bmatrix}F_{n+1}+F_{n}\\F_{n+1}\end{bmatrix} = \begin{bmatrix}1&1\\1&0\end{bmatrix}\begin{bmatrix}F_{n+1}\\F_{n}\end{bmatrix} = \begin{bmatrix}1&1\\1&0\end{bmatrix}\vec{v_n}\), so it follows that \(\vec{v_n} = \begin{bmatrix}1&1\\1&0\end{bmatrix}^{n}\begin{bmatrix}1\\1\end{bmatrix}\) since \(\begin{bmatrix}1\\1\end{bmatrix}\) are the initial conditions of the Fibonacci series. \(\begin{bmatrix}1&1\\1&0\end{bmatrix}\) has eigenvalues \(\dfrac{1\pm \sqrt{5}}{2} = \set{\phi, 1-\phi}\). Since we can find the closed form expression of \(F_n\) (component of) \(\vec{v_n} = \begin{bmatrix}1&1\\1&0\end{bmatrix}^{n}\begin{bmatrix}1\\1\end{bmatrix}\), where we find \(\begin{bmatrix}1&1\\1&0\end{bmatrix}^{n}\) by diagonalizing it; \(F_n = \dfrac{1}{\sqrt{5}}(1+\sqrt{5})^{n}- \dfrac{1}{\sqrt{5}}(1-\sqrt{5})^{n}\)

We found that since a matrix \(A\vec{v}\) represents a transformation (i.e. a function/operator) applied to \(\vec{v}\) by vector multiplication, \(A^n\vec{v}\) corresponds to it being repeated (composed) \(n\) times

Aside: since every function can be described perfectly (in the limit case) by a Taylor polynomial (within the radius of convergence) and a series can be represented as a series, and thus an \(\infty\)-dimensional vector, can every function be represented as a vector? What do linear transformations between these vectors correspond to? presumably non-linear functions? can every operator be described this way?

Topic 4 - Bases

based!

Lecture 22

Change of variables is a common technique in all of math, where a new variable that is more convenient to use is defined as a function of existing variables in an expression. When the variables we change define a coordinate system (i.e. \(x\), \(y\), etc.), this is a coordinate change; the coordinate change in linear algebra is a change of basis, since a basis is (like) a coordinate system.

Recall that \([T(\vec{v})]_{\beta'} = [T]_{\beta' \leftarrow \beta}[\vec{v}]_{\beta}\) (change of basis for transformation) and \([S \circ T]_{\beta'' \leftarrow \beta} = [S]_{\beta'' - \beta}[T]_{\beta'-\beta}\) (change-of-basis composition) for bases \(\beta\) of \(V\), \(\beta'\) of \(W\), \(\beta''\) of \(U\), and \(T:V\to W, S:W\to U\)

\(P_{\beta' \leftarrow \beta} = [\text{Id}_V(\vec{v})]_{\beta' \leftarrow \beta}\) is the change of basis matrix from \(\beta\) to \(\beta'\). Here, \(\text{Id} : \vec{v} \mapsto \vec{v}\) is the identity transformation for vector space \(V\) \(\text{Id}:\vec{v}\mapsto\vec{v}\), where \(T_{\text{Id}}=I\)

The change of basis matrix can change the basis of a vector \(\vec{v}\) (here, from \(\beta\) to \(\beta'\)): \([\vec{v}]_{\beta'} = [\text{Id}(\vec{v})]_{\beta'} = [\text{Id}]_{\beta' \leftarrow \beta}[\vec{v}]_{\beta} = P_{\beta' \leftarrow \beta} [\vec{v}]_{\beta}\). So, \(\boxed{[\vec{v}]_{\beta'} = P_{\beta' \leftarrow \beta} [\vec{v}]_{\beta}}\)

Change of basis matrices \(P\) reverse under inversion, i.e. \((P_{\beta' \leftarrow \beta})^{-1} = P_{\beta \leftarrow \beta'}\)

Let \(T:V\to W\), and \(\beta_V, \beta_V'\) be bases of \(V\), and \(\beta_W, \beta_W'\) be bases of \(W\). Then \([T]_{\beta_W' \leftarrow \beta_V'} = [\text{Id}_W \circ T \circ \text{Id}_V]_{\beta_W' \leftarrow \beta_V'} = [\text{Id}_W]_{\beta_W' \leftarrow \beta_V'}[T]_{\beta_W \leftarrow \beta_V}[\text{Id}_V]_{\beta_V'\leftarrow\beta_V} = P_{\beta_W' \leftarrow \beta_V'}[T]_{\beta_W \leftarrow \beta_V}P_{\beta_V'\leftarrow\beta_V}\)

Note: so far, we've just been treating matrices as immutable things with one representation. But, bases add an "extra dimension" to this, where matrices are represented differently in different bases. So far, we've essentially been assuming matrices are expressed in the standard basis and have omitted the notation.

Note: the concept of bases (and converting between different bases) for vectors is similar to the concept of expressing natural numbers in a base other than \(10\). In each, the underlying system describing how to calculate the "pure" value changes, but the value itself doesn't.

Lecture 23

E.g. take \(V = \mathbb{F}[x], D = \frac{d}{dx}, \beta = (x, 1), \beta'=(1,2x)\). Then, we have \([D]_{\beta\leftarrow\beta} = \begin{bmatrix}0&0\\1&0\end{bmatrix}\) (i.e. the definition of \(D\)), and \([D]_{\beta'\leftarrow\beta'} = \begin{bmatrix}0&2\\0&0\end{bmatrix}\) (since the order of basis vectors (and thus cols) is swapped, and we have \(2x\) instead of \(x\)). We find by inspection that \(P_{\beta\leftarrow\beta'}=\begin{bmatrix}0&2\\1&0\end{bmatrix}\), so \(P_{\beta'\leftarrow\beta}=(P_{\beta\leftarrow\beta'})^{-1} = \begin{bmatrix}0&1\\\frac{1}{2}&0\end{bmatrix}\), we have \(\begin{bmatrix}0&1\\\frac{1}{2}&0\end{bmatrix}\begin{bmatrix}0&0\\1&0\end{bmatrix}\begin{bmatrix}0&2\\1&0\end{bmatrix}=\begin{bmatrix}0&2\\0&0\end{bmatrix}\) as desired

\(n\times n\) matrices \(A\) and \(B\) are similar iff there exists invertible matrix \(P\) such that \(B = P^{-1}AP\)

\([T]_{\beta' \leftarrow \beta'} = (P_{\beta' \leftarrow \beta})^{-1}[T]_{\beta \leftarrow \beta}P_{\beta \leftarrow \beta'}\), shows that for bases \(\beta\) and \(\beta'\), \([T]_{\beta' \leftarrow \beta'}\) and \([T]_{\beta \leftarrow \beta}\) are similar. Conversely, given matrix \(B\) similar to \([T]_{\beta \leftarrow \beta}\), we can find basis \(\beta'\) such that \([T]_{\beta' \leftarrow \beta'}=B\)

Lecture 24

If we have \(D = P^{-1}AP\) for diagonal \(A\) and \(D= \begin{bmatrix} \lambda_1 & \dots & 0 \\\vdots & \ddots & \vdots \\ 0 & \dots & \lambda_n \end{bmatrix}\), we can choose \(P\) to be a permutation matrix (since \(A\) and \(D\) are both diagonal). Then, \(A\) will be \(D= \begin{bmatrix} \lambda_{\pi(1)} & \dots & 0 \\\vdots & \ddots & \vdots \\ 0 & \dots & \lambda_{\pi(n)} \end{bmatrix}\), where \(\pi(n)\) is the permutation described by \(P\).

Every matrix, even non-diagonalizable ones, is similar to an upper-triangular matrix, i.e. every matrix where the entry below the diagonal is \(0\).

Every non-diagonalizable \(2\times 2\) matrix is in the form \(\begin{bmatrix} a & b \\ 0 & a \end{bmatrix}\) for \(b\ne 0\) (since if the diagonal entries (eigenvalues) were different, they'd be distinct → diagonalizable). With \(P=\begin{bmatrix} 1 & 0 \\ 0 & \frac{1}{b} \end{bmatrix}\), this is similar to \(\begin{bmatrix} a & 1 \\ 0 & a \end{bmatrix}\).

Topic 5 - Jordanstuff and Generalized Eigenstuff

Lecture 24

The Jordan block of \(\lambda\), denoted \(J_n(\lambda)\), is the \(n\times n\) matrix equivalent of \(\begin{bmatrix} \lambda & 1 & 0 & 0 \\ 0 & \lambda & 1 & 0 \\ 0 & 0 & \lambda & 1 \\ 0 & 0 & 0 & \lambda \end{bmatrix}\), where all the diagonal entries are \(\lambda\) and the entries directly above them are \(1\) (note that this matrix is upper triangular)

We defined the direct sum/block sum \(A \oplus B\) as the "composite" matrix \(\begin{bmatrix} [A] & \mathcal{O_{n}} \\ \mathcal{O_{n}} & [B] \end{bmatrix}\), where \(\mathcal{O_{n}}\) is the \(n\times n\) zero matrix and \([A], [B]\) are the entries of \(A\) and \(B\).

Lecture 25

For \(n\times n\) matrices \(A\), \(A'\) and \(m\times m\) matrices \(B\), \(B'\), the direct sum is "distributive":

The Jordan block \(J_m(\lambda)\) has characteristic polynomial \((\lambda-x)^m\), so it has one eigenvalue (namely, \(x=\lambda\)) with algebraic multiplicity \(m\) and geometric multiplicity \(1\)

Lecture 26

Jordan Canonical Form Theorem

Theorem
If \(A\) is a square matrix, then its similarity class has a matrix of the form \((J_{m_{1, 1}}(\lambda_1) \oplus J_{m_{1, 2}}(\lambda_1) \oplus \dots \oplus J_{m_{1, r}}(\lambda_1))\) \(\oplus\) \((J_{m_{2, 1}}(\lambda_2) \oplus \dots \oplus J_{m_{2, r}}(\lambda_2))\) \(\oplus \dots \oplus\) \((J_{m_{k, 1}}(\lambda_j) \oplus \dots \oplus J_{m_{j, r}}(\lambda_k))\) where \(\lambda_1 \dots \lambda_k\) are the unique eigenvalues of \(A\); this is the Jordan canonical form of \(A\).

A matrix \(A\) is diagonalizable iff all \(m_{i, j}=1\), i.e. if all the Jordan blocks are as small as possible, and JCF of \(A\) is a (the) diagonal matrix.

Minimal Polynomial

Definition
The minimal polynomial \(m_A(x)\) of \(k\times k\) matrix \(A\) is the unique* polynomial with the following properties

  1. \(m_A\) is monic, i.e. its leading coefficient is \(1\)
  2. \(M_A (A)=\mathcal{O}_k\)1
  3. \(m_A\) is has the smallest degree possible while satisfying both 1) and 2)

"Minimal Polynomial Theorem"

Theorem
If \(A\) is an \(n\times n\) matrix, we have

  1. The minimal polynomial \(m_A(x)\) of \(A\) exists an is unique
  2. For any \(p(x)\) where \(p(A)=0\), then \(m_A(x)\) divides \(p(x)\). So, \(p(x)=m_A(x)q(x)\)
  3. \(A^{d-1}, A^{d-2}, \dots, A, I\) are linearly independent in \(M_{n\times n}(\mathbb{F})\) where \(m_A(x)\) has degree \(d\). However, \(A^{d}, A^{d-1}, \dots\) are not linearly independent (found by plugging \(A\) into \(m_A\))
  4. Any matrix similar to \(A\) has the same minimal polynomial

Lecture 27

The proof for the minimal polynomial theorem is found [[Lecture27-Mar20.pdf|here]]. The proofs for each point can be sketched as follows

  1. Somewhat trivial
  2. We must have \(p(x)=m_A(x)q(x)+r(x)\) (number theory!), plugging in \(A\) turns \(p(x)\) and \(m_A(x)\) to \(0\), deriving the point
  3. Proof straightforward from definition of linear dependence (also shows uniqueness for 1.)
  4. Plug in \(B=P^{-1}AP\) into the polynomial form of \(m_A(x)\), \(P^{-1}\) and \(P\)s "telescope" to cancel

Cayley-Hamilton Theorem

Theorem
For \(n\times n\) matrix \(A\) with characteristic polynomial \(C_A(\lambda)\), then \(C_A(A)=\mathcal{O}\)

Lecture 28

Since addition and scaling "distribute" over direct summation \(\oplus\), "taking a polynomial" does as well, i.e. \(p(A\oplus B)=p(A)\oplus p(B)\) for polynomial \(p\).

For polynomials \(p(x)\), \(q(x)\) and matrix \(A\), \(p(A)q(A)=q(A)p(A)\), i.e. multiplication of matrix-polynomial compositions is commutative

The generalized eigenspace \(GE_\lambda\) of matrix \(A\) and arbitrary (existentially quantified) number \(\lambda\in \mathbb{F}\) is defined as the space of \(\vec{v}\) where \((A-\lambda I)^{\ell}\vec{v}=\vec{0}\) for some \(\ell \in \mathbb{N}\).

Lecture 29

The proof that \(GE_\lambda\) is a subspace is sketched as follows:

  1. Nonempty: clearly \(\vec{0}\in GE_\lambda\)
  2. Closure under \(+\): whichever \(\vec{u}, \vec{v}\) has the lowest corresponding \(\ell\) is in the generalized eigenspace of the other; closure is inherited because \(GE_\lambda\) is a null space
  3. Closure under \(\cdot\): inherited directly from the null space definition as well

For any matrix \(A\) over field \(\mathbb{F}\) with characteristic polynomial \((\lambda_1-x)^{m_1}(\lambda_2-x)^{m_2}\dots(\lambda_k-x)^{m_k}\), \(GE_{\lambda_1}\oplus GE_{\lambda_2}\oplus \dots \oplus GE_{\lambda_k} = \mathbb{F}^n\)

Lecture 30

If a matrix \(N\) consists of a diagonal set of \(1\)s (after the main diagonal of the matrix, like \(J_n(\lambda)-\lambda I\)), then \(N^2\) is a diagonal set of \(1\)s on the next diagonal, \(N^3\) on the one after that, etc. until \(N^n=\mathcal{O}\).

Finding the JCF of a matrix ("dots" method)

Procedure
For a matrix \(A\), we find the characteristic polynomial \(C_A(\lambda)\) and factor it to find eigenvalues \(\lambda_1 \dots \lambda_k\) (factoring may need to include \(\mathbb{C}\)). For each eigenvalue \(\lambda\)

The vertical groupings of dots correspond to the Jordan blocks that correspond to a given \(\lambda\). So, the JCF of \(A\) is the direct sum of all of the Jordan blocks for all of the eigenvalues.

Since the JCF \(J\) of \(A\) is similar to \(A\), we have \(A = P^{-1}JP\). The columns of \(P\) are formed from the vectors that need to be added to the basis of \(\text{Null}((A-\lambda I)^{n})\) to form \(\text{Null}((A-\lambda I)^{n+1})\)

If \(\vec{v_1}, \vec{v_2}, \dots\) are columns in \(P\) that correspond to the same eigenvalue, then we must have \(\vec{v_{n}}=N\vec{v_{n+1}}\) where \(N=J_n(\lambda)-\lambda I\)

Topic 6 - Inner Products and Inner Proudct Spaces

Lecture 31

Inner Product

Definition
For vector space \(V\) over \(\mathbb{R}\), the pairing \(\langle \vec{u}, \vec{v} \rangle\) is an inner product if it has the following properties

  1. Symmetry: \(\langle \vec{u}, \vec{v} \rangle = \langle \vec{v}, \vec{u} \rangle\)
  2. Bilinearity: \(\langle \vec{u}, a\vec{v} + b\vec{w} \rangle = a\langle \vec{u}, \vec{v} \rangle + b\langle \vec{u}, \vec{w} \rangle\)
    • This can be generalized (using symmetry) to apply to both arguments
  3. Positivity: \(\langle \vec{v}, \vec{v} \rangle > 0\) for all nonzero \(\vec{v}\)

Inner Product Space

Definition
A vector space \(V\) with inner product \(\langle \vec{u}, \vec{v} \rangle\) form an inner product space

For \(\mathbb{R}^n\), \(\langle \vec{u}, \vec{v} \rangle = \displaystyle\sum\limits_{i=1}^{n} u_i v_i\) is a common inner product known as the dot product, denoted \(\vec{u}\cdot\vec{v}\)

For vector \(\vec{v}\), we define its length (or norm) as \(\sqrt{\langle \vec{v}, \vec{v} \rangle}\), denoted \(\|\vec{v}\|\)

We can derive the following from the inner product definition

We have the following examples and non-examples of inner products for other domains

The [[Lecture31-Apr3.pdf|Lecture notes]] have an example for determining \(\|\vec{u}+\vec{v}\|\) given the inner product definition, \(\|\vec{u}\|\) and \(\|\vec{v}\|\). Generally, this is done by expanding (using bilinearity) and possibly using symmetry to match.

Lecture 32

Cauchy-Schwarz Inequality

Theorem
Let \(V\) and \(\langle \vec{u}, \vec{v}\rangle\) be an inner product space. The Cauchy-Schwarz inequality states that \(-\|\vec{u}\|\|\vec{v}\| \geq \langle \vec{u}, \vec{v} \rangle \geq \|\vec{u}\|\|\vec{v} \|\)

The angle \(\alpha\) between vectors \(\vec{v}, \vec{u}\) in inner space \(V\) is the unique \(0 \leq \alpha \leq \pi\) such that \(\cos{\alpha}=\dfrac{\langle\vec{u}, \vec{v}\rangle}{\|\vec{u}\|\|\vec{v}\|}\)

Vectors \(\vec{u}, \vec{v}\) are parallel if \(\langle \vec{u}, \vec{v} \rangle = \| \vec{u} \|\| \vec{v} \|\) (i.e. \(\alpha=0\)) and orthogonal if \(\langle \vec{u}, \vec{v} \rangle = 0\) (i.e. \(\alpha = \dfrac{\pi}{2}\))

Aside: for the dot product in \(\mathbb{R}^2\) (inner space), in polar coordinates, we get \(\vec{u}=\begin{bmatrix} \| \vec{u}\| \cos{\theta} \\ \|\vec{u} \| \sin{\theta} \end{bmatrix}\) and \(\vec{v}=\begin{bmatrix} \| \vec{v}\| \cos{\varphi} \\ \|\vec{v} \| \sin{\varphi} \end{bmatrix}\), so \(\vec{u}\cdot \vec{v} = \dots = \| \vec{u}\|\| \vec{v}\| \cos(\varphi-\theta)\). This is where this definition of the dot product comes from, and implies that the dot product measures how close the angles of vectors are.

A basis \(\beta = (\vec{b}_1, \vec{b}_1, \dots, \vec{b}_n)\) is an orthogonal basis if \(\langle \vec{b}_i, \vec{b}_j \rangle = 0\) for all \(i\ne j\)

Lecture 33

"Orthogonal Decomposition Theorem"

Theorem
Let \(V\) be an inner product space with dimension \(n\); let \(\vec{v_1}, \dots, \vec{v_n}\) be orthogonal, so any pairwise \(\langle \vec{u_i}, \vec{v_i} \rangle=0\). Then \((\vec{v_1}, \dots, \vec{v_n})\) is an orthogonal basis if and only if all \(\vec{v_i}\ne \vec{0}\)

Cauchy-Schwarz refinement

Theorem
If \(\langle \vec{u}, \vec{v} \rangle = \| \vec{u} \|\| \vec{v} \|\) and \(\vec{v}\ne \vec{0}\), then \(\vec{u} = a \vec{v}\) for some scalar \(a \geq 0\), i.e. \(\vec{u}\) and \(\vec{v}\) are linearly dependent.

We can use Cauchy-Schwarz to define optimization problems by reinterpreting the equation whose parameter we want to optimize as an (instance of an) inner product, using Cauchy-Schwarz to define bounds, then using Cauchy-Schwarz refinement to find an expression based on the bound(s).

Lecture 33

The Gram-Schmidt process is used to construct an orthogonal basis from an basis (or even any set of vectors) in an inner product space.

Gram-Schmidt Process

Algorithm
For basis \(\vec{v_1}, \dots, \vec{v_n}\) in an inner product space, we can construct a basis \(\vec{u_1}, \dots, \vec{u_n}\) for \(\text{Span}(\vec{v_1}, \dots, \vec{v_n})\) that is orthogonal in that inner product space. We do this with the following process:

In general, we have \(\vec{u_k}=\vec{v_k} - \displaystyle\sum\limits_{\ell = 1}^{k-1} \frac{\langle \vec{v_k}, \vec{u_\ell} \rangle}{\|\vec{u_\ell} \|^2}\vec{u_\ell}\)

Lecture 35

If we perform the Gram-Schmidt process on linearly dependent \(\vec{v_1}, \dots, \vec{v_n}\), then at least one \(\vec{u}\) will be \(\vec{0}\).

Application: the least-squares solution for system of equations \(A \vec{x}=\vec{b}\) is the \(\hat{\vec{x}}\) that minimizes the error, namely \(\| A \hat{\vec{x}} -\vec{b} \|\). If this system is consistent, then \(A \vec{x}\) is in the column space of \(A\), and a "full" solution exists. When it's not consistent (i.e. in almost every practical application), \(A \hat{\vec{x}}=\vec{b}\) must be orthogonal to the column space, why???, i.e. to all the columns of \(A\). So, the least squares solution is any solution to \(A^{\top}(A \hat{\vec{x}}-\vec{b})=\vec{0}\implies A^{\top}A\hat{\vec{x}}=A^{\top}\vec{b}\).


  1. \(M_A (A)\) are you \(\mathcal{O}_k\)? So, \(M_A (A)\) are you \(\mathcal{O}_k\)? Are you \(\mathcal{O}_k\), \(M_A (A)\)? You've been hit by, you've been struck by, a polynomial↩︎