Linear Algebra


Notes on linear algebra

Bases

A basis for a vector space VV is a set of linearly independent vectors {v1,v2,...,vn}\{ \vec{v_1}, \vec{v_2}, ..., \vec{v_n} \} such that span({v1,v2,...,vn})=Vspan(\{ \vec{v_1}, \vec{v_2}, ..., \vec{v_n} \}) = V. A vector space may have more than one basis, but all bases for a vector space have the same number of elements. Any vector of a vector space can be expressed as a linear combination of a basis of such vector space.

The basis of a vector can be explicitly denoted using subscript notation. For example if v\vec{v} is expressed in terms of basis BB, then we can say [v]B[\vec{v}]_{B}.

More formally, given vector space VV, the set BB is a basis for VV if span(B)=Vspan(B) = V and if removing any element from BB stops making BB a basis for VV: b(B)\{}span(B\b)V\forall b \in \wp(B) \setminus \{ \emptyset \} \bullet span(B \setminus b) \neq V. A basis BB for VV is maximal, which means that adding any element to BB makes it a linearly dependent vector set.

Orthogonal and Orthonormal Bases

Two vectors u\vec{u} and v\vec{v} are orthogonal if u,v=0\langle \vec{u}, \vec{v} \rangle = 0. Given a vector space VV, a set of vectors {e1,e2,...,en}\{ e_1, e_2, ..., e_n \} is an orthogonal basis for VV if each eie_i is orthogonal to all the other vectors in the basis: ei,ej{e1,e2,...,en}eiejei,ej=0\forall e_i, e_j \in \{ e_1, e_2, ..., e_n \} \mid e_i \neq e_j \bullet \langle e_i, e_j \rangle = 0.

The unit vector of vector vv is v̂=vv\hat{v} = \frac{v}{\parallel v \parallel}. The set {ê1,ê2,...,ên}\{ \hat{e}_1, \hat{e}_2, ..., \hat{e}_n \} is an orthonormal basis for VV if every êi\hat{e}_i is the unit vector of a vector eie_i in an orthogonal basis for VV. A vecto v\vec{v} can be expressed in terms of an orthonormal basis {ê1,ê2,...,ên}\{ \hat{e}_1, \hat{e}_2, ..., \hat{e}_n \} as v=v,ê1ê1+v,ê2ê2+...+v,ênên\vec{v} = \langle \vec{v}, \hat{e}_1 \rangle \hat{e}_1 + \langle \vec{v}, \hat{e}_2 \rangle \hat{e}_2 + ... + \langle \vec{v}, \hat{e}_n \rangle \hat{e}_n.

The Gram–Schmidt process allows us to obtain an orthogonal basis {e1,e2,...,en}\{ e_1, e_2, ..., e_n \} given any basis {v1,v2,...,vn}\{ v_1, v_2, ..., v_n \}. If we have an orthogonal basis, its trivial to obtain an orthonormal basis by calculating the unit vector of each vector in the orthogonal basis. The process is defined as en=vni=1n1(eî,vneî)e_n = v_n - \sum_{i=1}^{n - 1} (\langle \hat{e_i}, v_n \rangle \hat{e_i}) so basically:

Matrix Bases

Given matrix AA:

We can setup the system of equations rref(A)x=0rref(A) \cdot \vec{x} = \vec{0}. If AA is an n×mn \times m matrix, then x\vec{x} contains mm elements.

For example, given [120000130000]\begin{bmatrix}1 & 2 & 0 & 0 \\ 0 & 0 & 1 & -3 \\ 0 & 0 & 0 & 0\end{bmatrix}, then the equation is [120000130000][x1x2x3x4]=[000]\begin{bmatrix}1 & 2 & 0 & 0 \\ 0 & 0 & 1 & -3 \\ 0 & 0 & 0 & 0\end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix} = \begin{bmatrix}0 \\ 0 \\ 0 \end{bmatrix}.

The goal is to re-express x\vec{x} so that the elements corresponding to columns in AA with pivots are expressed in terms of the elements corresponding to columns without pivots.

In our example, the columns of AA with pivots are the first and the third one, so we need to express x1x_1 and x3x_3 in terms of x2x_2 and x4x_4. Expanding the equations gives us:

So we now know that [x1x2x3x4]=[2x2x23x4x4]\begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix} = \begin{bmatrix}-2x_2 \\ x_2 \\ 3x_4 \\ x_4\end{bmatrix}.

Finally, we can express x\vec{x} as a linear combination over the terms corresponding to columns without pivots. The coefficients of such linear combination are a basis for 𝒩(A)\mathcal{N}(A).

In our example, [2x2x23x4x4]=[2100]x2+[0031]x4\begin{bmatrix}-2x_2 \\ x_2 \\ 3x_4 \\ x_4\end{bmatrix} = \begin{bmatrix}-2 \\ 1 \\ 0 \\ 0\end{bmatrix}x_2 + \begin{bmatrix}0 \\ 0 \\ 3 \\ 1\end{bmatrix}x_4, so the basis is {[2100],[0031]}\{ \begin{bmatrix}-2 \\ 1 \\ 0 \\ 0\end{bmatrix}, \begin{bmatrix}0 \\ 0 \\ 3 \\ 1\end{bmatrix} \}.

Change of Basis

The identity transformation matrix changes the basis of a matrix to another basis of the same vector space. The identity transformation matrix that changes from basis B1B_1 to bases B2B_2 is B2[1]B1{}_{B_2}[\mathbb{1}]_{B_1}. Notice an identity transformation matrix is not equal to the identity matrix 1\mathbb{1}, even though they re-use the same symbol.

Given B1={e1,e2,e3}B_1 = \{ e_1, e_2, e_3 \} and B2={t1,t2,t3}B_2 = \{ t_1, t_2, t_3 \}, then B2[1]B1{}_{B_2}[\mathbb{1}]_{B_1} consists of all the dot products eitje_i \cdot t_j:

B2[1]B1=[t1e1t1e2t1e3t2e1t2e2t2e3t3e1t3e2t3e3] {}_{B_2}[\mathbb{1}]_{B_1} = \begin{bmatrix} t_1 \cdot e_1 & t_1 \cdot e_2 & t_1 \cdot e_3 \\ t_2 \cdot e_1 & t_2 \cdot e_2 & t_2 \cdot e_3 \\ t_3 \cdot e_1 & t_3 \cdot e_2 & t_3 \cdot e_3 \end{bmatrix}

Notice that (B2[1]B1)1=B1[1]B2({}_{B_2}[\mathbb{1}]_{B_1})^{-1} = {}_{B_1}[\mathbb{1}]_{B_2}.

Linear Independence

A set of vectors {v1,...,vn}\{ \vec{v_1}, ..., \vec{v_n} \} is linearly independent if the only solution to the equation α1v1+...+αnvn=0\alpha_1 \vec{v_1} + ... + \alpha_n \vec{v_n} = \vec{0} is 00 for all αn\alpha_n.

We can also say a set V={v1,...,vn}V = \{ \vec{v_1}, ..., \vec{v_n} \} where no viv_i is the zero vector is linearly independent if no vector from the set is in the span of the other vectors: vVvspan(V\{v})\forall v \in V \bullet v \notin span(V \setminus \{ v \}).

Another way to express that the set of vectors in VV are linearly independent is that every vector in span(V)span(V) has a unique expression as a linear combination of vectors in VV.

The rows of a matrix AA are linearly independent if det(A)0det(A) \neq 0

Linear Combinations

A linear combination is an algebraic expression consisting on the sum of terms and constants of the form: a1x1+a2x2+...+anxna_1 x_1 + a_2 x_2 + ... + a_n x_n where the set of xnx_n are terms (with exponent 1) and the set ana_n are their corresponding constants. Linear combinations are degree-1 polynomials.

In a linear combination a1x1+a2x2+...+anxna_1 x_1 + a_2 x_2 + ... + a_n x_n, the xnx_n terms correspond to the basis in which we are expressing the linear combination.

Span

The span of a set of vectors is the set of all vectors that can be constructed as linear combinations of those vectors.

Consider that given v1\vec{v_1}, v2\vec{v_2}, and v3=v1+v2\vec{v_3} = \vec{v_1} + \vec{v_2}, then span({v1,v2,v3})=span({v1,v2})span(\{ \vec{v_1}, \vec{v_2}, \vec{v_3} \}) = span(\{ \vec{v_1}, \vec{v_2} \}), as v3\vec{v_3} can be expressed as a linear combination of the other two.

A set of vectors VV is spanned by {v1,...,vn}\{ \vec{v_1}, ..., \vec{v_n} \} if any vector in VV can be expressed as a linear combination of the vectors in {v1,...,vn}\{ \vec{v_1}, ..., \vec{v_n}\}.

Vector Spaces

A vector space is a set that consists of a number of linearly independent vectors and all linear combinations of those vectors. Vector spaces must be closed under addition and scalar multiplication, which means that, given vector space VV:

An abstract vector space is defined as (V,F,+,)(V, F, +, \cdot) where:

The addition function ++ and the set VV must have the following properties:

The scalar function \cdot and the set FF must have the following properties:

Vector spaces define an inner product ,:V×V\langle \cdot, \cdot \rangle : V \times V \mapsto \mathbb{R} function that is:

Defining an inner product automatically defines the length/norm operator v=v,v\parallel \vec{v} \parallel = \sqrt{\langle \vec{v}, \vec{v} \rangle} and the distance operator d(u,v)=uv=(uv),(uv)d(\vec{u}, \vec{v}) = \parallel \vec{u} - \vec{v} \parallel = \sqrt{\langle (\vec{u} - \vec{v}), (\vec{u} - \vec{v}) \rangle}. Both operations have the following characteristics given a valid inner product definition:

Subspaces

A vector space WVW \subseteq V is a vector subspace of VV if:

Notice vector subspaces always contain the zero vector, as in order for a vector space to be closed under scalar multiplication, it must hold that wW0wW\forall \vec{w} \in W \bullet 0\vec{w} \in W for the scalar zero, and multiplication with the scalar zero always yields the zero vector.

One way to define a vector subspace is to constrain a larger vector space. Given 3\mathbb{R}^3, we can define a bi-dimensional vector subspace as {(x,y,z)3(0,0,1)(x,y,z)=0}\{ (x, y, z) \in \mathbb{R}^3 \mid (0, 0, 1) \cdot (x, y, z) = 0 \}. Another way is to define vector subspaces using spanspan. The bi-dimensional vector subspace of 3\mathbb{R}^3 is also defined as span({(1,0,0),(0,1,0)})span(\{ (1, 0, 0), (0, 1, 0)\}).

Orthogonal Complement

Given vector space QQ and a vector subspace PQP \subseteq Q, PP^{\perp} is the orthogonal complement of PP in vector space QQ, defined as: P={qQpPqp=0}P^{\perp} = \{ \vec{q} \in Q \mid \forall \vec{p} \in P \bullet \vec{q} \cdot \vec{p} = 0 \}.

Dimension

The dimension of vector space SS, denoted dim(S)dim(S), is the cardinality (number of elements) in a basis of SS. Every possible basis of a vector space has the same dimension.

The following laws hold given an n×mn \times m dimensional matrix MM:

Zero Vector

The zero vector 0\vec{0} of a vector space VV is a vector such that xVx+0=0+x=x\forall \vec{x} \in V \bullet \vec{x} + \vec{0} = \vec{0} + \vec{x} = \vec{x}.

Linear Transformations (or Map)

A linear transformation (also called linear map or linear function), is a function that maps vectors to vectors, and that preserves the following property, assuming function ff and linear combination αx1+βx2\alpha \vec{x_1} + \beta \vec{x_2}:

f(αx1+βx2)=αf(x1)+βf(x2)f(\alpha \vec{x_1} + \beta \vec{x_2}) = \alpha f(\vec{x_1}) + \beta f(\vec{x_2})

Which in turn implies that:

Given a linear transformation ff that maps an nn dimensional vector space to an mm dimensional vector space, if ff is a bijective function, then n=mn = m, and it means that ff is a one to one mapping between vector spaces.

Consider a linear transformation f:VWf : V \mapsto W and v1,v2V\vec{v_1}, \vec{v_2} \in V. If we know that f(v1)=w1f(\vec{v_1}) = \vec{w_1} and that f(v2)=w2f(\vec{v_2}) = \vec{w_2}, then f(αv1+βv2)=αw1+βw2f(\alpha \vec{v_1} + \beta \vec{v_2}) = \alpha \vec{w_1} + \beta \vec{w_2}, which means we know how ff will behave for any linear combination of v1\vec{v_1} and v2\vec{v_2}. This is important as if we know how a linear transformation behaves for a basis of a vector space, then we know how it behaves for the whole vector space.

Kernel

The kernel of a linear transformation t:VWt : V \mapsto W is the set of vectors from VV that map to the zero vector: Ker(t)={vVt(v)=0}Ker(t) = \{ \vec{v} \in V \mid t(\vec{v}) = \vec{0} \}. Notice that if Ker(t)={0}Ker(t) = \{ \vec{0} \}, then tt is an injective function.

Image Space

The image space of a linear transformation t:VWt : V \mapsto W, denoted Im(t)Im(t) is the range of tt, which is the set of vectors from WW that the function can produce. Notice that if Im(t)=WIm(t) = W, then tt is a surjective function.

Matrix Representation

If we have a linear transformation f:VWf : V \mapsto W and a basis BvB_v for VV and a basis BwB_w for WW, then we can express ff as a matrix MfM_{f} such that applying the transformation to a vector is equivalent to multiplying the matrix with the vector: given vV\vec{v} \in V and wW\vec{w} \in W, then f(v)=wMfv=wf(\vec{v}) = \vec{w} \iff M_{f} \vec{v} = \vec{w}. Notice the matrix is not the linear transformation, but a representation of the linear transformation with respect to certain bases.

Any linear transformation that maps an nn dimensional vector space to a mm dimensional vector space can be represented as an m×nm \times n matrix.

Matrix MfM_f “takes” a vector in basis BvB_v and outputs a vector in basis BwB_w, so its sometimes more explicitly denoted as Bw[Mf]Bv{}_{B_w}[M_f]_{B_v}, writing the input basis at the right, and the output basis at the left.

Correspondences between linear transformations and their matrix representations, given v\vec{v}, ff, ss, MfM_f, and MsM_s:

In order to find the matrix representation with respect to a basis of a linear transformation, apply the linear transformation to all the vectors in the chosen basis and use the results as columns of the matrix representation.

For example, consider 3\mathbb{R}^3, basis {(0,0,1),(0,1,0),(1,0,0)}\{ (0, 0, 1), (0, 1, 0), (1, 0, 0) \}, and a linear transformation f((x,y,z))=(x,y,0)f((x, y, z)) = (x, y, 0). Then Mf=[001010000]M_f = \begin{bmatrix}0 & 0 & 1 \\ 0 & 1 & 0 \\ 0 & 0 & 0\end{bmatrix} as f((0,0,1))=(0,0,0)f((0, 0, 1)) = (0, 0, 0), f((0,1,0))=(0,1,0)f((0, 1, 0)) = (0, 1, 0), and f((1,0,0))=(1,0,0)f((1, 0, 0)) = (1, 0, 0).

We can express a matrix transformation in terms of different bases by surrounding it with the corresponding identity transformation matrices. For example, given Bv[Mf]Bv{}_{B_v}[M_f]_{B_v}, then Bw[Mf]Bw=Bw[1]BvBv[Mf]BvBv[1]Bw{}_{B_w}[M_f]_{B_w} = {}_{B_w}[\mathbb{1}]_{B_v} {}_{B_v}[M_f]_{B_v} {}_{B_v}[\mathbb{1}]_{B_w}, which basically means that we transform the input matrix to the original’s transformation basis, apply the linear transformation, and then change to the new basis again.

Inverse

A linear transformation f:VWf : V \mapsto W is invertible if its either injective, which implies Ker(f)={0}Ker(f) = \{ \vec{0} \}, or surjective, which implies Im(t)=WIm(t) = W. It is also invertible if f1vf1(f(v))=v\exists f^{-1} \bullet \forall \vec{v} \bullet f^{-1}(f(\vec{v})) = \vec{v}. If such linear transformation is invertible, then its matrix representation MfM_f is invertible as well, and viceversa.

Given linear transformation ff and its matrix representation MfM_f in terms of a certain basis, then Mf1M_f^{-1} corresponds to f1f^{-1}. Notice that given vector v\vec{v}, if Mf1M_f^{-1} is invertible, then Mf1Mfv=vM_f^{-1} M_f \vec{v} = \vec{v}.

Affine Transformations

An affine transformation is a function q:VWq : V \mapsto W that maps vector spaces, which is a combination of a linear transformation tt and a translation by a fixed vector b\vec{b}: q(x)=t(x)+bq(\vec{x}) = t(\vec{x}) + \vec{b}, or given the matrix representation MtM_t, q(x)=Mtx+bq(\vec{x}) = M_t \vec{x} + \vec{b}.

Systems of Linear Equations

Using RREFs

We can solve a system of nn linear equations given mm terms by constructing an n×m+1n \times m + 1 matrix where the last column correspond to the constants at the right of the equal sign and computing its RREF. The last column of the RREF contains the solutions for each corresponding pivot term.

The system of equations has no solutions if the contructed matrix is not linearly independent, in which case its RREF contains zero coefficients with a potentially non-zero constant at the end.

For example, consider 1x+2y=51x + 2y = 5 and 3x+9y=213x + 9y = 21. The resulting matrix is [1253921]\begin{bmatrix}1 & 2 & 5 \\ 3 & 9 & 21\end{bmatrix}. Then, rref([1253921])=[101012]rref(\begin{bmatrix}1 & 2 & 5 \\ 3 & 9 & 21\end{bmatrix}) = \begin{bmatrix}1 & 0 & 1 \\ 0 & 1 & 2\end{bmatrix}, so the solution set is x=1x = 1 and y=2y = 2.

Using Inverses

We can solve a system of nn linear equations given mm terms by expressing it as a matrix equation Ax=bA\vec{x} = \vec{b} of an nn square (otherwise there is no inverse) matrix AA containing the coefficients multiplied by an nn vector x\vec{x} containing the terms, all equal to an nn vector b\vec{b} containing the right-hand side constants. Using the inverse of AA, we can re-express Ax=bA\vec{x} = \vec{b} as A1Ax=A1bA^{-1} A \vec{x} = A^{-1} \vec{b}, which in turn equals x=A1b\vec{x} = A^{-1} \vec{b} as A1A=1A^{-1} A = \mathbb{1}, and then compute A1bA^{-1} \vec{b} to get the solution set.

Notice we multiply Ax=bA\vec{x} = \vec{b} as A1Ax=A1bA^{-1} A \vec{x} = A^{-1} \vec{b} and not as AxA1=bA1A\vec{x} A^{-1} = \vec{b} A^{-1} as matrix multiplication is not commutative and AxA1A1AxA\vec{x} A^{-1} \neq A^{-1} A \vec{x}.

For example, consider 1x+2y=51x + 2y = 5 and 3x+9y=213x + 9y = 21. The initial matrix equation is [1239][xy]=[521]\begin{bmatrix}1 & 2 \\ 3 & 9\end{bmatrix} \begin{bmatrix}x \\ y \end{bmatrix} = \begin{bmatrix}5 \\ 21\end{bmatrix}. The inverse of the coefficient matrix is [323113]\begin{bmatrix}3 & -\frac{2}{3} \\ -1 & \frac{1}{3}\end{bmatrix} so we can re-write our equation as [xy]=[323113][521]\begin{bmatrix}x \\ y\end{bmatrix} = \begin{bmatrix}3 & -\frac{2}{3} \\ -1 & \frac{1}{3}\end{bmatrix} \begin{bmatrix}5 \\ 21\end{bmatrix}, so then:

Using Determinants (Cramer’s Rule)

Given a system of nn linear equations with mm terms, consider an n×mn \times m coefficient matrix CC and an v\vec{v} term vector. The matrix CmC_m is the matrix CC with the column corresponding to the term mm replaced by the term vector. If the coefficient matrix CC is [c1c2c3c4]\begin{bmatrix}c_1 & c_2 \\ c_3 & c_4\end{bmatrix} and the term xx corresponds to the first column, then Cx=[vxc2vyc4]C_x = \begin{bmatrix}v_x & c_2 \\ v_y & c_4\end{bmatrix}. The value of mm is then det(Cm)det(C)\frac{det(C_m)}{det(C)}.

For example, consider 1x+2y=51x + 2y = 5 and 3x+9y=213x + 9y = 21. The coefficient matrix is [1239]\begin{bmatrix}1 & 2 \\ 3 & 9 \end{bmatrix} and the terms vector is [521]\begin{bmatrix}5 \\ 21\end{bmatrix}. det([1239])=3det(\begin{bmatrix}1 & 2 \\ 3 & 9 \end{bmatrix}) = 3, so x=det([52219])÷3=1x = det(\begin{bmatrix} 5 & 2 \\ 21 & 9\end{bmatrix}) \div 3 = 1 and y=det([15321])÷3=2y = det(\begin{bmatrix}1 & 5 \\ 3 & 21\end{bmatrix}) \div 3 = 2.

Eigenvalues and Eigenvectors

The value λ\lambda is an eigenvalue of AA if there exists a vector eλ\vec{e}_{\lambda} (the corresponding eigenvector of λ\lambda) such that multiplying AA by the vector is equal to scaling the vector by the eigenvalue: Aeλ=λeλA \vec{e}_{\lambda} = \lambda \cdot \vec{e}_{\lambda}.

The list of eigenvalues of AA is denoted eig(A)eig(A) and consists of the list of λi\lambda_i such that 𝒩(Aλi1){0}\mathcal{N}(A - \lambda_i \mathbb{1}) \neq \{ \vec{0} \}. The list of eigenvalues may contain duplicates. A repeated eigenvalue is degenerate and its algebraic multiplicity corresponds to the number of times it appears on the list.

In order to find the eigenvectors of a matrix, calculate the eigenspace corresponding to each of the eigenvalues of the matrix.

Eigenspaces

Given matrix AA and eigenvalue λi\lambda_i, then Eλi=𝒩(Aλi1)E_{\lambda_i} = \mathcal{N}(A - \lambda_i \mathbb{1}) is the eigenspace that corresponds to the eigenvalue λi\lambda_i. Eigenvectors that come from different eigenspaces are guaranteed to be linearly independent.

Every eigenspace contains at least one non-zero eigenvector that corresponds to the eigenvalue, and may contain more than one for degenerate eigenvalues. The amount of eigenvectors for a single eigenvalue is the geometric multiplicity. A matrix with a degenerate eigenvalue of algebraic multiplicity nn but m<nm \lt n eigenvectors for it has deficient geometric multiplicity.

The null space of a matrix AA is called the zero eigenspace as applying any vector from the null space to the matrix is equivalent to multiplication by zero: v𝒩(A)Av=0v=0\forall \vec{v} \in \mathcal{N}(A) \bullet A \vec{v} = 0 \vec{v} = \vec{0}. Notice that the Av=0vA \vec{v} = 0 \vec{v} part of the expression corresponds to the eigenvalue equation Aeλ=λeλA \vec{e}_{\lambda} = \lambda \cdot \vec{e}_{\lambda} where the eigenvalue is 0 and the vectors in the null space are the eigenvectors.

Characteristic Polynomial

The characteristic polynomial of a matrix AA is a single variable polynomial whose roots are the eigenvalues of AA and it is defined as p(λ)=det(Aλ1)p(\lambda) = det(A - \lambda \mathbb{1}). Therefore λ\lambda is an eigenvalue of AA if det(Aλ1)=0det(A - \lambda \mathbb{1}) = 0. If AA is an n×nn \times n matrix, then its characteristic polynomial has degree nn.

Matrices

Notice that matrix determinants and traces are operations strictly defined on the eigenvalues of a matrix, as det(A)=iλidet(A) = \prod_i \lambda_i and Tr(A)=iλiTr(A) = \sum_i \lambda_i:

det(A)=det(QΛQ1)=det(Q)det(Λ)det(Q1)=det(Q)det(Q1)det(Λ)=det(Q)det(Q)det(Λ)=1det(Λ)=det(Λ)=iλi \begin{align} det(A) &= det(Q \Lambda Q^{-1}) \\ &= det(Q) det(\Lambda) det(Q^{-1}) \\ &= det(Q) det(Q^{-1}) det(\Lambda) \\ &= \frac{det(Q)}{det(Q)} det(\Lambda) \\ &= 1 \cdot det(\Lambda) \\ &= det(\Lambda) \\ &= \prod_i \lambda_i \end{align}

Tr(A)=Tr(QΛQ1)=Tr(ΛQ1Q)=Tr(Λ1)=Tr(Λ)=iλi \begin{align} Tr(A) &= Tr(Q \Lambda Q^{-1}) \\ &= Tr(\Lambda Q^{-1} Q) \\ &= Tr(\Lambda \mathbb{1}) \\ &= Tr(\Lambda) \\ &= \sum_i \lambda_i \end{align}

The statements det(A)0det(A) \neq 0 and 𝒩(A)={0}\mathcal{N}(A) = \{ \vec{0} \} are equivalent. We know that det(A)=iλidet(A) = \prod_i \lambda_i, so det(A)0det(A) \neq 0 implies that none of the eigenvalues are zero, otherwise the product would be cancelled out. Because none of the eigenvalues are zero, then the only solution to Ax=0A\vec{x} = \vec{0} is 0\vec{0}, so 𝒩(A)={0}\mathcal{N}(A) = \{ \vec{0} \}.

Eigenbases

The diagonalizable version of a matrix AA, which consists of the eigenvalues of AA in the diagonal, corresponds to AA expressed in its eigenbasis (the natural basis).

A matrix QQ from whose columns are the eigenvectors of AA is a change-of-basis operation from the eigenbasis of AA. Therefore Q1Q^{-1} is a change-of-basis operation to the eigenbasis of AA. Notice that QQ may contain the eigenvectors in any order and with any scaling factor.

Eigendecomposition

Given a linear transformation matrix representation AA, the eigendecomposition of AA expresses the transformation AA in the eigenbasis of AA using change-of-basis operations.

We can express any transformation AA as QΛQ1Q \Lambda Q^{-1} where QQ is a change-of-basis matrix containing the eigenvectors of AA as columns and Λ\Lambda is AA expressed on its eigenbasis (containing the eigenvalues in the diagonal).

Every normal matrix NN has a corresponding orthogonal matrix OO such that its eigendecomposition is N=OΛOTN = O \Lambda O^{T}, as for orthogonal matrices OT=O1O^{T} = O^{-1}.

Applying the transformation on v\vec{v} is equivalent to saying QΛQ1vQ \Lambda Q^{-1} \vec{v} which first changes the basis of v\vec{v} to the eigenbasis of AA, applies the transformation, and changes the basis back again.

For example, consider A=[9226]A = \begin{bmatrix}9 & -2 \\ -2 & 6\end{bmatrix}. Its eigenvalues are eig(A)={5,10}eig(A) = \{ 5, 10 \} and we can find out the corresponding eigenvectors as follows:

𝒩([9226]5[1001])=𝒩([4221])={0,[121]}\mathcal{N}(\begin{bmatrix}9 & -2 \\ -2 & 6\end{bmatrix} - 5 \begin{bmatrix}1 & 0 \\ 0 & 1\end{bmatrix}) = \mathcal{N}(\begin{bmatrix}4 & -2 \\ -2 & 1\end{bmatrix}) = \{ \vec{0}, \begin{bmatrix}\frac{1}{2} \\ 1\end{bmatrix} \}

So we know that the eigenvector for 5 is [121]\begin{bmatrix}\frac{1}{2} \\ 1\end{bmatrix} and

𝒩([9226]10[1001])=𝒩([1224])={0,[21]}\mathcal{N}(\begin{bmatrix}9 & -2 \\ -2 & 6\end{bmatrix} - 10 \begin{bmatrix}1 & 0 \\ 0 & 1\end{bmatrix}) = \mathcal{N}(\begin{bmatrix}-1 & -2 \\ -2 & -4\end{bmatrix}) = \{ \vec{0}, \begin{bmatrix}-2 \\ 1\end{bmatrix} \}

So the eigenvector for 10 is [21]\begin{bmatrix}-2 \\ 1\end{bmatrix}. Therefore we can say that [9226]=[12211][50010][25452515]\begin{bmatrix}9 & -2 \\ -2 & 6\end{bmatrix} = \begin{bmatrix}\frac{1}{2} & -2 \\ 1 & 1\end{bmatrix} \begin{bmatrix}5 & 0 \\ 0 & 10\end{bmatrix} \begin{bmatrix}\frac{2}{5} & \frac{4}{5} \\ -\frac{2}{5} & \frac{1}{5}\end{bmatrix}.