TOC

Introduction

Algebra is a branch of mathematics that deals with the study of mathematical symbols and the rules of manipulation of these symbols. It is a broad area of mathematics that covers topics such as equations, functions, polynomials, matrices, and vectors. Algebra is used extensively in various fields such as science, engineering, economics, and computer science. The fundamental concepts of algebra include operations such as addition, subtraction, multiplication, and division, as well as properties of numbers and variables.

The basics

Linear algebra let us write a system of linear equations compactly in matrix notation:

4x15x2=13 2x1+3x2=9

into Ax = b, with

A=[4523] to be a matrix

and b=[139] to be a column vector

We can denote aij to be the entry of A in the ith row and jth column.

After having the matrix notation, we have the product of two matrices ARmxn and BRnxp to be C=ABRmxp where Cij=k=1nAikBkj.

Similarly, the inner product or dot product of two vectors x,yRn is the quantity xTyR=[x1x2...xn]=[y1y2...yn]=i=1nxiyi

We have xTy=yTx.

For the outer product of two vectors xRm and yRn, the result is a matrix xyTRmxn with (xyT)ij=xiyj

xyTRmxn=[x1x2...xm][y1y2...yn]=[x1y1,x1y2...x1ynx2y1,x2y2...x2yn...xmy1,xmy2...xmyn]

There are some properties we need to remember:

  • Matrix multiplication is associative (AB)C = A(BC)

  • Matrix multiplication is distributive A(B+C) = AB+AC

  • Matrix multiplication is generally not commutative ABBA

Definitions

The identity matrix is denoted IRnxn is a square matrix with 1s on the diagonal and 0s everywhere else.

Iij={1,i=j0,ij

There is the property regarding identity matrix: ARmxn:AI=A=IA

A diagonal matrix is a matrix where all non diagonal elements are 0s. This is denoted D=diag(d1,d2,..dn) with Dij={di,i=j0,ij. We can say that I = diag(1,1..,1)

In matrix algebra, transposing is a technique to flip rows and columns. Given a matrix ARmxn, then its transpose ATRnxm is: (AT)ij=Aji. Some properties of transposes are:

  • (AT)T=A
  • (AB)T=BTAT
  • (A+B)T=AT+BT

A square matrix ARnxn is symmetric if A=AT. It is anti-symmetric if A=AT. So, ARnxn,A+AT is symmetric and AAT is anti symmetric. Hence, any square matrix ARnxn can be represented as a sum of a symmetric matrix and an anti symmetric matrix, since A=12(A+AT)+12(AAT).

Let’s denote the set of all symmetric matrics of size n as Sn, and look at different properties of them.

The trace of a square matrix ARnxn, is the sum of diagonal elements in the matrix, tr(A)=i=1nAii. The trace has the following properties:

  • For ARnxn,tr(A)=tr(AT)

  • For A,BRnxn,tr(A+B)=tr(A)+tr(B)

  • For ARnxn,tR,tr(tA)=ttr(A)

  • For A, B such that AB is square, tr(AB) = tr(BA)

  • For A, B, C such that ABC is square, tr(ABC) = tr(BCA) = tr(CAB)

A norm of a vector ∣∣x∣∣ is the “length” of that vector. For example, the Euclidean (l2 norm is ∣∣x2=i=1nxi2. Note that ∣∣x22=xTx.

Mathematically, a norm is function f:RnR that satisfies:

  • xRn,f(x)0 (non negativity)

  • f(x) = 0 if and only if x = 0

  • xRn,tR,f(tx)=∣tf(x) (homogeneity)

  • x,yRn,f(x+y)f(x)+f(y)

Apart from Euclidean norm, we have the l1 norm ∣∣x1=i=1nxi and the l norm ∣∣x=maxixi

The generalization of the family of lp norm, parameterized by real number p1 is: ∣∣xp=(i=1nxip)1/p.

For matrices, we have the Frobenius norm:

∣∣AF=i=1mj=1nAij2=tr(ATA)

Properties

Next we would study the linear independence of vectors. A set of vectors {x1,x2,...,xn}Rm if no vector can be represented as a linear combination of the remaining vectors. So if a vector can be represented as a linear combination of the remaining vectors, then the vectors are linearly dependent: xn=i=1n1αixi. The column rank of matrix ARmxn is the size of the largest subset of columns of A that is linearly independent. Similarly, the row rank is the number of linearly independent rows. Since ARmxn the column rank is equal to the row rank, we call it rank(A). Here are some properties of the rank:

  • For ARmxn,rank(A)min(m,n). At equal, A is full rank.

  • For ARmxn,rank(A)=rank(AT)

  • For ARmxn,BRnxp,rank(AB)min(rank(A),rank(B))

  • For A,BRmxn,rank(A+B)rank(A)+rank(B)

The inverse of a square matrix ARnxn is denoted A1 and is the unique matrix A1A=I=AA1. Not all matrices have inverse, such as non square matrices and some square matrix. A non invertible matrix is called singular. For a square matrix A to be invertible, it mush be full rank. Here are some properties for invertible matrices:

  • (A1)1=A
  • (AB)1=B1A1
  • AT=(A1)T=(AT)1

If A is invertible then the roots x of the system Ax=b would be x=A1b.

Two vectors x,yRn are orthogonal if xTx=0. A vector xRn is normalized if ∣∣x2=1. A square matrix URnxn is orthogonal if all the columns are orthogonal to each other and are normalized. The columns are then orthonormal. Then UTU=I=UUT. The inverse of an orthogonal matrix is its transpose. The equality doesn’t hold if U is not square. Since the matrix is orthogonal, multiply it with a vector will not change its Euclidean norm ∣∣Ux2=∣∣x2xRn,URnxn orthogonal.

The span of a set of vectors {x1,x2..xn} is the set of all vectors that is a linear combination of them. span({x1,..xn})={v:v=i=1nαixi,αiR}. If xiRn then the span is Rn. The projection of a vector yRm onto the span is the v such that it is the closest to y in Euclidean norm.

Proj(y;{x1,..xn})=argminvspan∣∣yv2

The range of a matrix ARmxn is the span of the columns of A:

R(A)={vRm:v=Ax,xRn}

The nullspace of a matrix ARmxn is the set of all vectors that equal 0 when multiplied by A:

N(A)={xRn:Ax=0}

The determinant of a square matrix ARnxn is a function det:RnxnR.

A∣=i=1n(1)i+jaijAi,j for any j1,2...,n

=j=1n(1)i+jaijAi,j for any i1,2,...,n

For a 2x2 matrix

A=[1332]

the rows are [1, 3] and [3, 2]. The determinant of this matrix is the area of the parallelogram of the two row vectors.

Screen Shot 2023-04-24 at 17 09 08

Here are some properties of the determinant:

  • The determinant of the identity is 1, I∣=1

  • If we multiple a row of matrix ARnxn then the determinant increase by t factor.

  • If we exchange any two row of A, then the determinant inverts into A

  • For ARnxn,A∣=∣AT

  • For A,BRnxn,AB∣=∣A∣∣B

  • For ARnxn,A∣=0 if and only if A is singular (it doesn’t have a full rank and the columns are linearly dependent).

  • For ARnxn and A non singular A1∣=1/A

The adjoint of a matrix ARnxn is adj(A)Rnxn,(adj(A))ij=(1)i+jAj,i. For any nonsingular ARnxn,A1=1Aadj(A)

Given a square matrix ARnxn and a vector xRn the scalar value xTAx is a quadratic form.

xTAx=i=1nxi(Ax)i=i=1nxi(j=1nAijxj)=i=1nj=1nAijxixj
  • A symmetric matrix ASn is positive difinite (PD) if for all non zero vectors xRn,xTAx>0

  • A symmetric matrix ASn is positive semidefinite (PSD) if for all vectors xTAx0

  • A symmetric matrix ASn is negative definite (ND) if for all non zero xRn,xTAx<0

  • A symmetric matrix ASn is negative semidefinite (NSD) if for all xRn,xTAx<0

  • A symmetric matrix ASn is indefinite (neither positive semidefinite nor negative semidefinite) if there exists x1,x2Rn such that x1TAx1>0 and x2TAx2<0

If A is positive definite then -A is negative definite and vice versa. If A is positive semidefinite then -A is negative semidefinite and vice versa. If A is indefinite then so is -A. For positive definite and negative definite matrices, they are always full rank, hence invertible.

Eigen decomposition

For the square matrix ARnxn we say λC is an eigenvalue of A and xCn is the corresponding eigenvector if Ax=λx,x0. This simply means that when we multiply A by x we have a new vector of the same direction as x but scaled by a factor λ. For any eigenvector xCn and scalar tC,A(cx)=cAx=cλx=λ(cx), so cx is also an eigenvector. So usually we normalize eigenvector to have length 1: (λIA)x=0,x0. The equation has non zero solution to x if and only if (λIA) has a non empty nullspace, which is only the case if (λIA) is singular, i.e. the determinant is 0: (λIA)∣=0. The equation is a polynomial in λ, where λ has the maximum degree n. n (possibly complex) roots of this polynomial are our n eigenvalues. Then we substitute λ to find the vector. Here are some properties of eigen values and their vectors:

  • The trace of A is the sum of its eigenvalues: tr(A)=i=1nλi

  • The determinant of A is the product of its eigenvalues: A∣=i=1nλi

  • The rank of A is the number of its non zero eigenvalues

  • If A is nonsingular then (1λi) is an eigenvalue of A1 with the associated eigenvector xi

  • The eigenvalues of a diagonal matrix D are the diagonal entries.

For the matrix A and eigenvector matrix V and eigenvalue vector λ we have A=Vdiag(λ)V1. Similar to eigen decomposition, the singular value decomposition (SVD) would provide another way to factorize a matrix into singular vectors and singular values. This method is more applicable, since every real matrix has a SVD. The SVD is A=UDVT. If A is m x n then U is m x m, D is mxn and V is n x n.

For matrices that are not invertible, there is the Moore-Penrose pseudoinverse: A+=limα0(ATA+αI)1AT. Or it can practically be A+=VD+UT with U, D, and V are the SVD of A. The pseudoinverse of a diagonal matrix D is done by taking the reciprocal of its nonzero elements when taking the transpose of the resulting matrix.

Matrix calculus

Let f:RmxnR is a function that takes as input a matrix A and return a real value. Then the gradient of f is the matrix of partial derivatives:

Af(A)Rmxn=[δf(A)δA11...δf(A)δA1n...δf(A)δAm1...δf(A)δAmn]

The Hessian is the matrix of partial derivatives:

x2f(x)Rnxn=[δ2f(x)δx12...δ2f(x)δx1δxn...δ2f(x)δxnδx1...δ2f(x)δxn2]