Vectors are scientific quantities that have both direction and magnitude. Vectors can be as simple as a line segment defined on a 2-D Euclidean plane, or a multi-dimensional collection of numerical quantities.
Sum/difference of vectors is the vector of sum/differences
$$ \overrightarrow {a} = \left[ 1, 2, 3 \right] $$$$ \overrightarrow {b} = \left[ 3, 3, 4 \right] $$
$$ \overrightarrow {a}+\overrightarrow {b} = \left[ 1+3, 2+3, 3+4 \right] = \left[ 4, 5, 7 \right].$$$$ \overrightarrow {a}-\overrightarrow {b} = \left[ 1-3, 2-3, 3-4 \right] = \left[ -2, -1, -1 \right].$$Vector sum (or difference) is defined only between the vectors of the same dimension.
Inner product (also dot product) of two vectors \(\overrightarrow {a} \) and \( \overrightarrow {b}\) is defined as,
$$ \overrightarrow {a} \circ \overrightarrow {b} = \overrightarrow {a}^T \overrightarrow {b} = \sum_{i=1}^n a_i b_i, $$where \(n\) is the length of the vectors \(\overrightarrow {a} \) and \( \overrightarrow {b}\).
Inner product is defined only between the vectors of the same dimension.
Outer product of two vector \( \overrightarrow {a} \) and \( \overrightarrow {b}\) is the matrix \(C_{ij}\) such that
$$ C_{ij} = a_i b_j$$Outer product is helpful in certain applications to reduce computation and storage requirements.
Outer product is defined between any two vectors and has no contraint on the length of either vector.
A matrix (plural matrices) is a rectangular array of numbers, symbols, or expressions, arranged in rows and columns. In control systems, matrices are used to describe how the state vectors (for example position and velocity) evolve and how the control signal influences the state vectors. A matrix with \(n\) rows and \(m\) columns has a dimension of \( n \times m\) matrix and is represented as \(A_{n \times m}\) .
Sum/difference) of two matrices is equal to the matrix formed by sum (or difference) of individual elements, and has the same dimension as the original matrices. For example, sum of matrices \( A = a{i,j}\) and \( B = b{i,j}\) is
$$ C =c_{i,j} = a_{i,j} + b_{i,j} . $$Matrix sum (or difference) is defined only between the vectors of the same dimension.
Product of a matrix with a scalar is equal to the matrix formed by multiplying each element of the matrix by the scalar. Therefore,
$$ \lambda A = \lambda a_{i,j} $$Product of matrices , \( A B \) between two matrices $A_{n \times p}$ and $B_{p \times m} $ as,
$$ C_{n \times m} = A_{n \times p} B_{p \times m} = c_{i,j} = \sum_{k=1}^{p} a_{i,k} b_{k,j}, $$The matrix product is defined only if the number of columns in \(A\) is equal to the number of rows in \(B\). (\(C\)) is of dimension \( n \times m \).
Note: Outer product can also be used to compute matrix product, and typically results in lower computation and storage requirements.
$$ C_{n \times m} = \sum_{k=1}^{p} A_{(i,:)} B_{(:,j)}, $$Product of a matrix with a vector is defined in a similar manner as above, however in the special case of multiplying a matrix by vector results in linear combination of the columns.
$$ A_{n\times m} b_{m \times 1} = \left[ A_{(:,1)} A_{(:,2)} ... A_{(:,m)} \right] b_{m \times 1}$$Expanding \(b_{m \times 1}\) as \([b_1 b_2 ... b_m]^T\) and multiplying gives,
$$ A_{n\times m} b_{m \times 1} = b_1 A_{(:,1)}+ b_2 A_{(:,2)}+ ... + b_m A_{(:,m)} $$Therefore, the product of $ A_{n\times m}$ and $b_{m \times 1}$ is a linear combination of the columns of \(A\) and the weighing factors are determined by the vector $b_{m \times 1}$. The space spanned by the columns of \(A\) is also called the column space or range of the matrix \(A\).
Product of a vector with a matrix results in linear combination of the rows.
$$ a_{1\times n} B_{n \times m} = a_{1 \times n} \left[ \begin{array}{c} B_{(1,:)} \\ B_{(2,:)} \\ \vdots \\ B_{(n,:)} \\ \end{array} \right] $$Expanding \(a_{1 \times n}\) as \([a_1 a_2 ... a_n]\) and multiplying gives,
$$ a_{1\times n} B_{n \times m} = a_1 B_{(1,:)}+ a_2 B_{(2,:)}+ ... + a_n B_{(n,:)} $$Therefore, the product of $ a_{1\times n}$ and $B_{n \times m}$ is a linear combination of the rows of \(B\) and the weighing factors are determined by the vector \(a_{1 \times n}\). The space spanned by the rows of \(B\) is also called the row space or range of the matrix \(B\).
The rank of the matrix is equal to the number of independent rows (or columns). Note, column and row ranks of a matrix are equal. Check the wikipage for a brief description of why.
Determinant, defined for square matrices is an estimate of the 'size' of a matrix. Determinants represent the volume enclosed by the rows of the matrix in an n-dimensional hyperspace. For a 2-dimensional matrix, the determinant represents area enclosed by the vectors composed of the rows (or columns) of the matrix.
Inverse of a square matrix is the matrix that when multiplied by the original matrix gives an identity matrix. Identity matrix is a matrix that has 1s for diagonal terms and 0 for all the off-diagonal terms. Inverse of a matrix is computed as the matrix of cofactors divided by the determinant of the matrix, details here. Cofactors of an element \( i,j \) of a matrix is the matrix formed by removing \(i^{th} \) column and \(j^{th} \) column.
Compute the inverse of
$$ A = \left[ \begin{array}{ccc} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 1 \end{array} \right]$$Consider a vector \(\overrightarrow{v}\) such that
$$ A \overrightarrow{v} = \lambda \overrightarrow{v},$$i.e. product of the matrix \(A\) times the vector \(\overrightarrow{v}\) is the vector \(\overrightarrow{v}\) scaled by a factor of \(\lambda\). The vector \(\overrightarrow{v}\) is called the Eigen vector and the multiplier \(\lambda\) is called the eigen value. Eigen values are first computed by solving the characteristic equation of the matrix
$$ det( A - \lambda I) = 0, $$and next the first equation \( A \overrightarrow{v} = \lambda \overrightarrow{v}\) is used to obtain non-trival solution of eigen vectors. Eigen vectors are scaled so their magnitude is equal to 1.
If \(V\) is the matrix of all the eigen vectors then,
$$ AV = V \Lambda$$where \(\Lambda\) is the matrix whose off-diagonal terms are 0, and diagonal terms are eigen values corresponding to eigen vectors in the columns of \(V\). Therefore, \( A = V \Lambda V^{-1}\) or \(\Lambda = V^{-1} A V\). In the special case where \(V\) is unitary, \( A = V \Lambda V^T\) or \( \Lambda = V^T A V\).
Consider the matrix,
$$ A = \left[ \begin{array}{ccc} 1.5 & 1 \\ 0 & 0.5 \end{array} \right]$$Singular value decomposition or SVD is a powerful matrix factorization or decomposition technique. In SVD, a \( n \times m\) matrix \(A\) is decomposed as \( U \Sigma V^{*} \),
$$ A = U \Sigma V^{\*} $$where \(U\), \( \Sigma \) and \( V \) satisfy
Note: \( V^{*} \) is complex conjugate of \(V\). If all elements of \(V\) are real then \(V^{*} = V^T\).
SVD is a powerful dimensionality reduction technique.
The eigen vectors of covariance matrix also represent the percentage of explained variance in data.
Raw_signals
Raw_signals reconstructed using first 4 components, see notes for code and implementation details. Savings of 90% in storage by losing 1% accuracy.
SVD for image compression. SVD can be used for image compression too.
In control theory, we often come across equations of the form $ A_{n \times m} x_{m \times 1} = b_{n \times 1} $ where given \(A\) and \(b\), we wish to solve for \(x\). Lets consider the following cases,
Given a set of \( N \) training points such that for each \( i \in [0,N] \), \( \mathbf{x}_i \in R^M \) maps to a scalar output \(y_i\), the goal is to find a linear vector \(\mathbf{a}\) such that the error
$$ J \left(\mathbf{a},a_0 \right) = \frac{1}{2} \sum_{i}^{N} \sum_{j}^{m} \left( a_j x_{i,j} + a_0 - y_i \right)^2 , $$is minimized. Where \(a_j \) is the \( j^{th} \) component and \( a_0 \) is the intercept.
The cost fuction above can be rewritten as a dot product of the error vector
$$ e(W) = \left( X W - y \right) .$$where X is a matrix of n times m+1, y is a n-dimensional vector and W is (m+1)-dimensional vector.
Therefore, the cost function J is now,
$$ J \left(\mathbf{W} \right) = \frac{1}{2} e(W)^T e(W)= \frac{1}{2} \left( \mathbf{ X W - y } \right)^T \left(\mathbf{ X W - y } \right) .$$Expanding the transpose term, $$ J \left(\mathbf{W} \right) = \frac{1}{2} \left( \mathbf{W^T X^T - y^T} \right) \left(\mathbf{ X W - y }\right) $$
$$ = \frac{1}{2} \left( \mathbf{ W^T X^T X W - 2 y^T X W + y^T y} \right) $$Note, all terms in the equation above are scalar. Taking derivative with respect to W gives,
$$ \frac{\partial \mathbf{J}}{\partial \mathbf{W}} = \left( \mathbf{ W^T X^T X - y^T X } \right) $$Minimum is obtained when the derivative above is zero. Most commonly used approach is a gradient descent based solution where we start with some initial guess for W, and update it as,
$$ \mathbf{W_{k+1}} = \mathbf{W_{k}} - \mu \frac{\partial \mathbf{J}}{\partial \mathbf{W}} $$It always a good idea to test if the analytically computed derivative is correct, this is done by using the central difference method, $$ \frac{\partial \mathbf{J}}{\partial \mathbf{W}}_{numerical} \approx \frac{\mathbf{J(W+h) - J(W-h)}}{2h} $$
Central difference method is better suited because the central difference method has error of \( O(h^2 \), while the forward difference has error of \( O(h) \). Interested readers may look up Taylor series expansion or may choose to wait for a later post.
Another method is to directly compute the best solution by setting the first derivative of the cost function to zero.
$$ \frac{\partial \mathbf{J}}{\partial \mathbf{W}} = \left( \mathbf{ W^T X^T X - y^T X } \right) = \mathbf{0}$$By taking transpose and solving for W,
$$ \mathbf{W} = \mathbf{ \underbrace{(X^T X)^{-1}X^T}_{Pseudoinverse} ~ y } $$