Dimensionality Reduction, Eigenvalues, and Eigenvectors#
Norms#
A norm is a function that measures the length of vectors or matrices. The notion of length is handy because it enables us to define distance, i.e., similarity between vectors (or matrices) in applications such as machine learning.
A vector norm is any function \(||\star||:\mathbb{R}^{n}\rightarrow\mathbb{R}\) such that following properties are true:
Non-negativity: \(||\mathbf{x}||\geq{0}\) for any vector \(\mathbf{x}\in\mathbb{R}^{n}\)
Multiplication by a scalar: \(||\alpha\mathbf{x}|| = \alpha{||\mathbf{x}||}\) for any vector \(\mathbf{x}\in\mathbb{R}^{n}\) and \(\alpha\in\mathbb{R}\).
Triangle inequality: \(||\mathbf{x}+\mathbf{x}||\leq||\mathbf{x}||+||\mathbf{x}||\) for any vectors \(\mathbf{x},\mathbf{y}\in\mathbb{R}^{n}\)
Definition 26 (p-norm)
Arguably, the most commonly used vector norms belong to the family of \(p\)-norms (also called \(l_{p}\)-norms), which is defined as:
for any \(p>0\).
A matrix norm is any function \(||\star||:\mathbb{R}^{m\times{n}}\rightarrow\mathbb{R}\) such that following properties are true:
Non-negativity: \(||\mathbf{A}||\geq{0}\) for any matrix \(\mathbf{A}\in\mathbb{R}^{m\times{n}}\) and \(||\mathbf{A}||=0\) if and only if \(\mathbf{A}=0\).
Multiplication by a scalar: \(||\alpha\mathbf{A}|| = \alpha{||\mathbf{A}||}\) for any \(m\times{n}\) matrix \(\mathbf{A}\in\mathbb{R}^{m\times{n}}\) and \(\alpha\in\mathbb{R}\).
Triangle inequality: \(||\mathbf{A}+\mathbf{B}||\leq||\mathbf{A}||+||\mathbf{B}||\) for any matrices \(\mathbf{A},\mathbf{B}\in\mathbb{R}^{m\times{n}}\)
Definition 27 (Properties of Matrix Norms)
A matrix norm \(||\cdot||\) is consistent with a vector norm \(||\cdot||\) if:
Further, a matrix norm \(||\cdot||\) is sub-multiplicative if \(\forall\mathbf{A}\in\mathbb{R}^{m\times{n}}\) and \(\forall\mathbf{B}\in\mathbb{R}^{n\times{q}}\) the inequality holds:
Similarity functions#
Other functions can be used to measure the similarity (or distance between) vectors and matrices:
Definition 28 (Radial basis function)
A radial basis function (RBF) measures the similarity between two input vectors \(\mathbf{x}_{p}\in\mathbb{R}^{m\times{1}}\) and \(\mathbf{x_{q}}\in\mathbb{R}^{m\times{1}}\). The most common is the Gaussian radial basis function, defined by a bell-shaped curve:
where the matrix \(\mathbf{M}\in\mathbb{R}^{m\times{m}}\) can be any symmetric matrix, \(\sigma_{f}^{2}\) and \(\sigma_{n}^{2}\) are constants and \(\delta_{pq}\) denotes the Kronecker delta.
We shall see that radial basis functions are used in various machine learning applications, such as classification.
Dimensionality reduction#
Dimensionality reduction tools systematically reduce the number of variables in a dataset while preserving as much of the information in the data as possible. Dimensionality reduction simplifies data, removes noise, and makes patterns in the data more visible. It can also help visualize data, improve machine learning algorithms’ performance, and reduce the storage and computational requirements of working with large datasets.
There are many techniques for dimensionality reduction, including principal component analysis and singular value decomposition, which are both related to Eigenvalue-eigenvector problems. Other approaches, such as clustering and t-distributed stochastic neighbor embedding (t-SNE), are based upon minimizing a particular distance measure.
Eigenvalue-eigenvector problems#
Eigenvalue-eigenvector problems involve finding a set of scalar values \(\left\{\lambda_{1},\dots,\lambda_{m}\right\}\) called eigenvalues and a set of linearly independent vectors \(\left\{\mathbf{v}_{1},\dots,\mathbf{v}_{m}\right\}\) called eigenvectors such that:
where \(\mathbf{A}\in\mathbb{R}^{m\times{m}}\), \(\mathbf{v}\in\mathbb{R}^{m\times{1}}\), and \(\lambda\in\mathbb{R}\) is a scalar. Eigenvalues and eigenvectors are used in many areas of mathematics, engineering, and physics, including dynamics, image compression, and data reduction:
Solution of Differential Equations: Eigenvalues and eigenvectors are used to solve systems of differential equations. Eigenvectors form a set of linearly independent solutions, while eigenvalues determine their stability.
Structural Analysis: Eigenvalues and eigenvectors can be used to study the structural properties of a matrix or a graph. In structural engineering, for example, eigenvalues and eigenvectors can be used to analyze a structure’s natural frequencies and modes of vibration, such as buildings and bridges.
Singular Value Decomposition (SVD): SVD is commonly used in data analysis, computer vision, image processing, etc, to find the most essential features of the dataset. Another popular application, principle component analysis (PCA), uses the eigenvalues and eigenvectors of the covariance matrix to find the essential set of features in a dataset.
In addition, eigenvalues have another interesting feature (Observation 4):
Observation 4 (Determinants and eigenvalues)
Eigenvalues can be used directly to calculate the determinant of a matrix \(\mathbf{A}\in\mathbb{R}^{m\times{m}}\). Denote the set of eignenvalues for the matrix \(\mathbf{A}\in\mathbb{R}^{m\times{m}}\) as \(\left\{\lambda_{1},\dots,\lambda_{m}\right\}\). Then, the \(\det\left(\mathbf{A}\right)\) is given by:
A matrix \(\mathbf{A}\in\mathbb{R}^{m\times{m}}\) is non-singular if \(\text{abs}(\lambda_{i})>0~\forall{i}\), otherwise it is singular.
Characteristic polynomial#
The roots of the characteristic polynomial are the eigenvalues of a square matrix \(\mathbf{A}\in\mathbb{R}^{n\times{n}}\). The characteristic polynomial plays a central theoretical role in many areas of mathematics, including linear algebra, differential equations, and dynamical systems theory (Definition 29):
Definition 29 (Characteristic polynomial)
The characteristic polynomial of a square matrix \(\mathbf{A}\in\mathbb{R}^{n\times{n}}\) is a polynomial that is obtained by subtracting the scalar \(\lambda\) from the diagonal of the matrix \(\mathbf{A}\) and computing its determinant:
where \(\mathbf{I}\) is the \(n\times{n}\) identity matrix, and \(\lambda\) is an eigenvalue of the matrix \(\mathbf{A}\). The characteristic polynomial is a polynomial of degree \(n\) with \(\lambda\) as the variable, and its roots are precisely the eigenvalues of the matrix \(\mathbf{A}\).
However, despite its theoretical importance, in applications, we will rarely explicitly solve in the characteristic polynomial for the eigenvalues of the matrix \(\mathbf{A}\). Instead, we’ll use one of several more accessible techniques; see Methods to compute eigenvalues and eigenvectors.
Eigenvectors#
To compute the eigenvectors of the matrix \(\mathbf{A}\in\mathbb{R}^{n\times{n}}\), we first need the eigenvalues \(\left\{\lambda_{1},\dots,\lambda_{n}\right\}\). Once we have the eigenvalues, we rearrange Eqn (54) for each eigenvalue \(\lambda_{j}\) to give a system of homogenous linear algebraic equations that can be solved for the associated eigenvector (Definition 30):
Definition 30 (Eigenvector system)
Let the eigenvalues of the matrix \(\mathbf{A}\in\mathbb{R}^{n\times{n}}\) be given by the set \(\left\{\lambda_{1},\dots,\lambda_{n}\right\}\). Then for each eigenvalue \(\lambda_{j}\), there exists an eigenvector \(\mathbf{v}_{j}\) that is a solution of the homogenous system of equations:
where \(\mathbf{I}\) denotes the \(n\times{n}\) identity matrix. While eigenvectors are always linearly independent, they are not unique (in any case) and orthogonal only for a symmetric \(\mathbf{A}\).
Methods to compute eigenvalues and eigenvectors#
Power iteration is an iterative method that starts with a random vector and repeatedly multiplies the matrix by this vector, normalizing the result each time. As the iteration proceeds, the vector converges to the eigenvector corresponding to the largest eigenvalue. This method efficiently computes the dominant eigenvalue and eigenvector of a large, sparse matrix. Lanczos iteration is similar to power iteration but uses a truncated orthogonalization process to compute a small number of eigenvalues and eigenvectors of a large, sparse matrix.
Divide-and-conquer algorithms decompose the matrix into smaller matrices, recursively compute their eigenvalues and eigenvectors, and then combine them to obtain the eigenvalues and eigenvectors of the original matrix. This method is efficient for symmetric matrices.
QR iteration applies a sequence of orthogonal similarity transformations to the matrix, which gradually transforms it into a diagonal matrix with the eigenvalues on the diagonal. The eigenvectors can be computed by back-substitution. This method is more expensive than power iteration but can compute all eigenvalues and eigenvectors of a matrix.
Singular value decomposition computes a matrix’s singular values and singular vectors, which are related to its eigenvalues and eigenvectors. It is handy for calculating the low-rank approximations of a matrix or for dimensionality reduction.
Example: Using the eigen
function#
In Julia, eigenvalues and eigenvectors of a dense matrix \(\mathbf{A}\in\mathbb{R}^{n\times{n}}\) can be calculated using the eigen function included with the standard LinearAlgebra
package:
# LinearAlgebra package (included in the SLIB, so no download)
using LinearAlgebra
# Setup matrix the n x n matrix A (n = 3)
A = [3.0 -0.3 -0.2 ; 0.1 7.0 -0.3 ; 0.3 -0.2 10.0];
# Decompose using the built-in function
F = eigen(A); # eigenvalues and vectors in F of type Eigen
λ = F.values; # vector of eigenvalues
V = F.vectors; # 3 x 3 matrix of eigenvectors, each col is an eigenvector
# print out the eigenvalues
println("Eigenvalues λ = $(λ)")
Eigenvalues λ = [3.017277143754452, 6.9699146114784165, 10.012808244767134]
Because eigenvectors are linearly independent, the determinant of the matrix of eigenvectors should be non-zero:
# LinearAlgebra package (included in the SLIB, so no download)
using LinearAlgebra
# Setup matrix the n x n matrix A (n = 3)
A = [3.0 -0.3 -0.2 ; 0.1 7.0 -0.3 ; 0.3 -0.2 10.0];
# Decompose using the built-in function
F = eigen(A); # eigenvalues and vectors in F of type Eigen
λ = F.values; # vector of eigenvalues
V = F.vectors; # 3 x 3 matrix of eigenvectors, each col is an eigenvector
# print out the eigenvalues
println("det(V) = $(det(V))")
det(V) = 0.9913342222326557
In Python, eigenvalues and eigenvectors can be computed using the eig function of the NumPy library.
Example: QR iteration#
The QR algorithm iteratively calculates the eigenvalues and eigenvectors of a square matrix. The QR algorithm, independently developed in the late 1950s by John G. F. Francis and by Vera N. Kublanovskaya, is one of the ten algorithms of the twentieth century [Dongarra and Sullivan, 2000].
Definition 31 (Eigenvalues and Eigenvectors using QR iteration)
The basic idea is to perform QR decomposition repeatedly, writing the matrix as a product of an orthogonal matrix \(\mathbf{Q}\) and an upper triangular matrix \(\mathbf{R}\), multiply the factors in the reverse order, and iterate.
Phase 1: Eigenvalues. Let \(\mathbf{A}\in\mathbb{R}^{n\times{n}}\). Then, starting with \(k = 0\), we compute the QR decomposition of the matrix \(\mathbf{A}_{k}\):
where \(\mathbf{Q}_{k}\) is an orthogonal matrix, i.e., \(\mathbf{Q}^{T}_{k} = \mathbf{Q}^{−1}_{k}\) and \(\mathbf{R}_{k}\) is an upper triangular matrix. As \(k\rightarrow\infty\), the matrix \(\mathbf{A}_{k}\) converges to a triangular matrix with the eigenvalues listed along the diagonal.
Phase 2: Eigenvectors. Once we’ve calculated the eigenvalues \(\left\{\lambda_{1},\dots,\lambda_{n}\right\}\), we can compute the eigenvectors associated with each of the eigenvalues \(\left\{\mathbf{v}_{1},\dots,\mathbf{v}_{n}\right\}\) by solving the homogenous system of equations:
Let’s do an example (Example 20) where we compute the eigenvalues and eigenvectors of a square matrix \(\mathbf{A}\) using the QR iteration algorithm outlined in Definition 31.
Example 20 (QR iteration to compute Eigenvalues and Eigenvectors)
Compute the eigenvalues and eigenvectors of the matrix:
using the QR iteration algorithm outlined in Definition 31. How well does your answer compare to the values generated by eigen function?
Solution: Compute the eigenvalues and eigenvectors using the Julia eigen function and then using the qriteration
function (assumed to be included in the workspace):
using LinearAlgebra
# Setup matrix the n x n matrix A (n = 3)
A = [3.0 -0.3 -0.2 ; 0.1 7.0 -0.3 ; 0.3 -0.2 10.0];
# Decompose using the built-in function
F = eigen(A); # eigenvalues and vectors in F of type Eigen
λ = F.values; # vector of eigenvalues
V = F.vectors; # 3 x 3 matrix of eigenvectors, each col is an eigenvector
# Call our qriteration function (assumed to be included in the workspace)
(L,V) = qriteration(A; maxiter=10000, tolerance=1e-6);
# compare the difference between eigenvalues
δ = norm(L - λ); # see lecture notes on vector norms
The difference \(\delta\) between our qriteration
implementation and eigen
was \(\delta = 2.25e-6\). Thus, our function produces eigenvalues close to eigen,
with an error of the same magnitude as the tolerance
parameter.
Source: The implementation of our qriteration
function can be found on GitHub.
Singular value decomposition#
Singular value decomposition (SVD) is a powerful tool used in many applications, such as image and data compression, signal processing, and machine learning. Singular value decomposition factors a matrix into the product of an orthogonal matrix, a diagonal matrix, and another orthogonal matrix (Definition 32):
Definition 32 (Singular value decomposition)
Let the matrix \(\mathbf{A}\in\mathbb{R}^{m\times{n}}\). The singular value decomposition of the matrix \(\mathbf{A}\) is given by:
where \(\mathbf{U}\in\mathbb{R}^{n\times{n}}\) and \(\mathbf{V}\in\mathbb{R}^{m\times{m}}\) are orthogonal matrices and \(\mathbf{\Sigma}\in\mathbb{R}^{n\times{m}}\) is a diagonal matrix containing the singular values \(\sigma_{i}=\Sigma_{ii}\) along the main diagonal. The columns of \(\mathbf{U}\) are called left-singular vectors, while the columns of \(\mathbf{V}\) are called right-singular vectors. Singular vectors have a unique property: unlike eigenvectors, left- and right-singular vectors are linearly independent and orthogonal.
The singular value decomposition and eigendecomposition have important connections:
Singular values and eigenvalues are related: \(\sigma_{i} = \sqrt\lambda_{i}\) where \(\lambda_{i}\) is an eigenvalues of the matrix \(\mathbf{A}^{T}\mathbf{A}\).
The columns of \(\mathbf{U}\) (left-singular vectors) are eigenvectors of the matrix product \(\mathbf{A}\mathbf{A}^{T}\).
The columns of \(\mathbf{V}\) (right-singular vectors) are eigenvectors of the matrix product \(\mathbf{A}^{T}\mathbf{A}\).
Let’s explore another interesting use of singular value decomposition, namely the structural decomposition of a matrix (Observation 5):
Observation 5 (SVD structural decomposition)
Singular value decomposition (SVD) decomposes a rectangular matrix \(\mathbf{A}\) into a weighted, ordered sum of separable matrices. Let \(\mathbf{A}\in\mathbb{R}^{m\times{n}}\) have the singular value decomposition \(\mathbf{A} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^{T}\). Then, the matrix \(\mathbf{A}\in\mathbb{R}^{m\times{n}}\) can be re-written as:
where \(r_{\mathbf{A}}\) is the rank of matrix \(\mathbf{A}\), the vectors \(\mathbf{u}_{i}\) and \(\mathbf{v}_{i}\) are the ith left and right singular vectors, respectively, and \(\sigma_{i}\) are the ordered singular values. The outer-product \(\left(\mathbf{u}_{i}\otimes\mathbf{v}_{i}\right)\) is the separable component of the matrix \(\mathbf{A}\). For more details on computing the outer-product.
Principal component analysis (PCA)#
Principal Component Analysis (PCA) is a statistical technique that reduces the dimensionality of a dataset while retaining as much of its variation as possible. PCA transforms the original dataset into a new set of uncorrelated variables called principal components, ranked by importance.
Principal components are the directions in which the data varies the most and are the eigenvectors of the covariance matrix of the data.
PCA is a specific application of SVD to a dataset’s covariance matrix. The principal components are obtained from columns of the matrix \(\mathbf{U}\). The variance each component explains is obtained from the singular values in \(\mathbf{\Sigma}\).
Covariance matrix#
The covariance matrix is a square matrix that summarizes the pairwise relationships between variables in a dataset. The diagonal elements represent each variable’s variance, i.e., the standard deviation squared. In contrast, the off-diagonal elements represent the covariance between pairs of variables (Definition 33):
Definition 33 (Covariance matrix)
Suppose we have a collection of \(n\)-dimensional random vectors \(\mathbf{X}\). Then, \(\mathbf{K}_{XX}\) is the \(n\times{n}\) covariance matrix:
where \(\mathbb{E}(\mathbf{X})\) denotes the expected value, e.g., the mean of the random vector \(\mathbf{X}\). The entries of the covariance matrix \(\mathbf{K}_{XX}\) are given by:
where \(\sigma_{i}\) denotes the standard-deviation of variable \(i\), and \(\rho_{ij}\) denotes the correlation between variables \(i\) and \(j\).
Summary#
In this lecture, we discussed distance measurements and dimensionality reduction. Measurement and distance tools measure the size of matrix or vector objects and the distances between these objects. We explored two types of measurement and distance approaches:
Norms are mathematical tools for measuring the magnitude of matrices and vectors. They are essential in many areas of mathematics, including linear algebra, optimization, and analysis.
Similarity functions are mathematical tools that quantify the similarity between objects, such as vectors or points. They are widely used in machine learning, pattern recognition, and data analysis. One example of a similarity function is the radial basis function, which measures the distance between two points using a Gaussian distribution and is commonly used in clustering and classification algorithms.
Dimensionality reduction systematically reduces the number of variables in a dataset while preserving as much information as possible. Dimensionality reduction simplifies data, removes noise, and makes patterns in the data more visible. It can also help visualize data, improve machine learning algorithms’ performance, and reduce the storage and computational requirements of working with large datasets.