Linear Algebra
Vector spaces, linear maps, eigenvalues, spectral theory, and tensor products.
Linear algebra is the study of vector spaces and the maps between them — a subject that begins with the humble act of adding arrows in the plane and culminates in one of the most powerful and broadly applicable theories in all of mathematics. Its reach extends from quantum mechanics and signal processing to machine learning and structural engineering, and its language has become the common tongue of quantitative science. Understanding linear algebra means understanding how structure, direction, and transformation interact in spaces of any dimension.
Vector Spaces and Subspaces
The central object of linear algebra is the vector space. Informally, a vector space is a collection of objects — the vectors — that can be added together and scaled by numbers, with both operations behaving in the ways we intuitively expect. Formally, a vector space over a field (typically or ) is a set equipped with two operations: vector addition and scalar multiplication , satisfying eight axioms: commutativity and associativity of addition, the existence of a zero vector satisfying , the existence of additive inverses for each , compatibility of scalar multiplication with field multiplication, and the two distributive laws. The canonical example is , the set of all -tuples of real numbers, but vector spaces include spaces of polynomials, spaces of continuous functions, and spaces of matrices — wherever addition and scaling make sense and satisfy the axioms.
A subspace of is a subset that is itself a vector space under the same operations. The subspace test reduces this to three conditions: must contain the zero vector, be closed under addition, and be closed under scalar multiplication. The intersection of any collection of subspaces is again a subspace, a fact that allows us to define the span of a set of vectors as the smallest subspace containing , which is precisely the set of all linear combinations for scalars .
A set of vectors is linearly independent if the only way to write the zero vector as a linear combination of them is the trivial one: implies . Geometrically, this means no vector in the set lies in the span of the others — no vector is “redundant.” A basis for is a set that is both linearly independent and spans . Every basis for a given vector space has the same number of elements, a profound fact whose proof uses the Steinitz exchange lemma. This common cardinality is the dimension of the space.
The concept of dimension, seemingly obvious for , required careful work to establish in general. The existence of a basis for every vector space — including infinite-dimensional ones — relies on the axiom of choice (via Zorn’s lemma), a connection to set theory that highlights how the subject reaches into foundations. For the spaces that arise most often in practice, however, bases are explicit and dimension is finite. The standard basis for consists of the coordinate vectors , , and so on; every vector is uniquely , and these unique coefficients are the coordinates of the vector in this basis.
Linear Transformations and Matrices
A linear transformation (or linear map) from a vector space to a vector space is a function that respects both operations: and for all vectors and all scalars . These two conditions together are equivalent to the single requirement , the superposition principle that permeates physics and engineering. Linear maps include rotations and reflections of the plane, projections onto lines and planes, differentiation of polynomials, and, crucially, multiplication by a matrix.
Given a basis of and a basis of , every linear map is completely encoded by the matrix , whose -th column contains the coordinates of in the basis . Conversely, every matrix determines a linear map from to . The key consequence is that matrix multiplication corresponds to composition of linear maps: if and are linear, then the matrix of is the product of the matrices of and (in that order). This connection between an abstract algebraic notion (composition) and a concrete arithmetic operation (matrix multiplication) is one of the great insights of the subject.
Every linear map has two canonically associated subspaces. The kernel (or null space) is a subspace of ; it consists of all vectors that sends to zero. The image (or range) is a subspace of ; it consists of all outputs that can produce. The fundamental rank-nullity theorem asserts that these two dimensions always sum to the dimension of the domain:
The dimension of the image is called the rank of (or of its matrix), and the dimension of the kernel is the nullity. This theorem has immediate consequences: a linear map is injective (one-to-one) if and only if its kernel is trivial (), and it is surjective (onto) if and only if its rank equals . A map that is both is an isomorphism — it establishes that and are “the same” as vector spaces. Two finite-dimensional vector spaces over the same field are isomorphic if and only if they have the same dimension, which means that is (up to relabeling) the only -dimensional vector space over .
The dual space of linear functionals on is itself a vector space of the same dimension, and the double dual is naturally isomorphic to . This elegant reflexivity — the space of all linear functions on recovers itself — is a prototype of the duality that appears throughout mathematics.
Determinants and Systems of Equations
Before the modern vector-space framework took hold, linear algebra was largely the study of systems of linear equations and the determinants that govern their solvability. Determinants were studied systematically by Gottfried Leibniz in the late seventeenth century and later developed extensively by Gabriel Cramer (1750), Alexandre-Théophile Vandermonde (1772), and Carl Friedrich Gauss, whose method of Gaussian elimination — systematically row-reducing an augmented matrix to solve a linear system — remains the standard computational procedure today.
The determinant of a square matrix is a scalar that encodes essential geometric and algebraic information about . Geometrically, measures the factor by which scales volumes: if is a matrix, then is the area of the parallelogram spanned by its columns; if is , it is the volume of the parallelepiped. The sign of the determinant records whether preserves or reverses orientation. Algebraically, is invertible if and only if — a characterization that connects geometry (nondegenerate transformation) to algebra (solvable system) and to analysis (nonzero scaling factor).
The determinant is characterized by three properties: it is multilinear in the columns of (linear in each column when the others are fixed), it is alternating (swapping two columns negates the value), and it satisfies for the identity matrix. These three axioms pin down the determinant uniquely. The explicit formula is the Leibniz expansion:
where the sum runs over all permutations of and is the sign of (positive for even permutations, negative for odd). For a system , the classical Cramer’s rule expresses each component of the solution as a ratio of determinants. While Cramer’s rule is elegant and important theoretically, it is computationally inefficient for large systems; Gaussian elimination, which runs in time, is far preferable in practice.
Row-reducing the augmented matrix transforms the system into an equivalent one whose solution is transparent. The reduced row echelon form reveals which variables are pivot variables (determined by the equations) and which are free variables (chosen freely to parametrize the solution set). When the system is homogeneous, its solution set is precisely , a subspace. When the system is nonhomogeneous, the solution set (if nonempty) is an affine subspace — a translate of .
Eigenvalues and Eigenvectors
Among all the vectors that a linear map acts on, the most special are those it merely stretches or compresses — those for which for some scalar . Such a vector is called an eigenvector (from the German eigen, meaning “own” or “characteristic”), and is the corresponding eigenvalue. Eigenvalues and eigenvectors were studied in the context of quadratic forms by Leonhard Euler and Lagrange in the eighteenth century and developed in full generality by Augustin-Louis Cauchy and Karl Weierstrass in the nineteenth.
To find the eigenvalues of an matrix , we seek such that , equivalently , has a nonzero solution. This requires the matrix to be singular, which happens precisely when:
The left side, when expanded, is a degree- polynomial in called the characteristic polynomial of . Its roots — which may be real or complex, distinct or repeated — are the eigenvalues of . For each eigenvalue , the corresponding eigenspace is , the subspace of all eigenvectors (and the zero vector) for that eigenvalue. The dimension of is the geometric multiplicity of ; its multiplicity as a root of the characteristic polynomial is its algebraic multiplicity, and the geometric multiplicity never exceeds the algebraic.
A matrix is diagonalizable if there exists an invertible matrix such that is diagonal. This happens precisely when has a basis of eigenvectors — equivalently, when the geometric and algebraic multiplicities agree for every eigenvalue. When is diagonalizable, its powers are easy to compute: , since is just the diagonal matrix with entries . This makes diagonalization a powerful tool for solving linear recurrences and systems of linear differential equations.
Many matrices are not diagonalizable, however. A matrix whose characteristic polynomial has only one root but which is not the zero matrix — the classic example being — cannot be diagonalized over any field. Such matrices are still manageable through the Jordan normal form, which we take up in its own section below.
Inner Product Spaces
The vector space axioms say nothing about length or angle. These geometric notions are introduced through an inner product, a function satisfying four conditions: linearity in the first argument, conjugate symmetry (, which reduces to ordinary symmetry for real spaces), and positive definiteness ( for all ). The standard inner product on is the dot product ; on the standard inner product is .
An inner product induces a norm (length) by . The most important inequality in the theory is the Cauchy-Schwarz inequality:
with equality if and only if and are scalar multiples of each other. This inequality implies the triangle inequality , confirming that the norm is a genuine metric. Two vectors are orthogonal if ; an orthonormal basis is a basis in which every pair of distinct basis vectors is orthogonal and every basis vector has unit length.
Given any basis, the Gram-Schmidt process — developed by Jørgen Pedersen Gram (1883) and Erhard Schmidt (1907) — converts it into an orthonormal basis systematically. The process projects each new vector onto the span of the previous ones, subtracts that projection to obtain an orthogonal component, and then normalizes. In matrix terms, this is the QR decomposition: any matrix with linearly independent columns can be written as , where is a matrix with orthonormal columns and is upper triangular. The QR decomposition is the numerical workhorse behind least-squares solvers and eigenvalue algorithms.
Orthogonal projections are the geometric heart of inner product spaces. The orthogonal projection of onto a subspace is the unique vector closest to — it is the best approximation of from within . This is the solution to the least squares problem: given an overdetermined system (more equations than unknowns, generally inconsistent), the least-squares solution minimizes and satisfies the normal equations . Least squares is foundational to statistics (linear regression), signal processing (filter design), and scientific computing.
Spectral Theorem and Decompositions
The spectral theorem is the crowning result of linear algebra, and it concerns matrices — or operators — with special symmetry. A real matrix is symmetric if ; a complex matrix is Hermitian (or self-adjoint) if , where is the conjugate transpose. These matrices arise throughout physics (Hamiltonians in quantum mechanics, covariance matrices in statistics) and are characterized by an extraordinary diagonalizability property.
The spectral theorem for real symmetric matrices states: every real symmetric matrix has only real eigenvalues, and it can be diagonalized by an orthogonal matrix. In other words, if , there exists an orthogonal matrix (satisfying ) and a diagonal matrix such that:
The columns of are orthonormal eigenvectors of , and the diagonal entries of are the corresponding real eigenvalues. The complex analogue holds for Hermitian matrices, with orthogonal replaced by unitary (). This result was established through the combined work of Cauchy, Jacobi, and others in the nineteenth century and later given elegant operator-theoretic formulations by David Hilbert and John von Neumann in the early twentieth.
The spectral decomposition gives a beautiful expansion. If are the distinct eigenvalues of a symmetric matrix and are the orthogonal projections onto the corresponding eigenspaces, then:
This “spectral” representation (the eigenvalues form the “spectrum” of the operator) makes it possible to compute functions of naturally: for any function , a technique called the functional calculus.
The singular value decomposition (SVD) extends these ideas to arbitrary — not necessarily square or symmetric — matrices. Every matrix can be written as , where is orthogonal, is orthogonal, and is diagonal with nonnegative entries called singular values. The SVD reveals the geometric action of : it rotates (via ), scales (via ), and rotates again (via ). It provides the most informative and numerically stable decomposition available, and it underlies principal component analysis (PCA), data compression, latent semantic analysis, and the computation of the Moore-Penrose pseudoinverse .
Quadratic forms are polynomial functions defined by a symmetric matrix . They arise in the classification of conic sections, in the second derivative test for multivariable functions, and in the stability analysis of dynamical systems. The spectral theorem shows that every quadratic form can be diagonalized: by changing coordinates via the eigenvectors of , becomes — a sum of squares with signs determined by the eigenvalue signs. A quadratic form is positive definite if for all , equivalently if all eigenvalues of are positive; Sylvester’s criterion gives a practical test via the leading principal minors of .
Tensor Products and Multilinear Algebra
Linear maps take one vector space input. Multilinear maps take multiple inputs and are linear in each one separately. A bilinear map satisfies and (and analogously in the second argument). Familiar examples include the dot product on (bilinear, -valued), the cross product on (bilinear, -valued), and matrix multiplication (bilinear in the two matrix factors).
The tensor product is the universal recipient of bilinear maps out of . It is constructed as a vector space equipped with a bilinear map such that every bilinear map factors uniquely through — there is a unique linear map with . This universal property characterizes up to isomorphism. Concretely, if is a basis for and is a basis for , then is a basis for , giving:
The tensor product appears across mathematics and physics: in quantum mechanics, the state space of a composite system is the tensor product of the state spaces of its parts; in differential geometry, tensors on a manifold are elements of iterated tensor products of the tangent and cotangent spaces; in representation theory, the tensor product of two representations is again a representation.
The exterior algebra — due to Hermann Grassmann (1844) in his prescient but initially overlooked work Die lineale Ausdehnungslehre — gives a coordinate-free treatment of oriented volume. The wedge product is an alternating bilinear operation: , so . The exterior power is spanned by all products , and if , then:
The top exterior power is one-dimensional, and the determinant is precisely the coordinate of in this one-dimensional space relative to a chosen orientation — a conceptually clean restatement of the Leibniz formula. The exterior algebra is also the algebraic foundation for differential forms, the language in which Stokes’ theorem and the general theory of integration on manifolds is expressed.
Jordan Normal Form
Not every matrix is diagonalizable, but every square matrix over an algebraically closed field (such as ) can be put into a nearly diagonal form. This is the Jordan normal form, named for Camille Jordan who proved its existence in 1870 in his Traité des substitutions. (Weierstrass had earlier established the result for the special case of matrices with prescribed invariant factors, but Jordan’s proof in the language of linear substitutions is the one that shaped modern treatments.)
The idea begins with generalized eigenvectors. If is an eigenvalue of with algebraic multiplicity , the corresponding generalized eigenspace is . Ordinary eigenvectors are generalized eigenvectors of “depth 1”; generalized eigenvectors of depth satisfy but . A Jordan chain based at is a sequence descending until it reaches zero.
A Jordan block of size for eigenvalue is the matrix:
The Jordan normal form theorem asserts that every square matrix over is similar to a block-diagonal matrix whose diagonal blocks are Jordan blocks. The sizes and eigenvalues of these Jordan blocks are uniquely determined by (up to reordering), making the Jordan form a complete invariant for similarity over . A matrix is diagonalizable if and only if all its Jordan blocks are , equivalently if and only if geometric and algebraic multiplicities agree for every eigenvalue.
The Cayley-Hamilton theorem — proved independently by Arthur Cayley (1858) and William Rowan Hamilton in the context of quaternions — is a consequence: every matrix satisfies its own characteristic polynomial. If , then (the zero matrix). This result makes it possible to express high powers of in terms of lower ones and to compute the minimal polynomial of , the monic polynomial of least degree annihilating . The Jordan form gives precise control over the minimal polynomial: each Jordan block of size for contributes the factor to the minimal polynomial.
The matrix exponential is the key to solving linear systems of differential equations , whose solution is . The Jordan form makes computable: for a Jordan block , the exponential is upper-triangular with on the diagonal and polynomial times exponential entries above it, reflecting the fact that generalized eigenvectors lead to solutions of the form multiplied by a polynomial in . The stability of the system — whether solutions grow, decay, or oscillate — is governed entirely by the real parts of the eigenvalues.