Numerical Analysis
Interpolation, quadrature, finite elements, and spectral methods for computation.
Numerical analysis is the study of algorithms for solving mathematical problems on finite-precision computers — computing integrals, solving differential equations, finding roots, and decomposing matrices when exact symbolic answers are unavailable or computationally intractable. It occupies a peculiar and essential position in mathematics: it is deeply theoretical, demanding careful proofs of convergence and stability, yet its entire purpose is intensely practical, shaping weather forecasts, structural simulations, and medical imaging alike. To use a numerical method well, one must understand not only how it works but how and why it fails.
Foundations and Error Analysis
Every computation on a digital computer is tainted by floating-point arithmetic — the unavoidable consequence of representing real numbers with a finite number of bits. The IEEE 754 standard, adopted in 1985, governs how modern hardware handles floating-point numbers. A floating-point number is stored in the form , where is the mantissa (or significand) and is the exponent, each constrained to a fixed range. Double precision allocates 52 bits to the mantissa and 11 to the exponent, yielding roughly 15-16 significant decimal digits. The key constant is machine epsilon , the smallest positive number such that in floating-point arithmetic; for double precision, .
With this in mind, errors in numerical computation fall into several distinct categories. Truncation error (or discretization error) arises when an infinite mathematical process — a Taylor series, an integral, a derivative — is replaced by a finite approximation. Rounding error is introduced each time a real number is stored or an arithmetic operation is performed, since most results cannot be represented exactly. A third source, cancellation error, occurs when nearly equal quantities are subtracted, annihilating significant digits and leaving only noise.
To reason about errors systematically, we distinguish forward error analysis from backward error analysis. Forward analysis tracks how errors in inputs propagate to errors in outputs. If a computed result approximates the true answer , the absolute error is and the relative error is . Backward error analysis, championed by James Wilkinson in his landmark 1963 book Rounding Errors in Algebraic Processes, asks a different question: for what slightly perturbed input does our algorithm give the exact answer? If the backward error is small — if our computed answer is the exact answer to a nearby problem — then the algorithm is called backward stable, which is typically the gold standard.
The sensitivity of a problem to perturbations in its input is captured by the condition number. For a function and input , the condition number is approximately
measuring how much relative change in output results from a relative change in input. For a linear system , the condition number of the matrix is . A large condition number means the problem is ill-conditioned: small perturbations in can cause large changes in , and no algorithm can reliably recover the true solution from floating-point data. The crucial insight — one Wilkinson made central to the field — is that ill-conditioning is a property of the problem, not the algorithm. A backward stable algorithm applied to an ill-conditioned problem will still produce large errors, but those errors are inevitable given the data.
Richardson extrapolation is a general technique for eliminating low-order error terms. If an approximation to some quantity satisfies , then the combination cancels the leading error term and achieves order accuracy. Iterated, this process produces Romberg’s method for integration, capable of spectacularly rapid convergence for smooth integrands.
Interpolation and Approximation
The interpolation problem asks: given a set of data points with distinct nodes , find a function that passes exactly through all of them. Polynomial interpolation is the classical answer. A theorem going back to Joseph-Louis Lagrange and independently to Edward Waring guarantees that there exists a unique polynomial of degree at most passing through prescribed points.
The Lagrange interpolating polynomial is written explicitly in terms of basis functions. Define the Lagrange basis polynomials
so that (the Kronecker delta). The interpolating polynomial is then
An equivalent and computationally superior form is Newton’s divided difference formula, which builds the polynomial incrementally using a triangular table of divided differences. This form is particularly efficient when nodes are added one at a time, since existing computations need not be repeated.
The error in polynomial interpolation is given by the formula
for some in the interval spanned by the nodes. This error has two factors we can control: the derivative of (which we cannot change) and the node polynomial (which depends on our choice of nodes). The dangerous failure mode is Runge’s phenomenon, observed by Carl Runge in 1901: interpolating the innocent-looking function at equally spaced nodes on produces polynomials that oscillate wildly near the endpoints as . The remedy is to choose nodes that minimize . The optimal choice (up to scaling) is the Chebyshev nodes
which are the zeros of the Chebyshev polynomial . With Chebyshev nodes, the node polynomial satisfies , which is the best possible.
When global high-degree polynomials are problematic, spline interpolation provides an elegant alternative. A cubic spline is a piecewise cubic polynomial that is twice continuously differentiable at each interior node. On each subinterval a separate cubic is used, and the pieces are assembled with smoothness conditions — continuity of the function value, first derivative, and second derivative at each internal node. This system, together with boundary conditions (such as the natural spline condition setting end second derivatives to zero), yields a tridiagonal linear system that is efficiently solved. Cubic splines are the standard tool in computer-aided design and geometric modeling precisely because they are smooth but local: a change to one data point affects only a few adjacent pieces.
Numerical Differentiation and Integration
Derivatives are approximated using finite difference formulas derived from Taylor expansions. The forward difference approximation has truncation error : expanding and rearranging shows the error is . The central difference formula achieves accuracy by cancellation of the error term. Higher-order formulas for higher derivatives are derived similarly, and Richardson extrapolation can be applied to push the order arbitrarily high for smooth functions.
There is, however, a fundamental tension: as , the truncation error decreases, but the rounding error grows because the numerator involves cancellation of nearly equal numbers. The optimal step size balances these competing effects, giving for the forward difference and for the second derivative formula.
Numerical integration — computing — is historically called quadrature, a term dating to the Greek problem of squaring the circle. The Newton-Cotes formulas integrate the interpolating polynomial through equally spaced nodes. The trapezoidal rule uses linear interpolation at the endpoints:
Simpson’s rule uses the quadratic interpolant through , , and :
Note that Simpson’s rule, though derived from a cubic-degree formula through three points, integrates cubics exactly — an unexpected bonus called superconvergence. For practical computation, composite rules subdivide into subintervals and apply the basic rule on each piece, giving composite trapezoidal error and composite Simpson’s error where .
Gaussian quadrature goes further by choosing both the weights and the nodes optimally. A quadrature rule with nodes can be exact for polynomials of degree at most if the nodes are chosen as zeros of the Legendre polynomial on . This is twice the degree achievable with fixed equally-spaced nodes, making Gaussian quadrature extraordinarily efficient for smooth integrands. The Gauss-Legendre rule is:
where are the zeros of and the weights are determined by the exactness conditions. Variants include Gauss-Hermite quadrature (for integrals over with Gaussian weight ) and Gauss-Laguerre (for integrals over with weight ).
For integrals where the accuracy requirements vary across the domain, adaptive quadrature automatically subdivides the interval where the integrand is difficult, concentrating computational effort where it is needed.
Root Finding and Linear Systems
Root finding asks: given , find such that . The oldest reliable method is bisection: if , the Intermediate Value Theorem guarantees a root in . Evaluating at the midpoint and replacing whichever of or gives the same sign as halves the interval at each step. Bisection converges linearly — the error decreases by a factor of per iteration — and is guaranteed to converge, but it is slow.
Newton’s method (or the Newton-Raphson method) converges far faster by using derivative information. Starting from an initial guess , the iteration is
Geometrically, each step replaces by its tangent line at and takes the tangent’s root as the next iterate. When is sufficiently close to a simple root , Newton’s method converges quadratically: for some constant . This means the number of correct digits roughly doubles at each step — extraordinarily fast in practice. The method was described by Isaac Newton in 1669 (in a manuscript circulated privately) and later, independently, by Joseph Raphson in 1690 in a published form closer to the modern version.
When is unavailable or expensive, the secant method replaces the derivative by a finite-difference approximation using the two most recent iterates:
The secant method converges superlinearly at rate (the golden ratio) — slower than Newton but faster than bisection, and requiring only one function evaluation per step.
For linear systems , Gaussian elimination is the workhorse algorithm, performing floating-point operations to produce the LU decomposition where is unit lower triangular and is upper triangular. Solving then requires two triangular systems, each . The critical practical modification is partial pivoting: at each elimination step, rows are swapped to place the entry of largest absolute value in the pivot position, which controls growth in the entries of and makes the algorithm numerically stable in practice. The error in solving via LU with partial pivoting is bounded (loosely) by
For symmetric positive definite matrices, the Cholesky decomposition is twice as efficient as LU and numerically superior. For large sparse systems arising from discretized PDEs, iterative methods such as the Conjugate Gradient method (for symmetric positive definite ) or GMRES (for general ) are preferred, since they exploit the sparsity and avoid the fill-in that dense factorizations create.
Eigenvalue Problems
The eigenvalue problem — finding scalars and nonzero vectors for a given matrix — arises throughout applied mathematics, from the vibration modes of structures to the steady states of Markov chains and the principal components of datasets.
The simplest eigenvalue algorithm is the power method: starting from a random vector , repeatedly multiply by and normalize:
If the largest eigenvalue is strictly greater in magnitude than all others, the sequence converges to the corresponding eigenvector at rate per iteration. The inverse power method applies the power method to (via a linear solve at each step) and converges to the eigenvector for the smallest eigenvalue. With a shift , the shifted inverse power method converges to the eigenvalue nearest , making it possible to target any desired part of the spectrum.
Rayleigh quotient iteration combines shifting with the current iterate: the shift is updated at each step to (the Rayleigh quotient of at ). This achieves cubic convergence near a simple eigenvalue — remarkably fast.
The QR algorithm is the standard method for computing all eigenvalues simultaneously. Starting from , it iterates:
The matrices are all similar to (hence have the same eigenvalues), and the iteration converges to a triangular (or quasi-triangular) form. In practice, is first reduced to Hessenberg form (upper Hessenberg, with zeros below the first subdiagonal) by orthogonal similarity transformations, cutting the cost of each QR step from to . The Francis double-shift strategy, introduced by John G. F. Francis in 1961-62, makes the algorithm efficient and globally convergent. For large sparse problems, the Lanczos algorithm (for symmetric matrices) and the Arnoldi algorithm (for general matrices) build a low-dimensional Krylov subspace and extract good eigenvalue approximations from that subspace, avoiding the cost of the full QR algorithm.
Numerical Methods for ODEs and PDEs
An ordinary differential equation (ODE) initial value problem takes the form , , and asks for on an interval . Euler’s method is the simplest numerical scheme: stepping forward from to by
This first-order method has global error , meaning halving the step size halves the error. The classical fourth-order Runge-Kutta method (RK4) evaluates at four carefully chosen intermediate points within each step:
The global error of RK4 is : halving reduces the error by a factor of 16. RK4 became the universal workhorse of scientific computing after its introduction by Wilhelm Kutta in 1901 (building on earlier work by Carl Runge).
A crucial concern is stability. Applying Euler’s method to the test equation gives , which is bounded only when . For (a decaying equation), this requires . A stiff differential equation is one where the stability constraint forces far smaller than accuracy would require — typically because the system contains components decaying at wildly different rates. For stiff problems, implicit methods such as the backward Euler method or the Backward Differentiation Formulas (BDF) of C. W. Gear are preferred: they are A-stable (stable for all when ), at the cost of solving a nonlinear system at each step.
Modern ODE solvers use embedded Runge-Kutta pairs — pairs of methods of different orders sharing function evaluations — to estimate the local error and automatically adjust the step size. The Dormand-Prince scheme (used in MATLAB’s ode45) embeds a fifth-order method in a sixth-order family, providing error estimates at essentially no extra cost.
For partial differential equations (PDEs), the discretization strategy depends on the problem type. For elliptic problems (such as Laplace’s equation ), the finite element method (FEM) is the dominant approach. The PDE is first recast in weak form: multiply by a test function and integrate by parts, transferring derivatives from to . For Poisson’s equation on a domain ,
The domain is then triangulated into elements, is approximated by a piecewise polynomial on the mesh, and the weak form is enforced only for test functions in the same finite-dimensional space. This yields a sparse linear system that is solved by direct or iterative methods. FEM handles complex geometries naturally and admits rigorous error analysis: for sufficiently smooth solutions, the error is for linear elements and for degree- elements.
Spectral methods represent the solution as a truncated series in global basis functions — typically Fourier modes for periodic problems or Chebyshev polynomials for non-periodic problems. Because smooth functions have rapidly decaying coefficients in these bases, spectral methods achieve spectral accuracy: the error decays faster than any power of the number of degrees of freedom , often exponentially as . The trade-off is that spectral methods require smooth solutions and simple geometries, limiting their applicability compared to FEM.
Least Squares and Data Fitting
The least squares problem asks: given an overdetermined linear system with , , find minimizing . This arises constantly in data fitting, parameter estimation, and regression. Setting the gradient of to zero yields the normal equations
When has full column rank, is symmetric positive definite and can be solved by Cholesky. However, forming explicitly squares the condition number — — making the normal equations numerically unreliable for poorly conditioned problems.
The preferred approach uses the QR decomposition of . Since is orthogonal and , we have , and the minimizer satisfies the upper triangular system restricted to the first rows. This is numerically superior, with error proportional to rather than .
The most powerful tool is the Singular Value Decomposition (SVD): any can be written as where and are orthogonal and is diagonal with non-negative entries (the singular values). The condition number of is . The least squares solution is where is the Moore-Penrose pseudoinverse and inverts the nonzero singular values. The SVD reveals the rank, the null space, and the sensitivity of the problem all at once.
When the least squares problem is ill-conditioned (small singular values inflate the solution), Tikhonov regularization replaces the minimization of with
yielding the regularized solution . The regularization parameter controls the trade-off between data fit and solution size; choosing optimally is itself a nontrivial problem, addressed by methods such as the L-curve criterion and generalized cross-validation. For nonlinear least squares — fitting a nonlinear model — the Gauss-Newton method linearizes at the current iterate and solves a linear least squares problem at each step, while the Levenberg-Marquardt method stabilizes Gauss-Newton with a trust-region modification, achieving reliable convergence even far from the solution. Levenberg-Marquardt has become the standard algorithm for nonlinear curve fitting in virtually every scientific software package.