Measure Theory
Sigma-algebras, the Lebesgue integral, Lp spaces, and abstract measure.
Measure theory is the rigorous mathematical framework for assigning sizes to sets and building a theory of integration that goes far beyond what the Riemann integral can handle. Developed at the turn of the twentieth century, primarily by Henri Lebesgue, it resolved deep pathologies in real analysis and became the indispensable foundation for probability, functional analysis, and much of modern mathematics. Where classical calculus asks “what is the area under this curve,” measure theory asks the more fundamental question: what does it even mean to assign a size to an arbitrary subset of the real line?
Foundations of Measure Theory
The conceptual starting point is the observation that not every subset of should be expected to have a well-defined length. The classical notion of length works perfectly for intervals: the length of is . But attempts to assign lengths to arbitrary sets run into serious trouble. In 1905, Giuseppe Vitali constructed the first example of a non-measurable set — a subset of that cannot be assigned any length consistent with the properties we demand of a reasonable size function. This showed that some restriction on which sets we measure is unavoidable.
The solution is to work not with all subsets, but with a carefully chosen collection called a sigma-algebra (or -algebra). A -algebra on a set is a collection of subsets of satisfying three axioms: the empty set belongs to ; if then its complement ; and if is any countable sequence of sets in , then their union also belongs to . Sets belonging to are called measurable sets. The pair is called a measurable space.
The most important -algebra in analysis is the Borel -algebra , defined as the smallest -algebra on that contains all open sets. It also contains all closed sets, all countable intersections of open sets ( sets), all countable unions of closed sets ( sets), and much more. Virtually every set one encounters in analysis is a Borel set.
A measure on a measurable space is a function satisfying two properties. First, . Second, countable additivity: for any countable collection of pairwise disjoint sets in ,
The triple is a measure space. Countable additivity is what separates a measure from a merely finitely additive set function, and this property is essential for passing limits through integrals. When , the measure is a probability measure and the triple is a probability space in the sense of Andrei Kolmogorov’s 1933 axiomatization.
The Lebesgue measure on is the unique measure on the Borel -algebra that assigns to each interval its length: . Its construction proceeds via the Lebesgue outer measure , defined for any subset by covering with countably many open intervals and taking the infimum of the total length:
The outer measure is defined on all subsets of , but it is only countably additive on the measurable sets selected by the Carathéodory criterion: a set is Lebesgue measurable if for every subset ,
The collection of all Lebesgue measurable sets forms a -algebra that strictly contains the Borel -algebra and on which is a complete measure — meaning every subset of a null set is measurable. A set has measure zero (is a null set) if ; the Cantor set is a famous example of an uncountable null set. Properties that hold everywhere except on a null set are said to hold almost everywhere, abbreviated a.e.
Measurable Functions and Lebesgue Integration
With a measure space in hand, we need a notion of function compatible with the -algebra. A function is measurable if the preimage of every Borel set is measurable: for every , we require . Equivalently, is measurable if and only if for every , the set is measurable. Continuous functions are measurable, pointwise limits of measurable functions are measurable, and the class of measurable functions is closed under all algebraic operations and limits — precisely the properties needed for a robust integration theory.
The Lebesgue integral is built in stages. First, a simple function is a measurable function taking only finitely many values: where are measurable sets and is the indicator function of . Its integral is defined in the obvious way:
For a non-negative measurable function , the integral is defined as the supremum over all simple functions dominated by :
For a general measurable function, write where and are both non-negative, and set , provided at least one of these is finite. When both are finite, is called integrable or in .
The power of this construction is revealed by the convergence theorems. The Monotone Convergence Theorem states that if is an increasing sequence of non-negative measurable functions converging pointwise to , then
Fatou’s Lemma gives a one-sided inequality for general sequences: if , then
The most versatile result is the Dominated Convergence Theorem (DCT): if pointwise almost everywhere and there exists an integrable function with for all , then
The Riemann integral, by contrast, cannot pass limits through integrals without uniform convergence — a far more restrictive condition. Lebesgue showed that a bounded function on is Riemann integrable if and only if it is continuous almost everywhere, and in that case the Riemann and Lebesgue integrals agree. Functions like Dirichlet’s characteristic function of the rationals — zero on irrationals, one on rationals — are not Riemann integrable but are trivially Lebesgue integrable with integral zero, since the rationals form a null set.
Product Measures and Fubini’s Theorem
When two measure spaces and are given, we want to build a measure on the Cartesian product that extends both. The product -algebra is the smallest -algebra on containing all measurable rectangles with and . The product measure is the unique measure on satisfying
for all measurable rectangles. Existence and uniqueness of the product measure follows from the Carathéodory extension theorem, provided and are -finite — meaning that the whole space can be covered by countably many sets of finite measure.
Fubini’s Theorem is the cornerstone result that justifies computing double integrals as iterated integrals. It has two complementary parts. For non-negative measurable functions (Tonelli’s Theorem): if is -measurable, then the iterated integrals are well-defined and equal:
For integrable functions (Fubini’s Theorem proper): if , then for -almost every the section is -integrable, and the same iterated integral formula holds.
The hypothesis that be integrable (or non-negative) is genuinely necessary. The classic counterexample involves the unit square with Lebesgue measure: define for and . Then while — the two iterated integrals disagree because . Fubini’s Theorem tells us that when the iterated integrals disagree, the function cannot be absolutely integrable over the product space.
Fubini’s theorem underlies Cavalieri’s principle — the ancient observation that two solids with equal cross-sectional areas at every height have equal volume — and it is the rigorous justification for change-of-variables formulas involving coordinate transformations in multiple dimensions.
Signed Measures and Decomposition Theorems
A natural generalization allows a measure to take negative values. A signed measure on is a function satisfying and countable additivity, but now allowed to be negative, subject to the constraint that it takes at most one of the values or . Signed measures arise naturally as differences of two ordinary measures and as indefinite integrals: if is an integrable function and is a measure, then defines a signed measure.
The Jordan Decomposition Theorem shows that every signed measure can be written uniquely as a difference of two mutually singular positive measures, called the positive variation and negative variation. Two measures and are mutually singular, written , if there exist disjoint sets with such that is concentrated on and is concentrated on . The total variation measure is , and the total variation norm makes the space of signed measures on into a Banach space.
The companion result to Jordan decomposition is the Hahn Decomposition Theorem: for any signed measure , there exist disjoint sets and with such that is non-negative on every measurable subset of and non-positive on every measurable subset of . The sets and are essentially unique (up to -null sets) and are called the positive and negative parts of the Hahn decomposition.
The key concept relating two measures is absolute continuity. A measure is absolutely continuous with respect to a measure , written , if every -null set is also a -null set: whenever , we have . The intuition is that cannot see anything that cannot see.
The Radon-Nikodym Theorem characterizes absolute continuity analytically: if and are -finite measures on and , then there exists a non-negative measurable function , unique up to -null sets, such that
The function is called the Radon-Nikodym derivative or density of with respect to , and is written . This notation deliberately evokes the chain rule: if , then -almost everywhere. Otto Nikodym proved this result in 1930 (following earlier work by Johann Radon in 1913), and it is one of the most powerful tools in analysis and probability, providing the theoretical foundation for conditional expectation, likelihood ratios in statistics, and changes of probability measure in stochastic calculus.
The Lebesgue Decomposition Theorem extends this: for any two -finite measures and , there is a unique decomposition where and . Applied to the Stieltjes measure of a function of bounded variation, this recovers the classical decomposition into absolutely continuous and singular parts.
Lp Spaces
The spaces organize integrable functions into a family of Banach spaces parametrized by . For , the space consists of equivalence classes of measurable functions (with two functions identified if they agree almost everywhere) satisfying
The space consists of essentially bounded functions, with norm — the smallest such that almost everywhere. These norms make each space a complete normed vector space, i.e., a Banach space; this completeness was proved by Fischer and Riesz independently in 1907. The special case gives a Hilbert space with inner product .
The fundamental inequalities governing spaces are Hölder’s inequality and Minkowski’s inequality. If (with and called conjugate exponents), then for and ,
This is Hölder’s inequality; the case is the Cauchy-Schwarz inequality. Minkowski’s inequality asserts the triangle inequality for the norm: . Both inequalities ultimately rest on Young’s inequality for products: for and conjugate exponents .
The dual space of for is characterized by the Riesz Representation Theorem for : every bounded linear functional is of the form for a unique , and . This means , so and are isometrically isomorphic as Banach spaces when . For , the dual is ; for , the dual is strictly larger than .
Different modes of convergence interact in subtle ways within and across spaces. Convergence in norm implies convergence in measure, and convergence in measure implies the existence of an almost-everywhere convergent subsequence. Almost-everywhere convergence does not imply convergence in general (as the “sliding bump” sequence on shows), but with a dominating function it does, by the Dominated Convergence Theorem. Egorov’s Theorem bridges these: on a finite measure space, almost-everywhere convergence implies nearly-uniform convergence — given , there is a set with outside of which convergence is uniform.
The spaces are separable for (when the underlying measure space is -finite and separable), reflexive for , and neither reflexive nor separable in general for or . These structural properties make especially tractable in functional analysis and quantum mechanics, while and require more careful handling.
Hausdorff Measures and Fractal Dimension
Standard Lebesgue measure captures the -dimensional volume of subsets of , but it says nothing useful about sets with fractional or intermediate dimension — a curve in has zero 3-dimensional volume and infinite 1-dimensional measure unless we use the right notion. Hausdorff measures fill this gap by allowing the dimension parameter to be any non-negative real number.
Fix and . For any set , define
where the infimum is over all countable covers of by sets of diameter at most . The -dimensional Hausdorff measure is then . Felix Hausdorff introduced this construction in 1919. For each set , there is a critical value such that for and for . This critical value is the Hausdorff dimension .
For familiar sets, Hausdorff dimension agrees with topological intuition: a smooth curve has dimension 1, a smooth surface has dimension 2, an open set in has dimension . The power of the concept is in its application to irregular sets. The Cantor set , constructed by iteratively removing the middle thirds of intervals, is uncountable yet has Lebesgue measure zero. Its Hausdorff dimension is . The Sierpinski triangle, obtained by repeatedly removing central triangles, has Hausdorff dimension .
These are examples of self-similar sets — sets that are unions of scaled copies of themselves. For a self-similar set satisfying the open set condition (a technical disjointness requirement), if it is the union of copies each scaled by ratio , then its Hausdorff dimension satisfies the Moran equation: , giving . For the Cantor set, and , confirming .
Hausdorff measure is a Borel regular measure on , and coincides with -dimensional Lebesgue measure up to a normalizing constant. One dimensional Hausdorff measure restricted to a smooth curve gives arc length. The theory of Hausdorff measures is the entry point to geometric measure theory, which studies rectifiable sets, minimal surfaces, and variational problems using the full machinery of measure theory. Benoit Mandelbrot popularized fractal dimension in the 1970s, and the concept now appears in physics (turbulence, critical phenomena), biology (branching structures), and image processing.
Covering Theorems and Differentiation
A recurring theme in analysis is recovering local information about a function or measure from averaged quantities. The Lebesgue Differentiation Theorem is the measure-theoretic analogue of the fundamental theorem of calculus: for any locally integrable function ,
where is the ball of radius centered at and denotes Lebesgue measure. In other words, the average value of over smaller and smaller balls centered at converges to at almost every point. A point where this holds is called a Lebesgue point of , and the theorem says that almost every point is a Lebesgue point.
The proof of the Lebesgue Differentiation Theorem relies on covering lemmas — geometric results that allow one to extract disjoint or nearly-disjoint subcollections from a covering. The most important is the Vitali Covering Lemma: given any collection of balls in with , one can extract a countable disjoint subcollection such that every ball in the original collection is contained in (the ball with the same center but five times the radius) for some :
The Vitali lemma is used to bound the Hardy-Littlewood maximal function . The Hardy-Littlewood Maximal Inequality asserts that for and any ,
where depends only on the dimension . This weak-type estimate is the cornerstone of the proof of the differentiation theorem and appears throughout harmonic analysis. The Besicovitch Covering Theorem provides an alternative covering result that works in metric spaces without relying on the Euclidean structure, making it useful for differentiating measures in more abstract settings.
Covering theorems also underlie the differentiation theory of monotone functions. A monotone function is differentiable almost everywhere — this is Lebesgue’s Theorem on Monotone Functions, proved in 1904. The key insight is that the set of points where fails to be differentiable can be covered by collections of intervals where the difference quotients oscillate, and the Vitali lemma shows this set must have measure zero. A function is absolutely continuous on if and only if it is the indefinite integral of an function, in which case the fundamental theorem of calculus holds: for all . Absolute continuity is strictly stronger than continuity and bounded variation, and it is precisely the condition under which the Lebesgue integral serves as the inverse of differentiation.
The differentiation theorem and its relatives connect back to the Radon-Nikodym theorem: the density can often be recovered as a pointwise limit of difference quotients, and the Lebesgue Decomposition of a measure into absolutely continuous and singular parts corresponds to the decomposition of a function of bounded variation into its absolutely continuous and singular parts. These connections make the differentiation theory of measures a unifying thread running through real analysis, harmonic analysis, and geometric measure theory.