Real Analysis

Sequences, continuity, differentiation, and Riemann/Lebesgue integration.


Real analysis is the rigorous foundation of calculus — the discipline that supplies precise definitions and airtight proofs for the intuitions that Newton and Leibniz set in motion in the seventeenth century. It is where the notion of “approaching a limit” is given an exact meaning, where the familiar rules of differentiation and integration are derived from first principles, and where the real number line is examined with enough care to reveal its subtle completeness properties. Studying real analysis is, above all, an exercise in mathematical maturity: it trains the mind to distrust intuition just enough, and to trust proof absolutely.

Foundations of the Real Number System

The story begins not with calculus but with numbers. The rational numbers Q\mathbb{Q} — fractions p/qp/q with p,qZp, q \in \mathbb{Z}, q0q \neq 0 — seem, at first glance, to fill up the number line. The ancient Greeks believed precisely this until they proved, to their dismay, that 2\sqrt{2} cannot be rational. The rationals have gaps, and those gaps are precisely the obstacle that prevents a naive theory of limits from working.

The real numbers R\mathbb{R} are constructed to fill those gaps. There are two classical constructions. Richard Dedekind, in his 1872 essay Stetigkeit und irrationale Zahlen, defined each real number as a Dedekind cut: a partition of Q\mathbb{Q} into two non-empty sets (A,B)(A, B) such that every element of AA is less than every element of BB and AA has no largest element. The real number 2\sqrt{2}, for instance, corresponds to the cut where A={qQ:q0 or q2<2}A = \{q \in \mathbb{Q} : q \leq 0 \text{ or } q^2 < 2\}. Simultaneously, Georg Cantor proposed defining real numbers as equivalence classes of Cauchy sequences of rationals — sequences that become arbitrarily close to each other without necessarily converging to a rational limit. Both constructions yield the same object: an ordered field with the crucial additional property called completeness.

Formally, R\mathbb{R} is characterized as the unique complete ordered field. It satisfies the usual field axioms (addition, multiplication, their inverses), an ordering compatible with the field structure, and the Completeness Axiom (also called the Least Upper Bound Property): every non-empty subset of R\mathbb{R} that is bounded above has a supremum (least upper bound) in R\mathbb{R}. This single axiom is what distinguishes R\mathbb{R} from Q\mathbb{Q}. The supremum of a set SS is written supS\sup S; the greatest lower bound is the infimum infS\inf S.

Two important consequences flow immediately from completeness. The Archimedean property states that for every real number xx, there exists a natural number nn with n>xn > x — the natural numbers are not bounded above in R\mathbb{R}. This rules out infinitely large or infinitely small elements. The density of the rationals states that between any two distinct real numbers a<ba < b there exists a rational number qq with a<q<ba < q < b. Remarkably, there also exists an irrational between aa and bb, so both Q\mathbb{Q} and RQ\mathbb{R} \setminus \mathbb{Q} are dense in R\mathbb{R}, even though they have very different cardinalities (countable versus uncountable).

The absolute value x|x| measures distance from the origin: x=x|x| = x if x0x \geq 0 and x=x|x| = -x if x<0x < 0. Its most important property is the triangle inequality:

x+yx+yfor all x,yR.|x + y| \leq |x| + |y| \quad \text{for all } x, y \in \mathbb{R}.

The triangle inequality is the engine that drives most convergence arguments. A variant, the reverse triangle inequality xyxy\bigl||x| - |y|\bigr| \leq |x - y|, is equally useful in bounding differences.

Sequences and Series

A sequence of real numbers is a function a:NRa : \mathbb{N} \to \mathbb{R}, written (an)n=1(a_n)_{n=1}^\infty or simply (an)(a_n). The central question is: does the sequence settle down to a definite value? A sequence (an)(a_n) converges to a limit LRL \in \mathbb{R}, written anLa_n \to L or limnan=L\lim_{n \to \infty} a_n = L, if for every ε>0\varepsilon > 0 there exists NNN \in \mathbb{N} such that anL<ε|a_n - L| < \varepsilon for all n>Nn > N. In plain English: the terms of the sequence eventually stay within any prescribed distance ε\varepsilon of LL. This is the ε\varepsilon-NN definition, and mastering it — learning to produce an NN given an arbitrary ε\varepsilon — is the first and most important technical skill of real analysis.

Limits, when they exist, are unique. Convergent sequences are necessarily bounded: there exists M>0M > 0 with anM|a_n| \leq M for all nn. The algebra of limits is familiar from calculus but requires proof: if anLa_n \to L and bnMb_n \to M, then an+bnL+Ma_n + b_n \to L + M, anbnLMa_n b_n \to LM, and an/bnL/Ma_n / b_n \to L/M provided M0M \neq 0.

Monotone sequences are particularly well-behaved. The Monotone Convergence Theorem states that every monotone increasing sequence that is bounded above converges, and every monotone decreasing sequence that is bounded below converges. This is a direct consequence of the completeness axiom: the supremum of the sequence is its limit.

The Bolzano-Weierstrass Theorem (named for Bernard Bolzano and Karl Weierstrass, the latter of whom gave the modern rigorous treatment in the 1860s) asserts that every bounded sequence of real numbers has a convergent subsequence. A subsequence (ank)k=1(a_{n_k})_{k=1}^\infty is obtained by selecting an infinite increasing chain of indices n1<n2<n3<n_1 < n_2 < n_3 < \cdots. Bolzano-Weierstrass is a compactness result in disguise, and it underpins the proofs of the Extreme Value Theorem and the Heine-Cantor Theorem.

A sequence (an)(a_n) is a Cauchy sequence if for every ε>0\varepsilon > 0 there exists NN such that aman<ε|a_m - a_n| < \varepsilon for all m,n>Nm, n > N. The terms become close to each other without reference to any proposed limit. The fundamental result is that, in R\mathbb{R}, Cauchy sequences and convergent sequences are the same thing: a sequence converges if and only if it is Cauchy. This property, called completeness of R\mathbb{R}, is what the Cantor construction is designed to guarantee.

An infinite series n=1an\sum_{n=1}^\infty a_n is defined as the limit of the partial sums SN=n=1NanS_N = \sum_{n=1}^N a_n. The series converges if and only if (SN)(S_N) converges as a sequence. A necessary condition for convergence is that an0a_n \to 0, but this condition is far from sufficient — the harmonic series 1/n\sum 1/n diverges even though 1/n01/n \to 0, a fact first proved by Nicole Oresme in the fourteenth century. The standard convergence tests — comparison, ratio, root, and the alternating series test — each carve out sufficient conditions. A series an\sum a_n is absolutely convergent if an\sum |a_n| converges; absolute convergence implies convergence, and absolutely convergent series can be rearranged without changing their sum. Conditionally convergent series — those that converge but not absolutely — are far more delicate: by the Riemann Rearrangement Theorem, any conditionally convergent series can be rearranged to converge to any prescribed real number, or even to diverge to ±\pm \infty.

Limits, Continuity, and Differentiation

For functions f:RRf : \mathbb{R} \to \mathbb{R}, the limit limxcf(x)=L\lim_{x \to c} f(x) = L means: for every ε>0\varepsilon > 0, there exists δ>0\delta > 0 such that 0<xc<δ0 < |x - c| < \delta implies f(x)L<ε|f(x) - L| < \varepsilon. The condition 0<xc0 < |x - c| ensures we do not require anything about f(c)f(c) itself. A function is continuous at cc if limxcf(x)=f(c)\lim_{x \to c} f(x) = f(c) — the limit exists, equals the function value, and the function is defined at cc. Continuity has an equivalent sequential characterization: ff is continuous at cc if and only if xncx_n \to c implies f(xn)f(c)f(x_n) \to f(c) for every sequence (xn)(x_n).

Continuous functions on closed bounded intervals enjoy two celebrated properties. The Extreme Value Theorem (proved rigorously by Weierstrass) states that if f:[a,b]Rf : [a,b] \to \mathbb{R} is continuous, then ff attains its maximum and minimum values — there exist c,d[a,b]c, d \in [a,b] with f(c)f(x)f(d)f(c) \leq f(x) \leq f(d) for all x[a,b]x \in [a,b]. The Intermediate Value Theorem (traced to Bolzano’s 1817 paper Rein analytischer Beweis) states that if f:[a,b]Rf : [a,b] \to \mathbb{R} is continuous and f(a)<k<f(b)f(a) < k < f(b), then there exists c(a,b)c \in (a,b) with f(c)=kf(c) = k. The Intermediate Value Theorem is the rigorous underpinning of root-finding algorithms: every continuous function that changes sign must have a zero.

Uniform continuity is a stronger condition: ff is uniformly continuous on a set SS if for every ε>0\varepsilon > 0 there exists a single δ>0\delta > 0 (independent of the point) such that xy<δ|x - y| < \delta implies f(x)f(y)<ε|f(x) - f(y)| < \varepsilon for all x,ySx, y \in S. The Heine-Cantor Theorem asserts that every continuous function on a closed bounded interval is uniformly continuous — a powerful result with no analogue on open intervals (f(x)=1/xf(x) = 1/x on (0,1)(0,1) is continuous but not uniformly so).

The derivative of ff at a point cc is defined as the limit of the difference quotient:

f(c)=limh0f(c+h)f(c)h,f'(c) = \lim_{h \to 0} \frac{f(c+h) - f(c)}{h},

provided this limit exists. Differentiability at cc implies continuity at cc, but not conversely — continuity is strictly weaker. The standard differentiation rules (sum, product, quotient, chain) are theorems, not axioms. The most important theorems in differential calculus are the mean value theorems. Rolle’s Theorem states that if ff is continuous on [a,b][a,b], differentiable on (a,b)(a,b), and f(a)=f(b)f(a) = f(b), then there exists c(a,b)c \in (a,b) with f(c)=0f'(c) = 0. The Lagrange Mean Value Theorem (the form most often called “the mean value theorem”) generalizes this:

f(c)=f(b)f(a)bafor some c(a,b).f'(c) = \frac{f(b) - f(a)}{b - a} \quad \text{for some } c \in (a,b).

This single result implies that a function with a positive derivative on an interval is increasing, that a function with a zero derivative is constant, and that differentiable functions cannot oscillate faster than their derivative allows. Taylor’s Theorem extends the idea, approximating a sufficiently smooth function by a polynomial and providing an explicit formula for the remainder.

Riemann Integration

The Riemann integral, formalized by Bernhard Riemann in his 1854 Habilitation thesis Über die Darstellbarkeit einer Function durch eine trigonometrische Reihe, gives a precise meaning to the area under a curve. The construction begins with partitions: a partition PP of [a,b][a,b] is a finite collection of points a=x0<x1<<xn=ba = x_0 < x_1 < \cdots < x_n = b. For each subinterval [xi1,xi][x_{i-1}, x_i], define the upper sum U(f,P)=i=1nMi(xixi1)U(f,P) = \sum_{i=1}^n M_i (x_i - x_{i-1}) where Mi=supx[xi1,xi]f(x)M_i = \sup_{x \in [x_{i-1},x_i]} f(x), and the lower sum L(f,P)=i=1nmi(xixi1)L(f,P) = \sum_{i=1}^n m_i (x_i - x_{i-1}) where mi=infx[xi1,xi]f(x)m_i = \inf_{x \in [x_{i-1},x_i]} f(x).

A bounded function ff is Riemann integrable on [a,b][a,b] if the infimum of all upper sums equals the supremum of all lower sums:

infPU(f,P)=supPL(f,P)=abf(x)dx.\inf_P U(f,P) = \sup_P L(f,P) = \int_a^b f(x)\, dx.

The Riemann-Darboux criterion for integrability states that ff is integrable if and only if for every ε>0\varepsilon > 0 there exists a partition PP with U(f,P)L(f,P)<εU(f,P) - L(f,P) < \varepsilon. From this criterion, two large classes of integrable functions emerge: all continuous functions on [a,b][a,b] are integrable (Weierstrass), and all monotone functions on [a,b][a,b] are integrable. More generally, functions with only finitely many discontinuities are integrable, as are functions whose set of discontinuities has measure zero.

The relationship between differentiation and integration is codified in the two Fundamental Theorems of Calculus. The first theorem states that if ff is continuous on [a,b][a,b] and F(x)=axf(t)dtF(x) = \int_a^x f(t)\, dt, then FF is differentiable and F(x)=f(x)F'(x) = f(x) — integration produces an antiderivative. The second theorem states that if ff is integrable on [a,b][a,b] and GG is any antiderivative of ff, then abf(x)dx=G(b)G(a)\int_a^b f(x)\, dx = G(b) - G(a). Together, these theorems establish that differentiation and integration are inverse operations, a fact that Newton and Leibniz used instinctively but that required two centuries of effort to prove with full rigor.

Metric Spaces and Topology of Euclidean Space

Real analysis on the real line generalizes naturally to higher dimensions and to abstract spaces. A metric space is a pair (X,d)(X, d) where XX is a set and d:X×X[0,)d : X \times X \to [0, \infty) is a metric satisfying, for all x,y,zXx, y, z \in X: (i) d(x,y)=0d(x,y) = 0 if and only if x=yx = y; (ii) d(x,y)=d(y,x)d(x,y) = d(y,x) (symmetry); (iii) d(x,z)d(x,y)+d(y,z)d(x,z) \leq d(x,y) + d(y,z) (triangle inequality). The Euclidean metric on Rn\mathbb{R}^n is d(x,y)=xy=i=1n(xiyi)2d(x,y) = \|x - y\| = \sqrt{\sum_{i=1}^n (x_i - y_i)^2}. Other examples include the discrete metric, the supremum metric d(f,g)=supfgd_\infty(f,g) = \sup|f-g| on spaces of functions, and the pp-adic metric on number fields.

An open ball of radius rr centered at xx is the set B(x,r)={yX:d(x,y)<r}B(x,r) = \{y \in X : d(x,y) < r\}. A set UXU \subseteq X is open if every point of UU has an open ball entirely contained in UU. A set CC is closed if its complement is open — equivalently, if it contains all its limit points. The open sets define the topology of the metric space, encoding the notion of nearness.

The critical topological notion for analysis is compactness. A subset KK of a metric space is compact if every open cover of KK has a finite subcover. In Rn\mathbb{R}^n, the Heine-Borel Theorem gives a much simpler characterization: a subset of Rn\mathbb{R}^n is compact if and only if it is closed and bounded. Compact sets are the natural domain for the strongest theorems: continuous functions on compact sets are uniformly continuous (Heine-Cantor), attain their extreme values (Extreme Value Theorem), and their images are compact.

A subset SS of a metric space is connected if it cannot be written as a union of two disjoint non-empty open sets. In R\mathbb{R}, the connected sets are precisely the intervals (including rays and the whole line). A stronger notion is path connectedness: SS is path connected if any two points in SS can be joined by a continuous path lying entirely in SS. In Rn\mathbb{R}^n, path connectedness implies connectedness.

A metric space is complete if every Cauchy sequence in it converges to a point in the space. Euclidean space Rn\mathbb{R}^n is complete; the rationals Q\mathbb{Q} are not. Completeness is the precise property that prevents sequences from converging to “missing” points. The Banach Fixed Point Theorem (also called the Contraction Mapping Theorem) states that every contraction on a complete metric space has a unique fixed point, and that iterating the contraction from any starting point converges to it. This theorem is the backbone of many existence proofs in differential equations and numerical analysis.

Sequences and Series of Functions

When functions, rather than numbers, form the terms of a sequence, a new subtlety emerges: there are two distinct notions of convergence. A sequence of functions (fn)(f_n) on a set SS converges pointwise to ff if fn(x)f(x)f_n(x) \to f(x) for each fixed xSx \in S — a separate limit condition for each point. It converges uniformly to ff if for every ε>0\varepsilon > 0 there exists NN (independent of xx) such that fn(x)f(x)<ε|f_n(x) - f(x)| < \varepsilon for all n>Nn > N and all xSx \in S. Uniform convergence is a joint condition on the entire function simultaneously, while pointwise convergence permits the rate of convergence to vary arbitrarily from point to point.

The distinction matters enormously because pointwise limits can destroy properties that each individual function possesses. A pointwise limit of continuous functions need not be continuous: the sequence fn(x)=xnf_n(x) = x^n on [0,1][0,1] converges pointwise to the function that is 00 on [0,1)[0,1) and 11 at x=1x=1, which is discontinuous. Under uniform convergence, however, the limit of a sequence of continuous functions is continuous, the limit can be integrated term-by-term, and (with an additional condition on the derivatives) the limit can be differentiated term-by-term. These exchange-of-limits results are among the most useful theorems in analysis, and their failure under mere pointwise convergence is one of the lessons that the nineteenth century had to learn painfully.

A series of functions n=1fn\sum_{n=1}^\infty f_n converges uniformly if the partial sums converge uniformly. The Weierstrass M-test gives a clean sufficient condition: if fn(x)Mn|f_n(x)| \leq M_n for all xx and Mn<\sum M_n < \infty, then fn\sum f_n converges uniformly and absolutely. Power series n=0cn(xa)n\sum_{n=0}^\infty c_n (x-a)^n are the most important examples: each power series has a radius of convergence RR (given by the Cauchy-Hadamard formula 1/R=lim supcn1/n1/R = \limsup |c_n|^{1/n}) such that the series converges absolutely and uniformly on compact subsets of (aR,a+R)(a-R, a+R) and diverges for xa>R|x-a| > R. Within its interval of convergence, a power series can be differentiated and integrated term-by-term, and it defines a function whose Taylor coefficients are exactly cn=f(n)(a)/n!c_n = f^{(n)}(a)/n!. This is the theory of analytic functions in the real setting, a precursor to complex analysis.

The Weierstrass Approximation Theorem (1885) states that every continuous function on a closed bounded interval can be approximated uniformly by polynomials. The theorem was a surprise: polynomials are special, yet they are dense in the space of all continuous functions. Karl Weierstrass proved the theorem constructively; later, Sergei Bernstein gave an elegant probabilistic proof in 1912 using the polynomials now bearing his name. The Stone-Weierstrass Theorem vastly generalizes the result, replacing polynomials by any subalgebra of continuous functions that separates points and contains constants.

Introduction to Lebesgue Integration

The Riemann integral has a fundamental limitation: it struggles with functions that oscillate wildly or have too many discontinuities. The Dirichlet function — defined to be 11 on the rationals and 00 on the irrationals — is not Riemann integrable on any interval, even though it “should” have integral 00 (the rationals are negligible). The Riemann integral is also poorly behaved with respect to limits: pointwise limits of Riemann integrable functions need not be Riemann integrable, and even when they are, one cannot always exchange the limit and the integral.

Henri Lebesgue, in his 1902 doctoral thesis Intégrale, longueur, aire, introduced a radically different approach. Instead of partitioning the domain (the xx-axis) as Riemann did, Lebesgue partitioned the range (the yy-axis) and measured the size of the set of xx-values where the function takes values in each strip. This requires a theory of measure: a way to assign a “size” to subsets of R\mathbb{R} that generalizes the length of intervals.

A σ\sigma-algebra on a set XX is a collection F\mathcal{F} of subsets of XX that is closed under complements and countable unions (and hence countable intersections). A measure μ:F[0,]\mu : \mathcal{F} \to [0, \infty] is a function satisfying μ()=0\mu(\emptyset) = 0 and countable additivity: if A1,A2,A_1, A_2, \ldots are pairwise disjoint sets in F\mathcal{F}, then μ(n=1An)=n=1μ(An)\mu\bigl(\bigcup_{n=1}^\infty A_n\bigr) = \sum_{n=1}^\infty \mu(A_n). The Lebesgue measure λ\lambda on R\mathbb{R} is the unique measure on the Borel σ\sigma-algebra that assigns λ([a,b])=ba\lambda([a,b]) = b - a to every interval. Sets of Lebesgue measure zero — null sets — can be ignored for the purposes of integration. The Cantor set is a striking example: it is uncountable yet has Lebesgue measure zero.

A function f:RRf : \mathbb{R} \to \mathbb{R} is Lebesgue measurable if the preimage f1((a,))f^{-1}((a,\infty)) is a measurable set for every aRa \in \mathbb{R}. The Lebesgue integral of a non-negative measurable function is built up in stages: first for simple functions (finite linear combinations of indicator functions of measurable sets), then for general non-negative functions as a supremum over simple functions bounded below them, and finally for general functions by splitting f=f+ff = f^+ - f^- into positive and negative parts.

The power of Lebesgue integration lies in its convergence theorems. The Monotone Convergence Theorem states that if (fn)(f_n) is a sequence of non-negative measurable functions increasing pointwise to ff, then fndλfdλ\int f_n \, d\lambda \to \int f \, d\lambda. Fatou’s Lemma gives a lower bound for lim inf. The crown jewel is the Dominated Convergence Theorem: if fnff_n \to f pointwise almost everywhere and fng|f_n| \leq g for an integrable dominating function gg, then fndλfdλ\int f_n \, d\lambda \to \int f \, d\lambda. This theorem is the rigorous tool that makes the exchange of limits and integrals legitimate, and it is indispensable in functional analysis, probability theory, and the theory of partial differential equations.

Every Riemann integrable function is Lebesgue integrable and the two integrals agree. The Lebesgue integral is strictly more general: the Dirichlet function is Lebesgue integrable with integral 00. The precise characterization of Riemann integrability in Lebesgue’s language is elegant: a bounded function is Riemann integrable if and only if its set of discontinuities has Lebesgue measure zero. Real analysis, which began with the effort to make calculus rigorous, ends by opening the door to measure theory, functional analysis, and modern probability — the landscape of twentieth-century mathematics.