Probabilistic & Algebraic Methods
The probabilistic method, Lovász local lemma, and symmetric functions.
Probabilistic combinatorics is the art of proving that certain mathematical objects must exist — not by constructing them explicitly, but by demonstrating that a randomly chosen object has the desired properties with positive probability. This counter-intuitive approach, developed in the mid-twentieth century largely by Paul Erdős, transformed combinatorics from a discipline of direct construction into one where randomness itself becomes a proof technique of extraordinary power. The methods that emerged — concentration inequalities, local lemmas, threshold functions, and derandomization — now permeate theoretical computer science, information theory, and statistical physics.
The Probabilistic Method
The probabilistic method rests on a single observation: if a random object from some probability space has a positive probability of satisfying a desired property, then at least one such object exists. Conversely, if the expected value of some quantity is less than a threshold, then there exists a choice where the quantity is below that threshold. This sounds almost trivially obvious, yet it unlocks existence proofs for objects that are utterly inaccessible to explicit construction.
The method was essentially born in 1947 when Erdős used it to establish a lower bound on Ramsey numbers. Recall that denotes the smallest such that every red-blue coloring of the edges of the complete graph contains a monochromatic . Erdős asked: how large can be while still allowing a coloring with no monochromatic ? He colored the edges of uniformly at random, each edge independently red or blue with equal probability. For a fixed set of vertices, the probability that all edges among are the same color is . By the union bound over all such sets:
When this expression is less than 1, a valid coloring exists. Choosing makes the right-hand side go below 1, giving the lower bound
This is the first moment method: bound the expected number of “bad” events and show it is less than 1. When the expected number of violations is less than 1, the probability that no violation occurs is positive, so a good configuration exists.
The second moment method strengthens this. Let count the number of desired structures in a random object. If but one suspects concentrates, the Chebyshev inequality provides the key tool. Since , it suffices to show that the variance is small relative to the square of the mean. This approach was crucial in establishing when random graphs acquire specific properties: for instance, showing that the expected number of triangles in grows while their variance stays comparably small reveals that triangles are ubiquitous rather than concentrated in rare, anomalous graphs.
The probabilistic method also applies in the alterations style: start with a random object, compute the expected number of defects, and then alter or delete a few elements to eliminate all defects at a controllable cost. This technique yields tight bounds in problems ranging from independent sets in triangle-free graphs to discrepancy theory.
Lovász Local Lemma and Concentration Inequalities
The union bound used in the basic probabilistic method is often too crude. When there are many bad events each with small probability , the union bound gives , which can exceed 1 even when the events are nearly independent and the probability of avoiding all of them is actually close to 1. The Lovász Local Lemma, discovered by László Lovász and Paul Erdős in 1975, precisely handles this scenario.
The Local Lemma requires that each bad event is dependent on only a few others. Formally, given events in a probability space, construct a dependency graph on vertex set where and are connected by an edge if they are not independent of each other. If each event has probability at most and each vertex in the dependency graph has degree at most , and if the condition
holds (where is Euler’s number), then with positive probability none of the bad events occur:
The remarkable feature is that this holds regardless of how many events there are — as long as the local dependence structure stays sparse. The proof proceeds by induction, carefully tracking conditional probabilities. A lopsided (or asymmetric) version replaces the uniform bound with event-specific probabilities and allows for a more refined dependency condition: if there exist such that for all , then .
The Local Lemma has stunning applications in combinatorics. For hypergraph 2-coloring: a -uniform hypergraph is 2-colorable (its vertices can be colored red and blue so that no edge is monochromatic) whenever every edge shares vertices with at most other edges. This follows from the symmetric LLL applied to the events “edge is monochromatic” under a random 2-coloring, since and for large . For -SAT: any -CNF formula where every clause shares variables with at most other clauses is satisfiable.
Alongside the LLL, concentration inequalities are the workhorses of modern probabilistic combinatorics. The Chernoff bounds quantify how tightly a sum of independent Bernoulli random variables concentrates around its mean. If with each independent and , then for any :
In many settings the simpler bound suffices for . The Azuma-Hoeffding inequality extends concentration to functions of many independent variables that change by at most when the -th variable changes, without requiring the function to be a sum:
This martingale method — exposing variables one at a time and tracking how the conditional expectation drifts — underlies proofs of concentration in random graphs, random colorings, and random permutations. The method of bounded differences (also called the Azuma-McDiarmid inequality) is the principal tool when independence is present but the structure is not simply additive.
Random Structures and Threshold Functions
One of the most beautiful phenomena in probabilistic combinatorics is the emergence of sharp threshold functions: as a parameter varies continuously, a graph property transitions from almost certainly absent to almost certainly present in an interval of vanishing relative width. This was formalized by Erdős and Rényi in their landmark 1960 paper on random graphs, which launched the systematic study of , the Erdős-Rényi random graph in which each of the potential edges is included independently with probability .
A monotone property (one closed under the addition of edges) has a threshold if
as . The celebrated result of Bollobás and Thomason (1987) is that every monotone graph property has a threshold. Many thresholds are explicitly known. The threshold for containing a triangle is , meaning triangles appear around the time the average degree reaches 1. The threshold for connectivity is : below this, isolated vertices remain; above it, the graph is almost surely connected.
The most dramatic threshold is the giant component phenomenon, the centerpiece of Erdős-Rényi theory. When with , all connected components of have size with high probability. At the largest component has size . When , a unique giant component of size — containing a fraction of all vertices, where is determined by — emerges, while all other components remain logarithmically small. This transition at is a genuine phase transition, with direct analogies to percolation in statistical physics.
Random regular graphs present a richer structure: in , the uniform distribution over -regular graphs on vertices (with even), properties like Hamiltonicity, chromatic number concentration, and expansion are more accessible than in . It is known that for , is almost surely Hamiltonian and has chromatic number approximately . The configuration model — a generalization that allows any degree sequence — enables analysis of scale-free and power-law random graphs, which better model real-world networks.
Sharp thresholds occur when the transition window is truly narrow: jumps from near 0 to near 1 within a -interval of width . This is related to the influential KKL theorem (Kahn, Kalai, Linial, 1988), which shows that for Boolean functions computed by monotone properties, there always exists a variable with high influence — meaning changing that coordinate significantly affects the outcome. Sharp thresholds then follow from hyper-contractive inequalities applied to the Fourier expansion of the indicator of , a deep connection between combinatorics, harmonic analysis, and complexity theory.
Derandomization Techniques
The probabilistic method is phenomenally effective at proving existence, but it rarely produces explicit objects. Derandomization transforms probabilistic proofs into deterministic constructions, yielding explicit combinatorial objects with the properties guaranteed by the probabilistic argument.
The most fundamental derandomization tool is the method of conditional expectations, developed by Erdős and Selfridge in the context of discrepancy and refined by many others. The idea is to expose the random choices one at a time — fixing the first variable to the value that keeps the conditional expected number of violations below its starting value, then fixing the second variable, and so on. At each step, at least one choice maintains or improves the quantity being tracked. After fixing all variables, the resulting deterministic object satisfies the required property. This method applies whenever the probabilistic argument uses only the first moment: if where counts violations, the conditional expectation framework produces an explicit zero-violation configuration in polynomial time, provided that conditional expectations can be computed efficiently.
When the Local Lemma is involved, derandomization is harder — Moser and Tardos (2010) gave a breakthrough algorithmic version of the LLL that runs in expected polynomial time. The algorithm is strikingly simple: repeat any violated constraint by resampling all variables in its scope uniformly at random, until no constraint is violated. Moser-Tardos analysis proves that this terminates quickly using an entropy argument — a clever counting of “witness trees” that would need to exist if the algorithm ran for too long. The constructive LLL has been applied to k-SAT, graph coloring, and hypergraph coloring, turning existential guarantees into polynomial-time algorithms.
Pairwise independent hash families and -biased sets provide another route to derandomization. Many probabilistic arguments use only pairwise or -wise independence rather than full independence; replacing a fully random sequence with a -wise independent one obtained from an algebraic construction reduces the randomness requirement from bits to bits. Pseudorandom generators (PRGs) formalize this: a deterministic function is a PRG for a class of tests if no test in the class can distinguish ‘s output from a truly random string. Small-bias PRGs (Naor-Reingold), -nets, and sample and modify approaches all give explicit objects matching probabilistic bounds.
The discrepancy method connects derandomization to a beautiful geometric picture. The discrepancy of a set system under a coloring is ; the combinatorial discrepancy is the minimum over all colorings. Spencer’s theorem (1985) uses the probabilistic method via partial coloring and entropy to show that any sets on points have discrepancy at most — a bound independent of that is entirely non-constructive. The spectacular 2010 result by Bansal gave a polynomial-time algorithm achieving this bound by solving a semidefinite program guided by the Lovász theta function, converting the probabilistic existence proof into an efficient construction via convex optimization.
Symmetrization and algebraic constructions also play a central role: the algebraic geometry codes of Goppa, explicit Ramsey graphs from finite geometry, and Paley graphs (defined via quadratic residues in finite fields) all provide explicit combinatorial objects whose properties match or approach what the probabilistic method guarantees. The Paley graph on for a prime power , where vertices are field elements and edges connect and if is a nonzero square, is a -regular graph known to be strongly pseudo-random. Its clique and independence numbers are both by character sum estimates, matching the probabilistic lower bound up to logarithmic factors. Constructing an explicit graph achieving the probabilistic lower bound remains one of the great open problems in combinatorics.
The confluence of the probabilistic method, concentration inequalities, and derandomization has become one of the richest and most active corners of modern mathematics — a field where elegant non-constructive existence arguments and hard-won explicit constructions illuminate each other, and where combinatorics, probability theory, harmonic analysis, and theoretical computer science have developed a common language.